.. pbh5tools documentation master file, created by sphinx-quickstart on Thu Nov 10 17:09:22 2011. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. ========= pbh5tools ========= ``pbh5tools`` is a collection of tools that can manipulate the content or extract data from two types of h5 files: * ``cmp.h5``: files that contain alignment information. * ``bas.h5`` and ``pls.h5``: files that contain base-call information. ``pbh5tools`` is comprised of two executables: ``cmph5tools.py`` and ``bash5tools.py``. At the moment, the ``cmph5tools.py`` program provides a rich set of tools to manipulate and analyze the data in a ``cmp.h5`` file. The ``bash5tools.py`` provides mechanisms to extract basecall information from bas.h5 files. ############ Installation ############ To install ``pbh5tools``, run the following command from the ``pbh5tools`` root directory: :: python setup.py install #################### Tool: bash5tools.py #################### ``bash5tools.py`` can extract read sequences and quality values for both Raw and circular consensus sequencing (CCS) readtypes and use create ``fastq`` and ``fasta`` files. ----- Usage ----- :: usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug] [--outFilePrefix OUTFILEPREFIX] [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE] [--minLength MINLENGTH] [--minReadScore MINREADSCORE] [--minPasses MINPASSES] input.bas.h5 Tool for extracting data from .bas.h5 files positional arguments: input.bas.h5 input .bas.h5 filename optional arguments: -h, --help show this help message and exit --verbose, -v Set the verbosity level (default: None) --version show program's version number and exit --profile Print runtime profile at exit (default: False) --debug Run within a debugger session (default: False) --outFilePrefix OUTFILEPREFIX output filename prefix [None] --readType {ccs,subreads,unrolled} read type (ccs, subreads, or unrolled) [] --outType OUTTYPE output file type (fasta, fastq) [fasta] Read filtering arguments: --minLength MINLENGTH min read length [0] --minReadScore MINREADSCORE min read score, valid only with --readType={unrolled,subreads} [0] --minPasses MINPASSES min number of CCS passes, valid only with --readType=ccs [0] -------- Examples -------- Extracting all Raw reads from ``input.bas.h5`` without any filtering and exporting to FASTA (``myreads.fasta``): :: python bash5tools.py input.bas.h5 --outFilePrefix myreads --outType fasta --readType Raw Extracting all CCS reads from ``input.bas.h5`` that have read lengths larger than 100 and exporting to FASTQ (``myreads.fastq``): :: python bash5tools.py --inFile input.bas.h5 --outFilePref myreads --outType fastq --readType CCS --minLength 100 #################### Tool: cmph5tools.py #################### ``cmph5tools.py`` is a multi-commandline tool that provides access to the following subtools: 1. **merge**: Merge multiple ``cmp.h5`` files into a single file. 2. **sort**: Sort a ``cmp.h5`` file. 3. **select**: Create a new file from a ``cmp.h5`` file by specifying which reads to include. 4. **equal**: Compare the contents of 2 ``cmp.h5`` files for equivalence. 5. **summarize**: Summarize the contents of a ``cmp.h5`` file in a verbose, human readable format. 6. **stats**: Extract summary metrics from a ``cmp.h5`` file into a ``csv`` file. 7. **valid**: Determine whether a ``cmp.h5`` file is valid. 8. **listMetrics**: Emit the available metrics and statistics for use in the ``select`` and ``stats`` subcommands. To list all available subtools provided by ``cmph5tools.py`` simply run: :: cmph5tools.py --help Each subtool has its own usage information which can be generated by running: :: cmph5tools.py --help To run any subtool it is suggested to use the ``--info`` commandline argument since this will provide progress information while the script is running via printing in stdout: :: cmph5tools.py --info .. toctree:: :maxdepth: 2 cmph5tools-examples * :ref:`genindex` * :ref:`modindex` * :ref:`search`