pbh5tools¶
pbh5tools is a collection of tools that can manipulate the content or extract data from
two types of h5 files:
- cmp.h5: files that contain alignment information.
- bas.h5and- pls.h5: files that contain base-call information.
pbh5tools is comprised of two executables: cmph5tools.py and
bash5tools.py. At the moment, the cmph5tools.py program
provides a rich set of tools to manipulate and analyze the data in a
cmp.h5 file. The bash5tools.py provides mechanisms to extract
basecall information from bas.h5 files.
Installation¶
To install pbh5tools, run the following command from the pbh5tools root directory:
python setup.py install
Tool: bash5tools.py¶
bash5tools.py can extract read sequences and quality values for
both Raw and circular consensus sequencing (CCS) readtypes and use
create fastq and fasta files.
Usage¶
usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug]
                     [--outFilePrefix OUTFILEPREFIX]
                     [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE]
                     [--minLength MINLENGTH] [--minReadScore MINREADSCORE]
                     [--minPasses MINPASSES]
                     input.bas.h5
Tool for extracting data from .bas.h5 files
positional arguments:
  input.bas.h5          input .bas.h5 filename
optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Set the verbosity level (default: None)
  --version             show program's version number and exit
  --profile             Print runtime profile at exit (default: False)
  --debug               Run within a debugger session (default: False)
  --outFilePrefix OUTFILEPREFIX
                        output filename prefix [None]
  --readType {ccs,subreads,unrolled}
                        read type (ccs, subreads, or unrolled) []
  --outType OUTTYPE     output file type (fasta, fastq) [fasta]
Read filtering arguments:
  --minLength MINLENGTH
                        min read length [0]
  --minReadScore MINREADSCORE
                        min read score, valid only with
                        --readType={unrolled,subreads} [0]
  --minPasses MINPASSES
                        min number of CCS passes, valid only with
                        --readType=ccs [0]
Examples¶
Extracting all Raw reads from input.bas.h5 without any filtering
and exporting to FASTA (myreads.fasta):
python bash5tools.py input.bas.h5 --outFilePrefix myreads --outType fasta --readType Raw
Extracting all CCS reads from input.bas.h5 that have read lengths
larger than 100 and exporting to FASTQ (myreads.fastq):
python bash5tools.py --inFile input.bas.h5 --outFilePref myreads --outType fastq --readType CCS --minLength 100
Tool: cmph5tools.py¶
cmph5tools.py is a multi-commandline tool that provides access to
the following subtools:
- merge: Merge multiple cmp.h5files into a single file.
- sort: Sort a cmp.h5file.
3. select: Create a new file from a cmp.h5 file by specifying
which reads to include.
4. equal: Compare the contents of 2 cmp.h5 files for
equivalence.
5. summarize: Summarize the contents of a cmp.h5 file in a
verbose, human readable format.
6. stats: Extract summary metrics from a cmp.h5 file into a
csv file.
- valid: Determine whether a cmp.h5file is valid.
8. listMetrics: Emit the available metrics and statistics for use
in the select and stats subcommands.
To list all available subtools provided by cmph5tools.py simply
run:
cmph5tools.py --help
Each subtool has its own usage information which can be generated by running:
cmph5tools.py <toolname> --help
To run any subtool it is suggested to use the --info commandline
argument since this will provide progress information while the script
is running via printing in stdout:
cmph5tools.py <toolname> --info <other arguments>