pbh5tools¶
pbh5tools
is a collection of tools that can manipulate the content or extract data from
two types of h5 files:
cmp.h5
: files that contain alignment information.bas.h5
andpls.h5
: files that contain base-call information.
pbh5tools
is comprised of two executables: cmph5tools.py
and
bash5tools.py
. At the moment, the cmph5tools.py
program
provides a rich set of tools to manipulate and analyze the data in a
cmp.h5
file. The bash5tools.py
provides mechanisms to extract
basecall information from bas.h5 files.
Installation¶
To install pbh5tools
, run the following command from the pbh5tools
root directory:
python setup.py install
Tool: bash5tools.py¶
bash5tools.py
can extract read sequences and quality values for
both Raw and circular consensus sequencing (CCS) readtypes and use
create fastq
and fasta
files.
Usage¶
usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug]
[--outFilePrefix OUTFILEPREFIX]
[--readType {ccs,subreads,unrolled}] [--outType OUTTYPE]
[--minLength MINLENGTH] [--minReadScore MINREADSCORE]
[--minPasses MINPASSES]
input.bas.h5
Tool for extracting data from .bas.h5 files
positional arguments:
input.bas.h5 input .bas.h5 filename
optional arguments:
-h, --help show this help message and exit
--verbose, -v Set the verbosity level (default: None)
--version show program's version number and exit
--profile Print runtime profile at exit (default: False)
--debug Run within a debugger session (default: False)
--outFilePrefix OUTFILEPREFIX
output filename prefix [None]
--readType {ccs,subreads,unrolled}
read type (ccs, subreads, or unrolled) []
--outType OUTTYPE output file type (fasta, fastq) [fasta]
Read filtering arguments:
--minLength MINLENGTH
min read length [0]
--minReadScore MINREADSCORE
min read score, valid only with
--readType={unrolled,subreads} [0]
--minPasses MINPASSES
min number of CCS passes, valid only with
--readType=ccs [0]
Examples¶
Extracting all Raw reads from input.bas.h5
without any filtering
and exporting to FASTA (myreads.fasta
):
python bash5tools.py input.bas.h5 --outFilePrefix myreads --outType fasta --readType Raw
Extracting all CCS reads from input.bas.h5
that have read lengths
larger than 100 and exporting to FASTQ (myreads.fastq
):
python bash5tools.py --inFile input.bas.h5 --outFilePref myreads --outType fastq --readType CCS --minLength 100
Tool: cmph5tools.py¶
cmph5tools.py
is a multi-commandline tool that provides access to
the following subtools:
- merge: Merge multiple
cmp.h5
files into a single file. - sort: Sort a
cmp.h5
file.
3. select: Create a new file from a cmp.h5
file by specifying
which reads to include.
4. equal: Compare the contents of 2 cmp.h5
files for
equivalence.
5. summarize: Summarize the contents of a cmp.h5
file in a
verbose, human readable format.
6. stats: Extract summary metrics from a cmp.h5
file into a
csv
file.
- valid: Determine whether a
cmp.h5
file is valid.
8. listMetrics: Emit the available metrics and statistics for use
in the select
and stats
subcommands.
To list all available subtools provided by cmph5tools.py
simply
run:
cmph5tools.py --help
Each subtool has its own usage information which can be generated by running:
cmph5tools.py <toolname> --help
To run any subtool it is suggested to use the --info
commandline
argument since this will provide progress information while the script
is running via printing in stdout:
cmph5tools.py <toolname> --info <other arguments>