Version history¶
Version 0.6.1¶
2014-02-27
- added parser classes for BED and Wiggle format
Patch versions:
- 0.6.1p1 (2014-04-13)
- Fixed incorrect version tag
Version 0.6.0¶
2014-02-26
- Several changes and improvements to htseq-count:
- BAM files can now be read natively. (New option
--format
) - Paired-end SAM files can be used also if sorted by position. No need any mroe to sort by name. (New option
--order
.) - Documentation extended by a FAQ section.
- Default for
--minaqual
is now 10. (was: 0)
- BAM files can now be read natively. (New option
- New chapter in documentation, with more information on counting reads.
- New function
pair_SAM_alignments_with_buffer
to implement pairing for position-sorted SAM files.
Version 0.5.4¶
2013-02-20
Various bugs fixed, including
- GFF_Reader interpreted the constructor’s “end_included” flag in the wrong way, hence the end position of intervals of GFF features was off by 1 base pair before
- htseq-count no longer warns about missing chromosomes, as this warning was often misleading. Also, these reads are no properly included in the “no_feature” count.
- default for “max_qual” in “htseq-qa” is now 41, to accommodate newer Illumina FASTQ files
- BAM_Reader used to incorrectly label single-end reads as paired-end
Patch versions:
- v0.5.4p1 (2013-02-22):
- changed default for GFF_Reader to end_included=True, which is actually the correct style for Ensemble GTF files. Now the behavious should be as it was before.
- v0.5.4p2 (2013-04-18):
- fixed issue blocking proper built on Windows
- v0.5.4p3 (2013-04-29):
- htseq-count now correctly skips over “M0” cigar operations
- v0.5.4p4 (2013-08-28):
- added
.get_original_line()
function toVariantCall
- firex a bug with reads not being read as paired if they were not flagged as proper pair
- added
- v0.5.4p5 (2013-10-02/2013-10-10):
- parsing of GFF attribute field no longer fails on quoted semicolons
- fixed issue with get_line_number_string
Version 0.5.3¶
2011-06-29
- added the ‘–stranded=reverse’ option to htseq-count
Patch versions:
- v0.5.3p1 (2011-07-15):
- fix a bug in pair_sam_Alignment (many thanks for Justin Powell for finding the bug and suggesting a patch)
- v0.5.3p2 (2011-09-15)
- fixed a bug (and a documentation bug) in trim_left/right_end_with_quals
- v0.5.3p3 (2011-09-15)
- p2 was built improperly
- v0.5.3p5 (2012-05-29)
- added ‘to_line’ function to VariantCall objects and ‘meta_info’ function to VCF_Reader objects to print VCF-lines / -headers respectively
- v0.5.3p5b (2012-06-01) - added ‘flag’ field to SAM_Alignment objects and fixed ‘get_sam_line’ function of those
- v0.5.3p6 (2012-06-11) - fixed mix-up between patches p3, p4 and p5
- v0.5.3p7 (2012-06-13) - switched global pysam import to on-demand version
- v0.5.3p9ur1 (2012-08-31) - corrected get_sam_line: tab isntead of space between optional fields
Version 0.5.1¶
2011-05-03
- added steps method to GenomicArray
Patch versions:
- v0.5.1p1 (2011-05-11):
- fixed a bug in step_vector.h causing linkage failure under GCC 4.2
- v0.5.1p2 (2011-05-12):
- fixed pickling
- v0.5.1p3 (2011-05-22):
- fixed quality plot in htseq-qa (top pixel row, for quality score 40, was cut off)
Version 0.5.0¶
2011-04-21
- refactoring of GenomicArray class:
- field
step_vectors
replaced withchrom_vector
. These now contain dicts of dicts ofChromVector
objects rather thanStepVector
ones. chrom_vectors
is now always a dict of dict, even for unstranded GenomicArrays to make it easier to loop over them. (The inner dict has either keys"+"
and"-"
, or just one key,"."
.)- The new
ChromVector
class wraps the actual vector and supports three different storage modes:step
,ndarray
andmemmap
, the latter two being numpy arrays, without and with memory mapping. - The
GenomicArray
constructor now take two new arguments, one for the storage class, one for the memmap directory (if needed). - The
add_value
methods had been replaced with an__iadd__
method, to enable the+=
semantics. - Similarily,
+=
forGenomicArrayOfSets
adds an element to the sets. - Instead of
get_steps
, now usesteps
.
- field
- new parser class
VCF_Reader
and record classVariantCall
- new parser class
BAM_Reader
, to add BAM support (including indexed random access) (requires PySam to be installed) - new documentation page A detailed use case: TSS plots
Fasta_Reader
now allows indexed access to Fasta files (requires Pysam to be installed)- peek function removed
Patch Versions:
- v0.5.0p1 (2011-04-22):
- build was incomplete; fixed
- v0.5.0p2 (2011-04-22):
- build was still faulty; new try
- v0.5.0p3 (2011-04-26)
- fixed regression bug in htseq-count
Version 0.4.7¶
2010-12-22
- added new option
-o
(or--samout
) to htseq-count
Patch versions:
- Version 0.4.7p1 (2011-02-14)
- bug fix: GFF files with empty attribute fiels are now read correctly
- Version 0.4.7p2 (2011-03-13)
- fixed assertion error in pair_SAM_alignment, triggered by incorrect flags
- Version 0.4.7p3 (2011-03-15)
- fixed problem due to SAM_Alignment.peek (by removing the method)
- Version 0.4.7p4 (2011-03-18)
- removed left-over debugging print statement
Version 0.4.6¶
2010-12-09
- pair_SAM_alignments now handles multiple matches properly
- SAM_Alignments now allows access to optional fields via the new methods optional_field and optional_fields
- htseq-count now skips reads that are non-uniquely mapped according to the ‘NH’ optional field
- updated documentation
Patch versions:
- Version 0.4.6p1 (2010-12-17)
- updated htseq-count documentation page
- htseq-count now accepts ‘-‘ as SAM file name
- Version 0.4.6p2 (2012-12-21)
- corrected a bug in htseq-count regarding the handling of warnings and added SAM_Reader.peek.
Version 0.4.5¶
2010-08-30
- correction to GenomicArray.get_steps() when called without arguments
- correction to FileOrSequence.get_line_number_string
- removed use of urllib’s quote and unquote in GFF parsing/writing
- GFF_Reader now stores “meta information”
- qa.py now gives progress report
- auto add chrom now also works on read access
- refactored CIGAR parser
- added bool fields to SAM_Alignment for all flag bits
Patch versions:
- Version 0.4.5p1 (2010-10-08)
- correction of a mistake in CIGAR checking, misreading symbol “N”
- Version 0.4.5p2 (2010-10-13)
- Sequence.add_bases_to_count_array and hence htseq-qa now accepts ‘.’ instead of ‘N’ in a fastq file
- Version 0.4.5p3 (2010-10-20)
- fixed error reporting for PE in htseq-count
- Version 0.4.5p4 (2010-10-21)
- fixed another error reporting for PE in htseq-count
- Version 0.4.5p5 (2010-10-28)
- Not only ‘N’ but also ‘S’ was read the wrong way. Fixed.
- Cython had some odd way handling properties overloading attributes, which caused issues with ‘Alignment.read’. Worked around.
- Version 0.4.5p6 (2010-11-02)
- write_to_fastq should not break lines. Fixed.
- Version 0.4.5p7 (2010-11-16)
- added fallback to distutils in case setuptools in unavailable
- fixed documentation of ‘-a’ option to htseq-count
Version 0.4.4¶
2010-05-19
- StepVectors (and hence also GenomicArrays) now notice if, when setting the value of a step, this value is equal to an adjacent step and merge the steps.
- GenomicArray’s constructor now allows the special value
"auto"
for its first arguments in order to start without chromosomes and automatically add them when first encountered.
Patch versions:
- Version 0.4.4p1 (2010-05-26):
- minor change to make it run on Python 2.5 again
- changed ‘str’ to ‘bytes’ at various places, now compiles with Cython 0.12 (but no longer with Cython 0.11 and Python 2.5)
- Version 0.4.4p2 (2010-06-05):
- change to SAM parser: if flag “query unmapped is set” but RNAME is not “*”, a warning (rather than an error) is issued
- Version 0.4.4p3 (2010-06-25)
- again removed an “except sth as e”
- Version 0.4.4p4 (2010-07-12)
- dto.
- Version 0.4.4p5 (2010-07-13)
- rebuilt with Cython 0.12.1 (previous one was accidently built with Cython 0.11.1, causing it to fail with Python 2.5)
- Version 0.4.4p6 (2010-07-21)
- fixed bug in error reporting in count.py
- losened GFF attribute parsing
- changed “mio” to “millions” in qa output
- improved error reporting in GFF parser
- made SAM parsing more tolerant
Version 0.4.3¶
2010-05-01
New argument to constructer of GFF_Reader: end_include
- Version 0.4.3-p1 (2010-05-04): version number was messed up; fixed
- Version 0.4.3-p2 (2010-05-15): fixed ‘-q’ option in htseq-count
- Version 0.4.3-p3 (2010-05-15): parse_GFF_attribute_string can now deal with empty fields; score treated as float, not int
- Version 0.4.3-p3 (2010-05-15): - parse_GFF_attribute_string can now deal with empty fields; score treated as float, not int - fixed bug in SAM_Reader: can now deal with SAM files with 11 columns - SAM_Alignment._tags is now a list of strings
- Version 0.4.3-p4 (2010-05-16): bumped version number again just to make sure
Version 0.4.2¶
2010-04-19
Bug fixes to htseq-count and pair_SAM_alignments. Bumped version number to avoid confusion.
- Version 0.4.2-p1 (2010-04-20): there was still a bug left in htseq-count, fixed.
- Version 0.4.2-p2 (2010-04-26): bug fix: adapter trimming failed if the adapter was completely included in the sequence
- Version 0.4.2-p3
- Version 0.4.2-p4 (2010-04-29): bug fix: error in warning when htseq-count encountered an unknown chromosome
- Version 0.4.2-p5 (2010-04-30): bug fixes: error in warning when PE positions are mismatched, and misleading error when calling get_steps with unstranded interval in a stranded array
Version 0.4.1¶
2010-04-19
Bug fixes:
- Fixed bug in
htseq-count
: CIGAR strings with gaps were not correctly handled - Fixed bug in Tour (last section, on counting): An wrong indent, and accidental
change to the
exons
variable invalidated data. - SolexaExportReader no longer complains about multiplexing (indexing) not being supported.
- Mention link to example data in Tour.
- Fix installation instructions. (
--user
does not work for Python 2.5.)
Enhancements:
- Paired-end support for SAM_Alignment.
- “_as_pos” attributes for GenomicInterval
Version 0.4.0¶
2010-04-07
First “official” release, i.e., uploaded to PyPI and announced at SeqAnswers