Class BitSequenceReader.BitArrayWorker<C extends Compound>

java.lang.Object
org.biojava.nbio.core.sequence.storage.BitSequenceReader.BitArrayWorker<C>
Type Parameters:
C - The Compound to use
Direct Known Subclasses:
FourBitSequenceReader.FourBitArrayWorker, TwoBitSequenceReader.TwoBitArrayWorker
Enclosing class:
BitSequenceReader<C extends Compound>

public abstract static class BitSequenceReader.BitArrayWorker<C extends Compound> extends Object
The logic of working with a bit has been separated out into this class to help developers create the bit data structures without having to put the code into an intermediate format and to also use the format without the need to copy this code. This class behaves just like a Sequence without the interface
Author:
ayates
  • Field Details

  • Constructor Details

    • BitArrayWorker

      public BitArrayWorker(Sequence<C> sequence)
    • BitArrayWorker

      public BitArrayWorker(String sequence, CompoundSet<C> compoundSet)
    • BitArrayWorker

      public BitArrayWorker(CompoundSet<C> compoundSet, int length)
    • BitArrayWorker

      public BitArrayWorker(CompoundSet<C> compoundSet, int[] sequence)
  • Method Details

    • bitMask

      protected abstract byte bitMask()
      This method should return the bit mask to be used to extract the bytes you are interested in working with. See solid implementations on how to create these
    • compoundsPerDatatype

      protected abstract int compoundsPerDatatype()
      Should return the maximum amount of compounds we can encode per int
    • generateIndexToCompounds

      protected abstract List<C> generateIndexToCompounds()
      Should return the inverse information that generateCompoundsToIndex() returns i.e. if the Compound C returns 1 from compoundsToIndex then we should find that compound here in position 1
    • generateCompoundsToIndex

      protected abstract Map<C,Integer> generateCompoundsToIndex()
      Returns what the value of a compound is in the backing bit storage i.e. in 2bit storage the value 0 is encoded as 00 (in binary).
    • bitsPerCompound

      protected int bitsPerCompound()
      Returns how many bits are used to represent a compound e.g. 2 if using 2bit encoding.
    • seqArraySize

      public int seqArraySize(int length)
    • populate

      public void populate(Sequence<C> sequence)
      Loops through the Compounds in a Sequence and passes them onto setCompoundAt(Compound, int)
    • populate

      public void populate(String sequence)
      Loops through the chars in a String and passes them onto setCompoundAt(char, int)
    • setCompoundAt

      public void setCompoundAt(char base, int position)
      Converts from char to Compound and sets it at the given biological index
    • setCompoundAt

      public void setCompoundAt(C compound, int position)
      Sets the compound at the specified biological index
    • getCompoundAt

      public C getCompoundAt(int position)
      Returns the compound at the specified biological index
    • processUnknownCompound

      protected byte processUnknownCompound(C compound, int position) throws IllegalStateException
      Since bit encoding only supports a finite number of bases it is more than likely when processing sequence you will encounter a compound which is not covered by the encoding e.g. N in a 2bit sequence. You can override this to convert the unknown base into one you can process or store locations of unknown bases for a level of post processing in your subclass.
      Parameters:
      compound - Compound process
      Returns:
      Byte representation of the compound
      Throws:
      IllegalStateException - Done whenever this method is invoked
    • getIndexToCompoundsLookup

      protected List<C> getIndexToCompoundsLookup()
      Returns a list of compounds the index position of which is used to translate from the byte representation into a compound.
    • getCompoundsToIndexLookup

      protected Map<C,Integer> getCompoundsToIndexLookup()
      Returns a map which converts from compound to an integer representation
    • getCompoundSet

      public CompoundSet<C> getCompoundSet()
      Returns the compound set backing this store
    • getLength

      public int getLength()
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object