Class SequenceUtil

java.lang.Object
org.biojava.nbio.data.sequence.SequenceUtil

public final class SequenceUtil extends Object
Utility class for operations on sequences
Since:
3.0.2
Version:
1.0
Author:
Peter Troshin
  • Field Details

    • WHITE_SPACE

      public static final Pattern WHITE_SPACE
      A whitespace character: [\t\n\x0B\f\r]
    • DIGIT

      public static final Pattern DIGIT
      A digit
    • NONWORD

      public static final Pattern NONWORD
      Non word
    • AA

      public static final Pattern AA
      Valid Amino acids
    • NON_AA

      public static final Pattern NON_AA
      inversion of AA pattern
    • AMBIGUOUS_AA

      public static final Pattern AMBIGUOUS_AA
      Same as AA pattern but with two additional letters - XU
    • NUCLEOTIDE

      public static final Pattern NUCLEOTIDE
      Nucleotides a, t, g, c, u
    • AMBIGUOUS_NUCLEOTIDE

      public static final Pattern AMBIGUOUS_NUCLEOTIDE
      Ambiguous nucleotide
    • NON_NUCLEOTIDE

      public static final Pattern NON_NUCLEOTIDE
      Non nucleotide
  • Method Details

    • isNucleotideSequence

      public static boolean isNucleotideSequence(FastaSequence s)
      Returns:
      true is the sequence contains only letters a,c, t, g, u
    • isNonAmbNucleotideSequence

      public static boolean isNonAmbNucleotideSequence(String sequence)
      Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B char
    • cleanSequence

      public static String cleanSequence(String sequence)
      Removes all whitespace chars in the sequence string
      Parameters:
      sequence -
      Returns:
      cleaned up sequence
    • deepCleanSequence

      public static String deepCleanSequence(String sequence)
      Removes all special characters and digits as well as whitespace chars from the sequence
      Parameters:
      sequence -
      Returns:
      cleaned up sequence
    • isProteinSequence

      public static boolean isProteinSequence(String sequence)
      Parameters:
      sequence -
      Returns:
      true is the sequence is a protein sequence, false overwise
    • isAmbiguosProtein

      public static boolean isAmbiguosProtein(String sequence)
      Check whether the sequence confirms to amboguous protein sequence
      Parameters:
      sequence -
      Returns:
      return true only if the sequence if ambiguous protein sequence Return false otherwise. e.g. if the sequence is non-ambiguous protein or DNA
    • writeFasta

      public static void writeFasta(OutputStream outstream, List<FastaSequence> sequences, int width) throws IOException
      Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line
      Parameters:
      outstream -
      sequences -
      width - - the maximum number of characters to write in one line
      Throws:
      IOException
    • readFasta

      public static List<FastaSequence> readFasta(InputStream inStream) throws IOException
      Reads fasta sequences from inStream into the list of FastaSequence objects
      Parameters:
      inStream - from
      Returns:
      list of FastaSequence objects
      Throws:
      IOException
    • writeFasta

      public static void writeFasta(OutputStream os, List<FastaSequence> sequences) throws IOException
      Writes FastaSequence in the file, each sequence will take one line only
      Parameters:
      os -
      sequences -
      Throws:
      IOException