Package org.biojava.nbio.data.sequence
Class SequenceUtil
java.lang.Object
org.biojava.nbio.data.sequence.SequenceUtil
Utility class for operations on sequences
- Since:
- 3.0.2
- Version:
- 1.0
- Author:
- Peter Troshin
-
Field Summary
Modifier and TypeFieldDescriptionstatic final Pattern
Valid Amino acidsstatic final Pattern
Same as AA pattern but with two additional letters - XUstatic final Pattern
Ambiguous nucleotidestatic final Pattern
A digitstatic final Pattern
inversion of AA patternstatic final Pattern
Non nucleotidestatic final Pattern
Non wordstatic final Pattern
Nucleotides a, t, g, c, ustatic final Pattern
A whitespace character: [\t\n\x0B\f\r] -
Method Summary
Modifier and TypeMethodDescriptionstatic String
cleanSequence
(String sequence) Removes all whitespace chars in the sequence stringstatic String
deepCleanSequence
(String sequence) Removes all special characters and digits as well as whitespace chars from the sequencestatic boolean
isAmbiguosProtein
(String sequence) Check whether the sequence confirms to amboguous protein sequencestatic boolean
isNonAmbNucleotideSequence
(String sequence) Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B charstatic boolean
static boolean
isProteinSequence
(String sequence) static List<FastaSequence>
readFasta
(InputStream inStream) Reads fasta sequences from inStream into the list of FastaSequence objectsstatic void
writeFasta
(OutputStream os, List<FastaSequence> sequences) Writes FastaSequence in the file, each sequence will take one line onlystatic void
writeFasta
(OutputStream outstream, List<FastaSequence> sequences, int width) Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line
-
Field Details
-
WHITE_SPACE
A whitespace character: [\t\n\x0B\f\r] -
DIGIT
A digit -
NONWORD
Non word -
AA
Valid Amino acids -
NON_AA
inversion of AA pattern -
AMBIGUOUS_AA
Same as AA pattern but with two additional letters - XU -
NUCLEOTIDE
Nucleotides a, t, g, c, u -
AMBIGUOUS_NUCLEOTIDE
Ambiguous nucleotide -
NON_NUCLEOTIDE
Non nucleotide
-
-
Method Details
-
isNucleotideSequence
- Returns:
- true is the sequence contains only letters a,c, t, g, u
-
isNonAmbNucleotideSequence
Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B char -
cleanSequence
Removes all whitespace chars in the sequence string- Parameters:
sequence
-- Returns:
- cleaned up sequence
-
deepCleanSequence
Removes all special characters and digits as well as whitespace chars from the sequence- Parameters:
sequence
-- Returns:
- cleaned up sequence
-
isProteinSequence
- Parameters:
sequence
-- Returns:
- true is the sequence is a protein sequence, false overwise
-
isAmbiguosProtein
Check whether the sequence confirms to amboguous protein sequence- Parameters:
sequence
-- Returns:
- return true only if the sequence if ambiguous protein sequence Return false otherwise. e.g. if the sequence is non-ambiguous protein or DNA
-
writeFasta
public static void writeFasta(OutputStream outstream, List<FastaSequence> sequences, int width) throws IOException Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line- Parameters:
outstream
-sequences
-width
- - the maximum number of characters to write in one line- Throws:
IOException
-
readFasta
Reads fasta sequences from inStream into the list of FastaSequence objects- Parameters:
inStream
- from- Returns:
- list of FastaSequence objects
- Throws:
IOException
-
writeFasta
Writes FastaSequence in the file, each sequence will take one line only- Parameters:
os
-sequences
-- Throws:
IOException
-