Package org.biojava.nbio.structure.io
Class FastaStructureParser
java.lang.Object
org.biojava.nbio.structure.io.FastaStructureParser
Reads a protein sequence from a fasta file and attempts to match it to a
3D structure. Any gaps ('-') in the fasta file are preserved as null atoms in
the output, allowing structural alignments to be read from fasta files.
Structures are loaded from an AtomCache. For this to work, the accession
for each protein should be parsed from the fasta header line into a form
understood by AtomCache.getStructure(String)
.
Lowercase letters are sometimes used to specify unaligned residues.
This information can be preserved by using a CasePreservingSequenceCreator,
which allows the case of residues to be accessed through the
AbstractSequence.getUserCollection()
method.
- Author:
- Spencer Bliven
-
Constructor Summary
ConstructorDescriptionFastaStructureParser
(File file, SequenceHeaderParserInterface<ProteinSequence, AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) FastaStructureParser
(InputStream is, SequenceHeaderParserInterface<ProteinSequence, AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) FastaStructureParser
(FastaReader<ProteinSequence, AminoAcidCompound> reader, AtomCache cache) -
Method Summary
Modifier and TypeMethodDescriptionString[]
Gets the protein accessions mapped from the Fasta file.ResidueNumber[][]
For each residue in the fasta file, return the ResidueNumber in the corresponding structure.Gets the protein sequences read from the Fasta file.Gets the protein structures mapped from the Fasta file.void
process()
Parses the fasta file and loads it into memory.
-
Constructor Details
-
FastaStructureParser
public FastaStructureParser(InputStream is, SequenceHeaderParserInterface<ProteinSequence, AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) -
FastaStructureParser
public FastaStructureParser(File file, SequenceHeaderParserInterface<ProteinSequence, AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) throws FileNotFoundException- Throws:
FileNotFoundException
-
FastaStructureParser
-
-
Method Details
-
process
Parses the fasta file and loads it into memory. Information can be subsequently accessed throughgetSequences()
,getStructures()
,getResidues()
, andgetAccessions()
.- Throws:
IOException
StructureException
-
getSequences
Gets the protein sequences read from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array ProteinSequences from parsing the fasta file, or null if process() hasn't been called.
-
getStructures
Gets the protein structures mapped from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
-
getResidues
For each residue in the fasta file, return the ResidueNumber in the corresponding structure. If the residue cannot be found in the structure, that entry will be null. This can happen if that residue was not included in the PDB file (eg disordered residues), if the fasta sequence does not match the PDB sequence, or if errors occur during the matching process.- Returns:
- A 2D array of ResidueNumbers, or null if process() hasn't been called.
- See Also:
-
getAccessions
Gets the protein accessions mapped from the Fasta file. Returns null ifprocess()
has not been called.- Returns:
- An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
-