org.biojava.nbio.structure.io.FastaStructureParser

public class FastaStructureParser extends Object

Reads a protein sequence from a fasta file and attempts to match it to a 3D structure. Any gaps ('-') in the fasta file are preserved as null atoms in the output, allowing structural alignments to be read from fasta files.

Structures are loaded from an AtomCache. For this to work, the accession for each protein should be parsed from the fasta header line into a form understood by AtomCache.getStructure(String).

Lowercase letters are sometimes used to specify unaligned residues. This information can be preserved by using a CasePreservingSequenceCreator, which allows the case of residues to be accessed through the AbstractSequence.getUserCollection() method.

Author:: Spencer Bliven

Constructor Summary

Constructors

Constructor

Description

FastaStructureParser(File file, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)

FastaStructureParser(InputStream is, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)

FastaStructureParser(FastaReader<ProteinSequence,AminoAcidCompound> reader, AtomCache cache)
Method Summary

Modifier and Type

Method

Description

String[]

getAccessions()

Gets the protein accessions mapped from the Fasta file.

ResidueNumber[][]

getResidues()

For each residue in the fasta file, return the ResidueNumber in the corresponding structure.

ProteinSequence[]

getSequences()

Gets the protein sequences read from the Fasta file.

Structure[]

getStructures()

Gets the protein structures mapped from the Fasta file.

void

process()

Parses the fasta file and loads it into memory.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- FastaStructureParser
  
  public FastaStructureParser(InputStream is, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache)
- FastaStructureParser
  
  public FastaStructureParser(File file, SequenceHeaderParserInterface<ProteinSequence,AminoAcidCompound> headerParser, SequenceCreatorInterface<AminoAcidCompound> sequenceCreator, AtomCache cache) throws FileNotFoundException
  
  Throws:
  
  FileNotFoundException
- FastaStructureParser
  
  public FastaStructureParser(FastaReader<ProteinSequence,AminoAcidCompound> reader, AtomCache cache)
Method Details
- process
  
  public void process() throws IOException, StructureException
  
  Parses the fasta file and loads it into memory. Information can be subsequently accessed through getSequences(), getStructures(), getResidues(), and getAccessions().
  
  Throws:
  
  IOException
  
  StructureException
- getSequences
  
  public ProteinSequence[] getSequences()
  
  Gets the protein sequences read from the Fasta file. Returns null if process() has not been called.
  
  Returns:
  
  An array ProteinSequences from parsing the fasta file, or null if process() hasn't been called.
- getStructures
  
  public Structure[] getStructures()
  
  Gets the protein structures mapped from the Fasta file. Returns null if process() has not been called.
  
  Returns:
  
  An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
- getResidues
  
  public ResidueNumber[][] getResidues()
  
  For each residue in the fasta file, return the ResidueNumber in the corresponding structure. If the residue cannot be found in the structure, that entry will be null. This can happen if that residue was not included in the PDB file (eg disordered residues), if the fasta sequence does not match the PDB sequence, or if errors occur during the matching process.
  Returns:
  
  A 2D array of ResidueNumbers, or null if process() hasn't been called.
  
  See Also:
  
  StructureSequenceMatcher.matchSequenceToStructure(ProteinSequence, Structure)
- getAccessions
  
  public String[] getAccessions()
  
  Gets the protein accessions mapped from the Fasta file. Returns null if process() has not been called.
  
  Returns:
  
  An array of Structures for each protein in the fasta file, or null if process() hasn't been called.

Class FastaStructureParser

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

FastaStructureParser

FastaStructureParser

FastaStructureParser

Method Details

process

getSequences

getStructures

getResidues

getAccessions