Package org.biojava.nbio.structure.io
Class StructureSequenceMatcher
java.lang.Object
org.biojava.nbio.structure.io.StructureSequenceMatcher
A utility class with methods for matching ProteinSequences with
Structures.
- Author:
- Spencer Bliven
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic ProteinSequence
getProteinSequenceForStructure
(Structure struct, Map<Integer, Group> groupIndexPosition) Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups.static Structure
getSubstructureMatchingProteinSequence
(ProteinSequence sequence, Structure wholeStructure) static ResidueNumber[]
matchSequenceToStructure
(ProteinSequence seq, Structure struct) Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.static ProteinSequence
removeGaps
(ProteinSequence gapped) Removes all gaps ('-') from a protein sequencestatic <T> T[][]
removeGaps
(T[][] gapped) Creates a new list consisting of all columns of gapped where no row contained a null value.
-
Constructor Details
-
StructureSequenceMatcher
public StructureSequenceMatcher()
-
-
Method Details
-
getSubstructureMatchingProteinSequence
public static Structure getSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure) Get a substructure ofwholeStructure
containing only theGroups
that are included insequence
. The resulting structure will contain onlyATOM
residues; the SEQ-RES will be empty. TheChains
of the Structure will be new instances (cloned), but theGroups
will not.- Parameters:
sequence
- The input protein sequencewholeStructure
- The structure from which to take a substructure- Returns:
- The resulting structure
- Throws:
StructureException
-
getProteinSequenceForStructure
public static ProteinSequence getProteinSequenceForStructure(Structure struct, Map<Integer, Group> groupIndexPosition) Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups. Chains are appended to one another. 'X' is used for heteroatoms.- Parameters:
struct
- Input structuregroupIndexPosition
- An empty map, which will be populated with (residue index in returned ProteinSequence) -> (Group within struct)- Returns:
- A ProteinSequence with the full sequence of struct. Chains are concatenated in the same order as the input structures
-
matchSequenceToStructure
Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.Smith-Waterman alignment is used to match the sequences. Residues in the sequence but not the structure or mismatched between sequence and structure will have a null atom, while residues in the structure but not the sequence are ignored with a warning.
- Parameters:
seq
- The protein sequence. Should match the sequence of struct very closely.struct
- The corresponding protein structure- Returns:
- A list of ResidueNumbers of the same length as seq, containing either the corresponding residue or null.
-
removeGaps
Removes all gaps ('-') from a protein sequence- Parameters:
gapped
-- Returns:
-
removeGaps
public static <T> T[][] removeGaps(T[][] gapped) Creates a new list consisting of all columns of gapped where no row contained a null value. Here, "row" refers to the first index and "column" to the second, eg gapped.get(row).get(column)- Parameters:
gapped
- A rectangular matrix containing null to mark gaps- Returns:
- A new List without columns containing nulls
-