Class StructureSequenceMatcher

java.lang.Object
org.biojava.nbio.structure.io.StructureSequenceMatcher

public class StructureSequenceMatcher extends Object
A utility class with methods for matching ProteinSequences with Structures.
Author:
Spencer Bliven
  • Constructor Details

    • StructureSequenceMatcher

      public StructureSequenceMatcher()
  • Method Details

    • getSubstructureMatchingProteinSequence

      public static Structure getSubstructureMatchingProteinSequence(ProteinSequence sequence, Structure wholeStructure)
      Get a substructure of wholeStructure containing only the Groups that are included in sequence. The resulting structure will contain only ATOM residues; the SEQ-RES will be empty. The Chains of the Structure will be new instances (cloned), but the Groups will not.
      Parameters:
      sequence - The input protein sequence
      wholeStructure - The structure from which to take a substructure
      Returns:
      The resulting structure
      Throws:
      StructureException
    • getProteinSequenceForStructure

      public static ProteinSequence getProteinSequenceForStructure(Structure struct, Map<Integer,Group> groupIndexPosition)
      Generates a ProteinSequence corresponding to the sequence of struct, and maintains a mapping from the sequence back to the original groups. Chains are appended to one another. 'X' is used for heteroatoms.
      Parameters:
      struct - Input structure
      groupIndexPosition - An empty map, which will be populated with (residue index in returned ProteinSequence) -> (Group within struct)
      Returns:
      A ProteinSequence with the full sequence of struct. Chains are concatenated in the same order as the input structures
    • matchSequenceToStructure

      public static ResidueNumber[] matchSequenceToStructure(ProteinSequence seq, Structure struct)
      Given a sequence and the corresponding Structure, get the ResidueNumber for each residue in the sequence.

      Smith-Waterman alignment is used to match the sequences. Residues in the sequence but not the structure or mismatched between sequence and structure will have a null atom, while residues in the structure but not the sequence are ignored with a warning.

      Parameters:
      seq - The protein sequence. Should match the sequence of struct very closely.
      struct - The corresponding protein structure
      Returns:
      A list of ResidueNumbers of the same length as seq, containing either the corresponding residue or null.
    • removeGaps

      public static ProteinSequence removeGaps(ProteinSequence gapped)
      Removes all gaps ('-') from a protein sequence
      Parameters:
      gapped -
      Returns:
    • removeGaps

      public static <T> T[][] removeGaps(T[][] gapped)
      Creates a new list consisting of all columns of gapped where no row contained a null value. Here, "row" refers to the first index and "column" to the second, eg gapped.get(row).get(column)
      Parameters:
      gapped - A rectangular matrix containing null to mark gaps
      Returns:
      A new List without columns containing nulls