Class MultipleAlignmentTools

java.lang.Object
org.biojava.nbio.structure.align.multiple.util.MultipleAlignmentTools

public class MultipleAlignmentTools extends Object
Utility functions for working with MultipleAlignment.

Supported functions:

  • Obtain the alignment as sequence strings
  • Map from sequence alignment position to structure Atom
  • Map from sequence alignment position to Block number
  • Transform the aligned Atoms of a MultipleAlignment
  • Get all the core alignment positions of the alignment
  • Calculate the average residue distance of all aligned positions
  • Sort Blocks in a MultipleAlignment by a specified row
  • Convert a MultipleAlignment to a MultipleSequenceAlignment
Since:
4.1.0
Author:
Spencer Bliven, Aleix Lafita
  • Constructor Details

    • MultipleAlignmentTools

      public MultipleAlignmentTools()
  • Method Details

    • getSequenceAlignment

      public static List<String> getSequenceAlignment(MultipleAlignment alignment, List<Integer> mapSeqToStruct)
      Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.

      Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.

      Parameters:
      alignment - input MultipleAlignment
      mapSeqToStruct - provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.
      Returns:
      a string for each row in the alignment, giving the 1-letter code for each aligned residue.
    • getSequenceAlignment

      public static List<String> getSequenceAlignment(MultipleAlignment msa)
      Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.

      Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.

      Parameters:
      alignment - input MultipleAlignment
      Returns:
      String for each row in the alignment, giving the 1-letter code for each aligned residue.
    • getBlockSequenceAlignment

      public static List<String> getBlockSequenceAlignment(MultipleAlignment alignment, List<Integer> mapSeqToStruct)
      Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein use getSequenceAlignment(MultipleAlignment).

      Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.

      Parameters:
      alignment - input MultipleAlignment
      mapSeqToStruct - provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.
      Returns:
      a string for each row in the alignment, giving the 1-letter code for each aligned residue.
    • getBlockSequenceAlignment

      public static List<String> getBlockSequenceAlignment(MultipleAlignment ma)
      Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein use getSequenceAlignment(MultipleAlignment).

      Blocks are concatenated in the order returned by MultipleAlignment.getBlocks(), so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.

      Parameters:
      alignment - input MultipleAlignment
      Returns:
      String for each row in the alignment, giving the 1-letter code for each aligned residue.
    • getAtomForSequencePosition

      public static Atom getAtomForSequencePosition(MultipleAlignment msa, List<Integer> mapSeqToStruct, int str, int sequencePos)
      Returns the Atom of the specified structure that is aligned in the sequence alignment position specified.
      Parameters:
      multAln - the MultipleAlignment object from where the sequence alignment has been generated
      mapSeqToStruct - the mapping between sequence and structure generated with the sequence alignment
      str - the structure index of the alignment (row)
      sequencePos - the sequence alignment position (column)
      Returns:
      Atom the atom in that position or null if there is a gap
    • getBlockForSequencePosition

      public static int getBlockForSequencePosition(MultipleAlignment multAln, List<Integer> mapSeqToStruct, int sequencePos)
      Returns the block number of a specified position in the sequence alignment, given the mapping from structure to function.
      Parameters:
      multAln - the MultipleAlignment object from where the sequence alignment has been generated.
      mapSeqToStruct - the mapping between sequence and structure generated with the sequence alignment
      sequencePos - the position in the sequence alignment (column)
      Returns:
      int the block index, or -1 if the position is not aligned
    • getAverageResidueDistances

      public static Matrix getAverageResidueDistances(MultipleAlignment msa)
      The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.

      Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.

      Parameters:
      alignment - MultipleAlignment
      Returns:
      Matrix containing all average residue distances
    • getAverageResidueDistances

      public static Matrix getAverageResidueDistances(List<Atom[]> transformed)
      The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.

      Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.

      Parameters:
      transformed - List of Atom arrays containing only the aligned atoms of each structure, or null if there is a gap.
      Returns:
      Matrix containing all average residue distances. Entry -1 means there is a gap in the position.
    • transformAtoms

      public static List<Atom[]> transformAtoms(MultipleAlignment alignment)
      Transforms atoms according to the superposition stored in the alignment.

      For each structure in the alignment, returns an atom for each representative atom in the aligned columns, omitting unaligned residues (i.e. an array of length alignment.length() ).

      All blocks are concatenated together, so Atoms may not appear in the same order as in their parent structure. If the alignment blocks contain null residues (gaps), then the returned array will also contain null Atoms in the same positions.

      Parameters:
      alignment - MultipleAlignment
      Returns:
      List of Atom arrays of only the aligned atoms of every structure (null Atom if a gap position)
    • getCorePositions

      public static List<Integer> getCorePositions(Block block)
      Calculate a List of alignment indicies that correspond to the core of a Block, which means that all structures have a residue in that positon.
      Parameters:
      block - alignment Block
      Returns:
      List of positions in the core of the alignment
    • sortBlocks

      public static void sortBlocks(List<Block> blocks, int sortedIndex)
      Sort blocks so that the specified row is in sequential order. The sort happens in place.
      Parameters:
      blocks - List of blocks
      sortedIndex - Index of the row to be sorted
    • toProteinMSA

      Convert a MultipleAlignment into a MultipleSequenceAlignment of AminoAcid residues. This method is only valid for protein structure alignments.
      Parameters:
      msta - Multiple Structure Alignment
      Returns:
      MultipleSequenceAlignment of protein sequences
      Throws:
      CompoundNotFoundException
    • getRMSDMatrix

      public static Matrix getRMSDMatrix(MultipleAlignment msa)
      Calculate the RMSD matrix of a MultipleAlignment, that is, entry (i,j) of the matrix contains the RMSD between structures i and j.
      Parameters:
      msa - Multiple Structure Alignment
      Returns:
      Matrix of RMSD with size the number of structures squared
    • getKimuraTree

      public static Phylogeny getKimuraTree(MultipleAlignment msta) throws CompoundNotFoundException, IOException
      Calculate a phylogenetic tree of the MultipleAlignment using Kimura distances and the Neighbor Joining algorithm from forester.
      Parameters:
      msta - MultipleAlignment of protein structures
      Returns:
      Phylogeny phylogenetic tree
      Throws:
      CompoundNotFoundException
      IOException
    • getHSDMTree

      public static Phylogeny getHSDMTree(MultipleAlignment msta) throws CompoundNotFoundException, IOException
      Calculate a phylogenetic tree of the MultipleAlignment using dissimilarity scores (DS), based in SDM Substitution Matrix (ideal for distantly related proteins, structure-derived) and the Neighbor Joining algorithm from forester.
      Parameters:
      msta - MultipleAlignment of protein structures
      Returns:
      Phylogeny phylogenetic tree
      Throws:
      CompoundNotFoundException
      IOException
    • getStructuralTree

      public static Phylogeny getStructuralTree(MultipleAlignment msta)
      Calculate a phylogenetic tree of the MultipleAlignment using RMSD distances and the Neighbor Joining algorithm from forester.
      Parameters:
      msta - MultipleAlignment of protein structures
      Returns:
      Phylogeny phylogenetic tree
      Throws:
      CompoundNotFoundException