Class MultipleAlignmentTools
MultipleAlignment
.
Supported functions:
- Obtain the alignment as sequence strings
- Map from sequence alignment position to structure Atom
- Map from sequence alignment position to Block number
- Transform the aligned Atoms of a MultipleAlignment
- Get all the core alignment positions of the alignment
- Calculate the average residue distance of all aligned positions
- Sort Blocks in a MultipleAlignment by a specified row
- Convert a MultipleAlignment to a MultipleSequenceAlignment
- Since:
- 4.1.0
- Author:
- Spencer Bliven, Aleix Lafita
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic Atom
getAtomForSequencePosition
(MultipleAlignment msa, List<Integer> mapSeqToStruct, int str, int sequencePos) Returns the Atom of the specified structure that is aligned in the sequence alignment position specified.static Matrix
getAverageResidueDistances
(List<Atom[]> transformed) The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.static Matrix
The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.static int
getBlockForSequencePosition
(MultipleAlignment multAln, List<Integer> mapSeqToStruct, int sequencePos) Returns the block number of a specified position in the sequence alignment, given the mapping from structure to function.Calculate the sequence alignment Strings for the alignment Blocks in an alignment.getBlockSequenceAlignment
(MultipleAlignment alignment, List<Integer> mapSeqToStruct) Calculate the sequence alignment Strings for the alignment Blocks in an alignment.getCorePositions
(Block block) Calculate a List of alignment indicies that correspond to the core of a Block, which means that all structures have a residue in that positon.static Phylogeny
getHSDMTree
(MultipleAlignment msta) Calculate a phylogenetic tree of the MultipleAlignment using dissimilarity scores (DS), based in SDM Substitution Matrix (ideal for distantly related proteins, structure-derived) and the Neighbor Joining algorithm from forester.static Phylogeny
Calculate a phylogenetic tree of the MultipleAlignment using Kimura distances and the Neighbor Joining algorithm from forester.static Matrix
Calculate the RMSD matrix of a MultipleAlignment, that is, entry (i,j) of the matrix contains the RMSD between structures i and j.Calculate the sequence alignment Strings for the whole alignment.getSequenceAlignment
(MultipleAlignment alignment, List<Integer> mapSeqToStruct) Calculate the sequence alignment Strings for the whole alignment.static Phylogeny
Calculate a phylogenetic tree of the MultipleAlignment using RMSD distances and the Neighbor Joining algorithm from forester.static void
sortBlocks
(List<Block> blocks, int sortedIndex) Sort blocks so that the specified row is in sequential order.Convert a MultipleAlignment into a MultipleSequenceAlignment of AminoAcid residues.transformAtoms
(MultipleAlignment alignment) Transforms atoms according to the superposition stored in the alignment.
-
Constructor Details
-
MultipleAlignmentTools
public MultipleAlignmentTools()
-
-
Method Details
-
getSequenceAlignment
public static List<String> getSequenceAlignment(MultipleAlignment alignment, List<Integer> mapSeqToStruct) Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.Blocks are concatenated in the order returned by
MultipleAlignment.getBlocks()
, so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.- Parameters:
alignment
- input MultipleAlignmentmapSeqToStruct
- provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.- Returns:
- a string for each row in the alignment, giving the 1-letter code for each aligned residue.
-
getSequenceAlignment
Calculate the sequence alignment Strings for the whole alignment. This method creates a sequence alignment where aligned residues are in uppercase and unaligned residues are in lowercase, thus providing a more compact way to represent the alignment.Blocks are concatenated in the order returned by
MultipleAlignment.getBlocks()
, so sequences may not be sequential. Gaps are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is a possible discontinuity.- Parameters:
alignment
- input MultipleAlignment- Returns:
- String for each row in the alignment, giving the 1-letter code for each aligned residue.
-
getBlockSequenceAlignment
public static List<String> getBlockSequenceAlignment(MultipleAlignment alignment, List<Integer> mapSeqToStruct) Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein usegetSequenceAlignment(MultipleAlignment)
.Blocks are concatenated in the order returned by
MultipleAlignment.getBlocks()
, so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.- Parameters:
alignment
- input MultipleAlignmentmapSeqToStruct
- provides a link from the sequence alignment position to the structure alignment position. Specially designed for the GUI. Has to be initialized previously and will be overwritten.- Returns:
- a string for each row in the alignment, giving the 1-letter code for each aligned residue.
-
getBlockSequenceAlignment
Calculate the sequence alignment Strings for the alignment Blocks in an alignment. This method creates a sequence alignment where all residues are in uppercase and a residue alone with gaps in all the other structures represents unaligned residues. Because of this latter constraint only the residues within the Blocks are represented, for a more compact alignment. For a sequence alignment of the full protein usegetSequenceAlignment(MultipleAlignment)
.Blocks are concatenated in the order returned by
MultipleAlignment.getBlocks()
, so sequences may not be sequential. Gaps between blocks are omitted, while gaps within blocks are represented by '-'. Separation between different Blocks is indicated by a gap in all positions, meaning that there is something unaligned inbetween.- Parameters:
alignment
- input MultipleAlignment- Returns:
- String for each row in the alignment, giving the 1-letter code for each aligned residue.
-
getAtomForSequencePosition
public static Atom getAtomForSequencePosition(MultipleAlignment msa, List<Integer> mapSeqToStruct, int str, int sequencePos) Returns the Atom of the specified structure that is aligned in the sequence alignment position specified.- Parameters:
multAln
- the MultipleAlignment object from where the sequence alignment has been generatedmapSeqToStruct
- the mapping between sequence and structure generated with the sequence alignmentstr
- the structure index of the alignment (row)sequencePos
- the sequence alignment position (column)- Returns:
- Atom the atom in that position or null if there is a gap
-
getBlockForSequencePosition
public static int getBlockForSequencePosition(MultipleAlignment multAln, List<Integer> mapSeqToStruct, int sequencePos) Returns the block number of a specified position in the sequence alignment, given the mapping from structure to function.- Parameters:
multAln
- the MultipleAlignment object from where the sequence alignment has been generated.mapSeqToStruct
- the mapping between sequence and structure generated with the sequence alignmentsequencePos
- the position in the sequence alignment (column)- Returns:
- int the block index, or -1 if the position is not aligned
-
getAverageResidueDistances
The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.
- Parameters:
alignment
- MultipleAlignment- Returns:
- Matrix containing all average residue distances
-
getAverageResidueDistances
The average residue distance Matrix contains the average distance from each residue to all other residues aligned with it.Complexity: T(n,l) = O(l*n^2), if n=number of structures and l=alignment length.
- Parameters:
transformed
- List of Atom arrays containing only the aligned atoms of each structure, or null if there is a gap.- Returns:
- Matrix containing all average residue distances. Entry -1 means there is a gap in the position.
-
transformAtoms
Transforms atoms according to the superposition stored in the alignment.For each structure in the alignment, returns an atom for each representative atom in the aligned columns, omitting unaligned residues (i.e. an array of length alignment.length() ).
All blocks are concatenated together, so Atoms may not appear in the same order as in their parent structure. If the alignment blocks contain null residues (gaps), then the returned array will also contain null Atoms in the same positions.
- Parameters:
alignment
- MultipleAlignment- Returns:
- List of Atom arrays of only the aligned atoms of every structure (null Atom if a gap position)
-
getCorePositions
Calculate a List of alignment indicies that correspond to the core of a Block, which means that all structures have a residue in that positon.- Parameters:
block
- alignment Block- Returns:
- List of positions in the core of the alignment
-
sortBlocks
Sort blocks so that the specified row is in sequential order. The sort happens in place.- Parameters:
blocks
- List of blockssortedIndex
- Index of the row to be sorted
-
toProteinMSA
public static MultipleSequenceAlignment<ProteinSequence,AminoAcidCompound> toProteinMSA(MultipleAlignment msta) throws CompoundNotFoundException Convert a MultipleAlignment into a MultipleSequenceAlignment of AminoAcid residues. This method is only valid for protein structure alignments.- Parameters:
msta
- Multiple Structure Alignment- Returns:
- MultipleSequenceAlignment of protein sequences
- Throws:
CompoundNotFoundException
-
getRMSDMatrix
Calculate the RMSD matrix of a MultipleAlignment, that is, entry (i,j) of the matrix contains the RMSD between structures i and j.- Parameters:
msa
- Multiple Structure Alignment- Returns:
- Matrix of RMSD with size the number of structures squared
-
getKimuraTree
public static Phylogeny getKimuraTree(MultipleAlignment msta) throws CompoundNotFoundException, IOException Calculate a phylogenetic tree of the MultipleAlignment using Kimura distances and the Neighbor Joining algorithm from forester.- Parameters:
msta
- MultipleAlignment of protein structures- Returns:
- Phylogeny phylogenetic tree
- Throws:
CompoundNotFoundException
IOException
-
getHSDMTree
public static Phylogeny getHSDMTree(MultipleAlignment msta) throws CompoundNotFoundException, IOException Calculate a phylogenetic tree of the MultipleAlignment using dissimilarity scores (DS), based in SDM Substitution Matrix (ideal for distantly related proteins, structure-derived) and the Neighbor Joining algorithm from forester.- Parameters:
msta
- MultipleAlignment of protein structures- Returns:
- Phylogeny phylogenetic tree
- Throws:
CompoundNotFoundException
IOException
-
getStructuralTree
Calculate a phylogenetic tree of the MultipleAlignment using RMSD distances and the Neighbor Joining algorithm from forester.- Parameters:
msta
- MultipleAlignment of protein structures- Returns:
- Phylogeny phylogenetic tree
- Throws:
CompoundNotFoundException
-