Class AlignmentTools
Current methods: replace optimal alignment, create new AFPChain, format conversion, update superposition, etc.
- Author:
- Spencer Bliven, Aleix Lafita
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
A Map<K,V> can be viewed as a function from K to V. -
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionalignmentAsMap
(AFPChain afpChain) Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.static void
alignmentToSIF
(Writer out, AFPChain afpChain, Atom[] ca1, Atom[] ca2, String backboneInteraction, String alignmentInteraction) Creates a simple interaction format (SIF) file for an alignment.static <S,
T> Map<S, T> applyAlignment
(Map<S, T> alignmentMap, Map<T, S> identity, int k) Applies an alignment k times.static <T> Map<T,
T> applyAlignment
(Map<T, T> alignmentMap, int k) Applies an alignment k times.static int[]
calculateBlockGap
(int[][][] optAln) Method that calculates the number of gaps in each subunit block of an optimal AFP alignment.static AFPChain
createAFPChain
(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2) Fundamentally, an alignment is just a list of aligned residues in each protein.fromConciseAlignmentString
(String string) getOptAlnAsList
(AFPChain afpChain) Retrieves the optimum alignment from an AFPChain and returns it as a java collection.static int
getSymmetryOrder
(Map<Integer, Integer> alignment, int maxSymmetry, float minimumMetricChange) Helper forgetSymmetryOrder(Map, Map, int, float)
with a true identity function (X->X).static int
getSymmetryOrder
(Map<Integer, Integer> alignment, Map<Integer, Integer> identity, int maxSymmetry, float minimumMetricChange) Tries to detect symmetry in an alignment.static int
getSymmetryOrder
(AFPChain afpChain, int maxSymmetry, float minimumMetricChange) Guesses the order of symmetry in an alignmentguessSequentialAlignment
(Map<Integer, Integer> alignment, boolean inverseAlignment) Takes a potentially non-sequential alignment and guesses a sequential version of it.static boolean
isSequentialAlignment
(AFPChain afpChain, boolean checkWithinBlocks) Checks that the alignment given by afpChain is sequential.static AFPChain
replaceOptAln
(int[][][] newAlgn, AFPChain afpChain, Atom[] ca1, Atom[] ca2) It replaces an optimal alignment of an AFPChain and calculates all the new alignment scores and variables.static AFPChain
replaceOptAln
(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int blockNum, int[] optLens, int[][][] optAln) static AFPChain
Takes an AFPChain and replaces the optimal alignment based on an alignment mapstatic Object
resizeArray
(Object oldArray, int newSize) Reallocates an array with a new size, and copies the contents of the old array to the new array.static AFPChain
splitBlocksByTopology
(AFPChain a, Atom[] ca1, Atom[] ca2) static <S,
T> String toConciseAlignmentString
(Map<S, T> alignment, Map<T, S> identity) Print an alignment map in a concise representation.static <T> String
toConciseAlignmentString
(Map<T, T> alignment) static void
updateSuperposition
(AFPChain afpChain, Atom[] ca1, Atom[] ca2) After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid.
-
Field Details
-
debug
public static boolean debug
-
-
Constructor Details
-
AlignmentTools
public AlignmentTools()
-
-
Method Details
-
isSequentialAlignment
Checks that the alignment given by afpChain is sequential. This means that the residue indices of both proteins increase monotonically as a function of the alignment position (ie both proteins are sorted). This will return false for circularly permuted alignments or other non-topological alignments. It will also return false for cases where the alignment itself is sequential but it is not stored in the afpChain in a sorted manner. Since algorithms which create non-sequential alignments split the alignment into multiple blocks, some computational time can be saved by only checking block boundaries for sequentiality. Setting checkWithinBlocks to true makes this function slower, but detects AFPChains with non-sequential blocks. Note that this method should give the same results asAFPChain.isSequentialAlignment()
. However, the AFPChain version relies on the StructureAlignment algorithm correctly setting this parameter, which is sadly not always the case.- Parameters:
afpChain
- An alignmentcheckWithinBlocks
- Indicates whether individual blocks should be checked for sequentiality- Returns:
- True if the alignment is sequential.
-
alignmentAsMap
Creates a Map specifying the alignment as a mapping between residue indices of protein 1 and residue indices of protein 2.For example,
1234 5678
becomes1->5 2->6 3->7 4->8
- Parameters:
afpChain
- An alignment- Returns:
- A mapping from aligned residues of protein 1 to their partners in protein 2.
- Throws:
StructureException
- If afpChain is not one-to-one
-
applyAlignment
Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)).- Type Parameters:
T
-- Parameters:
alignmentMap
- The input function, as a map (seealignmentAsMap(AFPChain)
)k
- The number of times to apply the alignment- Returns:
- A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
-
applyAlignment
Applies an alignment k times. Eg if alignmentMap defines function f(x), this returns a function f^k(x)=f(f(...f(x)...)). To allow for functions with different domains and codomains, the identity function allows converting back in a reasonable way. For instance, if alignmentMap represented an alignment between two proteins with different numbering schemes, the identity function could calculate the offset between residue numbers, eg I(x) = x-offset. When an identity function is provided, the returned function calculates f^k(x) = f(I( f(I( ... f(x) ... )) )).- Type Parameters:
S
-T
-- Parameters:
alignmentMap
- The input function, as a map (seealignmentAsMap(AFPChain)
)identity
- An identity-like function providing the isomorphism between the codomain of alignmentMap (of type) and the domain (type ).k
- The number of times to apply the alignment- Returns:
- A new alignment. If the input function is not automorphic (one-to-one), then some inputs may map to null, indicating that the function is undefined for that input.
-
getSymmetryOrder
public static int getSymmetryOrder(Map<Integer, Integer> alignment, int maxSymmetry, float minimumMetricChange) Helper forgetSymmetryOrder(Map, Map, int, float)
with a true identity function (X->X).This method should only be used in cases where the two proteins aligned have identical numbering, as for self-alignments. See
getSymmetryOrder(AFPChain, int, float)
for a way to guess the sequential correspondence between two proteins.- Parameters:
alignment
-maxSymmetry
-minimumMetricChange
-- Returns:
-
getSymmetryOrder
public static int getSymmetryOrder(Map<Integer, Integer> alignment, Map<Integer, Integer> identity, int maxSymmetry, float minimumMetricChange) Tries to detect symmetry in an alignment.Conceptually, an alignment is a function f:A->B between two sets of integers. The function may have simple topology (meaning that if two elements of A are close, then their images in B will also be close), or may have more complex topology (such as a circular permutation). This function checks alignment against a reference function identity, which should have simple topology. It then tries to determine the symmetry order of alignment relative to identity, up to a maximum order of maxSymmetry.
Details
Considers the offset (in number of residues) which a residue moves after undergoing n alternating transforms by alignment and identity. If n corresponds to the intrinsic order of the alignment, this will be small. This algorithm tries increasing values of n and looks for abrupt decreases in the root mean squared offset. If none are found at n<=maxSymmetry, the alignment is reported as non-symmetric.- Parameters:
alignment
- The alignment to test for symmetryidentity
- An alignment with simple topology which approximates the sequential relationship between the two proteins. Should map in the reverse direction from alignment.maxSymmetry
- Maximum symmetry to consider. High values increase the calculation time and can lead to overfitting.minimumMetricChange
- Percent decrease in root mean squared offsets in order to declare symmetry. 0.4f seems to work well for CeSymm.- Returns:
- The order of symmetry of alignment, or 1 if no order <= maxSymmetry is found.
- See Also:
-
getSymmetryOrder
public static int getSymmetryOrder(AFPChain afpChain, int maxSymmetry, float minimumMetricChange) throws StructureException Guesses the order of symmetry in an alignmentUses
getSymmetryOrder(Map alignment, Map identity, int, float)
to determine the the symmetry order. For the identity alignment, sorts the aligned residues of each protein sequentially, then defines the ith residues of each protein to be equivalent.Note that the selection of the identity alignment here is very naive, and only works for proteins with very good coverage. Wherever possible, it is better to construct an identity function explicitly from a sequence alignment (or use an
AlignmentTools.IdentityMap
for internally symmetric proteins) and usegetSymmetryOrder(Map, Map, int, float)
.- Throws:
StructureException
-
guessSequentialAlignment
public static Map<Integer,Integer> guessSequentialAlignment(Map<Integer, Integer> alignment, boolean inverseAlignment) Takes a potentially non-sequential alignment and guesses a sequential version of it. Residues from each structure are sorted sequentially and then compared directly.The results of this method are consistent with what one might expect from an identity function, and are therefore useful with
getSymmetryOrder(Map, Map identity, int, float)
.- Perfect self-alignments will have the same pre-image and image, so will map X->X
- Gaps and alignment errors will cause errors in the resulting map, but only locally. Errors do not propagate through the whole alignment.
Example:
A non sequential alignment, represented schematically as12456789 78912345
would result in a map12456789 12345789
- Parameters:
alignment
- The non-sequential input alignmentinverseAlignment
- If false, map from structure1 to structure2. If true, generate the inverse of that map.- Returns:
- A mapping from sequential residues of one protein to those of the other
- Throws:
IllegalArgumentException
- if the input alignment is not one-to-one.
-
getOptAlnAsList
Retrieves the optimum alignment from an AFPChain and returns it as a java collection. The result is indexed in the same way asAFPChain.getOptAln()
, but has the correct size().List<List<List
>> aln = getOptAlnAsList(AFPChain afpChain); aln.get(blockNum).get(structureNum={0,1}).get(pos) - Parameters:
afpChain
-- Returns:
-
createAFPChain
public static AFPChain createAFPChain(Atom[] ca1, Atom[] ca2, ResidueNumber[] aligned1, ResidueNumber[] aligned2) throws StructureException Fundamentally, an alignment is just a list of aligned residues in each protein. This method converts two lists of ResidueNumbers into an AFPChain.Parameters are filled with defaults (often null) or sometimes calculated.
For a way to modify the alignment of an existing AFPChain, see
replaceOptAln(AFPChain, Atom[], Atom[], Map)
- Parameters:
ca1
- CA atoms of the first proteinca2
- CA atoms of the second proteinaligned1
- A list of aligned residues from the first proteinaligned2
- A list of aligned residues from the second protein. Must be the same length as aligned1.- Returns:
- An AFPChain representing the alignment. Many properties may be null or another default.
- Throws:
StructureException
- if an error occured during superpositionIllegalArgumentException
- if aligned1 and aligned2 have different lengths- See Also:
-
splitBlocksByTopology
public static AFPChain splitBlocksByTopology(AFPChain a, Atom[] ca1, Atom[] ca2) throws StructureException - Parameters:
a
-ca1
-ca2
-- Returns:
- Throws:
StructureException
- if an error occurred during superposition
-
replaceOptAln
public static AFPChain replaceOptAln(int[][][] newAlgn, AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException It replaces an optimal alignment of an AFPChain and calculates all the new alignment scores and variables.- Throws:
StructureException
-
replaceOptAln
public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, Map<Integer, Integer> alignment) throws StructureExceptionTakes an AFPChain and replaces the optimal alignment based on an alignment mapParameters are filled with defaults (often null) or sometimes calculated.
For a way to create a new AFPChain, see
createAFPChain(Atom[], Atom[], ResidueNumber[], ResidueNumber[])
- Parameters:
afpChain
- The alignment to be modifiedalignment
- The new alignment, as a Map- Throws:
StructureException
- if an error occurred during superposition- See Also:
-
replaceOptAln
public static AFPChain replaceOptAln(AFPChain afpChain, Atom[] ca1, Atom[] ca2, int blockNum, int[] optLens, int[][][] optAln) throws StructureException - Parameters:
afpChain
- Input afpchain. UNMODIFIEDca1
-ca2
-optLens
-optAln
-- Returns:
- A NEW AfpChain based off the input but with the optAln modified
- Throws:
StructureException
- if an error occured during superposition
-
updateSuperposition
public static void updateSuperposition(AFPChain afpChain, Atom[] ca1, Atom[] ca2) throws StructureException After the alignment changes (optAln, optLen, blockNum, at a minimum), many other properties which depend on the superposition will be invalid. This method re-runs a rigid superposition over the whole alignment and repopulates the required properties, including RMSD (TotalRMSD) and TM-Score.- Parameters:
afpChain
-ca1
-ca2
- Second set of ca atoms. Will be modified based on the superposition- Throws:
StructureException
-
resizeArray
Reallocates an array with a new size, and copies the contents of the old array to the new array.- Parameters:
oldArray
- the old array, to be reallocated.newSize
- the new array size.- Returns:
- A new array with the same contents.
-
toConciseAlignmentString
Print an alignment map in a concise representation. Edges are given as two numbers separated by '>'. They are chained together where possible, or separated by spaces where disjoint or branched.Note that more concise representations may be possible.
Examples:- 1>2>3>1
- 1>2>3>2 4>3
- Parameters:
alignment
- The input function, as a map (seealignmentAsMap(AFPChain)
)identity
- An identity-like function providing the isomorphism between the codomain of alignment (of type) and the domain (type ).- Returns:
-
toConciseAlignmentString
- See Also:
-
fromConciseAlignmentString
- See Also:
-
calculateBlockGap
public static int[] calculateBlockGap(int[][][] optAln) Method that calculates the number of gaps in each subunit block of an optimal AFP alignment. INPUT: an optimal alignment in the format int[][][]. OUTPUT: an int[] array oflength containing the gaps in each block as int[block]. -
alignmentToSIF
public static void alignmentToSIF(Writer out, AFPChain afpChain, Atom[] ca1, Atom[] ca2, String backboneInteraction, String alignmentInteraction) throws IOException Creates a simple interaction format (SIF) file for an alignment. The SIF file can be read by network software (eg Cytoscape) to analyze alignments as graphs. This function creates a graph with residues as nodes and two types of edges: 1. backbone edges, which connect adjacent residues in the aligned protein 2. alignment edges, which connect aligned residues- Parameters:
out
- Stream to write toafpChain
- alignment to writeca1
- First protein, used to generate node namesca2
- Second protein, used to generate node namesbackboneInteraction
- Two-letter string used to identify backbone edgesalignmentInteraction
- Two-letter string used to identify alignment edges- Throws:
IOException
-