Package org.biojava.nbio.structure
Class StructureTools
java.lang.Object
org.biojava.nbio.structure.StructureTools
A class that provides some tool methods.
- Since:
- 1.0
- Author:
- Andreas Prlic, Jules Jacobsen
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
The atom name for the backbone carbonylstatic final String
The atom name of the backbone C1' in RNAstatic final String
The atom name of the backbone C2' in RNAstatic final String
The atom name of the backbone C3' in RNAstatic final String
The atom name of the backbone C4' in RNAstatic final String
The atom name of the backbone C-alpha atom.static final String
The atom name of the side-chain C-beta atomstatic final String
The atom name for the backbone amide nitrogenstatic final String
The atom used as representative for nucleotides, equivalent toCA_ATOM_NAME
for proteinsstatic final String
The atom name for the backbone carbonyl oxygenstatic final String
The atom name of the backbone O2' in RNAstatic final String
The atom name of the backbone O3' in RNAstatic final String
The atom name of the backbone O4' in RNAstatic final String
The atom name of the backbone O4' in RNAstatic final String
The atom name of the backbone O4' in RNAstatic final String
The atom name of the backbone O4' in RNAstatic final String
The atom name of the backbone phosphate in RNAstatic final double
Below this ratio of aminoacid/nucleotide residues to the sequence total, we use simple majority of aminoacid/nucleotide residues to decide the character of the chain (protein/nucleotide)static final char
The character to use for unknown compounds in sequence strings -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic final Atom[]
cloneAtomArray
(Atom[] ca) Provides an equivalent copy of Atoms in a new array.static final Atom[]
cloneCAArray
(Atom[] ca) Deprecated.static Group[]
cloneGroups
(Atom[] ca) Clone a set of representative Atoms, but returns the parent groupsstatic final Character
convert_3code_1code
(String code3) Deprecated.Useget1LetterCodeAmino(String)
insteadstatic final String
convertAtomsToSeq
(Atom[] atoms) static Atom[]
duplicateCA2
(Atom[] ca2) Utility method for working with circular permutations.filterLigands
(List<Group> allGroups) Removes all polymeric and solvent groups from a list of groupsstatic final Character
get1LetterCode
(String groupCode3) Convert a three letter amino acid or nucleotide code into a single character code.static final Character
get1LetterCodeAmino
(String groupCode3) Convert three character amino acid codes into single character e.g.static final Atom[]
Returns and array of all atoms of the chain (first model), including Hydrogens (if present) and all HETATOMs.static final Atom[]
Convert all atoms of the structure (first model) into an Atom arraystatic final Atom[]
getAllNonHAtomArray
(Chain c, boolean hetAtoms) Returns and array of all non-Hydrogen atoms in the given Chain, optionally including HET atoms or not Waters are not included.static final Atom[]
getAllNonHAtomArray
(Structure s, boolean hetAtoms) Returns and array of all non-Hydrogen atoms in the given Structure, optionally including HET atoms or not.static final Atom[]
getAtomArray
(Chain c, String[] atomNames) Returns an array of the requested Atoms from the Chain object.static final Atom[]
getAtomArray
(Structure s, String[] atomNames) Returns an array of the requested Atoms from the Structure object.static final Atom[]
getAtomArrayAllModels
(Structure s, String[] atomNames) Returns an array of the requested Atoms from the Structure object.static final Atom[]
Returns an Atom array of the C-alpha atoms.static Atom[]
Return an Atom array of the C-alpha atoms.static AtomContactSet
getAtomsCAInContact
(Chain chain, double cutoff) Returns the set of intra-chain contacts for the given chain for C-alpha atoms (including non-standard aminoacids appearing as HETATM groups), i.e.static AtomContactSet
getAtomsInContact
(Chain chain, double cutoff) Returns the set of intra-chain contacts for the given chain for all non-H atoms of non-hetatoms, i.e.static AtomContactSet
getAtomsInContact
(Chain chain, String[] atomNames, double cutoff) Returns the set of intra-chain contacts for the given chain for given atom names, i.e.static AtomContactSet
getAtomsInContact
(Chain chain1, Chain chain2, double cutoff, boolean hetAtoms) Returns the set of inter-chain contacts between the two given chains for all non-H atoms.static AtomContactSet
getAtomsInContact
(Chain chain1, Chain chain2, String[] atomNames, double cutoff, boolean hetAtoms) Returns the set of inter-chain contacts between the two given chains for the given atom names.static Atom[]
Return an Atom array of the main chain atoms: CA, C, N, O Any group that contains those atoms will be included, be it a standard aminoacid or notstatic final Group
getGroupByPDBResidueNumber
(Structure struc, ResidueNumber pdbResNum) Get a group represented by a ResidueNumber.getGroupDistancesWithinShell
(Structure structure, Atom centroid, Set<ResidueNumber> excludeResidues, double radius, boolean includeWater, boolean useAverageDistance) Finds Groups instructure
that contain at least one Atom that is withinradius
Angstroms ofcentroid
.getGroupsWithinShell
(Structure structure, Atom atom, Set<ResidueNumber> excludeResidues, double distance, boolean includeWater) getGroupsWithinShell
(Structure structure, Group group, double distance, boolean includeWater) Returns a Set of Groups in a structure within the distance specified of a given group.static final int
Count how many Atoms are contained within a Structure object.static final int
Count how many groups are contained within a structure object.static GroupType
Get the predominantGroupType
for a given Chain, following these rules: if the ratio of number of residues of a certainGroupType
to total non-water residues is above the threshold 0.95, then thatGroupType
is returned if there is noGroupType
that is above the threshold then theGroupType
with most members is chosen, logging itstatic final Structure
getReducedStructure
(Structure s, int chainNr) Deprecated.UseStructureIdentifier.reduce(Structure)
instead (v.static final Structure
getReducedStructure
(Structure s, String chainId) Deprecated.UseStructureIdentifier.reduce(Structure)
instead (v.static final Atom[]
Gets a representative atom for each group that is part of the chain backbone.static Atom[]
Gets a representative atom for each group that is part of the chain backbone.static AtomContactSet
getRepresentativeAtomsInContact
(Chain chain, double cutoff) Returns the set of intra-chain contacts for the given chain for C-alpha or C3' atoms (including non-standard aminoacids appearing as HETATM groups), i.e.static Structure
getStructure
(String name) Short version ofgetStructure(String, PDBFileParser, AtomCache)
which creates new parsers when neededstatic Structure
getStructure
(String name, PDBFileParser parser, AtomCache cache) Flexibly get a structure from an input String.static final Structure
getSubRanges
(Structure s, String ranges) Deprecated.UseStructureIdentifier
instead (4.2.0)getUnalignedGroups
(Atom[] ca) List of groups from the structure not included in ca (e.g.static boolean
Returns true if the given chain is composed of non-polymeric groups onlystatic boolean
Returns true if the given chain is composed of water molecules onlystatic boolean
Tell whether given chain is DNA or RNAstatic final boolean
isNucleotide
(String groupCode3) Test if the three-letter code of an ATOM entry corresponds to a nucleotide or to an aminoacid.static boolean
Tell whether given chain is a protein chainstatic Structure
Remove all models from a Structure and keep only the first
-
Field Details
-
CA_ATOM_NAME
The atom name of the backbone C-alpha atom. Note that this can be ambiguous depending on the context since Calcium atoms use the same name in PDB.- See Also:
-
N_ATOM_NAME
The atom name for the backbone amide nitrogen- See Also:
-
C_ATOM_NAME
The atom name for the backbone carbonyl- See Also:
-
O_ATOM_NAME
The atom name for the backbone carbonyl oxygen- See Also:
-
CB_ATOM_NAME
The atom name of the side-chain C-beta atom- See Also:
-
C1_ATOM_NAME
The atom name of the backbone C1' in RNA- See Also:
-
C2_ATOM_NAME
The atom name of the backbone C2' in RNA- See Also:
-
C3_ATOM_NAME
The atom name of the backbone C3' in RNA- See Also:
-
C4_ATOM_NAME
The atom name of the backbone C4' in RNA- See Also:
-
O2_ATOM_NAME
The atom name of the backbone O2' in RNA- See Also:
-
O3_ATOM_NAME
The atom name of the backbone O3' in RNA- See Also:
-
O4_ATOM_NAME
The atom name of the backbone O4' in RNA- See Also:
-
O5_ATOM_NAME
The atom name of the backbone O4' in RNA- See Also:
-
OP1_ATOM_NAME
The atom name of the backbone O4' in RNA- See Also:
-
OP2_ATOM_NAME
The atom name of the backbone O4' in RNA- See Also:
-
P_ATOM_NAME
The atom name of the backbone phosphate in RNA- See Also:
-
NUCLEOTIDE_REPRESENTATIVE
The atom used as representative for nucleotides, equivalent toCA_ATOM_NAME
for proteins- See Also:
-
UNKNOWN_GROUP_LABEL
public static final char UNKNOWN_GROUP_LABELThe character to use for unknown compounds in sequence strings- See Also:
-
RATIO_RESIDUES_TO_TOTAL
public static final double RATIO_RESIDUES_TO_TOTALBelow this ratio of aminoacid/nucleotide residues to the sequence total, we use simple majority of aminoacid/nucleotide residues to decide the character of the chain (protein/nucleotide)- See Also:
-
-
Constructor Details
-
StructureTools
public StructureTools()
-
-
Method Details
-
getNrAtoms
Count how many Atoms are contained within a Structure object.- Parameters:
s
- the structure object- Returns:
- the number of Atoms in this Structure
-
getNrGroups
Count how many groups are contained within a structure object.- Parameters:
s
- the structure object- Returns:
- the number of groups in the structure
-
getAtomArray
Returns an array of the requested Atoms from the Structure object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is aAminoAcid
orHetatomImpl
group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.- Parameters:
s
- the structure to get the atoms fromatomNames
- contains the atom names to be used.- Returns:
- an Atom[] array
-
getAtomArrayAllModels
Returns an array of the requested Atoms from the Structure object. In contrast togetAtomArray(Structure, String[])
this method iterates over all chains. Iterates over all chains and groups and checks if the requested atoms are in this group, no matter if this is aAminoAcid
orHetatomImpl
group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.- Parameters:
s
- the structure to get the atoms fromatomNames
- contains the atom names to be used.- Returns:
- an Atom[] array
-
getAllAtomArray
Convert all atoms of the structure (first model) into an Atom array- Parameters:
s
- input structure- Returns:
- all atom array
-
getAllAtomArray
Returns and array of all atoms of the chain (first model), including Hydrogens (if present) and all HETATOMs. Waters are not included.- Parameters:
c
- input chain- Returns:
- all atom array
-
getUnalignedGroups
List of groups from the structure not included in ca (e.g. ligands). Unaligned groups are searched from all chains referenced in ca, as well as any chains in the first model of the structure from ca[0], if any.- Parameters:
ca
- an array of atoms- Returns:
-
getAllNonHAtomArray
Returns and array of all non-Hydrogen atoms in the given Structure, optionally including HET atoms or not. Waters are not included.- Parameters:
s
-hetAtoms
- if true HET atoms are included in array, if false they are not- Returns:
-
getAllNonHAtomArray
Returns and array of all non-Hydrogen atoms in the given Chain, optionally including HET atoms or not Waters are not included.- Parameters:
c
-hetAtoms
- if true HET atoms are included in array, if false they are not- Returns:
-
getAtomArray
Returns an array of the requested Atoms from the Chain object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or Hetatom group. If the group does not contain all requested atoms then no atoms are added for that group.- Parameters:
c
- the Chain to get the atoms fromatomNames
- contains the atom names to be used.- Returns:
- an Atom[] array
-
getAtomCAArray
Returns an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.- Parameters:
c
- the structure object- Returns:
- an Atom[] array
- See Also:
-
getRepresentativeAtomArray
Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if theReducedChemCompProvider
was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.- Parameters:
c
-- Returns:
- representative Atoms of the chain backbone
- Since:
- Biojava 4.1.0
-
cloneCAArray
Deprecated.Use the better-namedcloneAtomArray(Atom[])
insteadProvides an equivalent copy of Atoms in a new array. Clones everything, starting with parent groups and chains. The chain will only contain groups that are part of the input array.- Parameters:
ca
- array of representative atoms, e.g. CA atoms- Returns:
- Atom array
-
cloneAtomArray
Provides an equivalent copy of Atoms in a new array. Clones everything, starting with parent groups and chains. The chain will only contain groups that are part of the input array.- Parameters:
ca
- array of representative atoms, e.g. CA atoms- Returns:
- Atom array
- Since:
- Biojava 4.1.0
-
cloneGroups
Clone a set of representative Atoms, but returns the parent groups- Parameters:
ca
- Atom array- Returns:
- Group array
-
duplicateCA2
Utility method for working with circular permutations. Creates a duplicated and cloned set of Calpha atoms from the input array.- Parameters:
ca2
- atom array- Returns:
- cloned and duplicated set of input array
-
getAtomCAArray
Return an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.- Parameters:
s
- the structure object- Returns:
- an Atom[] array
- See Also:
-
getRepresentativeAtomArray
Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if theReducedChemCompProvider
was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.- Parameters:
s
- Input structure- Returns:
- representative Atoms of the structure backbone
- Since:
- Biojava 4.1.0
-
getBackboneAtomArray
Return an Atom array of the main chain atoms: CA, C, N, O Any group that contains those atoms will be included, be it a standard aminoacid or not- Parameters:
s
- the structure object- Returns:
- an Atom[] array
-
get1LetterCodeAmino
Convert three character amino acid codes into single character e.g. convert CYS to C. Valid 3-letter codes will be those of the standard 20 amino acids plus MSE, CSE, SEC, PYH, PYL (see theaminoAcids
map)- Parameters:
groupCode3
- a three character amino acid representation String- Returns:
- the 1 letter code, or null if the given 3 letter code does not correspond to an amino acid code
-
convert_3code_1code
Deprecated.Useget1LetterCodeAmino(String)
instead- Parameters:
code3
-- Returns:
-
get1LetterCode
Convert a three letter amino acid or nucleotide code into a single character code. If the code does not correspond to an amino acid or nucleotide, returnsUNKNOWN_GROUP_LABEL
. Returned null for nucleotides prior to version 4.0.1.- Parameters:
groupCode3
- three letter representation- Returns:
- The 1-letter abbreviation
-
isNucleotide
Test if the three-letter code of an ATOM entry corresponds to a nucleotide or to an aminoacid.- Parameters:
groupCode3
- 3-character code for a group.
-
getReducedStructure
@Deprecated public static final Structure getReducedStructure(Structure s, String chainId) throws StructureException Deprecated.UseStructureIdentifier.reduce(Structure)
instead (v. 4.2.0)Reduce a structure to provide a smaller representation . Only takes the first model of the structure. If chainId is provided only return a structure containing that Chain ID. Converts lower case chain IDs to upper case if structure does not contain a chain with that ID.- Parameters:
s
-chainId
-- Returns:
- Structure
- Throws:
StructureException
- Since:
- 3.0
-
getReducedStructure
@Deprecated public static final Structure getReducedStructure(Structure s, int chainNr) throws StructureException Deprecated.UseStructureIdentifier.reduce(Structure)
instead (v. 4.2.0)Reduce a structure to provide a smaller representation. Only takes the first model of the structure. If chainNr >=0 only takes the chain at that position into account.- Parameters:
s
-chainNr
- can be -1 to request all chains of model 0, otherwise will only add chain at this position- Returns:
- Structure object
- Throws:
StructureException
- Since:
- 3.0
-
getSubRanges
@Deprecated public static final Structure getSubRanges(Structure s, String ranges) throws StructureException Deprecated.UseStructureIdentifier
instead (4.2.0)In addition to the functionality provided bygetReducedStructure(Structure, int)
andgetReducedStructure(Structure, String)
, also provides a way to specify sub-regions of a structure with the following specification:- ranges can be surrounded by ( and ). (but will be removed).
- ranges are specified as PDBresnum1 : PDBresnum2
- a list of ranges is separated by ,
Example4GCR (A:1-83) 1CDG (A:407-495,A:582-686) 1CDG (A_407-495,A_582-686)
- Parameters:
s
- The full structureranges
- A comma-separated list of ranges, optionally surrounded by parentheses- Returns:
- Substructure of s specified by ranges
- Throws:
IllegalArgumentException
- for malformed range stringsStructureException
- for errors when reducing the Structure
-
convertAtomsToSeq
-
getGroupByPDBResidueNumber
public static final Group getGroupByPDBResidueNumber(Structure struc, ResidueNumber pdbResNum) throws StructureException Get a group represented by a ResidueNumber.- Parameters:
struc
- aStructure
pdbResNum
- aResidueNumber
- Returns:
- a group in the structure that is represented by the pdbResNum.
- Throws:
StructureException
- if the group cannot be found.
-
getAtomsInContact
Returns the set of intra-chain contacts for the given chain for given atom names, i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing modeFileParsingParameters.setAlignSeqRes(boolean)
needs to be set to true for this to work.- Parameters:
chain
-atomNames
- the array with atom names to be used. Beware: CA will do both C-alphas an Calciums if null all non-H atoms of non-hetatoms will be usedcutoff
-- Returns:
-
getAtomsInContact
Returns the set of intra-chain contacts for the given chain for all non-H atoms of non-hetatoms, i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing modeFileParsingParameters.setAlignSeqRes(boolean)
needs to be set to true for this to work.- Parameters:
chain
-cutoff
-- Returns:
-
getAtomsCAInContact
Returns the set of intra-chain contacts for the given chain for C-alpha atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing modeFileParsingParameters.setAlignSeqRes(boolean)
needs to be set to true for this to work.- Parameters:
chain
-cutoff
-- Returns:
-
getRepresentativeAtomsInContact
Returns the set of intra-chain contacts for the given chain for C-alpha or C3' atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix.- Parameters:
chain
-cutoff
-- Returns:
- Since:
- Biojava 4.1.0
-
getAtomsInContact
public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, String[] atomNames, double cutoff, boolean hetAtoms) Returns the set of inter-chain contacts between the two given chains for the given atom names. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing modeFileParsingParameters.setAlignSeqRes(boolean)
needs to be set to true for this to work.- Parameters:
chain1
-chain2
-atomNames
- the array with atom names to be used. For Calphas use {"CA"}, if null all non-H atoms will be used. Note HET atoms are ignored unless this parameter is null.cutoff
-hetAtoms
- if true HET atoms are included, if false they are not- Returns:
-
getAtomsInContact
public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, double cutoff, boolean hetAtoms) Returns the set of inter-chain contacts between the two given chains for all non-H atoms. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing modeFileParsingParameters.setAlignSeqRes(boolean)
needs to be set to true for this to work.- Parameters:
chain1
-chain2
-cutoff
-hetAtoms
- if true HET atoms are included, if false they are not- Returns:
-
getGroupDistancesWithinShell
public static Map<Group,Double> getGroupDistancesWithinShell(Structure structure, Atom centroid, Set<ResidueNumber> excludeResidues, double radius, boolean includeWater, boolean useAverageDistance) Finds Groups instructure
that contain at least one Atom that is withinradius
Angstroms ofcentroid
.- Parameters:
structure
- The structure from which to find Groupscentroid
- The centroid of the shellexcludeResidues
- A list of ResidueNumbers to excluderadius
- The radius fromcentroid
, in AngstromsincludeWater
- Whether to include Groups whose only atoms are wateruseAverageDistance
- When set to true, distances are the arithmetic mean (1-norm) of the distances of atoms that belong to the group and that are within the shell; otherwise, distances are the minimum of these values- Returns:
- A map of Groups within (or partially within) the shell, to their distances in Angstroms
-
getGroupsWithinShell
-
getGroupsWithinShell
public static Set<Group> getGroupsWithinShell(Structure structure, Group group, double distance, boolean includeWater) Returns a Set of Groups in a structure within the distance specified of a given group.
Updated 18-Sep-2015 sroughley to return a Set so only a unique set of Groups returned
- Parameters:
structure
- The structure to work withgroup
- The 'query' groupdistance
- The cutoff distanceincludeWater
- Should water residues be included in the output?- Returns:
LinkedHashSet
ofGroup
s within at least one atom withdistance
of at least one atom ingroup
-
removeModels
Remove all models from a Structure and keep only the first- Parameters:
s
- original Structure- Returns:
- a structure that contains only the first model
- Since:
- 3.0.5
-
filterLigands
Removes all polymeric and solvent groups from a list of groups -
getStructure
Short version ofgetStructure(String, PDBFileParser, AtomCache)
which creates new parsers when needed- Parameters:
name
-- Returns:
- Throws:
IOException
StructureException
-
getStructure
public static Structure getStructure(String name, PDBFileParser parser, AtomCache cache) throws IOException, StructureException Flexibly get a structure from an input String. The intent of this method is to allow any reasonable string which could refer to a structure to be correctly parsed. The following are currently supported:- Filename (if name refers to an existing file)
- PDB ID
- SCOP domains
- PDP domains
- Residue ranges
- Other formats supported by AtomCache
- Parameters:
name
- Some reference to the protein structureparser
- A clean PDBFileParser to use if it is a file. If null, a PDBFileParser will be instantiated if needed.cache
- An AtomCache to use if the structure can be fetched from the PDB. If null, a AtomCache will be instantiated if needed.- Returns:
- A Structure object
- Throws:
IOException
- if name is an existing file, but doesn't parse correctlyStructureException
- if the format is unknown, or if AtomCache throws an exception.
-
isProtein
Tell whether given chain is a protein chain- Parameters:
c
-- Returns:
- true if protein, false if nucleotide or ligand
- See Also:
-
isNucleicAcid
Tell whether given chain is DNA or RNA- Parameters:
c
-- Returns:
- true if nucleic acid, false if protein or ligand
- See Also:
-
getPredominantGroupType
Get the predominantGroupType
for a given Chain, following these rules:- if the ratio of number of residues of a certain
GroupType
to total non-water residues is above the threshold 0.95, then thatGroupType
is returned- if there is no
GroupType
that is above the threshold then theGroupType
with most members is chosen, logging itSee also
ChemComp.getPolymerType()
andChemComp.getResidueType()
which follow the PDB chemical component dictionary and provide a much more accurate description of groups and their linking.- Parameters:
c
-- Returns:
- if the ratio of number of residues of a certain
-
isChainWaterOnly
Returns true if the given chain is composed of water molecules only- Parameters:
c
-- Returns:
-
isChainPureNonPolymer
Returns true if the given chain is composed of non-polymeric groups only- Parameters:
c
-- Returns:
-
cloneAtomArray(Atom[])
instead