Class StructureTools

java.lang.Object
org.biojava.nbio.structure.StructureTools

public class StructureTools extends Object
A class that provides some tool methods.
Since:
1.0
Author:
Andreas Prlic, Jules Jacobsen
  • Field Details

    • CA_ATOM_NAME

      public static final String CA_ATOM_NAME
      The atom name of the backbone C-alpha atom. Note that this can be ambiguous depending on the context since Calcium atoms use the same name in PDB.
      See Also:
    • N_ATOM_NAME

      public static final String N_ATOM_NAME
      The atom name for the backbone amide nitrogen
      See Also:
    • C_ATOM_NAME

      public static final String C_ATOM_NAME
      The atom name for the backbone carbonyl
      See Also:
    • O_ATOM_NAME

      public static final String O_ATOM_NAME
      The atom name for the backbone carbonyl oxygen
      See Also:
    • CB_ATOM_NAME

      public static final String CB_ATOM_NAME
      The atom name of the side-chain C-beta atom
      See Also:
    • C1_ATOM_NAME

      public static final String C1_ATOM_NAME
      The atom name of the backbone C1' in RNA
      See Also:
    • C2_ATOM_NAME

      public static final String C2_ATOM_NAME
      The atom name of the backbone C2' in RNA
      See Also:
    • C3_ATOM_NAME

      public static final String C3_ATOM_NAME
      The atom name of the backbone C3' in RNA
      See Also:
    • C4_ATOM_NAME

      public static final String C4_ATOM_NAME
      The atom name of the backbone C4' in RNA
      See Also:
    • O2_ATOM_NAME

      public static final String O2_ATOM_NAME
      The atom name of the backbone O2' in RNA
      See Also:
    • O3_ATOM_NAME

      public static final String O3_ATOM_NAME
      The atom name of the backbone O3' in RNA
      See Also:
    • O4_ATOM_NAME

      public static final String O4_ATOM_NAME
      The atom name of the backbone O4' in RNA
      See Also:
    • O5_ATOM_NAME

      public static final String O5_ATOM_NAME
      The atom name of the backbone O4' in RNA
      See Also:
    • OP1_ATOM_NAME

      public static final String OP1_ATOM_NAME
      The atom name of the backbone O4' in RNA
      See Also:
    • OP2_ATOM_NAME

      public static final String OP2_ATOM_NAME
      The atom name of the backbone O4' in RNA
      See Also:
    • P_ATOM_NAME

      public static final String P_ATOM_NAME
      The atom name of the backbone phosphate in RNA
      See Also:
    • NUCLEOTIDE_REPRESENTATIVE

      public static final String NUCLEOTIDE_REPRESENTATIVE
      The atom used as representative for nucleotides, equivalent to CA_ATOM_NAME for proteins
      See Also:
    • UNKNOWN_GROUP_LABEL

      public static final char UNKNOWN_GROUP_LABEL
      The character to use for unknown compounds in sequence strings
      See Also:
    • RATIO_RESIDUES_TO_TOTAL

      public static final double RATIO_RESIDUES_TO_TOTAL
      Below this ratio of aminoacid/nucleotide residues to the sequence total, we use simple majority of aminoacid/nucleotide residues to decide the character of the chain (protein/nucleotide)
      See Also:
  • Constructor Details

    • StructureTools

      public StructureTools()
  • Method Details

    • getNrAtoms

      public static final int getNrAtoms(Structure s)
      Count how many Atoms are contained within a Structure object.
      Parameters:
      s - the structure object
      Returns:
      the number of Atoms in this Structure
    • getNrGroups

      public static final int getNrGroups(Structure s)
      Count how many groups are contained within a structure object.
      Parameters:
      s - the structure object
      Returns:
      the number of groups in the structure
    • getAtomArray

      public static final Atom[] getAtomArray(Structure s, String[] atomNames)
      Returns an array of the requested Atoms from the Structure object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or HetatomImpl group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.
      Parameters:
      s - the structure to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAtomArrayAllModels

      public static final Atom[] getAtomArrayAllModels(Structure s, String[] atomNames)
      Returns an array of the requested Atoms from the Structure object. In contrast to getAtomArray(Structure, String[]) this method iterates over all chains. Iterates over all chains and groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or HetatomImpl group. If the group does not contain all requested atoms then no atoms are added for that group. For structures with more than one model, only model 0 will be used.
      Parameters:
      s - the structure to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAllAtomArray

      public static final Atom[] getAllAtomArray(Structure s)
      Convert all atoms of the structure (first model) into an Atom array
      Parameters:
      s - input structure
      Returns:
      all atom array
    • getAllAtomArray

      public static final Atom[] getAllAtomArray(Chain c)
      Returns and array of all atoms of the chain (first model), including Hydrogens (if present) and all HETATOMs. Waters are not included.
      Parameters:
      c - input chain
      Returns:
      all atom array
    • getUnalignedGroups

      public static List<Group> getUnalignedGroups(Atom[] ca)
      List of groups from the structure not included in ca (e.g. ligands). Unaligned groups are searched from all chains referenced in ca, as well as any chains in the first model of the structure from ca[0], if any.
      Parameters:
      ca - an array of atoms
      Returns:
    • getAllNonHAtomArray

      public static final Atom[] getAllNonHAtomArray(Structure s, boolean hetAtoms)
      Returns and array of all non-Hydrogen atoms in the given Structure, optionally including HET atoms or not. Waters are not included.
      Parameters:
      s -
      hetAtoms - if true HET atoms are included in array, if false they are not
      Returns:
    • getAllNonHAtomArray

      public static final Atom[] getAllNonHAtomArray(Chain c, boolean hetAtoms)
      Returns and array of all non-Hydrogen atoms in the given Chain, optionally including HET atoms or not Waters are not included.
      Parameters:
      c -
      hetAtoms - if true HET atoms are included in array, if false they are not
      Returns:
    • getAtomArray

      public static final Atom[] getAtomArray(Chain c, String[] atomNames)
      Returns an array of the requested Atoms from the Chain object. Iterates over all groups and checks if the requested atoms are in this group, no matter if this is a AminoAcid or Hetatom group. If the group does not contain all requested atoms then no atoms are added for that group.
      Parameters:
      c - the Chain to get the atoms from
      atomNames - contains the atom names to be used.
      Returns:
      an Atom[] array
    • getAtomCAArray

      public static final Atom[] getAtomCAArray(Chain c)
      Returns an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.
      Parameters:
      c - the structure object
      Returns:
      an Atom[] array
      See Also:
    • getRepresentativeAtomArray

      public static final Atom[] getRepresentativeAtomArray(Chain c)
      Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if the ReducedChemCompProvider was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.
      Parameters:
      c -
      Returns:
      representative Atoms of the chain backbone
      Since:
      Biojava 4.1.0
    • cloneCAArray

      @Deprecated public static final Atom[] cloneCAArray(Atom[] ca)
      Deprecated.
      Use the better-named cloneAtomArray(Atom[]) instead
      Provides an equivalent copy of Atoms in a new array. Clones everything, starting with parent groups and chains. The chain will only contain groups that are part of the input array.
      Parameters:
      ca - array of representative atoms, e.g. CA atoms
      Returns:
      Atom array
    • cloneAtomArray

      public static final Atom[] cloneAtomArray(Atom[] ca)
      Provides an equivalent copy of Atoms in a new array. Clones everything, starting with parent groups and chains. The chain will only contain groups that are part of the input array.
      Parameters:
      ca - array of representative atoms, e.g. CA atoms
      Returns:
      Atom array
      Since:
      Biojava 4.1.0
    • cloneGroups

      public static Group[] cloneGroups(Atom[] ca)
      Clone a set of representative Atoms, but returns the parent groups
      Parameters:
      ca - Atom array
      Returns:
      Group array
    • duplicateCA2

      public static Atom[] duplicateCA2(Atom[] ca2)
      Utility method for working with circular permutations. Creates a duplicated and cloned set of Calpha atoms from the input array.
      Parameters:
      ca2 - atom array
      Returns:
      cloned and duplicated set of input array
    • getAtomCAArray

      public static Atom[] getAtomCAArray(Structure s)
      Return an Atom array of the C-alpha atoms. Any atom that is a carbon and has CA name will be returned.
      Parameters:
      s - the structure object
      Returns:
      an Atom[] array
      See Also:
    • getRepresentativeAtomArray

      public static Atom[] getRepresentativeAtomArray(Structure s)
      Gets a representative atom for each group that is part of the chain backbone. Note that modified aminoacids won't be returned as part of the backbone if the ReducedChemCompProvider was used to load the structure. For amino acids, the representative is a CA carbon. For nucleotides, the representative is the "P". Other group types will be ignored.
      Parameters:
      s - Input structure
      Returns:
      representative Atoms of the structure backbone
      Since:
      Biojava 4.1.0
    • getBackboneAtomArray

      public static Atom[] getBackboneAtomArray(Structure s)
      Return an Atom array of the main chain atoms: CA, C, N, O Any group that contains those atoms will be included, be it a standard aminoacid or not
      Parameters:
      s - the structure object
      Returns:
      an Atom[] array
    • get1LetterCodeAmino

      public static final Character get1LetterCodeAmino(String groupCode3)
      Convert three character amino acid codes into single character e.g. convert CYS to C. Valid 3-letter codes will be those of the standard 20 amino acids plus MSE, CSE, SEC, PYH, PYL (see the aminoAcids map)
      Parameters:
      groupCode3 - a three character amino acid representation String
      Returns:
      the 1 letter code, or null if the given 3 letter code does not correspond to an amino acid code
    • convert_3code_1code

      @Deprecated public static final Character convert_3code_1code(String code3)
      Deprecated.
      Parameters:
      code3 -
      Returns:
    • get1LetterCode

      public static final Character get1LetterCode(String groupCode3)
      Convert a three letter amino acid or nucleotide code into a single character code. If the code does not correspond to an amino acid or nucleotide, returns UNKNOWN_GROUP_LABEL. Returned null for nucleotides prior to version 4.0.1.
      Parameters:
      groupCode3 - three letter representation
      Returns:
      The 1-letter abbreviation
    • isNucleotide

      public static final boolean isNucleotide(String groupCode3)
      Test if the three-letter code of an ATOM entry corresponds to a nucleotide or to an aminoacid.
      Parameters:
      groupCode3 - 3-character code for a group.
    • getReducedStructure

      @Deprecated public static final Structure getReducedStructure(Structure s, String chainId) throws StructureException
      Deprecated.
      Reduce a structure to provide a smaller representation . Only takes the first model of the structure. If chainId is provided only return a structure containing that Chain ID. Converts lower case chain IDs to upper case if structure does not contain a chain with that ID.
      Parameters:
      s -
      chainId -
      Returns:
      Structure
      Throws:
      StructureException
      Since:
      3.0
    • getReducedStructure

      @Deprecated public static final Structure getReducedStructure(Structure s, int chainNr) throws StructureException
      Deprecated.
      Reduce a structure to provide a smaller representation. Only takes the first model of the structure. If chainNr >=0 only takes the chain at that position into account.
      Parameters:
      s -
      chainNr - can be -1 to request all chains of model 0, otherwise will only add chain at this position
      Returns:
      Structure object
      Throws:
      StructureException
      Since:
      3.0
    • getSubRanges

      @Deprecated public static final Structure getSubRanges(Structure s, String ranges) throws StructureException
      Deprecated.
      Use StructureIdentifier instead (4.2.0)
      In addition to the functionality provided by getReducedStructure(Structure, int) and getReducedStructure(Structure, String), also provides a way to specify sub-regions of a structure with the following specification:

    • ranges can be surrounded by ( and ). (but will be removed).
    • ranges are specified as PDBresnum1 : PDBresnum2
    • a list of ranges is separated by ,
    • Example
        4GCR (A:1-83)
        1CDG (A:407-495,A:582-686)
        1CDG (A_407-495,A_582-686)
       
      Parameters:
      s - The full structure
      ranges - A comma-separated list of ranges, optionally surrounded by parentheses
      Returns:
      Substructure of s specified by ranges
      Throws:
      IllegalArgumentException - for malformed range strings
      StructureException - for errors when reducing the Structure
    • convertAtomsToSeq

      public static final String convertAtomsToSeq(Atom[] atoms)
    • getGroupByPDBResidueNumber

      public static final Group getGroupByPDBResidueNumber(Structure struc, ResidueNumber pdbResNum) throws StructureException
      Get a group represented by a ResidueNumber.
      Parameters:
      struc - a Structure
      pdbResNum - a ResidueNumber
      Returns:
      a group in the structure that is represented by the pdbResNum.
      Throws:
      StructureException - if the group cannot be found.
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain, String[] atomNames, double cutoff)
      Returns the set of intra-chain contacts for the given chain for given atom names, i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      atomNames - the array with atom names to be used. Beware: CA will do both C-alphas an Calciums if null all non-H atoms of non-hetatoms will be used
      cutoff -
      Returns:
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for all non-H atoms of non-hetatoms, i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      cutoff -
      Returns:
    • getAtomsCAInContact

      public static AtomContactSet getAtomsCAInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for C-alpha atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain -
      cutoff -
      Returns:
    • getRepresentativeAtomsInContact

      public static AtomContactSet getRepresentativeAtomsInContact(Chain chain, double cutoff)
      Returns the set of intra-chain contacts for the given chain for C-alpha or C3' atoms (including non-standard aminoacids appearing as HETATM groups), i.e. the contact map. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix.
      Parameters:
      chain -
      cutoff -
      Returns:
      Since:
      Biojava 4.1.0
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, String[] atomNames, double cutoff, boolean hetAtoms)
      Returns the set of inter-chain contacts between the two given chains for the given atom names. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain1 -
      chain2 -
      atomNames - the array with atom names to be used. For Calphas use {"CA"}, if null all non-H atoms will be used. Note HET atoms are ignored unless this parameter is null.
      cutoff -
      hetAtoms - if true HET atoms are included, if false they are not
      Returns:
    • getAtomsInContact

      public static AtomContactSet getAtomsInContact(Chain chain1, Chain chain2, double cutoff, boolean hetAtoms)
      Returns the set of inter-chain contacts between the two given chains for all non-H atoms. Uses a geometric hashing algorithm that speeds up the calculation without need of full distance matrix. The parsing mode FileParsingParameters.setAlignSeqRes(boolean) needs to be set to true for this to work.
      Parameters:
      chain1 -
      chain2 -
      cutoff -
      hetAtoms - if true HET atoms are included, if false they are not
      Returns:
    • getGroupDistancesWithinShell

      public static Map<Group,Double> getGroupDistancesWithinShell(Structure structure, Atom centroid, Set<ResidueNumber> excludeResidues, double radius, boolean includeWater, boolean useAverageDistance)
      Finds Groups in structure that contain at least one Atom that is within radius Angstroms of centroid.
      Parameters:
      structure - The structure from which to find Groups
      centroid - The centroid of the shell
      excludeResidues - A list of ResidueNumbers to exclude
      radius - The radius from centroid, in Angstroms
      includeWater - Whether to include Groups whose only atoms are water
      useAverageDistance - When set to true, distances are the arithmetic mean (1-norm) of the distances of atoms that belong to the group and that are within the shell; otherwise, distances are the minimum of these values
      Returns:
      A map of Groups within (or partially within) the shell, to their distances in Angstroms
    • getGroupsWithinShell

      public static Set<Group> getGroupsWithinShell(Structure structure, Atom atom, Set<ResidueNumber> excludeResidues, double distance, boolean includeWater)
    • getGroupsWithinShell

      public static Set<Group> getGroupsWithinShell(Structure structure, Group group, double distance, boolean includeWater)

      Returns a Set of Groups in a structure within the distance specified of a given group.

      Updated 18-Sep-2015 sroughley to return a Set so only a unique set of Groups returned

      Parameters:
      structure - The structure to work with
      group - The 'query' group
      distance - The cutoff distance
      includeWater - Should water residues be included in the output?
      Returns:
      LinkedHashSet of Groups within at least one atom with distance of at least one atom in group
    • removeModels

      public static Structure removeModels(Structure s)
      Remove all models from a Structure and keep only the first
      Parameters:
      s - original Structure
      Returns:
      a structure that contains only the first model
      Since:
      3.0.5
    • filterLigands

      public static List<Group> filterLigands(List<Group> allGroups)
      Removes all polymeric and solvent groups from a list of groups
    • getStructure

      public static Structure getStructure(String name) throws IOException, StructureException
      Short version of getStructure(String, PDBFileParser, AtomCache) which creates new parsers when needed
      Parameters:
      name -
      Returns:
      Throws:
      IOException
      StructureException
    • getStructure

      public static Structure getStructure(String name, PDBFileParser parser, AtomCache cache) throws IOException, StructureException
      Flexibly get a structure from an input String. The intent of this method is to allow any reasonable string which could refer to a structure to be correctly parsed. The following are currently supported:
      1. Filename (if name refers to an existing file)
      2. PDB ID
      3. SCOP domains
      4. PDP domains
      5. Residue ranges
      6. Other formats supported by AtomCache
      Parameters:
      name - Some reference to the protein structure
      parser - A clean PDBFileParser to use if it is a file. If null, a PDBFileParser will be instantiated if needed.
      cache - An AtomCache to use if the structure can be fetched from the PDB. If null, a AtomCache will be instantiated if needed.
      Returns:
      A Structure object
      Throws:
      IOException - if name is an existing file, but doesn't parse correctly
      StructureException - if the format is unknown, or if AtomCache throws an exception.
    • isProtein

      public static boolean isProtein(Chain c)
      Tell whether given chain is a protein chain
      Parameters:
      c -
      Returns:
      true if protein, false if nucleotide or ligand
      See Also:
    • isNucleicAcid

      public static boolean isNucleicAcid(Chain c)
      Tell whether given chain is DNA or RNA
      Parameters:
      c -
      Returns:
      true if nucleic acid, false if protein or ligand
      See Also:
    • getPredominantGroupType

      public static GroupType getPredominantGroupType(Chain c)
      Get the predominant GroupType for a given Chain, following these rules:
    • if the ratio of number of residues of a certain GroupType to total non-water residues is above the threshold 0.95, then that GroupType is returned
    • if there is no GroupType that is above the threshold then the GroupType with most members is chosen, logging it
    • See also ChemComp.getPolymerType() and ChemComp.getResidueType() which follow the PDB chemical component dictionary and provide a much more accurate description of groups and their linking.

      Parameters:
      c -
      Returns:
    • isChainWaterOnly

      public static boolean isChainWaterOnly(Chain c)
      Returns true if the given chain is composed of water molecules only
      Parameters:
      c -
      Returns:
    • isChainPureNonPolymer

      public static boolean isChainPureNonPolymer(Chain c)
      Returns true if the given chain is composed of non-polymeric groups only
      Parameters:
      c -
      Returns: