Class UniprotProxySequenceReader<C extends Compound>

java.lang.Object
org.biojava.nbio.core.sequence.loader.UniprotProxySequenceReader<C>
Type Parameters:
C -
All Implemented Interfaces:
Iterable<C>, DatabaseReferenceInterface, FeaturesKeyWordInterface, Accessioned, ProxySequenceReader<C>, Sequence<C>, SequenceReader<C>

public class UniprotProxySequenceReader<C extends Compound> extends Object implements ProxySequenceReader<C>, FeaturesKeyWordInterface, DatabaseReferenceInterface
Pass in a Uniprot ID and this ProxySequenceReader when passed to a ProteinSequence will get the sequence data and other data elements associated with the ProteinSequence by Uniprot. This is an example of how to map external databases of proteins and features to the BioJava3 ProteinSequence. Important to call @see setUniprotDirectoryCache to allow caching of XML files so they don't need to be reloaded each time. Does not manage cache.
  • Field Details

    • UP_AC_PATTERN

      public static final Pattern UP_AC_PATTERN
  • Constructor Details

    • UniprotProxySequenceReader

      public UniprotProxySequenceReader(String accession, CompoundSet<C> compoundSet) throws CompoundNotFoundException, IOException
      The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error
      Parameters:
      accession -
      compoundSet -
      Throws:
      CompoundNotFoundException
      IOException - if problems while reading the UniProt XML
    • UniprotProxySequenceReader

      public UniprotProxySequenceReader(Document document, CompoundSet<C> compoundSet) throws CompoundNotFoundException
      The xml is passed in as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
      Parameters:
      document -
      compoundSet -
      Throws:
      CompoundNotFoundException
  • Method Details

    • parseUniprotXMLString

      public static <C extends Compound> UniprotProxySequenceReader<C> parseUniprotXMLString(String xml, CompoundSet<C> compoundSet)
      The passed in xml is parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
      Parameters:
      xml -
      compoundSet -
      Returns:
      UniprotProxySequenceReader
      Throws:
      Exception
    • setCompoundSet

      public void setCompoundSet(CompoundSet<C> compoundSet)
      Specified by:
      setCompoundSet in interface SequenceReader<C extends Compound>
    • setContents

      public void setContents(String sequence) throws CompoundNotFoundException
      Once the sequence is retrieved set the contents and make sure everything this is valid
      Specified by:
      setContents in interface SequenceReader<C extends Compound>
      Parameters:
      sequence -
      Throws:
      CompoundNotFoundException
    • getLength

      public int getLength()
      The sequence length
      Specified by:
      getLength in interface Sequence<C extends Compound>
      Returns:
    • getCompoundAt

      public C getCompoundAt(int position)
      Description copied from interface: Sequence
      Returns the Compound at the given biological index
      Specified by:
      getCompoundAt in interface Sequence<C extends Compound>
      Parameters:
      position -
      Returns:
    • getIndexOf

      public int getIndexOf(C compound)
      Description copied from interface: Sequence
      Scans through the Sequence looking for the first occurrence of the given compound
      Specified by:
      getIndexOf in interface Sequence<C extends Compound>
      Parameters:
      compound -
      Returns:
    • getLastIndexOf

      public int getLastIndexOf(C compound)
      Description copied from interface: Sequence
      Scans through the Sequence looking for the last occurrence of the given compound
      Specified by:
      getLastIndexOf in interface Sequence<C extends Compound>
      Parameters:
      compound -
      Returns:
    • toString

      public String toString()
      Overrides:
      toString in class Object
      Returns:
    • getSequenceAsString

      public String getSequenceAsString()
      Description copied from interface: Sequence
      Returns the String representation of the Sequence
      Specified by:
      getSequenceAsString in interface Sequence<C extends Compound>
      Returns:
    • getAsList

      public List<C> getAsList()
      Description copied from interface: Sequence
      Returns the Sequence as a List of compounds
      Specified by:
      getAsList in interface Sequence<C extends Compound>
      Returns:
    • getInverse

      public SequenceView<C> getInverse()
      Description copied from interface: Sequence
      Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.
      Specified by:
      getInverse in interface Sequence<C extends Compound>
      Returns:
    • getSequenceAsString

      public String getSequenceAsString(Integer bioBegin, Integer bioEnd, Strand strand)
      Parameters:
      bioBegin -
      bioEnd -
      strand -
      Returns:
    • getSubSequence

      public SequenceView<C> getSubSequence(Integer bioBegin, Integer bioEnd)
      Description copied from interface: Sequence
      Returns a portion of the sequence from the different positions. This is indexed from 1
      Specified by:
      getSubSequence in interface Sequence<C extends Compound>
      Parameters:
      bioBegin -
      bioEnd -
      Returns:
    • iterator

      public Iterator<C> iterator()
      Specified by:
      iterator in interface Iterable<C extends Compound>
      Returns:
    • getCompoundSet

      public CompoundSet<C> getCompoundSet()
      Description copied from interface: Sequence
      Gets the compound set used to back this Sequence
      Specified by:
      getCompoundSet in interface Sequence<C extends Compound>
      Returns:
    • getAccession

      public AccessionID getAccession()
      Description copied from interface: Accessioned
      Returns the AccessionID this location is currently bound with
      Specified by:
      getAccession in interface Accessioned
      Returns:
    • getAccessions

      public ArrayList<AccessionID> getAccessions() throws XPathExpressionException
      Pull uniprot accessions associated with this sequence
      Returns:
      Throws:
      XPathExpressionException
    • getAliases

      public ArrayList<String> getAliases() throws XPathExpressionException
      Pull uniprot protein aliases associated with this sequence
      Returns:
      Throws:
      XPathExpressionException
    • countCompounds

      public int countCompounds(C... compounds)
      Description copied from interface: Sequence
      Returns the number of times we found a compound in the Sequence
      Specified by:
      countCompounds in interface Sequence<C extends Compound>
      Parameters:
      compounds -
      Returns:
    • getUniprotbaseURL

      public static String getUniprotbaseURL()
      The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.
      Returns:
      the uniprotbaseURL
    • setUniprotbaseURL

      public static void setUniprotbaseURL(String aUniprotbaseURL)
      Parameters:
      aUniprotbaseURL - the uniprotbaseURL to set
    • getUniprotDirectoryCache

      public static String getUniprotDirectoryCache()
      Local directory cache of XML that can be downloaded
      Returns:
      the uniprotDirectoryCache
    • setUniprotDirectoryCache

      public static void setUniprotDirectoryCache(String aUniprotDirectoryCache)
      Parameters:
      aUniprotDirectoryCache - the uniprotDirectoryCache to set
    • main

      public static void main(String[] args)
    • getGeneName

      public String getGeneName()
      Get the gene name associated with this sequence.
      Returns:
    • getOrganismName

      public String getOrganismName()
      Get the organism name assigned to this sequence
      Returns:
    • getKeyWords

      public ArrayList<String> getKeyWords()
      Pull UniProt key words which is a mixed bag of words associated with this sequence
      Specified by:
      getKeyWords in interface FeaturesKeyWordInterface
      Returns:
    • getDatabaseReferences

      public LinkedHashMap<String,ArrayList<DBReferenceInfo>> getDatabaseReferences()
      The Uniprot mappings to other database identifiers for this sequence
      Specified by:
      getDatabaseReferences in interface DatabaseReferenceInterface
      Returns: