Package vcf

Class VcfRec

  • All Implemented Interfaces:
    IntArray, DuplicatesGTRec, GTRec, MarkerContainer

    public final class VcfRec
    extends java.lang.Object
    implements GTRec

    Class VcfRec represents a VCF record. If one allele in a diploid genotype is missing, then both alleles are set to missing.

    Instances of class VcfRec are immutable.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String GL_FORMAT
      The VCF FORMAT code for log-scaled genotype likelihood data: "GL".
      static java.lang.String PL_FORMAT
      The VCF FORMAT code for phred-scaled genotype likelihood data: "PL".
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int allele1​(int sample)
      Returns the first allele for the specified sample or -1 if the allele is missing.
      int allele2​(int sample)
      Returns the second allele for the specified sample or -1 if the allele is missing.
      int[] alleles()
      Returns an array of length this.size() whose j-th element is equal to this.allele(j}
      java.lang.String filter()
      Returns the FILTER field.
      java.lang.String format()
      Returns the FORMAT field.
      java.lang.String[] formatData​(java.lang.String formatCode)
      Returns an array of length this.size() containing the specified FORMAT subfield data for each sample.
      int formatIndex​(java.lang.String formatCode)
      Returns the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and returns -1 otherwise.
      java.lang.String formatSubfield​(int subfieldIndex)
      Returns the specified FORMAT subfield.
      static VcfRec fromGL​(VcfHeader vcfHeader, java.lang.String vcfRecord, float maxLR)
      Constructs and returns a new VcfRec instance from a VCF record and its GL or PL format subfield data.
      static VcfRec fromGT​(VcfHeader vcfHeader, java.lang.String vcfRecord)
      Constructs and returns a new VcfRec instance from a VCF record and its GT format subfield data
      static VcfRec fromGTGL​(VcfHeader vcfHeader, java.lang.String vcfRecord, float maxLR)
      Constructs and returns a new VcfRec instance from a VCF record and its GT, GL, and PL format subfield data.
      int get​(int hap)
      Returns the specified allele for the specified haplotype or -1 if the allele is missing.
      float gl​(int sample, int allele1, int allele2)
      Returns the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype.
      static int gtIndex​(int a1, int a2)
      Returns the VCF genotype index for the specified pair of alleles.
      boolean hasFormat​(java.lang.String formatCode)
      Returns true if the specified FORMAT subfield is present, and returns false otherwise.
      java.lang.String info()
      Returns the INFO field.
      boolean isPhased()
      Returns true if every genotype for each sample is a phased, non-missing genotype, and returns false otherwise.
      boolean isPhased​(int sample)
      Returns true if the genotype for the specified sample has non-missing alleles and is either haploid or diploid with a phased allele separator, and returns false otherwise.
      Marker marker()
      Returns the marker.
      int nFormatSubfields()
      Returns the number of FORMAT subfields.
      java.lang.String qual()
      Returns the QUAL field.
      java.lang.String sampleData​(int sample)
      Returns the data for the specified sample.
      java.lang.String sampleData​(int sample, int subfieldIndex)
      Returns the specified data for the specified sample.
      java.lang.String sampleData​(int sample, java.lang.String formatCode)
      Returns the specified data for the specified sample.
      Samples samples()
      Returns the list of samples.
      int size()
      Returns the number of haplotypes.
      java.lang.String toString()
      Returns the VCF record.
      VcfHeader vcfHeader()
      Returns the VCF meta-information lines and the VCF header line.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • GL_FORMAT

        public static final java.lang.String GL_FORMAT
        The VCF FORMAT code for log-scaled genotype likelihood data: "GL".
        See Also:
        Constant Field Values
      • PL_FORMAT

        public static final java.lang.String PL_FORMAT
        The VCF FORMAT code for phred-scaled genotype likelihood data: "PL".
        See Also:
        Constant Field Values
    • Method Detail

      • gtIndex

        public static int gtIndex​(int a1,
                                  int a2)
        Returns the VCF genotype index for the specified pair of alleles.
        Parameters:
        a1 - the first allele
        a2 - the second allele
        Returns:
        the VCF genotype index for the specified pair of alleles
        Throws:
        java.lang.IllegalArgumentException - if a1 < 0 || a2 < 0
      • fromGT

        public static VcfRec fromGT​(VcfHeader vcfHeader,
                                    java.lang.String vcfRecord)
        Constructs and returns a new VcfRec instance from a VCF record and its GT format subfield data
        Parameters:
        vcfHeader - meta-information lines and header line for the specified VCF record.
        vcfRecord - a VCF record with a GL format field corresponding to the specified vcfHeader object
        Returns:
        a new VcfRec instance
        Throws:
        java.lang.IllegalArgumentException - if the VCF record does not have a GT format field
        java.lang.IllegalArgumentException - if a VCF record format error is detected
        java.lang.IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
        java.lang.NullPointerException - if vcfHeader == null || vcfRecord == null
      • fromGL

        public static VcfRec fromGL​(VcfHeader vcfHeader,
                                    java.lang.String vcfRecord,
                                    float maxLR)
        Constructs and returns a new VcfRec instance from a VCF record and its GL or PL format subfield data. If both GL and PL format subfields are present, the GL format field will be used. If the maximum normalized genotype likelihood is 1.0 for a sample, then any other genotype likelihood for the sample that is less than lrThreshold is set to 0.
        Parameters:
        vcfHeader - meta-information lines and header line for the specified VCF record
        vcfRecord - a VCF record with a GL format field corresponding to the specified vcfHeader object
        maxLR - the maximum likelihood ratio
        Returns:
        a new VcfRec instance
        Throws:
        java.lang.IllegalArgumentException - if the VCF record does not have a GL format field
        java.lang.IllegalArgumentException - if a VCF record format error is detected
        java.lang.IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
        java.lang.NullPointerException - if vcfHeader == null || vcfRecord == null
      • fromGTGL

        public static VcfRec fromGTGL​(VcfHeader vcfHeader,
                                      java.lang.String vcfRecord,
                                      float maxLR)
        Constructs and returns a new VcfRec instance from a VCF record and its GT, GL, and PL format subfield data. If the GT format subfield is present and non-missing, the GT format subfield is used to determine genotype likelihoods. Otherwise the GL or PL format subfield is used to determine genotype likelihoods. If both the GL and PL format subfields are present, only the GL format subfield will be used. If the maximum normalized genotype likelihood is 1.0 for a sample, then any other genotype likelihood for the sample that is less than lrThreshold is set to 0.
        Parameters:
        vcfHeader - meta-information lines and header line for the specified VCF record
        vcfRecord - a VCF record with a GT, a GL or a PL format field corresponding to the specified vcfHeader object
        maxLR - the maximum likelihood ratio
        Returns:
        a new VcfRec
        Throws:
        java.lang.IllegalArgumentException - if the VCF record does not have a GT, GL, or PL format field
        java.lang.IllegalArgumentException - if a VCF record format error is detected
        java.lang.IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
        java.lang.NullPointerException - if vcfHeader == null || vcfRecord == null
      • qual

        public java.lang.String qual()
        Returns the QUAL field.
        Returns:
        the QUAL field
      • filter

        public java.lang.String filter()
        Returns the FILTER field.
        Returns:
        the FILTER field
      • info

        public java.lang.String info()
        Returns the INFO field.
        Returns:
        the INFO field
      • format

        public java.lang.String format()
        Returns the FORMAT field. Returns the empty string ("") if the FORMAT field is missing.
        Returns:
        the FORMAT field
      • nFormatSubfields

        public int nFormatSubfields()
        Returns the number of FORMAT subfields.
        Returns:
        the number of FORMAT subfields
      • formatSubfield

        public java.lang.String formatSubfield​(int subfieldIndex)
        Returns the specified FORMAT subfield.
        Parameters:
        subfieldIndex - a FORMAT subfield index
        Returns:
        the specified FORMAT subfield
        Throws:
        java.lang.IndexOutOfBoundsException - if subfieldIndex < 0 || subfieldIndex >= this.nFormatSubfields()
      • hasFormat

        public boolean hasFormat​(java.lang.String formatCode)
        Returns true if the specified FORMAT subfield is present, and returns false otherwise.
        Parameters:
        formatCode - a FORMAT subfield code
        Returns:
        true if the specified FORMAT subfield is present
      • formatIndex

        public int formatIndex​(java.lang.String formatCode)
        Returns the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and returns -1 otherwise.
        Parameters:
        formatCode - the format subfield code
        Returns:
        the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and -1 otherwise
      • sampleData

        public java.lang.String sampleData​(int sample)
        Returns the data for the specified sample.
        Parameters:
        sample - a sample index
        Returns:
        the data for the specified sample
        Throws:
        java.lang.IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
      • sampleData

        public java.lang.String sampleData​(int sample,
                                           java.lang.String formatCode)
        Returns the specified data for the specified sample.
        Parameters:
        sample - a sample index
        formatCode - a FORMAT subfield code
        Returns:
        the specified data for the specified sample
        Throws:
        java.lang.IllegalArgumentException - if this.hasFormat(formatCode)==false
        java.lang.IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
      • sampleData

        public java.lang.String sampleData​(int sample,
                                           int subfieldIndex)
        Returns the specified data for the specified sample.
        Parameters:
        sample - a sample index
        subfieldIndex - a FORMAT subfield index
        Returns:
        the specified data for the specified sample
        Throws:
        java.lang.IndexOutOfBoundsException - if field < 0 || field >= this.nFormatSubfields()
        java.lang.IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
      • formatData

        public java.lang.String[] formatData​(java.lang.String formatCode)
        Returns an array of length this.size() containing the specified FORMAT subfield data for each sample. The k-th element of the array is the specified FORMAT subfield data for the k-th sample.
        Parameters:
        formatCode - a format subfield code
        Returns:
        an array of length this.size() containing the specified FORMAT subfield data for each sample
        Throws:
        java.lang.IllegalArgumentException - if this.hasFormat(formatCode) == false
      • samples

        public Samples samples()
        Description copied from interface: GTRec
        Returns the list of samples.
        Specified by:
        samples in interface GTRec
        Returns:
        the list of samples
      • vcfHeader

        public VcfHeader vcfHeader()
        Returns the VCF meta-information lines and the VCF header line.
        Returns:
        the VCF meta-information lines and the VCF header line
      • allele1

        public int allele1​(int sample)
        Description copied from interface: DuplicatesGTRec
        Returns the first allele for the specified sample or -1 if the allele is missing. The two alleles for a sample are arbitrarily ordered if this.unphased(marker, sample) == false.
        Specified by:
        allele1 in interface DuplicatesGTRec
        Parameters:
        sample - a sample index
        Returns:
        the first allele for the specified sample
      • allele2

        public int allele2​(int sample)
        Description copied from interface: DuplicatesGTRec
        Returns the second allele for the specified sample or -1 if the allele is missing. The two alleles for a sample are arbitrarily ordered if this.unphased(marker, sample) == false.
        Specified by:
        allele2 in interface DuplicatesGTRec
        Parameters:
        sample - a sample index
        Returns:
        the second allele for the specified sample
      • get

        public int get​(int hap)
        Description copied from interface: DuplicatesGTRec
        Returns the specified allele for the specified haplotype or -1 if the allele is missing. The two alleles for a sample at a marker are arbitrarily ordered if this.unphased(marker, hap/2) == false.
        Specified by:
        get in interface DuplicatesGTRec
        Specified by:
        get in interface IntArray
        Parameters:
        hap - a haplotype index
        Returns:
        the specified allele for the specified sample
      • alleles

        public int[] alleles()
        Description copied from interface: DuplicatesGTRec
        Returns an array of length this.size() whose j-th element is equal to this.allele(j}
        Specified by:
        alleles in interface DuplicatesGTRec
        Returns:
        an array of length this.size() whose j-th element is equal to this.allele(j}
      • isPhased

        public boolean isPhased​(int sample)
        Description copied from interface: DuplicatesGTRec
        Returns true if the genotype for the specified sample has non-missing alleles and is either haploid or diploid with a phased allele separator, and returns false otherwise.
        Specified by:
        isPhased in interface DuplicatesGTRec
        Parameters:
        sample - a sample index
        Returns:
        true if the genotype for the specified sample is a phased, nonmissing genotype
      • isPhased

        public boolean isPhased()
        Description copied from interface: DuplicatesGTRec
        Returns true if every genotype for each sample is a phased, non-missing genotype, and returns false otherwise.
        Specified by:
        isPhased in interface DuplicatesGTRec
        Returns:
        true if the genotype for each sample is a phased, non-missing genotype
      • gl

        public float gl​(int sample,
                        int allele1,
                        int allele2)
        Returns the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype. Returns 1.0f if the corresponding genotype determined by the isPhased(), allele1(), and allele2() methods is consistent with the specified ordered genotype, and returns 0.0f otherwise.
        Parameters:
        sample - the sample index
        allele1 - the first allele index
        allele2 - the second allele index
        Returns:
        the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype.
        Throws:
        java.lang.IndexOutOfBoundsException - if samples < 0 || samples >= this.size()
        java.lang.IndexOutOfBoundsException - if allele1 < 0 || allele1 >= this.marker().nAlleles()
        java.lang.IndexOutOfBoundsException - if allele2 < 0 || allele2 >= this.marker().nAlleles()
      • size

        public int size()
        Description copied from interface: DuplicatesGTRec
        Returns the number of haplotypes.
        Specified by:
        size in interface DuplicatesGTRec
        Specified by:
        size in interface IntArray
        Returns:
        the number of haplotypes
      • toString

        public java.lang.String toString()
        Returns the VCF record.
        Overrides:
        toString in class java.lang.Object
        Returns:
        the VCF record