Class Soundex

  • All Implemented Interfaces:
    Encoder, StringEncoder

    public class Soundex
    extends java.lang.Object
    implements StringEncoder
    Encodes a string into a Soundex value. Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

    This class is thread-safe. Although not strictly immutable, the mutable fields are not actually used.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static char SILENT_MARKER
      The marker character used to indicate a silent (ignored) character.
      static Soundex US_ENGLISH
      An instance of Soundex using the US_ENGLISH_MAPPING mapping.
      static Soundex US_ENGLISH_GENEALOGY
      An instance of Soundex using the mapping as per the Genealogy site: http://www.genealogy.com/articles/research/00000060.html
      static java.lang.String US_ENGLISH_MAPPING_STRING
      This is a default mapping of the 26 letters used in US English.
      static Soundex US_ENGLISH_SIMPLIFIED
      An instance of Soundex using the Simplified Soundex mapping, as described here: http://west-penwith.org.uk/misc/soundex.htm
    • Constructor Summary

      Constructors 
      Constructor Description
      Soundex()
      Creates an instance using US_ENGLISH_MAPPING
      Soundex​(char[] mapping)
      Creates a soundex instance using the given mapping.
      Soundex​(java.lang.String mapping)
      Creates a refined soundex instance using a custom mapping.
      Soundex​(java.lang.String mapping, boolean specialCaseHW)
      Creates a refined soundex instance using a custom mapping.
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      int difference​(java.lang.String s1, java.lang.String s2)
      Encodes the Strings and returns the number of characters in the two encoded Strings that are the same.
      java.lang.Object encode​(java.lang.Object obj)
      Encodes an Object using the soundex algorithm.
      java.lang.String encode​(java.lang.String str)
      Encodes a String using the soundex algorithm.
      int getMaxLength()
      Deprecated.
      This feature is not needed since the encoding size must be constant.
      void setMaxLength​(int maxLength)
      Deprecated.
      This feature is not needed since the encoding size must be constant.
      java.lang.String soundex​(java.lang.String str)
      Retrieves the Soundex code for a given String object.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • SILENT_MARKER

        public static final char SILENT_MARKER
        The marker character used to indicate a silent (ignored) character. These are ignored except when they appear as the first character.

        Note: the US_ENGLISH_MAPPING_STRING does not use this mechanism because changing it might break existing code. Mappings that don't contain a silent marker code are treated as though H and W are silent.

        To override this, use the Soundex(String, boolean) constructor.

        Since:
        1.11
        See Also:
        Constant Field Values
      • US_ENGLISH_MAPPING_STRING

        public static final java.lang.String US_ENGLISH_MAPPING_STRING
        This is a default mapping of the 26 letters used in US English. A value of 0 for a letter position means do not encode, but treat as a separator when it occurs between consonants with the same code.

        (This constant is provided as both an implementation convenience and to allow Javadoc to pick up the value for the constant values page.)

        Note that letters H and W are treated specially. They are ignored (after the first letter) and don't act as separators between consonants with the same code.

        See Also:
        Constant Field Values
      • US_ENGLISH

        public static final Soundex US_ENGLISH
        An instance of Soundex using the US_ENGLISH_MAPPING mapping. This treats H and W as silent letters. Apart from when they appear as the first letter, they are ignored. They don't act as separators between duplicate codes.
        See Also:
        US_ENGLISH_MAPPING_STRING
      • US_ENGLISH_SIMPLIFIED

        public static final Soundex US_ENGLISH_SIMPLIFIED
        An instance of Soundex using the Simplified Soundex mapping, as described here: http://west-penwith.org.uk/misc/soundex.htm

        This treats H and W the same as vowels (AEIOUY). Such letters aren't encoded (after the first), but they do act as separators when dropping duplicate codes. The mapping is otherwise the same as for US_ENGLISH

        Since:
        1.11
      • US_ENGLISH_GENEALOGY

        public static final Soundex US_ENGLISH_GENEALOGY
        An instance of Soundex using the mapping as per the Genealogy site: http://www.genealogy.com/articles/research/00000060.html

        This treats vowels (AEIOUY), H and W as silent letters. Such letters are ignored (after the first) and do not act as separators when dropping duplicate codes.

        The codes for consonants are otherwise the same as for US_ENGLISH_MAPPING_STRING and US_ENGLISH_SIMPLIFIED

        Since:
        1.11
    • Constructor Detail

      • Soundex

        public Soundex​(char[] mapping)
        Creates a soundex instance using the given mapping. This constructor can be used to provide an internationalized mapping for a non-Western character set. Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH

        If the mapping contains an instance of SILENT_MARKER then H and W are not given special treatment

        Parameters:
        mapping - Mapping array to use when finding the corresponding code for a given character
      • Soundex

        public Soundex​(java.lang.String mapping)
        Creates a refined soundex instance using a custom mapping. This constructor can be used to customize the mapping, and/or possibly provide an internationalized mapping for a non-Western character set.

        If the mapping contains an instance of SILENT_MARKER then H and W are not given special treatment

        Parameters:
        mapping - Mapping string to use when finding the corresponding code for a given character
        Since:
        1.4
      • Soundex

        public Soundex​(java.lang.String mapping,
                       boolean specialCaseHW)
        Creates a refined soundex instance using a custom mapping. This constructor can be used to customize the mapping, and/or possibly provide an internationalized mapping for a non-Western character set.
        Parameters:
        mapping - Mapping string to use when finding the corresponding code for a given character
        specialCaseHW - if true, then
        Since:
        1.11
    • Method Detail

      • difference

        public int difference​(java.lang.String s1,
                              java.lang.String s2)
                       throws EncoderException
        Encodes the Strings and returns the number of characters in the two encoded Strings that are the same. This return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.
        Parameters:
        s1 - A String that will be encoded and compared.
        s2 - A String that will be encoded and compared.
        Returns:
        The number of characters in the two encoded Strings that are the same from 0 to 4.
        Throws:
        EncoderException - if an error occurs encoding one of the strings
        Since:
        1.3
        See Also:
        SoundexUtils.difference(StringEncoder,String,String), MS T-SQL DIFFERENCE
      • encode

        public java.lang.Object encode​(java.lang.Object obj)
                                throws EncoderException
        Encodes an Object using the soundex algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type java.lang.String.
        Specified by:
        encode in interface Encoder
        Parameters:
        obj - Object to encode
        Returns:
        An object (or type java.lang.String) containing the soundex code which corresponds to the String supplied.
        Throws:
        EncoderException - if the parameter supplied is not of type java.lang.String
        java.lang.IllegalArgumentException - if a character is not mapped
      • encode

        public java.lang.String encode​(java.lang.String str)
        Encodes a String using the soundex algorithm.
        Specified by:
        encode in interface StringEncoder
        Parameters:
        str - A String object to encode
        Returns:
        A Soundex code corresponding to the String supplied
        Throws:
        java.lang.IllegalArgumentException - if a character is not mapped
      • getMaxLength

        @Deprecated
        public int getMaxLength()
        Deprecated.
        This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
        Returns the maxLength. Standard Soundex
        Returns:
        int
      • setMaxLength

        @Deprecated
        public void setMaxLength​(int maxLength)
        Deprecated.
        This feature is not needed since the encoding size must be constant. Will be removed in 2.0.
        Sets the maxLength.
        Parameters:
        maxLength - The maxLength to set
      • soundex

        public java.lang.String soundex​(java.lang.String str)
        Retrieves the Soundex code for a given String object.
        Parameters:
        str - String to encode using the Soundex algorithm
        Returns:
        A soundex code for the String supplied
        Throws:
        java.lang.IllegalArgumentException - if a character is not mapped