java.text

Class RuleBasedCollator

Implemented Interfaces:
Cloneable, Comparator<T>

public class RuleBasedCollator
extends Collator

This class is a concrete subclass of Collator suitable for string collation in a wide variety of languages. An instance of this class is normally returned by the getInstance method of Collator with rules predefined for the requested locale. However, an instance of this class can be created manually with any desired rules.

Rules take the form of a String with the following syntax

The modifier character indicates that accents sort backward as is the case with French. The modifier applies to all rules after the modifier but before the next primary sequence. If placed at the end of the sequence if applies to all unknown accented character. The relational operators specify how the text argument relates to the previous term. The relation characters have the following meanings:

As for the text argument itself, this is any sequence of Unicode characters not in the following ranges: 0x0009-0x000D, 0x0020-0x002F, 0x003A-0x0040, 0x005B-0x0060, and 0x007B-0x007E. If these characters are desired, they must be enclosed in single quotes. If any whitespace is encountered, it is ignored. (For example, "a b" is equal to "ab").

The reset operation inserts the following rule at the point where the text argument to it exists in the previously declared rule string. This makes it easy to add new rules to an existing string by simply including them in a reset sequence at the end. Note that the text argument, or at least the first character of it, must be present somewhere in the previously declared rules in order to be inserted properly. If this is not satisfied, a ParseException will be thrown.

This system of configuring RuleBasedCollator is needlessly complex and the people at Taligent who developed it (along with the folks at Sun who accepted it into the Java standard library) deserve a slow and agonizing death.

Here are a couple of example of rule strings:

"< a < b < c" - This string says that a is greater than b which is greater than c, with all differences being primary differences.

"< a,A < b,B < c,C" - This string says that 'A' is greater than 'a' with a tertiary strength comparison. Both 'b' and 'B' are greater than 'a' and 'A' during a primary strength comparison. But 'B' is greater than 'b' under a tertiary strength comparison.

"< a < c & a < b " - This sequence is identical in function to the "< a < b < c" rule string above. The '&' reset symbol indicates that the rule "< b" is to be inserted after the text argument "a" in the previous rule string segment.

"< a < b & y < z" - This is an error. The character 'y' does not appear anywhere in the previous rule string segment so the rule following the reset rule cannot be inserted.

"< a & A @ < e & E < f& F" - This sequence is equivalent to the following "< a & A < E & e < f & F".

For a description of the various comparison strength types, see the documentation for the Collator class.

As an additional complication to this already overly complex rule scheme, if any characters precede the first rule, these characters are considered ignorable. They will be treated as if they did not exist during comparisons. For example, "- < a < b ..." would make '-' an ignorable character such that the strings "high-tech" and "hightech" would be considered identical.

A ParseException will be thrown for any of the following conditions:

Field Summary

Fields inherited from class java.text.Collator

CANONICAL_DECOMPOSITION, FULL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, SECONDARY, TERTIARY

Constructor Summary

RuleBasedCollator(String rules)
This method initializes a new instance of RuleBasedCollator with the specified collation rules.

Method Summary

Object
clone()
This method creates a copy of this object.
int
compare(String source, String target)
This method returns an integer which indicates whether the first specified String is less than, greater than, or equal to the second.
boolean
equals(Object obj)
This method tests this object for equality against the specified object.
CollationElementIterator
getCollationElementIterator(String source)
This method returns an instance for CollationElementIterator for the specified String under the collation rules for this object.
CollationElementIterator
getCollationElementIterator(CharacterIterator source)
This method returns an instance of CollationElementIterator for the String represented by the specified CharacterIterator.
CollationKey
getCollationKey(String source)
This method returns an instance of CollationKey for the specified String.
String
getRules()
This method returns a String containing the collation rules for this object.
int
hashCode()
This method returns a hash value for this object.

Methods inherited from class java.text.Collator

clone, compare, compare, equals, equals, getAvailableLocales, getCollationKey, getDecomposition, getInstance, getInstance, getStrength, hashCode, setDecomposition, setStrength

Methods inherited from class java.lang.Object

clone, equals, extends Object> getClass, finalize, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details

RuleBasedCollator

public RuleBasedCollator(String rules)
            throws ParseException
This method initializes a new instance of RuleBasedCollator with the specified collation rules. Note that an application normally obtains an instance of RuleBasedCollator by calling the getInstance method of Collator. That method automatically loads the proper set of rules for the desired locale.
Parameters:
rules - The collation rule string.
Throws:
ParseException - If the rule string contains syntax errors.

Method Details

clone

public Object clone()
This method creates a copy of this object.
Overrides:
clone in interface Collator
Returns:
A copy of this object.

compare

public int compare(String source,
                   String target)
This method returns an integer which indicates whether the first specified String is less than, greater than, or equal to the second. The value depends not only on the collation rules in effect, but also the strength and decomposition settings of this object.
Overrides:
compare in interface Collator
Parameters:
source - The first String to compare.
target - A second String to compare to the first.
Returns:
A negative integer if source < target, a positive integer if source > target, or 0 if source == target.

equals

public boolean equals(Object obj)
This method tests this object for equality against the specified object. This will be true if and only if the specified object is another reference to this object.
Specified by:
equals in interface Comparator<T>
Overrides:
equals in interface Collator
Parameters:
obj - The Object to compare against this object.
Returns:
true if the specified object is equal to this object, false otherwise.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(String source)
This method returns an instance for CollationElementIterator for the specified String under the collation rules for this object.
Parameters:
source - The String to return the CollationElementIterator instance for.
Returns:
A CollationElementIterator for the specified String.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(CharacterIterator source)
This method returns an instance of CollationElementIterator for the String represented by the specified CharacterIterator.
Parameters:
source - The CharacterIterator with the desired String.
Returns:
A CollationElementIterator for the specified String.

getCollationKey

public CollationKey getCollationKey(String source)
This method returns an instance of CollationKey for the specified String. The object returned will have a more efficient mechanism for its comparison function that could provide speed benefits if multiple comparisons are performed, such as during a sort.
Overrides:
getCollationKey in interface Collator
Parameters:
source - The String to create a CollationKey for.
Returns:
A CollationKey for the specified String.

getRules

public String getRules()
This method returns a String containing the collation rules for this object.
Returns:
The collation rules for this object.

hashCode

public int hashCode()
This method returns a hash value for this object.
Overrides:
hashCode in interface Collator
Returns:
A hash value for this object.

RuleBasedCollator.java -- Concrete Collator Class Copyright (C) 1998, 1999, 2000, 2001, 2003, 2004, 2005 Free Software Foundation, Inc. This file is part of GNU Classpath. GNU Classpath is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. GNU Classpath is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GNU Classpath; see the file COPYING. If not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination. As a special exception, the copyright holders of this library give you permission to link this library with independent modules to produce an executable, regardless of the license terms of these independent modules, and to copy and distribute the resulting executable under terms of your choice, provided that you also meet, for each linked independent module, the terms and conditions of the license of that module. An independent module is a module which is not derived from or based on this library. If you modify this library, you may extend this exception to your version of the library, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version.