gnu.java.lang
Interface CharData
This contains the info about the unicode characters, that
java.lang.Character needs. It is generated automatically from
../doc/unicode/UnicodeData-4.0.0.txt
and
../doc/unicode/SpecialCasing-4.0.0.txt
, by some
perl scripts. These Unicode definition files can be found on the
http://www.unicode.org website.
JDK 1.5 uses Unicode version 4.0.0.
The data is stored as string constants, but Character will convert these
Strings to their respective
char[]
components. The fields
are stored in arrays of 17 elements each, one element per Unicode plane.
BLOCKS
stores the offset of a block of 2
SHIFT
characters within
DATA
. The DATA field, in turn, stores
information about each character in the low order bits, and an offset
into the attribute tables
UPPER
,
LOWER
,
NUM_VALUE
, and
DIRECTION
. Notice that the
attribute tables are much smaller than 0xffff entries; as many characters
in Unicode share common attributes. Numbers that are too large to fit
into NUM_VALUE as 16 bit chars are stored in LARGENUMS and a number N is
stored in NUM_VALUE such that (-N - 3) is the offset into LARGENUMS for
the particular character. The DIRECTION table also contains a field for
detecting characters with multi-character uppercase expansions.
Next, there is a listing for
TITLE
exceptions (most characters
just have the same title case as upper case). Finally, there are two
tables for multi-character capitalization,
UPPER_SPECIAL
which lists the characters which are special cased, and
UPPER_EXPAND
, which lists their expansion.
static String[] | BLOCKS - The mapping of character blocks to their location in
DATA .
|
static String[] | DATA - Information about each character.
|
static String[] | DIRECTION - This is the attribute table for computing the directionality class
of a character, as well as a marker of characters with a multi-character
capitalization.
|
static int[] | LARGENUMS - The array containing the numeric values that are too large to be stored as
chars in NUM_VALUE.
|
static String[] | LOWER - This is the attribute table for computing the lowercase representation
of a character.
|
static String[] | NUM_VALUE - This is the attribute table for computing the numeric value of a
character.
|
static int[] | SHIFT - The character shift amount to look up the block offset.
|
static String | SOURCE - The Unicode definition file that was parsed to build this database.
|
static String | TITLE - This is the listing of titlecase special cases (all other characters
can use
UPPER to determine their titlecase).
|
static String[] | UPPER - This is the attribute table for computing the single-character uppercase
representation of a character.
|
static String | UPPER_EXPAND - This is the listing of special case multi-character uppercase sequences.
|
static String | UPPER_SPECIAL - This is a listing of characters with multi-character uppercase sequences.
|
BLOCKS
public static final String[] BLOCKS
The mapping of character blocks to their location in DATA
.
Each entry has been adjusted so that the 16-bit sum with the desired
character gives the actual index into DATA
.
DATA
public static final String[] DATA
Information about each character. The low order 5 bits form the
character type, the next bit is a flag for non-breaking spaces, and the
next bit is a flag for mirrored directionality. The high order 9 bits
form the offset into the attribute tables. Note that this limits the
number of unique character attributes to 512, which is not a problem
as of Unicode version 4.0.0, but may soon become one.
DIRECTION
public static final String[] DIRECTION
This is the attribute table for computing the directionality class
of a character, as well as a marker of characters with a multi-character
capitalization. The direction is taken by performing a signed shift
right by 2 (where a result of -1 means an unknown direction, such as
for undefined characters). The lower 2 bits form a count of the
additional characters that will be added to a String when performing
multi-character uppercase expansion. This count is also used, along with
the offset in UPPER_SPECIAL, to determine how much of UPPER_EXPAND to use
when performing the case conversion. Note that this information is stored
as an unsigned char since this is a String literal.
LARGENUMS
public static final int[] LARGENUMS
The array containing the numeric values that are too large to be stored as
chars in NUM_VALUE. NUM_VALUE in this case will contain a negative integer
N such that LARGENUMS[-N - 3] contains the correct numeric value.
LOWER
public static final String[] LOWER
This is the attribute table for computing the lowercase representation
of a character. The value is the signed difference between the
character and its lowercase version. Note that this is stored as an
unsigned char since this is a String literal.
NUM_VALUE
public static final String[] NUM_VALUE
This is the attribute table for computing the numeric value of a
character. The value is -1 if Unicode does not define a value, -2
if the value is not a positive integer, otherwise it is the value.
Note that this is a signed value, but stored as an unsigned char
since this is a String literal.
SHIFT
public static final int[] SHIFT
The character shift amount to look up the block offset. In other words,
(char) (BLOCKS.value[ch >> SHIFT[p]] + ch)
is the index
where ch
is described in DATA
if ch
is in Unicode plane p
. Note that p
is simply
the integer division of ch and 0x10000.
SOURCE
public static final String SOURCE
The Unicode definition file that was parsed to build this database.
- "../doc/unicode/UnicodeData-4.0.0.txt"
TITLE
public static final String TITLE
This is the listing of titlecase special cases (all other characters
can use UPPER
to determine their titlecase). The listing
is a sorted sequence of character pairs; converting the first character
of the pair to titlecase produces the second character.
- "\u01c4\u01c5\u01c5\u01c5\u01c6\u01c5\u01c7\u01c8\u01c8\u01c8\u01c9\u01c8\u01ca\u01cb\u01cb\u01cb\u01cc\u01cb\u01f1\u01f2\u01f2\u01f2\u01f3\u01f2"
UPPER
public static final String[] UPPER
This is the attribute table for computing the single-character uppercase
representation of a character. The value is the signed difference
between the character and its uppercase version. Note that this is
stored as an unsigned char since this is a String literal. When
capitalizing a String, you must first check if a multi-character uppercase
sequence exists before using this character.
UPPER_EXPAND
public static final String UPPER_EXPAND
This is the listing of special case multi-character uppercase sequences.
Characters listed in UPPER_SPECIAL index into this table to find their
uppercase expansion. Remember that you must also perform special-casing
on two single-character sequences in the Turkish locale, which are not
covered here in CharData.
- "SS\u02bcNJ\u030c\u0399\u0308\u0301\u03a5\u0308\u0301\u0535\u0552H\u0331T\u0308W\u030aY\u030aA\u02be\u03a5\u0313\u03a5\u0313\u0300\u03a5\u0313\u0301\u03a5\u0313\u0342\u1f08\u0399\u1f09\u0399\u1f0a\u0399\u1f0b\u0399\u1f0c\u0399\u1f0d\u0399\u1f0e\u0399\u1f0f\u0399\u1f08\u0399\u1f09\u0399\u1f0a\u0399\u1f0b\u0399\u1f0c\u0399\u1f0d\u0399\u1f0e\u0399\u1f0f\u0399\u1f28\u0399\u1f29\u0399\u1f2a\u0399\u1f2b\u0399\u1f2c\u0399\u1f2d\u0399\u1f2e\u0399\u1f2f\u0399\u1f28\u0399\u1f29\u0399\u1f2a\u0399\u1f2b\u0399\u1f2c\u0399\u1f2d\u0399\u1f2e\u0399\u1f2f\u0399\u1f68\u0399\u1f69\u0399\u1f6a\u0399\u1f6b\u0399\u1f6c\u0399\u1f6d\u0399\u1f6e\u0399\u1f6f\u0399\u1f68\u0399\u1f69\u0399\u1f6a\u0399\u1f6b\u0399\u1f6c\u0399\u1f6d\u0399\u1f6e\u0399\u1f6f\u0399\u1fba\u0399\u0391\u0399\u0386\u0399\u0391\u0342\u0391\u0342\u0399\u0391\u0399\u1fca\u0399\u0397\u0399\u0389\u0399\u0397\u0342\u0397\u0342\u0399\u0397\u0399\u0399\u0308\u0300\u0399\u0308\u0301\u0399\u0342\u0399\u0308\u0342\u03a5\u0308\u0300\u03a5\u0308\u0301\u03a1\u0313\u03a5\u0342\u03a5\u0308\u0342\u1ffa\u0399\u03a9\u0399\u038f\u0399\u03a9\u0342\u03a9\u0342\u0399\u03a9\u0399FFFIFLFFIFFLSTST\u0544\u0546\u0544\u0535\u0544\u053b\u054e\u0546\u0544\u053d"
UPPER_SPECIAL
public static final String UPPER_SPECIAL
This is a listing of characters with multi-character uppercase sequences.
A character appears in this list exactly when it has a non-zero entry
in the low-order 2-bit field of DIRECTION. The listing is a sorted
sequence of pairs (hence a binary search on the even elements is an
efficient way to lookup a character). The first element of a pair is the
character with the expansion, and the second is the index into
UPPER_EXPAND where the expansion begins. Use the 2-bit field of
DIRECTION to determine where the expansion ends.
- "\u00df\000\u0149\002\u01f0\004\u0390\006\u03b0\011\u0587\014\u1e96\016\u1e97\020\u1e98\022\u1e99\024\u1e9a\026\u1f50\030\u1f52\032\u1f54\035\u1f56 \u1f80#\u1f81%\u1f82'\u1f83)\u1f84+\u1f85-\u1f86/\u1f871\u1f883\u1f895\u1f8a7\u1f8b9\u1f8c;\u1f8d=\u1f8e?\u1f8fA\u1f90C\u1f91E\u1f92G\u1f93I\u1f94K\u1f95M\u1f96O\u1f97Q\u1f98S\u1f99U\u1f9aW\u1f9bY\u1f9c[\u1f9d]\u1f9e_\u1f9fa\u1fa0c\u1fa1e\u1fa2g\u1fa3i\u1fa4k\u1fa5m\u1fa6o\u1fa7q\u1fa8s\u1fa9u\u1faaw\u1faby\u1fac{\u1fad}\u1fae\u007f\u1faf\u0081\u1fb2\u0083\u1fb3\u0085\u1fb4\u0087\u1fb6\u0089\u1fb7\u008b\u1fbc\u008e\u1fc2\u0090\u1fc3\u0092\u1fc4\u0094\u1fc6\u0096\u1fc7\u0098\u1fcc\u009b\u1fd2\u009d\u1fd3\u00a0\u1fd6\u00a3\u1fd7\u00a5\u1fe2\u00a8\u1fe3\u00ab\u1fe4\u00ae\u1fe6\u00b0\u1fe7\u00b2\u1ff2\u00b5\u1ff3\u00b7\u1ff4\u00b9\u1ff6\u00bb\u1ff7\u00bd\u1ffc\u00c0\ufb00\u00c2\ufb01\u00c4\ufb02\u00c6\ufb03\u00c8\ufb04\u00cb\ufb05\u00ce\ufb06\u00d0\ufb13\u00d2\ufb14\u00d4\ufb15\u00d6\ufb16\u00d8\ufb17\u00da"
gnu/java/lang/CharData -- Database for java.lang.Character Unicode info
Copyright (C) 2002 Free Software Foundation, Inc.
*** This file is generated by scripts/unicode-muncher.pl ***
This file is part of GNU Classpath.
GNU Classpath is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
GNU Classpath is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU Classpath; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301 USA.
Linking this library statically or dynamically with other modules is
making a combined work based on this library. Thus, the terms and
conditions of the GNU General Public License cover the whole
combination.
As a special exception, the copyright holders of this library give you
permission to link this library with independent modules to produce an
executable, regardless of the license terms of these independent
modules, and to copy and distribute the resulting executable under
terms of your choice, provided that you also meet, for each linked
independent module, the terms and conditions of the license of that
module. An independent module is a module which is not derived from
or based on this library. If you modify this library, you may extend
this exception to your version of the library, but you are not
obligated to do so. If you do not wish to do so, delete this
exception statement from your version.