|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ibm.icu.lang.UCharacter
public final class UCharacter
The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for more Unicode properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF). Each ICU release supports the latest version of Unicode available at that time.
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
Otherwise, another method would be to copy the files uprops.dat and
unames.icu from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode properties, the main differences between UCharacter and Character are:
Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare
In addition to Java compatibility functions, which calculate derived properties, this API provides low-level access to the Unicode Character Database.
Unicode assigns each code point (not just assigned character) values for many properties. Most of them are simple boolean flags, or constants from a small enumerated list. For some properties, values are strings or other relatively more complex types.
For more information see "About the Unicode Character Database" (http://www.unicode.org/ucd/) and the ICU User Guide chapter on Properties (http://www.icu-project.org/userguide/properties.html).
There are also functions that provide easy migration from C/POSIX functions like isblank(). Their use is generally discouraged because the C/POSIX standards do not define their semantics beyond the ASCII range, which means that different implementations exhibit very different behavior. Instead, Unicode properties should be used directly.
There are also only a few, broad C/POSIX character classes, and they tend to be used for conflicting purposes. For example, the "isalpha()" class is sometimes used to determine word boundaries, while a more sophisticated approach would at least distinguish initial letters from continuation characters (the latter including combining marks). (In ICU, BreakIterator is the most sophisticated API for word boundaries.) Another example: There is no "istitle()" class for titlecase characters.
ICU 3.4 and later provides API access for all twelve C/POSIX character classes. ICU implements them according to the Standard Recommendations in Annex C: Compatibility Properties of UTS #18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/#Compatibility_Properties).
API access for C/POSIX character classes is as follows:
- alpha: isUAlphabetic(c) or hasBinaryProperty(c, UProperty.ALPHABETIC)
- lower: isULowercase(c) or hasBinaryProperty(c, UProperty.LOWERCASE)
- upper: isUUppercase(c) or hasBinaryProperty(c, UProperty.UPPERCASE)
- punct: ((1<
The C/POSIX character classes are also available in UnicodeSet patterns,
using patterns like [:graph:] or \p{graph}.
Note: There are several ICU (and Java) whitespace functions.
Comparison:
- isUWhiteSpace=UCHAR_WHITE_SPACE: Unicode White_Space property;
most of general categories "Z" (separators) + most whitespace ISO controls
(including no-break spaces, but excluding IS1..IS4 and ZWSP)
- isWhitespace: Java isWhitespace; Z + whitespace ISO controls but excluding no-break spaces
- isSpaceChar: just Z (including no-break spaces)
This class is not subclassable
Get the numeric value for a Unicode code point as defined in the
Unicode Character Database. A "double" return type is necessary because some numeric values are
fractions, negative, or too large for int. For characters without any numeric values in the Unicode Character
Database, this function will return NO_NUMERIC_VALUE. API Change: In release 2.2 and prior, this API has a
return type int and returns -1 when the argument ch does not have a
corresponding numeric value. This has been changed to synch with ICU4C
UCharacterEnums
Nested Class Summary
static interface
UCharacter.DecompositionType
Decomposition Type constants.
static interface
UCharacter.EastAsianWidth
East Asian Width constants.
static interface
UCharacter.GraphemeClusterBreak
Grapheme Cluster Break constants.
static interface
UCharacter.HangulSyllableType
Hangul Syllable Type constants.
static interface
UCharacter.JoiningGroup
Joining Group constants.
static interface
UCharacter.JoiningType
Joining Type constants.
static interface
UCharacter.LineBreak
Line Break constants.
static interface
UCharacter.NumericType
Numeric Type constants.
static interface
UCharacter.SentenceBreak
Sentence Break constants.
static class
UCharacter.UnicodeBlock
A family of character subsets representing the character blocks in the
Unicode specification, generated from Unicode Data file Blocks.txt.
static interface
UCharacter.WordBreak
Word Break constants.
Field Summary
static int
FOLD_CASE_DEFAULT
Option value for case folding: use default mappings defined in CaseFolding.txt.
static int
FOLD_CASE_EXCLUDE_SPECIAL_I
Option value for case folding: exclude the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt.
static int
MAX_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static char
MAX_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience.
static char
MAX_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience.
static int
MAX_RADIX
Compatibility constant for Java Character's MAX_RADIX.
static char
MAX_SURROGATE
Cover the JDK 1.5 API, for convenience.
static int
MAX_VALUE
The highest Unicode code point value (scalar value) according to the
Unicode Standard.
static int
MIN_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static char
MIN_HIGH_SURROGATE
Cover the JDK 1.5 API, for convenience.
static char
MIN_LOW_SURROGATE
Cover the JDK 1.5 API, for convenience.
static int
MIN_RADIX
Compatibility constant for Java Character's MIN_RADIX.
static int
MIN_SUPPLEMENTARY_CODE_POINT
Cover the JDK 1.5 API, for convenience.
static char
MIN_SURROGATE
Cover the JDK 1.5 API, for convenience.
static int
MIN_VALUE
The lowest Unicode code point value.
static double
NO_NUMERIC_VALUE
Special value that is returned by getUnicodeNumericValue(int) when no
numeric value is defined for a code point.
static int
REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there
is no existing character.
static int
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points
static int
TITLECASE_NO_BREAK_ADJUSTMENT
Do not adjust the titlecasing indexes from BreakIterator::next() indexes;
titlecase exactly the characters at breaks from the iterator.
static int
TITLECASE_NO_LOWERCASE
Do not lowercase non-initial parts of words when titlecasing.
Method Summary
static int
charCount(int cp)
Cover the JDK 1.5 API, for convenience.
static int
codePointAt(char[] text,
int index)
Cover the JDK 1.5 API, for convenience.
static int
codePointAt(char[] text,
int index,
int limit)
Cover the JDK 1.5 API, for convenience.
static int
codePointAt(CharSequence seq,
int index)
Cover the JDK 1.5 API, for convenience.
static int
codePointBefore(char[] text,
int index)
Cover the JDK 1.5 API, for convenience.
static int
codePointBefore(char[] text,
int index,
int limit)
Cover the JDK 1.5 API, for convenience.
static int
codePointBefore(CharSequence seq,
int index)
Cover the JDK 1.5 API, for convenience.
static int
codePointCount(char[] text,
int start,
int limit)
Cover the JDK API, for convenience.
static int
codePointCount(CharSequence text,
int start,
int limit)
Cover the JDK API, for convenience.
static int
digit(int ch)
Retrieves the numeric value of a decimal digit code point.
static int
digit(int ch,
int radix)
Retrieves the numeric value of a decimal digit code point.
static int
foldCase(int ch,
boolean defaultmapping)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
static int
foldCase(int ch,
int options)
The given character is mapped to its case folding equivalent according
to UnicodeData.txt and CaseFolding.txt; if the character has no case
folding equivalent, the character itself is returned.
static String
foldCase(String str,
boolean defaultmapping)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
static String
foldCase(String str,
int options)
The given string is mapped to its case folding equivalent according to
UnicodeData.txt and CaseFolding.txt; if any character has no case
folding equivalent, the character itself is returned.
static char
forDigit(int digit,
int radix)
Provide the java.lang.Character forDigit API, for convenience.
static VersionInfo
getAge(int ch)
Get the "age" of the code point.
static int
getCharFromExtendedName(String name)
Find a Unicode character by either its name and return its code
point value.
static int
getCharFromName(String name)
Find a Unicode code point by its most current Unicode name and
return its code point value.
static int
getCharFromName1_0(String name)
Find a Unicode character by its version 1.0 Unicode name and return
its code point value.
static int
getCodePoint(char char16)
Returns the code point corresponding to the UTF16 character.
static int
getCodePoint(char lead,
char trail)
Returns a code point corresponding to the two UTF16 characters.
static int
getCombiningClass(int ch)
Gets the combining class of the argument codepoint
static int
getDirection(int ch)
Returns the Bidirection property of a code point.
static byte
getDirectionality(int cp)
Cover the JDK API, for convenience.
static String
getExtendedName(int ch)
Retrieves a name for a valid codepoint.
static ValueIterator
getExtendedNameIterator()
Gets an iterator for character names, iterating over codepoints.
static int
getHanNumericValue(int ch)
Return numeric value of Han code points.
static int
getIntPropertyMaxValue(int type)
Get the maximum value for an integer/binary Unicode property.
static int
getIntPropertyMinValue(int type)
Get the minimum value for an integer/binary Unicode property type.
static int
getIntPropertyValue(int ch,
int type)
Gets the property value for an Unicode property type of a code point.
static String
getISOComment(int ch)
Get the ISO 10646 comment for a character.
static int
getMirror(int ch)
Maps the specified code point to a "mirror-image" code point.
static String
getName(int ch)
Retrieve the most current Unicode name of the argument code point, or
null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static String
getName(String s,
String separator)
Gets the names for each of the characters in a string
static String
getName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code
point, or null if the character is unassigned or outside the range
UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static ValueIterator
getName1_0Iterator()
Gets an iterator for character names, iterating over codepoints.
static ValueIterator
getNameIterator()
Gets an iterator for character names, iterating over codepoints.
static int
getNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative
integer.
static int
getPropertyEnum(String propertyAlias)
Return the UProperty selector for a given property name, as
specified in the Unicode database file PropertyAliases.txt.
static String
getPropertyName(int property,
int nameChoice)
Return the Unicode name for a given property, as given in the
Unicode database file PropertyAliases.txt.
static int
getPropertyValueEnum(int property,
String valueAlias)
Return the property value integer for a given value name, as
specified in the Unicode database file PropertyValueAliases.txt.
static String
getPropertyValueName(int property,
int value,
int nameChoice)
Return the Unicode name for a given property value, as given in
the Unicode database file PropertyValueAliases.txt.
static String
getStringPropertyValue(int propertyEnum,
int codepoint,
int nameChoice)
Deprecated. This API is ICU internal only.
static int
getType(int ch)
Returns a value indicating a code point's Unicode category.
static RangeValueIterator
getTypeIterator()
Gets an iterator for character types, iterating over codepoints.
static double
getUnicodeNumericValue(int ch)
Get the numeric value for a Unicode code point as defined in the
Unicode Character Database.
static VersionInfo
getUnicodeVersion()
Gets the version of Unicode data used.
static boolean
hasBinaryProperty(int ch,
int property)
Check a binary Unicode property for a code point.
static boolean
isBaseForm(int ch)
Determines whether the specified code point is of base form.
static boolean
isBMP(int ch)
Determines if the code point is in the BMP plane.
static boolean
isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date
Unicode standard.
static boolean
isDigit(int ch)
Determines if a code point is a Java digit.
static boolean
isHighSurrogate(char ch)
Cover the JDK 1.5 API, for convenience.
static boolean
isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an
ignorable character in a Unicode identifier.
static boolean
isISOControl(int ch)
Determines if the specified code point is an ISO control character.
static boolean
isJavaIdentifierPart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierPart.
static boolean
isJavaIdentifierStart(int cp)
Compatibility override of Java method, delegates to
java.lang.Character.isJavaIdentifierStart.
static boolean
isJavaLetter(int cp)
Deprecated. ICU 3.4 (Java)
static boolean
isJavaLetterOrDigit(int cp)
Deprecated. ICU 3.4 (Java)
static boolean
isLegal(int ch)
A code point is illegal if and only if
Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE
A surrogate value, 0xD800 to 0xDFFF
Not-a-character, having the form 0x xxFFFF or 0x xxFFFE
Note: legal does not mean that it is assigned in this version of Unicode.
static boolean
isLegal(String str)
A string is legal iff all its code points are legal.
static boolean
isLetter(int ch)
Determines if the specified code point is a letter.
static boolean
isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit.
static boolean
isLowerCase(int ch)
Determines if the specified code point is a lowercase character.
static boolean
isLowSurrogate(char ch)
Cover the JDK 1.5 API, for convenience.
static boolean
isMirrored(int ch)
Determines whether the code point has the "mirrored" property.
static boolean
isPrintable(int ch)
Determines whether the specified code point is a printable character
according to the Unicode standard.
static boolean
isSpace(int ch)
Deprecated. ICU 3.4 (Java)
static boolean
isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space
character, i.e. if code point is in the category Zs, Zl and Zp.
static boolean
isSupplementary(int ch)
Determines if the code point is a supplementary character.
static boolean
isSupplementaryCodePoint(int cp)
Cover the JDK 1.5 API, for convenience.
static boolean
isSurrogatePair(char high,
char low)
Cover the JDK 1.5 API, for convenience.
static boolean
isTitleCase(int ch)
Determines if the specified code point is a titlecase character.
static boolean
isUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property.
static boolean
isULowercase(int ch)
Check if a code point has the Lowercase Unicode property.
static boolean
isUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode
identifier other than the starting character.
static boolean
isUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first
character in a Unicode identifier.
static boolean
isUpperCase(int ch)
Determines if the specified code point is an uppercase character.
static boolean
isUUppercase(int ch)
Check if a code point has the Uppercase Unicode property.
static boolean
isUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property.
static boolean
isValidCodePoint(int cp)
Cover the JDK 1.5 API, for convenience.
static boolean
isWhitespace(int ch)
Determines if the specified code point is a white space character.
static int
offsetByCodePoints(char[] text,
int start,
int count,
int index,
int codePointOffset)
Cover the JDK API, for convenience.
static int
offsetByCodePoints(CharSequence text,
int index,
int codePointOffset)
Cover the JDK API, for convenience.
static char[]
toChars(int cp)
Cover the JDK 1.5 API, for convenience.
static int
toChars(int cp,
char[] dst,
int dstIndex)
Cover the JDK 1.5 API, for convenience.
static int
toCodePoint(char high,
char low)
Cover the JDK 1.5 API, for convenience.
static int
toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code
point has no lowercase equivalent, the code point itself is returned.
static String
toLowerCase(Locale locale,
String str)
Gets lowercase version of the argument string.
static String
toLowerCase(String str)
Gets lowercase version of the argument string.
static String
toLowerCase(ULocale locale,
String str)
Gets lowercase version of the argument string.
static String
toString(int ch)
Converts argument code point and returns a String object representing
the code point's value in UTF16 format.
static int
toTitleCase(int ch)
Converts the code point argument to titlecase.
static String
toTitleCase(Locale locale,
String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string.
static String
toTitleCase(String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string.
static String
toTitleCase(ULocale locale,
String str,
BreakIterator titleIter)
Gets the titlecase version of the argument string.
static String
toTitleCase(ULocale locale,
String str,
BreakIterator titleIter,
int options)
Gets the titlecase version of the argument string.
static int
toUpperCase(int ch)
Converts the character argument to uppercase.
static String
toUpperCase(Locale locale,
String str)
Gets uppercase version of the argument string.
static String
toUpperCase(String str)
Gets uppercase version of the argument string.
static String
toUpperCase(ULocale locale,
String str)
Gets uppercase version of the argument string.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Field Detail
MIN_VALUE
public static final int MIN_VALUE
MAX_VALUE
public static final int MAX_VALUE
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE
SUPPLEMENTARY_MIN_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
REPLACEMENT_CHAR
public static final int REPLACEMENT_CHAR
NO_NUMERIC_VALUE
public static final double NO_NUMERIC_VALUE
getUnicodeNumericValue(int)
,
Constant Field Values
MIN_RADIX
public static final int MIN_RADIX
MAX_RADIX
public static final int MAX_RADIX
TITLECASE_NO_LOWERCASE
public static final int TITLECASE_NO_LOWERCASE
toTitleCase(int)
,
Constant Field Values
TITLECASE_NO_BREAK_ADJUSTMENT
public static final int TITLECASE_NO_BREAK_ADJUSTMENT
toTitleCase(int)
,
TITLECASE_NO_LOWERCASE
,
Constant Field Values
FOLD_CASE_DEFAULT
public static final int FOLD_CASE_DEFAULT
FOLD_CASE_EXCLUDE_SPECIAL_I
public static final int FOLD_CASE_EXCLUDE_SPECIAL_I
MIN_HIGH_SURROGATE
public static final char MIN_HIGH_SURROGATE
UTF16.LEAD_SURROGATE_MIN_VALUE
,
Constant Field Values
MAX_HIGH_SURROGATE
public static final char MAX_HIGH_SURROGATE
UTF16.LEAD_SURROGATE_MAX_VALUE
,
Constant Field Values
MIN_LOW_SURROGATE
public static final char MIN_LOW_SURROGATE
UTF16.TRAIL_SURROGATE_MIN_VALUE
,
Constant Field Values
MAX_LOW_SURROGATE
public static final char MAX_LOW_SURROGATE
UTF16.TRAIL_SURROGATE_MAX_VALUE
,
Constant Field Values
MIN_SURROGATE
public static final char MIN_SURROGATE
UTF16.SURROGATE_MIN_VALUE
,
Constant Field Values
MAX_SURROGATE
public static final char MAX_SURROGATE
UTF16.SURROGATE_MAX_VALUE
,
Constant Field Values
MIN_SUPPLEMENTARY_CODE_POINT
public static final int MIN_SUPPLEMENTARY_CODE_POINT
UTF16.SUPPLEMENTARY_MIN_VALUE
,
Constant Field Values
MAX_CODE_POINT
public static final int MAX_CODE_POINT
UTF16.CODEPOINT_MAX_VALUE
,
Constant Field Values
MIN_CODE_POINT
public static final int MIN_CODE_POINT
UTF16.CODEPOINT_MIN_VALUE
,
Constant Field Values
Method Detail
digit
public static int digit(int ch,
int radix)
This method observes the semantics of
java.lang.Character.digit()
. Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and
prior, this did not treat the European letters as having a
digit value, and also treated numeric letters and other numbers as
digits.
This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
ch
- the code point to queryradix
- the radix
digit
public static int digit(int ch)
This is a convenience overload of digit(int, int)
that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this
treated numeric letters and other numbers as digits. This has
been changed to conform to the java semantics.
ch
- the code point to query
getNumericValue
public static int getNumericValue(int ch)
If the code point does not have a numeric value, then -1 is returned.
If the code point has a numeric value that cannot be represented as a
nonnegative integer (for example, a fractional value), then -2 is
returned.
ch
- the code point to query
getUnicodeNumericValue
public static double getUnicodeNumericValue(int ch)
ch
- Code point to get the numeric value for.
isSpace
public static boolean isSpace(int ch)
ch
- the code point
getType
public static int getType(int ch)
Return results are constants from the interface
UCharacterCategory
NOTE: the UCharacterCategory values are not compatible with
those returned by java.lang.Character.getType. UCharacterCategory values
match the ones used in ICU4C, while java.lang.Character type
values, though similar, skip the value 17.
ch
- code point whose type is to be determined
public static boolean isDefined(int ch)
ch
- code point to be determined if it is defined in the most
current version of Unicode
public static boolean isDigit(int ch)
java.lang.Character.isDigit()
. It returns true for decimal
digits only.
ch
- code point to query
public static boolean isISOControl(int ch)
ch
- code point to determine if it is an ISO control character
public static boolean isLetter(int ch)
ch
- code point to determine if it is a letter
public static boolean isLetterOrDigit(int ch)
ch
- code point to determine if it is a letter or a digit
public static boolean isJavaLetter(int cp)
cp
- the code point
public static boolean isJavaLetterOrDigit(int cp)
cp
- the code point
public static boolean isJavaIdentifierStart(int cp)
cp
- the code point
public static boolean isJavaIdentifierPart(int cp)
cp
- the code point
public static boolean isLowerCase(int ch)
ch
- code point to determine if it is in lowercase
public static boolean isWhitespace(int ch)
ch
- code point to determine if it is a white space
public static boolean isSpaceChar(int ch)
ch
- code point to determine if it is a space
public static boolean isTitleCase(int ch)
ch
- code point to determine if it is in title case
public static boolean isUnicodeIdentifierPart(int ch)
ch
- code point to determine if is can be part of a Unicode
identifier
public static boolean isUnicodeIdentifierStart(int ch)
ch
- code point to determine if it can start a Unicode identifier
public static boolean isIdentifierIgnorable(int ch)
ch
- code point to be determined if it can be ignored in a Unicode
identifier.
public static boolean isUpperCase(int ch)
ch
- code point to determine if it is in uppercase
public static int toLowerCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch
- code point whose lowercase equivalent is to be retrieved
public static String toString(int ch)
ch
- code point
public static int toTitleCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch
- code point whose title case is to be retrieved
public static int toUpperCase(int ch)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They take into account the string context and the language and can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch
- code point whose uppercase is to be retrieved
public static boolean isSupplementary(int ch)
ch
- code point to be determined if it is in the supplementary
plane
public static boolean isBMP(int ch)
ch
- code point to be determined if it is not a supplementary
character
public static boolean isPrintable(int ch)
ch
- code point to be determined if it is printable
public static boolean isBaseForm(int ch)
ch
- code point to be determined if it is of base form
public static int getDirection(int ch)
ch
- the code point to be determined its direction
public static boolean isMirrored(int ch)
ch
- code point whose mirror is to be determined
public static int getMirror(int ch)
ch
- code point whose mirror is to be retrieved
public static int getCombiningClass(int ch)
ch
- code point whose combining is to be retrieved
public static boolean isLegal(int ch)
ch
- code point to determine if it is a legal code point by itself
public static boolean isLegal(String str)
str
- containing code points to examin
public static VersionInfo getUnicodeVersion()
public static String getName(int ch)
ch
- the code point for which to get the name
public static String getName(String s, String separator)
s
- string to formatseparator
- string to go between names
public static String getName1_0(int ch)
ch
- the code point for which to get the name
public static String getExtendedName(int ch)
Retrieves a name for a valid codepoint. Unlike, getName(int) and getName1_0(int), this method will return a name even for codepoints that are not assigned a name in UnicodeData.txt.
The names are returned in the following order.
ch
- the code point for which to get the name
public static String getISOComment(int ch)
ch
- The code point for which to get the ISO comment.
It must be 0<=c<=0x10ffff
.
public static int getCharFromName(String name)
Find a Unicode code point by its most current Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
name
- most current Unicode character name whose code point is to
be returned
public static int getCharFromName1_0(String name)
Find a Unicode character by its version 1.0 Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
name
- Unicode 1.0 code point name whose code point is to
returned
public static int getCharFromExtendedName(String name)
Find a Unicode character by either its name and return its code point value. All Unicode names are in uppercase. Extended names are all lowercase except for numbers and are contained within angle brackets.
The names are searched in the following order
name
- codepoint name
public static String getPropertyName(int property, int nameChoice)
property
- UProperty selector.nameChoice
- UProperty.NameChoice selector for which name
to get. All properties have a long name. Most have a short
name, but some do not. Unicode allows for additional names; if
present these will be returned by UProperty.NameChoice.LONG + i,
where i=1, 2,...
IllegalArgumentException
- thrown if property or
nameChoice are invalid.UProperty
,
UProperty.NameChoice
public static int getPropertyEnum(String propertyAlias)
propertyAlias
- the property name to be matched. The name
is compared using "loose matching" as described in
PropertyAliases.txt.
IllegalArgumentException
- thrown if propertyAlias
is not recognized.UProperty
public static String getPropertyValueName(int property, int value, int nameChoice)
property
- UProperty selector constant.
UProperty.INT_START <= property < UProperty.INT_LIMIT or
UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or
UProperty.MASK_START < = property < UProperty.MASK_LIMIT.
If out of range, null is returned.value
- selector for a value for the given property. In
general, valid values range from 0 up to some maximum. There
are a few exceptions: (1.) UProperty.BLOCK values begin at the
non-zero value BASIC_LATIN.getID(). (2.)
UProperty.CANONICAL_COMBINING_CLASS values are not contiguous
and range from 0..240. (3.) UProperty.GENERAL_CATEGORY_MASK values
are mask values produced by left-shifting 1 by
UCharacter.getType(). This allows grouped categories such as
[:L:] to be represented. Mask values are non-contiguous.nameChoice
- UProperty.NameChoice selector for which name
to get. All values have a long name. Most have a short name,
but some do not. Unicode allows for additional names; if
present these will be returned by UProperty.NameChoice.LONG + i,
where i=1, 2,...
IllegalArgumentException
- thrown if property, value,
or nameChoice are invalid.UProperty
,
UProperty.NameChoice
public static int getPropertyValueEnum(int property, String valueAlias)
property
- UProperty selector constant.
UProperty.INT_START <= property < UProperty.INT_LIMIT or
UProperty.BINARY_START <= property < UProperty.BINARY_LIMIT or
UProperty.MASK_START < = property < UProperty.MASK_LIMIT.
Only these properties can be enumerated.valueAlias
- the value name to be matched. The name is
compared using "loose matching" as described in
PropertyValueAliases.txt.
IllegalArgumentException
- if property is not a valid UProperty
selectorUProperty
public static int getCodePoint(char lead, char trail)
lead
- the lead chartrail
- the trail char
IllegalArgumentException
- thrown when argument characters do
not form a valid codepointpublic static int getCodePoint(char char16)
char16
- the UTF16 character
IllegalArgumentException
- thrown when char16 is not a valid
codepointpublic static String toUpperCase(String str)
str
- source string to be performed on
public static String toLowerCase(String str)
str
- source string to be performed on
public static String toTitleCase(String str, BreakIterator breakiter)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the default locale and context-sensitive
str
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.
public static String toUpperCase(Locale locale, String str)
locale
- which string is to be converted instr
- source string to be performed on
public static String toUpperCase(ULocale locale, String str)
locale
- which string is to be converted instr
- source string to be performed on
public static String toLowerCase(Locale locale, String str)
locale
- which string is to be converted instr
- source string to be performed on
public static String toLowerCase(ULocale locale, String str)
locale
- which string is to be converted instr
- source string to be performed on
public static String toTitleCase(Locale locale, String str, BreakIterator breakiter)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.
public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed ontitleIter
- break iterator to determine the positions in which
the character should be title cased.
public static String toTitleCase(ULocale locale, String str, BreakIterator titleIter, int options)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customize his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed ontitleIter
- break iterator to determine the positions in which
the character should be title cased.options
- bit set to modify the titlecasing operation
TITLECASE_NO_LOWERCASE
,
TITLECASE_NO_BREAK_ADJUSTMENT
public static int foldCase(int ch, boolean defaultmapping)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch
- the character to be converteddefaultmapping
- Indicates if all mappings defined in
CaseFolding.txt is to be used, otherwise the
mappings for dotted I and dotless i marked with
'I' in CaseFolding.txt will be skipped.
foldCase(String, boolean)
public static String foldCase(String str, boolean defaultmapping)
str
- the String to be converteddefaultmapping
- Indicates if all mappings defined in
CaseFolding.txt is to be used, otherwise the
mappings for dotted I and dotless i marked with
'I' in CaseFolding.txt will be skipped.
foldCase(int, boolean)
public static int foldCase(int ch, int options)
This function only returns the simple, single-code point case mapping. Full case mappings should be used whenever possible because they produce better results by working on whole strings. They can map to a result string with a different length as appropriate. Full case mappings are applied by the case mapping functions that take String parameters rather than code points (int). See also the User Guide chapter on C/POSIX migration: http://www.icu-project.org/userguide/posix.html#case_mappings
ch
- the character to be convertedoptions
- A bit set for special processing. Currently the recognised options are
FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT
foldCase(String, boolean)
public static final String foldCase(String str, int options)
str
- the String to be convertedoptions
- A bit set for special processing. Currently the recognised options are
FOLD_CASE_EXCLUDE_SPECIAL_I and FOLD_CASE_DEFAULT
foldCase(int, boolean)
public static int getHanNumericValue(int ch)
ch
- code point to query
public static RangeValueIterator getTypeIterator()
Gets an iterator for character types, iterating over codepoints.
Example of use:RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element element = new RangeValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.start) + " to codepoint \\u" + Integer.toHexString(element.limit - 1) + " has the character type " + element.value); }
public static ValueIterator getNameIterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.getNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.
public static ValueIterator getName1_0Iterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the older 1.0 Unicode names. For modern, most up-to-date Unicode names use getNameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.get1_0NameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
public static ValueIterator getExtendedNameIterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names. For modern, most up-to-date Unicode names use getNameIterator() or for older 1.0 Unicode names use get1_0NameIterator().
Example of use:ValueIterator iterator = UCharacter.getExtendedNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
public static VersionInfo getAge(int ch)
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
ch
- The code point.
public static boolean hasBinaryProperty(int ch, int property)
Check a binary Unicode property for a code point.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available.
ch
- code point to test.property
- selector constant from com.ibm.icu.lang.UProperty,
identifies which binary property to check.
UProperty
public static boolean isUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).
Different from UCharacter.isLetter(ch)!
ch
- codepoint to be testedpublic static boolean isULowercase(int ch)
Check if a code point has the Lowercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).
This is different from UCharacter.isLowerCase(ch)!
ch
- codepoint to be testedpublic static boolean isUUppercase(int ch)
Check if a code point has the Uppercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).
This is different from UCharacter.isUpperCase(ch)!
ch
- codepoint to be testedpublic static boolean isUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).
This is different from both UCharacter.isSpace(ch) and UCharacter.isWhitespace(ch)!
ch
- codepoint to be testedpublic static int getIntPropertyValue(int ch, int type)
Gets the property value for an Unicode property type of a code point. Also returns binary and mask property values.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
Sample usage: int ea = UCharacter.getIntPropertyValue(c, UProperty.EAST_ASIAN_WIDTH); int ideo = UCharacter.getIntPropertyValue(c, UProperty.IDEOGRAPHIC); boolean b = (ideo == 1) ? true : false;
ch
- code point to test.type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT or
UProperty.MASK_START <= type < UProperty.MASK_LIMIT.
UProperty
,
hasBinaryProperty(int, int)
,
getIntPropertyMinValue(int)
,
getIntPropertyMaxValue(int)
,
getUnicodeVersion()
public static String getStringPropertyValue(int propertyEnum, int codepoint, int nameChoice)
propertyEnum
- codepoint
- nameChoice
-
public static int getIntPropertyMinValue(int type)
type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT.
UProperty
,
hasBinaryProperty(int, int)
,
getUnicodeVersion()
,
getIntPropertyMaxValue(int)
,
getIntPropertyValue(int, int)
public static int getIntPropertyMaxValue(int type)
type
- UProperty selector constant, identifies which binary
property to check. Must be
UProperty.BINARY_START <= type < UProperty.BINARY_LIMIT or
UProperty.INT_START <= type < UProperty.INT_LIMIT.
UProperty
,
hasBinaryProperty(int, int)
,
getUnicodeVersion()
,
getIntPropertyMaxValue(int)
,
getIntPropertyValue(int, int)
public static char forDigit(int digit, int radix)
public static final boolean isValidCodePoint(int cp)
cp
- the code point to check
public static final boolean isSupplementaryCodePoint(int cp)
cp
- the code point to check
public static boolean isHighSurrogate(char ch)
ch
- the char to check
public static boolean isLowSurrogate(char ch)
ch
- the char to check
public static final boolean isSurrogatePair(char high, char low)
high
- the high (lead) charlow
- the low (trail) char
public static int charCount(int cp)
cp
- the code point to check
UTF16.getCharCount(int)
public static final int toCodePoint(char high, char low)
high
- the high (lead) surrogatelow
- the low (trail) surrogate
public static final int codePointAt(CharSequence seq, int index)
seq
- the characters to checkindex
- the index of the first or only char forming the code point
public static final int codePointAt(char[] text, int index)
text
- the characters to checkindex
- the index of the first or only char forming the code point
public static final int codePointAt(char[] text, int index, int limit)
text
- the characters to checkindex
- the index of the first or only char forming the code pointlimit
- the limit of the valid text
public static final int codePointBefore(CharSequence seq, int index)
seq
- the characters to checkindex
- the index after the last or only char forming the code point
public static final int codePointBefore(char[] text, int index)
text
- the characters to checkindex
- the index after the last or only char forming the code point
public static final int codePointBefore(char[] text, int index, int limit)
text
- the characters to checkindex
- the index after the last or only char forming the code pointlimit
- the start of the valid text
public static final int toChars(int cp, char[] dst, int dstIndex)
cp
- the code point to convertdst
- the destination array into which to put the char(s) representing the code pointdstIndex
- the index at which to put the first (or only) char
IllegalArgumentException
- if cp is not a valid code pointpublic static final char[] toChars(int cp)
cp
- the code point to convert
IllegalArgumentException
- if cp is not a valid code pointpublic static byte getDirectionality(int cp)
UCharacterEnums.ECharacterDirection
since the values are different from the ones defined by java.lang.Character
.
cp
- the code point to check
getDirection(int)
public static int codePointCount(CharSequence text, int start, int limit)
text
- the characters to checkstart
- the start of the rangelimit
- the limit of the range
public static int codePointCount(char[] text, int start, int limit)
text
- the characters to checkstart
- the start of the rangelimit
- the limit of the range
public static int offsetByCodePoints(CharSequence text, int index, int codePointOffset)
text
- the characters to checkindex
- the index to adjustcodePointOffset
- the number of code points by which to offset the index
public static int offsetByCodePoints(char[] text, int start, int count, int index, int codePointOffset)
text
- the characters to checkstart
- the start of the range to checkcount
- the length of the range to checkindex
- the index to adjustcodePointOffset
- the number of code points by which to offset the index
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |