com.ibm.icu.text
Interface UForwardCharacterIterator

All Known Implementing Classes:
UCharacterIterator

public interface UForwardCharacterIterator

Interface that defines an API for forward-only iteration on text objects. This is a minimal interface for iteration without random access or backwards iteration. It is especially useful for wrapping streams with converters into an object for collation or normalization.

Characters can be accessed in two ways: as code units or as code points. Unicode code points are 21-bit integers and are the scalar values of Unicode characters. ICU uses the type int for them. Unicode code units are the storage units of a given Unicode/UCS Transformation Format (a character encoding scheme). With UTF-16, all code points can be represented with either one or two code units ("surrogates"). String storage is typically based on code units, while properties of characters are typically determined using code point values. Some processes may be designed to work with sequences of code units, or it may be known that all characters that are important to an algorithm can be represented with single code units. Other processes will need to use the code point access functions.

ForwardCharacterIterator provides next() to access a code unit and advance an internal position into the text object, similar to a return text[position++].
It provides nextCodePoint() to access a code point and advance an internal position.

nextCodePoint() assumes that the current position is that of the beginning of a code point, i.e., of its first code unit. After nextCodePoint(), this will be true again. In general, access to code units and code points in the same iteration loop should not be mixed. In UTF-16, if the current position is on a second code unit (Low Surrogate), then only that code unit is returned even by nextCodePoint().

Usage: public void function1(UForwardCharacterIterator it) { int c; while((c=it.next())!=UForwardCharacterIterator.DONE) { // use c } }

Status:
Stable ICU 2.4.

Field Summary
static int DONE
          Indicator that we have reached the ends of the UTF16 text.
 
Method Summary
 int next()
          Returns the UTF16 code unit at index, and increments to the next code unit (post-increment semantics).
 int nextCodePoint()
          Returns the code point at index, and increments to the next code point (post-increment semantics).
 

Field Detail

DONE

static final int DONE
Indicator that we have reached the ends of the UTF16 text.

See Also:
Constant Field Values
Status:
Stable ICU 2.4.
Method Detail

next

int next()
Returns the UTF16 code unit at index, and increments to the next code unit (post-increment semantics). If index is out of range, DONE is returned, and the iterator is reset to the limit of the text.

Returns:
the next UTF16 code unit, or DONE if the index is at the limit of the text.
Status:
Stable ICU 2.4.

nextCodePoint

int nextCodePoint()
Returns the code point at index, and increments to the next code point (post-increment semantics). If index does not point to a valid surrogate pair, the behavior is the same as next(). Otherwise the iterator is incremented past the surrogate pair, and the code point represented by the pair is returned.

Returns:
the next codepoint in text, or DONE if the index is at the limit of the text.
Status:
Stable ICU 2.4.


Copyright (c) 2009 IBM Corporation and others.