com.ibm.icu.text
Class CollationElementIterator

java.lang.Object
  extended by com.ibm.icu.text.CollationElementIterator

public final class CollationElementIterator
extends Object

CollationElementIterator is an iterator created by a RuleBasedCollator to walk through a string. The return result of each iteration is a 32-bit collation element that defines the ordering priority of the next character or sequence of characters in the source string.

For illustration, consider the following in Spanish:

 "ca" -> the first collation element is collation_element('c') and second
         collation element is collation_element('a').

 Since "ch" in Spanish sorts as one entity, the below example returns one
 collation element for the two characters 'c' and 'h'

 "cha" -> the first collation element is collation_element('ch') and second
          collation element is collation_element('a').
 
And in German,
 Since the character 'æ' is a composed character of 'a' and 'e', the
 iterator returns two collation elements for the single character 'æ'

 "æb" -> the first collation element is collation_element('a'), the
              second collation element is collation_element('e'), and the
              third collation element is collation_element('b').
 

For collation ordering comparison, the collation element results can not be compared simply by using basic arithmetric operators, e.g. <, == or >, further processing has to be done. Details can be found in the ICU user guide. An example of using the CollationElementIterator for collation ordering comparison is the class com.ibm.icu.text.StringSearch.

To construct a CollationElementIterator object, users call the method getCollationElementIterator() on a RuleBasedCollator that defines the desired sorting order.

Example:

  String testString = "This is a test";
  RuleBasedCollator rbc = new RuleBasedCollator("&a<b");
  CollationElementIterator iterator = rbc.getCollationElementIterator(testString);
  int primaryOrder = iterator.IGNORABLE;
  while (primaryOrder != iterator.NULLORDER) {
      int order = iterator.next();
      if (order != iterator.IGNORABLE &&
          order != iterator.NULLORDER) {
          // order is valid, not ignorable and we have not passed the end
          // of the iteration, we do something
          primaryOrder = CollationElementIterator.primaryOrder(order);
          System.out.println("Next primary order 0x" +
                             Integer.toHexString(primaryOrder));
      }
  }
 

This class is not subclassable

Author:
Syn Wee Quek
See Also:
Collator, RuleBasedCollator, StringSearch
Status:
Stable ICU 2.8.

Field Summary
static int IGNORABLE
          This constant is returned by the iterator in the methods next() and previous() when a collation element result is to be ignored.
static int NULLORDER
          This constant is returned by the iterator in the methods next() and previous() when the end or the beginning of the source string has been reached, and there are no more valid collation elements to return.
 
Method Summary
 boolean equals(Object that)
          Tests that argument object is equals to this CollationElementIterator.
 int getMaxExpansion(int ce)
           Returns the maximum length of any expansion sequence that ends with the specified collation element.
 int getOffset()
          Returns the character offset in the source string corresponding to the next collation element.
 int next()
          Get the next collation element in the source string.
 int previous()
          Get the previous collation element in the source string.
static int primaryOrder(int ce)
          Return the primary order of the specified collation element, i.e. the first 16 bits.
 void reset()
           Resets the cursor to the beginning of the string.
static int secondaryOrder(int ce)
          Return the secondary order of the specified collation element, i.e. the 16th to 23th bits, inclusive.
 void setOffset(int offset)
           Sets the iterator to point to the collation element corresponding to the character at the specified offset.
 void setText(CharacterIterator source)
          Set a new source string iterator for iteration, and reset the offset to the beginning of the text.
 void setText(String source)
          Set a new source string for iteration, and reset the offset to the beginning of the text.
 void setText(UCharacterIterator source)
          Set a new source string iterator for iteration, and reset the offset to the beginning of the text.
static int tertiaryOrder(int ce)
          Return the tertiary order of the specified collation element, i.e. the last 8 bits.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NULLORDER

public static final int NULLORDER

This constant is returned by the iterator in the methods next() and previous() when the end or the beginning of the source string has been reached, and there are no more valid collation elements to return.

See class documentation for an example of use.

See Also:
next(), previous(), Constant Field Values
Status:
Stable ICU 2.8.

IGNORABLE

public static final int IGNORABLE

This constant is returned by the iterator in the methods next() and previous() when a collation element result is to be ignored.

See class documentation for an example of use.

See Also:
next(), previous(), Constant Field Values
Status:
Stable ICU 2.8.
Method Detail

getOffset

public int getOffset()

Returns the character offset in the source string corresponding to the next collation element. I.e., getOffset() returns the position in the source string corresponding to the collation element that will be returned by the next call to next(). This value could be any of:

Returns:
The character offset in the source string corresponding to the collation element that will be returned by the next call to next().
Status:
Stable ICU 2.8.

getMaxExpansion

public int getMaxExpansion(int ce)

Returns the maximum length of any expansion sequence that ends with the specified collation element. If there is no expansion with this collation element as the last element, returns 1.

Parameters:
ce - a collation element returned by previous() or next().
Returns:
the maximum length of any expansion sequence ending with the specified collation element.
Status:
Stable ICU 2.8.

reset

public void reset()

Resets the cursor to the beginning of the string. The next call to next() or previous() will return the first and last collation element in the string, respectively.

If the RuleBasedCollator used by this iterator has had its attributes changed, calling reset() will reinitialize the iterator to use the new attributes.

Status:
Stable ICU 2.8.

next

public int next()

Get the next collation element in the source string.

This iterator iterates over a sequence of collation elements that were built from the string. Because there isn't necessarily a one-to-one mapping from characters to collation elements, this doesn't mean the same thing as "return the collation element [or ordering priority] of the next character in the string".

This function returns the collation element that the iterator is currently pointing to, and then updates the internal pointer to point to the next element. Previous() updates the pointer first, and then returns the element. This means that when you change direction while iterating (i.e., call next() and then call previous(), or call previous() and then call next()), you'll get back the same element twice.

Returns:
the next collation element or NULLORDER if the end of the iteration has been reached.
Status:
Stable ICU 2.8.

previous

public int previous()

Get the previous collation element in the source string.

This iterator iterates over a sequence of collation elements that were built from the string. Because there isn't necessarily a one-to-one mapping from characters to collation elements, this doesn't mean the same thing as "return the collation element [or ordering priority] of the previous character in the string".

This function updates the iterator's internal pointer to point to the collation element preceding the one it's currently pointing to and then returns that element, while next() returns the current element and then updates the pointer. This means that when you change direction while iterating (i.e., call next() and then call previous(), or call previous() and then call next()), you'll get back the same element twice.

Returns:
the previous collation element, or NULLORDER when the start of the iteration has been reached.
Status:
Stable ICU 2.8.

primaryOrder

public static final int primaryOrder(int ce)
Return the primary order of the specified collation element, i.e. the first 16 bits. This value is unsigned.

Parameters:
ce - the collation element
Returns:
the element's 16 bits primary order.
Status:
Stable ICU 2.8.

secondaryOrder

public static final int secondaryOrder(int ce)
Return the secondary order of the specified collation element, i.e. the 16th to 23th bits, inclusive. This value is unsigned.

Parameters:
ce - the collation element
Returns:
the element's 8 bits secondary order
Status:
Stable ICU 2.8.

tertiaryOrder

public static final int tertiaryOrder(int ce)
Return the tertiary order of the specified collation element, i.e. the last 8 bits. This value is unsigned.

Parameters:
ce - the collation element
Returns:
the element's 8 bits tertiary order
Status:
Stable ICU 2.8.

setOffset

public void setOffset(int offset)

Sets the iterator to point to the collation element corresponding to the character at the specified offset. The value returned by the next call to next() will be the collation element corresponding to the characters at offset.

If offset is in the middle of a contracting character sequence, the iterator is adjusted to the start of the contracting sequence. This means that getOffset() is not guaranteed to return the same value set by this method.

If the decomposition mode is on, and offset is in the middle of a decomposible range of source text, the iterator may not return a correct result for the next forwards or backwards iteration. The user must ensure that the offset is not in the middle of a decomposible range.

Parameters:
offset - the character offset into the original source string to set. Note that this is not an offset into the corresponding sequence of collation elements.
Status:
Stable ICU 2.8.

setText

public void setText(String source)

Set a new source string for iteration, and reset the offset to the beginning of the text.

Parameters:
source - the new source string for iteration.
Status:
Stable ICU 2.8.

setText

public void setText(UCharacterIterator source)

Set a new source string iterator for iteration, and reset the offset to the beginning of the text.

The source iterator's integrity will be preserved since a new copy will be created for use.

Parameters:
source - the new source string iterator for iteration.
Status:
Stable ICU 2.8.

setText

public void setText(CharacterIterator source)

Set a new source string iterator for iteration, and reset the offset to the beginning of the text.

Parameters:
source - the new source string iterator for iteration.
Status:
Stable ICU 2.8.

equals

public boolean equals(Object that)
Tests that argument object is equals to this CollationElementIterator. Iterators are equal if the objects uses the same RuleBasedCollator, the same source text and have the same current position in iteration.

Overrides:
equals in class Object
Parameters:
that - object to test if it is equals to this CollationElementIterator
Status:
Stable ICU 2.8.


Copyright (c) 2009 IBM Corporation and others.