|
Qizx/Open v0.3 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--net.xfra.qizxopen.util.DefaultWordExtractor
A default word extractor suitable for European languages compatible with ISO-8859-1.
By default, words start on a letter, accept letters/digits inside. Characters are folded to lowercase and - unless setKeepAccents(true) is called - accented letters to the corresponding non-accented letters (e.g eacute maps to 'E'.) This behavior can be redefined in subclasses by redefining isWordStart, isWordPart and mapChar.
Constructor Summary | |
DefaultWordExtractor()
|
Method Summary | |
char |
charAt(int ahead)
Returns the character at current position + ahead, or 0 if after end. |
boolean |
isWordPart(char c)
Returns true if a word may contain this character. |
boolean |
isWordStart(char c)
Returns true if a word may begin with this character. |
static void |
main(java.lang.String[] args)
|
char |
mapChar(char c)
Normalizes a character (belonging to a word) |
char |
nextChar()
Moves to next character and return it, returns 0 if at end. |
char[] |
nextWord()
Gets the next normalized word, or null if no more words. |
void |
setKeepAccents(boolean keep)
|
void |
start(char[] text,
int length)
Starts the analysis of a new text chunk. |
int |
wordLength()
Returns the original length of the last word returned by nextWord. |
int |
wordOffset()
Returns the offset of the last word returned by nextWord. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public DefaultWordExtractor()
Method Detail |
public void start(char[] text, int length)
WordExtractor
start
in interface WordExtractor
public boolean isWordStart(char c)
isWordStart
in interface WordExtractor
public boolean isWordPart(char c)
isWordPart
in interface WordExtractor
public char mapChar(char c)
WordExtractor
mapChar
in interface WordExtractor
public char[] nextWord()
WordExtractor
nextWord
in interface WordExtractor
public char charAt(int ahead)
WordExtractor
charAt
in interface WordExtractor
public char nextChar()
WordExtractor
nextChar
in interface WordExtractor
public int wordOffset()
WordExtractor
wordOffset
in interface WordExtractor
public int wordLength()
WordExtractor
wordLength
in interface WordExtractor
public void setKeepAccents(boolean keep)
public static void main(java.lang.String[] args)
|
Copyright Xavier FRANC 2003-2004 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |