org.apache.lucene.analysis.compound
Class CompoundWordTokenFilterBase
java.lang.Object
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.compound.CompoundWordTokenFilterBase
- Direct Known Subclasses:
- DictionaryCompoundWordTokenFilter, HyphenationCompoundWordTokenFilter
public abstract class CompoundWordTokenFilterBase
- extends TokenFilter
Base class for decomposition token filters.
Constructor Summary |
protected |
CompoundWordTokenFilterBase(TokenStream input,
Set dictionary)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
Set dictionary,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
boolean onlyLongestMatch)
|
protected |
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MIN_WORD_SIZE
public static final int DEFAULT_MIN_WORD_SIZE
- The default for minimal word length that gets decomposed
- See Also:
- Constant Field Values
DEFAULT_MIN_SUBWORD_SIZE
public static final int DEFAULT_MIN_SUBWORD_SIZE
- The default for minimal length of subwords that get propagated to the output of this filter
- See Also:
- Constant Field Values
DEFAULT_MAX_SUBWORD_SIZE
public static final int DEFAULT_MAX_SUBWORD_SIZE
- The default for maximal length of subwords that get propagated to the output of this filter
- See Also:
- Constant Field Values
dictionary
protected final CharArraySet dictionary
tokens
protected final LinkedList tokens
minWordSize
protected final int minWordSize
minSubwordSize
protected final int minSubwordSize
maxSubwordSize
protected final int maxSubwordSize
onlyLongestMatch
protected final boolean onlyLongestMatch
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
boolean onlyLongestMatch)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
Set dictionary,
boolean onlyLongestMatch)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
Set dictionary)
CompoundWordTokenFilterBase
protected CompoundWordTokenFilterBase(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
makeDictionary
public static final Set makeDictionary(String[] dictionary)
- Create a set of words from an array
The resulting Set does case insensitive matching
TODO We should look for a faster dictionary lookup approach.
- Parameters:
dictionary
-
- Returns:
next
public Token next(Token reusableToken)
throws IOException
- Overrides:
next
in class TokenStream
- Throws:
IOException
addAllLowerCase
protected static final void addAllLowerCase(Set target,
Collection col)
makeLowerCaseCopy
protected static char[] makeLowerCaseCopy(char[] buffer)
createToken
protected final Token createToken(int offset,
int length,
Token prototype)
decompose
protected void decompose(Token token)
decomposeInternal
protected abstract void decomposeInternal(Token token)
Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.