net.java.sen.dictionary
クラス Tokenizer

java.lang.Object
  上位を拡張 net.java.sen.dictionary.Tokenizer
直系の既知のサブクラス:
JapaneseTokenizer

public abstract class Tokenizer
extends java.lang.Object

A String Tokenizer

The Tokenizer uses a Dictionary to assist the decomposition of strings into potential morphemes


フィールドの概要
protected  Node bosNode
          A Node representing a beginning-of-string
protected  Dictionary dictionary
          The Dictionary used to find possible morphemes
protected  Node eosNode
          A Node representing an end-of-string
protected  CToken unknownCToken
          A CToken representing an unknown morpheme
protected  java.lang.String unknownPartOfSpeechDescription
          The part-of-speech code to use for unknown tokens
 
コンストラクタの概要
Tokenizer(Dictionary dictionary, java.lang.String unknownPartOfSpeechDescription)
          Constructs a new Tokenizer that uses the specified Dictionary to find possible morphemes within a given string
 
メソッドの概要
 Node getBOSNode()
          Creates a unique beginning-of-string Node.
 Dictionary getDictionary()
           
 Node getEOSNode()
          Creates a unique end-of-string Node.
 Node getUnknownNode(char[] surface, int start, int length, int span)
          Creates an "unknown morpheme" Node with the specified characteristics.
abstract  Node lookup(SentenceIterator iterator, char[] surface)
          Searches for possible morphemes from the given SentenceIterator.
 
クラス java.lang.Object から継承されたメソッド
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

フィールドの詳細

dictionary

protected Dictionary dictionary
The Dictionary used to find possible morphemes


unknownCToken

protected CToken unknownCToken
A CToken representing an unknown morpheme


bosNode

protected Node bosNode
A Node representing a beginning-of-string


eosNode

protected Node eosNode
A Node representing an end-of-string


unknownPartOfSpeechDescription

protected java.lang.String unknownPartOfSpeechDescription
The part-of-speech code to use for unknown tokens

コンストラクタの詳細

Tokenizer

public Tokenizer(Dictionary dictionary,
                 java.lang.String unknownPartOfSpeechDescription)
Constructs a new Tokenizer that uses the specified Dictionary to find possible morphemes within a given string

パラメータ:
dictionary - The Dictionary to search within
unknownPartOfSpeechDescription - The part-of-speech code to use for unknown tokens
メソッドの詳細

getDictionary

public Dictionary getDictionary()
戻り値:
Returns the dictionary used to find possible morphemes

getBOSNode

public Node getBOSNode()
Creates a unique beginning-of-string Node. The Node returned by this method is freshly cloned and not an alias of any other Node

戻り値:
A beginning-of-string Node

getEOSNode

public Node getEOSNode()
Creates a unique end-of-string Node. The Node returned by this method is freshly cloned and not an alias of any other Node

戻り値:
An end-of-string Node

getUnknownNode

public Node getUnknownNode(char[] surface,
                           int start,
                           int length,
                           int span)
Creates an "unknown morpheme" Node with the specified characteristics. The Node returned by this method is freshly cloned and not an alias of any other Node

パラメータ:
surface - The underlying surface of which the Node is part
start - The index of the first character of the surface within the Node
length - The length of the Node
span - The span of the Node
戻り値:
The new "unknown morpheme" Node

lookup

public abstract Node lookup(SentenceIterator iterator,
                            char[] surface)
                     throws java.io.IOException
Searches for possible morphemes from the given SentenceIterator. The Node that is returned links through Node.rnext to a list of matches which may be of varying lengths

パラメータ:
iterator - The iterator to search from
surface - The underlying character surface
戻り値:
The head of a chain of Nodes representing the possible morphemes beginning at the given index
例外:
java.io.IOException


Copyright ? 2008. All Rights Reserved.