In Files

Methods

Class Index [+]

Quicksearch

Ferret::Analysis::WhiteSpaceTokenizer

Summary

A WhiteSpaceTokenizer is a tokenizer that divides text at white-space. Adjacent sequences of non-WhiteSpace characters form tokens.

Example

  "Dave's résumé, at http://www.davebalmain.com/ 1234"
    => ["Dave's", "résumé,", "at", "http://www.davebalmain.com", "1234"]

Public Class Methods

new(lower = true) → tokenizer click to toggle source

Create a new WhiteSpaceTokenizer which optionally downcases tokens. Downcasing is done according the current locale.

lower

set to false if you don’t wish to downcase tokens

static VALUE
frb_whitespace_tokenizer_init(int argc, VALUE *argv, VALUE self) 
{
    TS_ARGS(false);
#ifndef POSH_OS_WIN32
    if (!frb_locale) frb_locale = setlocale(LC_CTYPE, "");
#endif
    return get_wrapped_ts(self, rstr, mb_whitespace_tokenizer_new(lower));
}

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.