A MappingFilter maps strings in tokens. This is usually used to map UTF-8 characters to ASCII characters for easier searching and better search recall. The mapping is compiled into a Deterministic Finite Automata so it is super fast. This Filter can therefor be used for indexing very large datasets. Currently regular expressions are not supported. If you are really interested in the feature, please contact me at dbalmain@gmail.com.
mapping = { ['à','á','â','ã','ä','å','ā','ă'] => 'a', 'æ' => 'ae', ['ď','đ'] => 'd', ['ç','ć','č','ĉ','ċ'] => 'c', ['è','é','ê','ë','ē','ę','ě','ĕ','ė',] => 'e', ['ƒ'] => 'f', ['ĝ','ğ','ġ','ģ'] => 'g', ['ĥ','ħ'] => 'h', ['ì','ì','í','î','ï','ī','ĩ','ĭ'] => 'i', ['į','ı','ij','ĵ'] => 'j', ['ķ','ĸ'] => 'k', ['ł','ľ','ĺ','ļ','ŀ'] => 'l', ['ñ','ń','ň','ņ','ʼn','ŋ'] => 'n', ['ò','ó','ô','õ','ö','ø','ō','ő','ŏ','ŏ'] => 'o', ['œ'] => 'oek', ['ą'] => 'q', ['ŕ','ř','ŗ'] => 'r', ['ś','š','ş','ŝ','ș'] => 's', ['ť','ţ','ŧ','ț'] => 't', ['ù','ú','û','ü','ū','ů','ű','ŭ','ũ','ų'] => 'u', ['ŵ'] => 'w', ['ý','ÿ','ŷ'] => 'y', ['ž','ż','ź'] => 'z' } filt = MappingFilter.new(token_stream, mapping)
Create an MappingFilter which maps strings in tokens. This is usually used to map UTF-8 characters to ASCII characters for easier searching and better search recall. The mapping is compiled into a Deterministic Finite Automata so it is super fast. This Filter can therefor be used for indexing very large datasets. Currently regular expressions are not supported. If you are really interested in the feature, please contact me at dbalmain@gmail.com.
token_stream | TokenStream to be filtered |
mapping | Hash of mappings to apply to tokens. The key can be a String or an Array of Strings. The value must be a String |
filt = MappingFilter.new(token_stream, { ['à','á','â','ã','ä','å'] => 'a', ['è','é','ê','ë','ē','ę'] => 'e' })
static VALUE frb_mapping_filter_init(VALUE self, VALUE rsub_ts, VALUE mapping) { TokenStream *ts; ts = frb_get_cwrapped_rts(rsub_ts); ts = mapping_filter_new(ts); rb_hash_foreach(mapping, frb_add_mappings_i, (VALUE)ts); mulmap_compile(((MappingFilter *)ts)->mapper); object_add(&(TkFilt(ts)->sub_ts), rsub_ts); Frt_Wrap_Struct(self, &frb_tf_mark, &frb_tf_free, ts); object_add(ts, self); return self; }
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.