In Files

Parent

BufferedTokenizer

BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited.

Commonly used to parse lines out of incoming data:

 module LineBufferedConnection
   def receive_data(data)
     (@buffer ||= BufferedTokenizer.new).extract(data).each do |line|
       receive_line(line)
     end
   end
 end

Public Class Methods

new(delimiter = "\n", size_limit = nil) click to toggle source

New BufferedTokenizers will operate on lines delimited by “n“ by default or allow you to specify any delimiter token you so choose, which will then be used by String#split to tokenize the input data

    # File lib/em/buftok.rb, line 36
36:   def initialize(delimiter = "\n", size_limit = nil)
37:     # Store the specified delimiter
38:     @delimiter = delimiter
39: 
40:     # Store the specified size limitation
41:     @size_limit = size_limit
42: 
43:     # The input buffer is stored as an array.  This is by far the most efficient
44:     # approach given language constraints (in C a linked list would be a more
45:     # appropriate data structure).  Segments of input data are stored in a list
46:     # which is only joined when a token is reached, substantially reducing the
47:     # number of objects required for the operation.
48:     @input = []
49: 
50:     # Size of the input buffer
51:     @input_size = 0
52:   end

Public Instance Methods

empty?() click to toggle source

Is the buffer empty?

     # File lib/em/buftok.rb, line 135
135:   def empty?
136:     @input.empty?
137:   end
extract(data) click to toggle source

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:

  tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...
     # File lib/em/buftok.rb, line 59
 59:   def extract(data)
 60:     # Extract token-delimited entities from the input string with the split command.
 61:     # There's a bit of craftiness here with the -1 parameter.  Normally split would
 62:     # behave no differently regardless of if the token lies at the very end of the 
 63:     # input buffer or not (i.e. a literal edge case)  Specifying -1 forces split to
 64:     # return "" in this case, meaning that the last entry in the list represents a
 65:     # new segment of data where the token has not been encountered
 66:     entities = data.split @delimiter, 1
 67: 
 68:     # Check to see if the buffer has exceeded capacity, if we're imposing a limit
 69:     if @size_limit
 70:       raise 'input buffer full' if @input_size + entities.first.size > @size_limit
 71:       @input_size += entities.first.size
 72:     end
 73:     
 74:     # Move the first entry in the resulting array into the input buffer.  It represents
 75:     # the last segment of a token-delimited entity unless it's the only entry in the list.
 76:     @input << entities.shift
 77: 
 78:     # If the resulting array from the split is empty, the token was not encountered
 79:     # (not even at the end of the buffer).  Since we've encountered no token-delimited
 80:     # entities this go-around, return an empty array.
 81:     return [] if entities.empty?
 82: 
 83:     # At this point, we've hit a token, or potentially multiple tokens.  Now we can bring
 84:     # together all the data we've buffered from earlier calls without hitting a token,
 85:     # and add it to our list of discovered entities.
 86:     entities.unshift @input.join
 87: 
 88:     # Note added by FC, 10Jul07. This paragraph contains a regression. It breaks    # empty tokens. Think of the empty line that delimits an HTTP header. It will have    # two "\n" delimiters in a row, and this code mishandles the resulting empty token.    # It someone figures out how to fix the problem, we can re-enable this code branch.    # Multi-character token support.    # Split any tokens that were incomplete on the last iteration buf complete now.    entities.map! do |e|      e.split @delimiter, -1    end    # Flatten the resulting array.  This has the side effect of removing the empty    # entry at the end that was produced by passing -1 to split.  Add it again if    # necessary.    if (entities[-1] == [])      entities.flatten! << []    else      entities.flatten!    end=end
 89: 
 90:     # Now that we've hit a token, joined the input buffer and added it to the entities
 91:     # list, we can go ahead and clear the input buffer.  All of the segments that were
 92:     # stored before the join can now be garbage collected.
 93:     @input.clear
 94:     
 95:     # The last entity in the list is not token delimited, however, thanks to the -1
 96:     # passed to split.  It represents the beginning of a new list of as-yet-untokenized  
 97:     # data, so we add it to the start of the list.
 98:     @input << entities.pop
 99:     
100:     # Set the new input buffer size, provided we're keeping track
101:     @input_size = @input.first.size if @size_limit
102: 
103:     # Now we're left with the list of extracted token-delimited entities we wanted
104:     # in the first place.  Hooray!
105:     entities
106:   end
flush() click to toggle source

Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered

     # File lib/em/buftok.rb, line 128
128:   def flush
129:     buffer = @input.join
130:     @input.clear
131:     buffer
132:   end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.