A tokenizer for the Ruby language. It recognizes all common syntax (and some less common syntax) but because it is not a true lexer, it will make mistakes on some ambiguous cases.
The list of all identifiers recognized as keywords.
Perform ruby-specific setup
# File lib/syntax/lang/ruby.rb, line 18 18: def setup 19: @selector = false 20: @allow_operator = false 21: @heredocs = [] 22: end
Step through a single iteration of the tokenization process.
# File lib/syntax/lang/ruby.rb, line 25 25: def step 26: case 27: when bol? && check( /=begin/ ) 28: start_group( :comment, scan_until( /^=end#{EOL}/ ) ) 29: when bol? && check( /__END__#{EOL}/ ) 30: start_group( :comment, scan_until( /\Z/ ) ) 31: else 32: case 33: when check( /def\s+/ ) 34: start_group :keyword, scan( /def\s+/ ) 35: start_group :method, scan_until( /(?=[;(\s]|#{EOL})/ ) 36: when check( /class\s+/ ) 37: start_group :keyword, scan( /class\s+/ ) 38: start_group :class, scan_until( /(?=[;\s<]|#{EOL})/ ) 39: when check( /module\s+/ ) 40: start_group :keyword, scan( /module\s+/ ) 41: start_group :module, scan_until( /(?=[;\s]|#{EOL})/ ) 42: when check( /::/ ) 43: start_group :punct, scan(/::/) 44: when check( /:"/ ) 45: start_group :symbol, scan(/:/) 46: scan_delimited_region :symbol, :symbol, "", true 47: @allow_operator = true 48: when check( /:'/ ) 49: start_group :symbol, scan(/:/) 50: scan_delimited_region :symbol, :symbol, "", false 51: @allow_operator = true 52: when scan( /:[_a-zA-Z@$][$@\w]*[=!?]?/ ) 53: start_group :symbol, matched 54: @allow_operator = true 55: when scan( /\?(\\[^\n\r]|[^\\\n\r\s])/ ) 56: start_group :char, matched 57: @allow_operator = true 58: when check( /(__FILE__|__LINE__|true|false|nil|self)[?!]?/ ) 59: if @selector || matched[1] == ?? || matched[1] == !! 60: start_group :ident, 61: scan(/(__FILE__|__LINE__|true|false|nil|self)[?!]?/) 62: else 63: start_group :constant, 64: scan(/(__FILE__|__LINE__|true|false|nil|self)/) 65: end 66: @selector = false 67: @allow_operator = true 68: when scan(/0([bB][01]+|[oO][0-7]+|[dD][0-9]+|[xX][0-9a-fA-F]+)/) 69: start_group :number, matched 70: @allow_operator = true 71: else 72: case peek(2) 73: when "%r" 74: scan_delimited_region :punct, :regex, scan( /../ ), true 75: @allow_operator = true 76: when "%w", "%q" 77: scan_delimited_region :punct, :string, scan( /../ ), false 78: @allow_operator = true 79: when "%s" 80: scan_delimited_region :punct, :symbol, scan( /../ ), false 81: @allow_operator = true 82: when "%W", "%Q", "%x" 83: scan_delimited_region :punct, :string, scan( /../ ), true 84: @allow_operator = true 85: when /%[^\sa-zA-Z0-9]/ 86: scan_delimited_region :punct, :string, scan( /./ ), true 87: @allow_operator = true 88: when "<<" 89: saw_word = ( chunk[1,1] =~ /[\w!?]/ ) 90: start_group :punct, scan( /<</ ) 91: if saw_word 92: @allow_operator = false 93: return 94: end 95: 96: float_right = scan( /-/ ) 97: append "-" if float_right 98: if ( type = scan( /['"]/ ) ) 99: append type 100: delim = scan_until( /(?=#{type})/ ) 101: if delim.nil? 102: append scan_until( /\Z/ ) 103: return 104: end 105: else 106: delim = scan( /\w+/ ) or return 107: end 108: start_group :constant, delim 109: start_group :punct, scan( /#{type}/ ) if type 110: @heredocs << [ float_right, type, delim ] 111: @allow_operator = true 112: else 113: case peek(1) 114: when /[\n\r]/ 115: unless @heredocs.empty? 116: scan_heredoc(*@heredocs.shift) 117: else 118: start_group :normal, scan( /\s+/ ) 119: end 120: @allow_operator = false 121: when /\s/ 122: start_group :normal, scan( /\s+/ ) 123: when "#" 124: start_group :comment, scan( /#[^\n\r]*/ ) 125: when /[A-Z]/ 126: start_group @selector ? :ident : :constant, scan( /\w+/ ) 127: @allow_operator = true 128: when /[a-z_]/ 129: word = scan( /\w+[?!]?/ ) 130: if !@selector && KEYWORDS.include?( word ) 131: start_group :keyword, word 132: @allow_operator = false 133: elsif 134: start_group :ident, word 135: @allow_operator = true 136: end 137: @selector = false 138: when /\d/ 139: start_group :number, 140: scan( /[\d_]+(\.[\d_]+)?([eE][\d_]+)?/ ) 141: @allow_operator = true 142: when '"' 143: scan_delimited_region :punct, :string, "", true 144: @allow_operator = true 145: when '/' 146: if @allow_operator 147: start_group :punct, scan(%{/}) 148: @allow_operator = false 149: else 150: scan_delimited_region :punct, :regex, "", true 151: @allow_operator = true 152: end 153: when "'" 154: scan_delimited_region :punct, :string, "", false 155: @allow_operator = true 156: when "." 157: dots = scan( /\.{1,3}/ ) 158: start_group :punct, dots 159: @selector = ( dots.length == 1 ) 160: when /[@]/ 161: start_group :attribute, scan( /@{1,2}\w*/ ) 162: @allow_operator = true 163: when /[$]/ 164: start_group :global, scan(/\$/) 165: start_group :global, scan( /\w+|./ ) if check(/./) 166: @allow_operator = true 167: when /[-!?*\/+=<>(\[\{}:;,&|%]/ 168: start_group :punct, scan(/./) 169: @allow_operator = false 170: when /[)\]]/ 171: start_group :punct, scan(/./) 172: @allow_operator = true 173: else 174: # all else just falls through this, to prevent 175: # infinite loops... 176: append getch 177: end 178: end 179: end 180: end 181: end
Scan a delimited region of text. This handles the simple cases (strings delimited with quotes) as well as the more complex cases of %-strings and here-documents.
delim_group is the group to use to classify the delimiters of the region
inner_group is the group to use to classify the contents of the region
starter is the text to use as the starting delimiter
exprs is a boolean flag indicating whether the region is an interpolated string or not
delim is the text to use as the delimiter of the region. If nil, the next character will be treated as the delimiter.
heredoc is either false, meaning the region is not a heredoc, or :flush (meaning the delimiter must be flushed left), or :float (meaning the delimiter doens’t have to be flush left).
# File lib/syntax/lang/ruby.rb, line 201 201: def scan_delimited_region( delim_group, inner_group, starter, exprs, 202: delim=nil, heredoc=false ) 203: # begin 204: if !delim 205: start_group delim_group, starter 206: delim = scan( /./ ) 207: append delim 208: 209: delim = case delim 210: when '{' then '}' 211: when '(' then ')' 212: when '[' then ']' 213: when '<' then '>' 214: else delim 215: end 216: end 217: 218: start_region inner_group 219: 220: items = "\\\\|" 221: if heredoc 222: items << "(^" 223: items << '\s*' if heredoc == :float 224: items << "#{Regexp.escape(delim)}\s*?)#{EOL}" 225: else 226: items << "#{Regexp.escape(delim)}" 227: end 228: items << "|#(\\$|@@?|\\{)" if exprs 229: items = Regexp.new( items ) 230: 231: loop do 232: p = pos 233: match = scan_until( items ) 234: if match.nil? 235: start_group inner_group, scan_until( /\Z/ ) 236: break 237: else 238: text = pre_match[p..1] 239: start_group inner_group, text if text.length > 0 240: case matched.strip 241: when "\\" 242: unless exprs 243: case peek(1) 244: when "'" 245: scan(/./) 246: start_group :escape, "\\'" 247: when "\\" 248: scan(/./) 249: start_group :escape, "\\\\" 250: else 251: start_group inner_group, "\\" 252: end 253: else 254: start_group :escape, "\\" 255: c = getch 256: append c 257: case c 258: when 'x' 259: append scan( /[a-fA-F0-9]{1,2}/ ) 260: when /[0-7]/ 261: append scan( /[0-7]{0,2}/ ) 262: end 263: end 264: when delim 265: end_region inner_group 266: start_group delim_group, matched 267: break 268: when /^#/ 269: do_highlight = (option(:expressions) == :highlight) 270: start_region :expr if do_highlight 271: start_group :expr, matched 272: case matched[1] 273: when {{ 274: depth = 1 275: content = "" 276: while depth > 0 277: p = pos 278: c = scan_until( /[\{}]/ ) 279: if c.nil? 280: content << scan_until( /\Z/ ) 281: break 282: else 283: depth += ( matched == "{" ? 1 : 1 ) 284: content << pre_match[p..1] 285: content << matched if depth > 0 286: end 287: end 288: if do_highlight 289: subtokenize "ruby", content 290: start_group :expr, "}" 291: else 292: append content + "}" 293: end 294: when $$, @@ 295: append scan( /\w+/ ) 296: end 297: end_region :expr if do_highlight 298: else raise "unexpected match on #{matched}" 299: end 300: end 301: end 302: end
Scan a heredoc beginning at the current position.
float indicates whether the delimiter may be floated to the right
type is nil, a single quote, or a double quote
delim is the delimiter to look for
# File lib/syntax/lang/ruby.rb, line 309 309: def scan_heredoc(float, type, delim) 310: scan_delimited_region( :constant, :string, "", type != "'", 311: delim, float ? :float : :flush ) 312: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.