Syntax::XML

A simple implementation of an XML lexer. It handles most cases. It is not a validating lexer, meaning it will happily process invalid XML without complaining.

Public Instance Methods

setup() click to toggle source

Initialize the lexer.

    # File lib/syntax/lang/xml.rb, line 11
11:     def setup
12:       @in_tag = false
13:     end
step() click to toggle source

Step through a single iteration of the tokenization process. This will yield (potentially) many tokens, and possibly zero tokens.

    # File lib/syntax/lang/xml.rb, line 17
17:     def step
18:       start_group :normal, matched if scan( /\s+/ )
19:       if @in_tag
20:         case
21:           when scan( /([-\w]+):([-\w]+)/ )
22:             start_group :namespace, subgroup(1)
23:             start_group :punct, ":"
24:             start_group :attribute, subgroup(2)
25:           when scan( /\d+/ )
26:             start_group :number, matched
27:           when scan( /[-\w]+/ )
28:             start_group :attribute, matched
29:           when scan( %{[/?]?>} )
30:             @in_tag = false
31:             start_group :punct, matched
32:           when scan( /=/ )
33:             start_group :punct, matched
34:           when scan( /["']/ )
35:             scan_string matched
36:           else
37:             append getch
38:         end
39:       elsif ( text = scan_until( /(?=[<&])/ ) )
40:         start_group :normal, text unless text.empty?
41:         if scan(/<!--.*?(-->|\Z)/)
42:           start_group :comment, matched
43:         else
44:           case peek(1)
45:             when "<"
46:               start_group :punct, getch
47:               case peek(1)
48:                 when "?"
49:                   append getch
50:                 when "/"
51:                   append getch
52:                 when "!"
53:                   append getch
54:               end
55:               start_group :normal, matched if scan( /\s+/ )
56:               if scan( /([-\w]+):([-\w]+)/ )
57:                 start_group :namespace, subgroup(1)
58:                 start_group :punct, ":"
59:                 start_group :tag, subgroup(2)
60:               elsif scan( /[-\w]+/ )
61:                 start_group :tag, matched
62:               end
63:               @in_tag = true
64:             when "&"
65:               if scan( /&\S{1,10};/ )
66:                 start_group :entity, matched
67:               else
68:                 start_group :normal, scan( /&/ )
69:               end
70:           end
71:         end
72:       else
73:         append scan_until( /\Z/ )
74:       end
75:     end

Private Instance Methods

scan_string( delim ) click to toggle source

Scan the string starting at the current position, with the given delimiter character.

     # File lib/syntax/lang/xml.rb, line 81
 81:       def scan_string( delim )
 82:         start_group :punct, delim
 83:         match = /(?=[&\\]|#{delim})/
 84:         loop do
 85:           break unless ( text = scan_until( match ) )
 86:           start_group :string, text unless text.empty?
 87:           case peek(1)
 88:             when "&"
 89:               if scan( /&\S{1,10};/ )
 90:                 start_group :entity, matched
 91:               else
 92:                 start_group :string, getch
 93:               end
 94:             when "\\"
 95:               start_group :string, getch
 96:               append getch || ""
 97:             when delim
 98:               start_group :punct, getch
 99:               break
100:           end
101:         end
102:       end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.