Object
encoding: UTF-8
encoding: UTF-8
encoding: UTF-8
HTML entity encoding and decoding for Ruby
Legacy compatibility class method allowing direct decoding of XHTML1 entities. See HTMLEntities#decode for description of parameters.
Deprecated.
# File lib/htmlentities/legacy.rb, line 20 20: def decode_entities(*args) 21: xhtml1_entities.decode(*args) 22: end
Legacy compatibility class method allowing direct encoding of XHTML1 entities. See HTMLEntities#encode for description of parameters.
Deprecated.
# File lib/htmlentities/legacy.rb, line 10 10: def encode_entities(*args) 11: xhtml1_entities.encode(*args) 12: end
Create a new HTMLEntities coder for the specified flavor. Available flavors are ‘html4’, ‘expanded’ and ‘xhtml1’ (the default).
The only difference in functionality between html4 and xhtml1 is in the handling of the apos (apostrophe) named entity, which is not defined in HTML4.
‘expanded’ includes a large number of additional SGML entities drawn from
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT
it “maps SGML character entities from various public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode characters.” (sgml.txt).
‘expanded’ is a strict superset of the XHTML entities: every xhtml named entity encodes and decodes the same under :expanded as under :xhtml1
# File lib/htmlentities.rb, line 33 33: def initialize(flavor='xhtml1') 34: @flavor = flavor.to_s.downcase 35: raise UnknownFlavor, "Unknown flavor #{flavor}" unless FLAVORS.include?(@flavor) 36: end
Decode entities in a string into their UTF-8 equivalents. The string should already be in UTF-8 encoding.
Unknown named entities will not be converted
# File lib/htmlentities.rb, line 44 44: def decode(source) 45: Decoder.new(@flavor).decode(source) 46: end
Encode codepoints into their corresponding entities. Various operations are possible, and may be specified in order:
:basic | Convert the five XML entities (’”<>&) |
:named | Convert non-ASCII characters to their named HTML 4.01 equivalent |
:decimal | Convert non-ASCII characters to decimal entities (e.g. Ӓ) |
:hexadecimal | Convert non-ASCII characters to hexadecimal entities (e.g. # &#;) |
You can specify the commands in any order, but they will be executed in the order listed above to ensure that entity ampersands are not clobbered and that named entities are replaced before numeric ones.
If no instructions are specified, :basic will be used.
Examples:
encode_entities(str) - XML-safe encode_entities(str, :basic, :decimal) - XML-safe and 7-bit clean encode_entities(str, :basic, :named, :decimal) - 7-bit clean, with all non-ASCII characters replaced with their named entity where possible, and decimal equivalents otherwise.
Note: It is the program’s responsibility to ensure that the source contains valid UTF-8 before calling this method.
# File lib/htmlentities.rb, line 73 73: def encode(source, *instructions) 74: Encoder.new(@flavor, instructions).encode(source) 75: end
Disabled; run with --debug to generate this.
Generated with the Darkfish Rdoc Generator 1.1.6.