The original w3m (denoted by w3m below, and assume that the executable file has the same name) reads its configuration options from
The w3m with multiple character encoding extension
(denoted by w3mmee
,
and assume that the executable file has the same name)
needs to know
realm of automatic detection of encoding scheme,
encodings which your terminal accepts,
conversion manner of encoding and character set,
messages localized for your language,
and so on.
Hence its startup flow is somewhat complecated.
First w3mmee examine value of the environment variable ``W3MLANG'' (or ``LANG'' if ``W3MLANG'' is unset). It lowers cases of alphabets in the value, and regards the value as in the form:
<language code>+"_"(under score)+<country code>+"."(period)+<encoding>For instance, if ``W3MLANG'' has value ``ja_JP.UTF-8'', w3mmee will get
From these components, w3mmee composes file names:
Next it reads expositions of options displayed in the option setup panel, from the files:
<option name>+"="(equal sign)+<exposition>are recognized as definitions of expostions. Spaces at beginning of lines, at end of lines, before equal signs, and after equal signs, are removed.
Finally
per user configuration file of the same format as $LIB_DIR/w3mconfig*,
per user message setup file of the same format as $LIB_DIR/w3mmessages*,
Contents of this section is applicative only when you configured w3mmee to use gettext().
When return value of gettext() function contains non US-ASCII characters, encoding of such characters must be converted to internal one. Gettext() determines encoding of its output based on codeset name in current locale, while w3mmee uses MIME charset name. Unfortunately a codeset name and a MIME charset name for an encoding scheme differ from each other in general, so w3mmee needs mapping table between them.
Though such table is already built into w3mmee, it is quite possible that the table is insufficient in your environment. Then you can tell additional correspondences to w3mmee with files
<MIME charset name>+"="(equal sign)+<lang. spec>[+","(comma)+...]where you may add optional spaces around "=" and ",". <lang. spec> must be a string of the form
<language code>+"_"+<country code>+"."+<codeset name>where any (but not all) of <language code>, "_"+<country code>, or "."+<codeset name> may be omitted.
The followings are the list of new configuration options concerning character encoding added by multiple character encoding extension.
Specifies your language. Currently, value of this option is used only to restrict realm of encoding schemes for autodetection.
For example, assume that you have specified as
mylang cjkand try to read a document with no charset specification. Then w3mmee try to find encoding scheme among
You can also specify comma seprated list of names of character encoding schemes. In this case, the encoding schemes are used as candidates for autodetections.
Specifies encoding scheme of a document, of which w3mmee fails to autodetect encoding scheme.
Specifies encoding scheme of terminal I/O.
Using this option is deprecated. Please use tty_initial_input_charset and tty_initial_output_charset instead.
When ISO 2022 conforming encoding scheme is specified with tty_charset, initial state of intermediate buffers of that encoding for input stream from tty can be modified to that of encoding scheme specified with this option.
When ISO 2022 conforming encoding scheme is specified with tty_charset, initial state of intermediate buffers of that encoding for output stream to tty can be modified to that of encoding scheme specified with this option.
Specifies conversions of encoding scheme and character set of terminal input.
Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.
Specifies conversions of encoding scheme and character set of terminal output.
Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.
Unless terminal can display a character or replacement string is specified for the character, conversions specified by this option are applied to the character.
Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.
Specifies encoding scheme of a document which contains no charset sepcification, and makes w3mmee to stop autodetection of encoding scheme.
Specifies conversion of encoding scheme and character set of characters input from network or a local file.
Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.
When a document contains no charset specification and w3mmee fails to autodetect encoding scheme of the document, w3mmee assumes that name of encoding scheme of the document is that specified by this option.
If the document contains a form requiring input of text, argument passed to the action of the form after conversion to the encoding. Currently this is the only case affected by this option.
Specifies conversion of encoding scheme and character set of characters output to network or a local file.
Please use this option only if you completely understand behavior of the support library used by multiple character encoding extension.
Specifies encoding schemes for strings which may be passed to a local process, such as arguments for bookmark registration program.
<string> must be a space seprated list of charset specifications of the following form:
<sep1>+<regular expression for process name>+<sep2>+<charset>or
<charset>Each space separated token is treated as first form if the first character is non-alphanumeric. Otherwise it is regared as second form. In first form, if <sep1> is "(", "{", "[", "<", or "^", <sep2> must be ")", "}", "]", ">", or "$", respectively. Otherwise <sep2> must equal <sep1>. <sep1> and <sep2> are treated as part of regular expression, only if they are "^" and "$", respectively. Second form is an abbreviation of
"^.*$"+<charset>
A process name given, regular expressions are matched against the name in order. The charset corresponding to the expression of which match succeeded first is adopted.
Specifies characters which your terminal can't handle. Instead of any character in the range, w3mmee output to terminal the first matching one in the list:
In case that options of this type appear twice, and that one includes another, more specific one is adopted. Or if the ranges overlap, only overlapping range is overwritten by the latter specification.
Specifies default replacement string for characters which your terminal can't handle.
Specifies a format string for messages representing documentations in buffers with mouse support disabled (including the case that mouse support was disabled when configured).
Specifies a format string for messages representing documentations in buffers with mouse support enabled.
Specifies replacement string when middle part of a long URI is omitted.
Specifies comma separated list of strings leading items of <ul> construct.
Specifies a string leading items of <ul> of which type attribute is "disc".
Specifies a string leading items of <ul> of which type attribute is "circle".
Specifies a string leading items of <ul> of which type attribute is "square".
Specifies replacement string for small images.
Specifies a string used to draw <hr>.
Specifies a comma separated list of menu frame components starting with left-top corner, left to right, and top to bottom.
Specifies a comma separated list of table borders in the order:
Specifies a comma separated list of table bold face borders in the order:
The option setup panel has an additional item to choose whether new setup will be saved to $HOME/.w3mmee/config. This option specifies an exposition of this configuration option.
Specifies a canonical name of non-standard charset names in the form
<canonical name>+"="(equal sign)+<comma spearated list of charset names>No space is allowed around equal sign or comma. Charset names are case insensitive.
For example, to treat a page containing charset specification ``charset=SHIFT-JIS'' as if its charset is ``Shift_JIS'', please add the line
charset_cname shift_jis=shift-jisto your config file.
If there are two options of this type defining the same canonical name, the latter overrides the former.
Specifies the name of a character width table. Recognized names are as follows (names are case insensitive).
The followings are the list of new configuration options not concerning character encoding. Since original w3m does not recoginize for various reasons (because my patch was rejected, or I have not ported yet related codes to original w3m for my laziness), they are listed in this document.
Binds value <encoding name> of HTTP header field "content-encoding", MIME type <media type>, and a filter program to decode contents encoded with method identified by the name <encoding name>. For this option to be functional, you further need to bind <media type> with a file name extesion by adding a line
<media type> <the extension>to the file
$HOME/.mime.types
.
In case that options of this type appear twice or more, and that encoding names coincide, last specification is adopted.
Specifies a comma separated list of file extensions which stand for content languages.
If a file has multiple extensions, the extensions listed in this option is skipped when w3mmee determines content type of the file.
Specifies whether regular expression search across multiple lines is enabled or not.
Specifies maximum of number of processes to load documents.
Specifies maximum of number of processes to load documents from each server.
Specify how many redirections should be followed.
Specify optional HTTP request header to be added. The headers
Host
,Pragma
,Cache-Control
,Content-Length
are always assigned with values generated by w3mmee, and your specifications are ignored. The headers
UserArgent
,Accept
,Accept-Encoding
,Accept-Language
ara assigned with values generated by w3mmee unless you explicitly specify them. The headers
Content-Type
,Referer
are assigned with values which you specify only if there is no other appropriate value. The headers
Cookie
,Cookie2
,
are assigned with values which you specify only if cookie support in w3mmee is disabled by compile option, by command line option, or by configuration option. Otherwise w3mmee decides their values.
In case that options of this type appear twice or more, and that header names coincide, last specification is adopted.
Specify version of each HTTP request. Acceptable value is "1.1" or "1.0" (without double quotation marks). Any other value is silently ignored, and version is set to "1.1".
Specify style of refering anchors in formatted dump of a document. It is passed to sprintf function toghether with number (starting with 1) in the list of all links in the document. So it must contain one and only one sprintf conversion specification "%d".
Specify style of refering images in formatted dump of a document. It is passed to sprintf function toghether with number (starting with 1) in the list of all links in the document. So it must contain one and only one sprintf conversion specification "%d".
Specify style of optional line number and columns information of links to labels within the same document in formatted dump of a document. It is passed to sprintf function toghether with line number and columns (both starting with 1). So it must contain just two sprintf conversion specifications "%d".
When make link references in a formated output of a document, <string> is used as URL of the document.
When a cursor moving command is issued and cursor goes outside current view, view scrolls <number> lines or columns.
Specify a mailcap entry of maxmal priority, which is intended to change an external viewer temprarily.
Options of this type can appear more than once.
Specify a browsecap entry of maxmal priority, which is intended to change an external browser temprarily.
Options of this type can appear more than once.
Specify whether to wrap a line wider than screen width or not.
Specify the indicator of truncated lines.
Specify the indicator of continued lines.
Specify whether to load inline images before actually displayed or not.
Specify default virtical alignment of inline images.
<position> must be one of
D
(stands for "default"),
T
(stands for "top"),
M
(stands for "middle"),
or
B
(stands for "bottom").
D
is almost the same as
B
,
but somewhat differs for smalle images.
Specify default virtical alignment in table.
<position> must be one of
T
(stands for "top"),
M
(stands for "middle"),
or
B
(stands for "bottom").
Specify behaviour when HTTP request with method other than GET or HEAD is redirected with HTTP response code 301 or 302. <behaviour> must be one of
0
1
2
3
Specify color of frame borders.
Specify whether or not number of pixels per character can be auto-detected.
Specify whether or not number of pixels per line can be auto-detected.
Specifies a comma separated list of file extensions. When it has failed to open a local file, w3mmee appends each of the extensions to the name of the file, and retries to open a file with the new name.
You can specify "*" (asterisk without quotes) as an item in the list, which is expanded to the comma separated list of all the file extensions bound to content encoding methods (".Z,.bz2,.gz" by default, see accept_encoding option).
Specify whether or not you want to edit cached sources of remote pages.
Specify whether or not trailing spaces of each formatted line should be removed.
w3mmee recognizes following additional %-escapes on string expansion in mailcap entry.
The host part of URL.
The port part of URL.
The whole URL.
First %<test> is tested whether it expands to something. Please notice that "%" is prepended to the beginning of <test>. If it really expands to anything including empty string, <yes> is processed. Otherwise <no> is processed. If <yes> is omitted, it is treated as if <test> is copied to that place. If <no> is omitted and if expansion of <test> fails, whole escape is replaced with empty string.
w3mmee includes a mechanism to determine an external browser invoked on a URL automatically based on the scheme part of the URL. Bindings of external browsers and schemes are given by "browsecap" files. w3mmee trys to scan two files
File format is also the same as "mailcap" files. Only exception is that the first field of each entry must be of the form
<scheme>+"/"(slash)+<method>where currently supported <method> is "post", "get", or "download". <method> part may be "*" (asterisk), which is treated as a usual wildcard. In case that <method> part is "post", arguments which should be passed to a CGI program, is passed to a matched external browser as its standard input.
If relevant URL contains query string and if the query string includes a component like <word>=<value>, an escape sequence of the form %{<word>} expands to <value>. Further the escape sequence %? expands to whole of the query string (the first question mark is exclusive).
The browsecap facility is also used to determine an editor used to edit the source file of a buffer, the formatted image of a buffer, value of a input control of text type of a form element, or contents of a textarea control of a form element. An entry is adopted for this purpose if the first field of it matches "x-w3m-edit/buffer", "x-w3m-edit/screen", "x-w3m-edit/inputtext", or "x-w3m-edit/textarea", respectively.
Parser of mailcap and browsecap entries in w3mmee recognizes new flags "x-w3m-internal", "x-w3m-cgioutput", "x-w3m-match=<regexp>", and "x-w3m-nc-match=<regexp>".
If the flag "x-w3m-internal" is set in an entry, the entry is restricted to internal use such as determining process of an enditor described above. I recommend to set this flag in entries for such editors.
If the flag "x-w3m-cgioutput" is set, the program determined by the entry is treated as if it is a CGI program, that is, various environment variables are set before calling the program and lines before the first empty line in output of the program are parsed as HTTP response header.
Flags "x-w3m-match=<regexp>" and "x-w3m-nc-match=<regexp>" are only recognized in browsecap. They are exclusive, and if both are set for one entry, the latter one is atopted. If one of them is set, <regexp> is matched against the whole URL (in case-insensitive manner for "x-w3m-nc-match=<regexp>"), and only when match have succeeded, the entry is adopted. When "test=..." is also set, the results are ANDed to determine whether or not to adopt the entry.
The first argument of tty_accept_character or of tty_reject_character must be of the following form. For Unicode characters,
"U+"+<hexadecimal notation of Unicode>.or
"U+"+<hexadecimal notation of Unicode of starting character in the range>+ "-"+<hexadecimal notation of Unicode of ending character in the range>For non-Unicode characters,
"I+"+<internal representation of character>or
"I+"+<internal representation of starting character in the range>+ "-"+<internal representation of ending character in the range>
``Internal representation'' of non-Unicode character is computed as follows.
First determine an integer S
after ISO 2022 classification of character set:
Then,
for 94, 96, or 94x94 set,
let F
be the final octet of designating sequence in ISO 2022 encoding.
For 94 set which needs further itermediate octet 2/1 in its designating sequence,
further add 0x40 to F
.
For non-ISO 2022 character set,
the support library
assigns each character set with an integer to identify the set.
We adopt that integer as F
.
Finally
order all the codepoints representable in the character set,
and assign all codepoints with numbers C
starting with 0, in that order.
Hexadecimal notations S
, F
, C
joined with ``+'' (plus sign)
compose ``internal representation''.
F
and C
are optional,
and their default values are