A variant of w3m with support for multiple character encodings


This document describes about a variant of w3m with support for the following character encodings

..., and it further includes the following enhancement, though they are not related with character encoding.


Current status

The page which includes both UTF-8 encoded Unicode characters and ISO 2022 registered characters, is rendered without any problem, and can be viewed on kterm with ISO 2022, or on xterm with UTF-8. In case terminal emulator fails to find corresponding fonts, garbages may appear on screen, but they will disappear if missing fonts are supplied.

It is not tested at all whether line input routine can handle double width characters other than JIS X 0208. Right-to-left writing language is not considered at all. Terminal emulators used for test are kterm(-6.2.0), xterm(-165) with utf-8 support, and rxvt(-2.6.3) with euc-jp support.


Installation

If you will use character encoding extension part of this variant, you need first to get and install the support library. In most POSIX conforming environment, you just need to issue the following commands:

make install ; ldconfig
I strongly recomend to use the latest version. Unless you will use character encoding extension part of this variant, you need not this library.

You may further need to install Boehm-GC library, unless you have already the library. Notice that you may need to install include file gc.h, and probably private/gc_private.h, in hand.

Then move to the directory where you extracted the tarball and follow the usual process:

configure ; make ; su -c 'make install'
where, when you run configure, you need to answer "3" to the question
Which language do you prefer?
if you want to enable character encoding extensions. Other enhancements are still included even if you answer 1 or 2.

When you invoke the installed executable, you need to assign the environment variable "W3MLANG" with the value

<your language code>_<your country code>.kterm
in case you are using kterm,
<your language code>_<your country code>.UTF-8
in case you are using xterm.
<your language code>_<your country code>.eucJISX0208
in case you are using rxvt with EUC-jp support.
<your language code>_<your country code>.sjis
in case you are using rxvt with Shift_JIS support.
<your language code>_<your country code>.big5
in case you are using rxvt with Big Five support.