#####################################################################

jcode.pl: Perl library for Japanese character code conversion

Copyright (c) 1995-2000 Kazumasa Utashiro <utashiro@iij.ad.jp>
Internet Initiative Japan Inc.
3-13 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101-0054, Japan

Copyright (c) 1992,1993,1994 Kazumasa Utashiro
Software Research Associates, Inc.

Use and redistribution for ANY PURPOSE are granted as long as all
copyright notices are retained.  Redistribution with modification
is allowed provided that you make your modified version obviously
distinguishable from the original one.  THIS SOFTWARE IS PROVIDED
BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES ARE
DISCLAIMED.

Original version was developed under the name of srekcah@sra.co.jp
February 1992 and it was called kconv.pl at the beginning.  This
address was a pen name for group of individuals and it is no longer
valid.

The latest version is available here:

	ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/

;; $rcsid = q$Id: jcode.pl,v 2.13 2000/09/29 16:10:05 utashiro Exp $;

#####################################################################

PERL4 INTERFACE:

	&jcode'getcode(*line)
		Return 'jis', 'sjis', 'euc' or undef according to
		Japanese character code in $line.  Return 'binary' if
		the data has non-character code.

		When evaluated in array context, it returns a list
		contains two items.  First value is the number of
		characters which matched to the expected code, and
		second value is the code name.  It is useful if and
		only if the number is not 0 and the code is undef;
		that case means it couldn't tell 'euc' or 'sjis'
		because the evaluation score was exactly same.  This
		interface is too tricky, though.

		Code detection between euc and sjis is very difficult
		or sometimes impossible or even lead to wrong result
		when it includes JIS X0201 KANA characters.  So JIS
		X0201 KANA is ignored for automatic code detection.

	&jcode'convert(*line, $ocode [, $icode [, $option]])
		Convert the contents of $line to the specified
		Japanese code given in the second argument $ocode.
		$ocode can be any of "jis", "sjis" or "euc", or use
		"noconv" when you don't want the code conversion.
		Input code is recognized automatically from the line
		itself when $icode is not supplied (JIS X0201 KANA is
		ignored in code detection.  See the above descripton
		of &getcode).  $icode also can be specified, but
		xxx2yyy routine is more efficient when both codes are
		known.

		It returns the code of input string in scalar context,
		and a list of pointer of convert subroutine and the
		input code in array context.

		Japanese character code JIS X0201, X0208, X0212 and
		ASCII code are supported.  X0212 characters can not be
		represented in SJIS and they will be replased by
		"geta" character when converted to SJIS.

		See next paragraph for $option parameter.

	&jcode'xxx2yyy(*line [, $option])
		Convert the Japanese code from xxx to yyy.  String xxx
		and yyy are any convination from "jis", "euc" or
		"sjis".  They return *approximate* number of converted
		bytes.  So return value 0 means the line was not
		converted at all.

		Optional parameter $option is used to specify optional
		conversion method.  String "z" is for JIS X0201 KANA
		to X0208 KANA, and "h" is for reverse.

	$jcode'convf{'xxx', 'yyy'}
		The value of this associative array is pointer to the
		subroutine jcode'xxx2yyy().

	&jcode'to($ocode, $line [, $icode [, $option]])
	&jcode'jis($line [, $icode [, $option]])
	&jcode'euc($line [, $icode [, $option]])
	&jcode'sjis($line [, $icode [, $option]])
		These functions are prepared for easy use of
		call/return-by-value interface.  You can use these
		funcitons in s///e operation or any other place for
		convenience.

	&jcode'jis_inout($in, $out)
		Set or inquire JIS start and end sequences.  Default
		is "ESC-$-B" and "ESC-(-B".  If you supplied only one
		character, "ESC-$" or "ESC-(" is prepended for each
		character respectively.  Acutually "ESC-(-B" is not a
		sequence to end JIS code but a sequence to start ASCII
		code set.  So `in' and `out' are somewhat misleading.

	&jcode'get_inout($string)
		Get JIS start and end sequences from $string.

	&jcode'cache()
	&jcode'nocache()
	&jcode'flush()
		Usually, converted character is cached in memory to
		avoid same calculations have to be done many times.
		To disable this caching, call &jcode'nocache().  It
		can be revived by &jcode'cache() and cache is flushed
		by calling &jcode'flush().  &cache() and &nocache()
		functions return previous caching state.

	---------------------------------------------------------------

	&jcode'h2z_xxx(*line)
		JIS X0201 KANA (so-called Hankaku-KANA) to X0208 KANA
		(Zenkaku-KANA) code conversion routine.  String xxx is
		any of "jis", "sjis" and "euc".  From the difficulty
		of recognizing code set from 1-byte KATAKANA string,
		automatic code recognition is not supported.

	&jcode'z2h_xxx(*line)
		X0208 to X0201 KANA code conversion routine.  String
		xxx is any of "jis", "sjis" and "euc".

	$jcode'z2hf{'xxx'}
	$jcode'h2zf{'xxx'}
		These are pointer to the corresponding function just
		as $jcode'convf.

	---------------------------------------------------------------

	&jcode'tr(*line, $from, $to [, $option])
		&jcode'tr emulates tr operator for 2 byte code.  Only 'd'
		is interpreted as an option.

		Range operator like `A-Z' for 2 byte code is partially
		supported.  Code must be JIS or EUC, and first byte
		have to be same on first and last character.

		CAUTION: Handling range operator is a kind of trick
		and it is not perfect.  So if you need to transfer `-' 
		character, please be sure to put it at the beginning
		or the end of $from and $to strings.

	&jcode'trans($line, $from, $to [, $option)
		Same as &jcode'tr but accept string and return string
		after translation.

	---------------------------------------------------------------

	&jcode'init()
		Initialize the variables used in this package.  You
		don't have to call this when using jocde.pl by `do' or
		`require' interface.  Call it first if you embedded
		the jcode.pl at the end of your script.

#####################################################################

PERL5 INTERFACE:

Current jcode.pl is written in Perl 4 but it is possible to use
from Perl 5 using `references'.  Fully perl5 capable version is
future issue.

Since lexical variable is not a subject of typeglob, *string style
call doesn't work if the variable is declared as `my'.  Same thing
happens to special variable $_ if the perl is compiled to use
thread capability.  So using reference is generally recommented to
avoid the mysterious error.

	jcode::getcode(\$line)
	jcode::convert(\$line, $ocode [, $icode [, $option]])
	jcode::xxx2yyy(\$line [, $option])
	&{$jcode::convf{'xxx', 'yyy'}}(\$line)
	jcode::to($ocode, $line [, $icode [, $option]])
	jcode::jis($line [, $icode [, $option]])
	jcode::euc($line [, $icode [, $option]])
	jcode::sjis($line [, $icode [, $option]])
	jcode::jis_inout($in, $out)
	jcode::get_inout($string)
	jcode::cache()
	jcode::nocache()
	jcode::flush()
	jcode::h2z_xxx(\$line)
	jcode::z2h_xxx(\$line)
	&{$jcode::z2hf{'xxx'}}(\$line)
	&{$jcode::h2zf{'xxx'}}(\$line)
	jcode::tr(\$line, $from, $to [, $option])
	jcode::trans($line, $from, $to [, $option)
	jcode::init()

#####################################################################

SAMPLES

Convert any Kanji code to JIS and print each line with code name.

	while (defined($s = <>)) {
	    $code = &jcode'convert(*s, 'jis');
	    print $code, "\t", $s;
	}
	
Convert all lines to JIS according to the first recognized line.

	while (defined($s = <>)) {
	    print, next unless $s =~ /[\033\200-\377]/;
	    (*f, $icode) = &jcode'convert(*s, 'jis');
	    print;
	    defined(&f) || next;
	    while (<>) { &f(*s); print; }
	    last;
	}

The safest way of JIS conversion.

	while (defined($s = <>)) {
	    ($matched, $icode) = &jcode'getcode(*s);
	    if (@buf == 0 && $matched == 0) {
		print $s;
		next;
	    }
	    push(@buf, $s);
	    next unless $icode;
	    while (defined($s = shift(@buf))) {
		&jcode'convert(*s, 'jis', $icode);
		print $s;
	    }
	    while (defined($s = <>)) {
		&jcode'convert(*s, 'jis', $icode);
		print $s;
	    }
	    last;
	}
	print @buf if @buf;