We assume you have already adapted your Linux console and X11 configuration to your keyboard and locale. This is explained in the Danish/International HOWTO, and in the other national HOWTOs: Finnish, French, German, Italian, Polish, Slovenian, Spanish, Cyrillic, Hebrew, Chinese, Thai, Esperanto. But please do not follow the advice given in the Thai HOWTO, to pretend you were using ISO-8859-1 characters (U0000..U00FF) when what you are typing are actually Thai characters (U0E01..U0E5B). Doing so will only cause problems when you switch to Unicode.
I'm not talking much about the Linux console here, because on those machines on which I don't have xdm running, I use it only to type my login name, my password, and "xinit".
Anyway, the kbd-0.99 package ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz and a heavily extended version, the console-tools-0.2.3 package ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-tools-0.2.3.tar.gz contains in the kbd-0.99/src/ (or console-tools-0.2.3/screenfonttools/) directory two programs: `unicode_start' and `unicode_stop'. When you call `unicode_start', the console's screen output is interpreted as UTF-8. Also, the keyboard is put into Unicode mode (see "man kbd_mode"). In this mode, Unicode characters typed as Alt-x1 ... Alt-xn (where x1,...,xn are digits on the numeric keypad) will be emitted in UTF-8. If your keyboard or, more precisely, your normal keymap has non-ASCII letter keys (like the German Umlaute) which you would like to be CapsLockable, you need to apply the kernel patch linux-2.2.9-keyboard.diff or linux-2.3.12-keyboard.diff.
You will want to use display characters from different scripts on the same screen. For this, you need a Unicode console font. The ftp://sunsite.unc.edu/pub/Linux/system/keyboards/kbd-0.99.tar.gz and ftp://sunsite.unc.edu/pub/Linux/system/keyboards/console-data-1999.08.29.tar.gz packages contain a font (LatArCyrHeb-{08,14,16,19}.psf) which covers Latin, Cyrillic, Hebrew, Arabic scripts. It covers ISO 8859 parts 1,2,3,4,5,6,8,9,10 all at once. To install it, copy it to /usr/lib/kbd/consolefonts/ and execute "/usr/bin/setfont /usr/lib/kbd/consolefonts/LatArCyrHeb-14.psf".
A more flexible approach is given by Dmitry Yu. Bolkhovityanov <D.Yu.Bolkhovityanov@inp.nsk.su> in http://www.inp.nsk.su/~bolkhov/files/fonts/univga/index.html and http://www.inp.nsk.su/~bolkhov/files/fonts/univga/uni-vga.tgz. To work around the constraint that a VGA font can only cover 512 characters simultaneously, he provides a rich Unicode font (2279 characters, covering Latin, Greek, Cyrillic, Hebrew, Armenian, IPA, math symbols, arrows, and more) in the typical 8x16 size and a script which permits to extract any 512 characters as a console font.
If you want cut&paste to work with UTF-8 consoles, you need the patch linux-2.3.12-console.diff from Edmund Thomas Grimley Evans and Stanislav Voronyi.
In April 2000, Edmund Thomas Grimley Evans <edmundo@rano.org> has implemented an UTF-8 console terminal emulator. It uses Unicode fonts and relies on the Linux frame buffer device.
Don't hesitate to install Cyrillic, Chinese, Japanese etc. fonts. Even if they are not Unicode fonts, they will help in displaying Unicode documents: at least Netscape Communicator 4 and Java will make use of foreign fonts when available.
The following programs are useful when installing fonts:
The following fonts are freely available (not a complete list):
Applications wishing to display text belonging to different scripts (like Cyrillic and Greek) at the same time, can do so by using different X fonts for the various pieces of text. This is what Netscape Communicator and Java do. However, this approach is more complicated, because instead of working with `Font' and `XFontStruct', the programmer has to deal with `XFontSet', and also because not all fonts in the font set need to have the same dimensions.
$ gunzip unifont.hex.gz
$ hex2bdf < unifont.hex > unifont.bdf
$ bdftopcf -o unifont.pcf unifont.bdf
$ gzip -9 unifont.pcf
# cp unifont.pcf.gz /usr/X11R6/lib/X11/fonts/misc
# cd /usr/X11R6/lib/X11/fonts/misc
# mkfontdir
# xset fp rehash
$ bdftopcf -o cu12.pcf cu12.bdf
$ gzip -9 cu12.pcf
# cp cu12.pcf.gz /usr/X11R6/lib/X11/fonts/misc
# cd /usr/X11R6/lib/X11/fonts/misc
# mkfontdir
# xset fp rehash
xterm is part of X11R6 and XFree86, but is maintained separately by Tom Dickey. http://www.clark.net/pub/dickey/xterm/xterm.html Newer versions (patch level 146 and above) contain support for converting keystrokes to UTF-8 before sending them to the application running in the xterm, and for displaying Unicode characters that the application outputs as UTF-8 byte sequence. It also contains support for double-wide characters (mostly CJK ideographs) and combining characters, contributed by Robert Brady <robert@suse.co.uk>.
To get an UTF-8 xterm running, you need to:
$ cd .../ucs-fonts
$ cat quickbrown.txt
$ cat utf-8-demo.txt
You should be seeing (among others) greek and russian characters.
xterm*utf8: 1
xterm*VT100*font: -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
xterm*VT100*wideFont: -misc-fixed-medium-r-normal-ja-13-125-75-75-c-120-iso10646-1
xterm*VT100*boldFont: -misc-fixed-bold-r-semicondensed--13-120-75-75-c-60-iso10646-1
to your $HOME/.Xdefaults (for yourself only).
For CJK text processing with double-width characters, the following
settings are probably better:
xterm*VT100*font: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
xterm*VT100*wideFont: -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1
I don't recommend changing
the system-wide /usr/X11R6/lib/X11/app-defaults/XTerm, because then your
changes will be erased next time you upgrade to a new XFree86 version.
The fonts mentioned above are fixed size and not scalable. For some applications, especially printing, high resolution fonts are necessary, though. The most important type of scalable, high resolution fonts are TrueType fonts. They are currently supported by
Load "freetype"
or
Load "xtt"
to the "Module"
section of your XF86Config file.Some no-cost TrueType fonts with large Unicode coverage are
Covers Roman, Cyrillic, Greek, Hebrew, Arabic, combining diacritical marks, Chinese, Korean, Japanese, and more.
Downloadable from ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows/Cyberbit.ZIP. It is free for non-commercial purposes.
Covers Roman, Cyrillic, Greek, Hebrew, Arabic, some combining diacritical marks, Vietnamese.
Downloadable; look on a search engine for ftp-able files called
arial.ttf
, ariali.ttf
, arialbd.ttf
,
arialbi.ttf
.
Covers Roman, Cyrillic, Greek, Hebrew, combining diacritical marks.
Download: contained in IBM's JDK 1.3.0 for Linux, at
http://www.ibm.com/java/jdk/linux130/,
or directly downloadable as LucidaSansRegular.ttf
and
LucidaSansOblique.ttf
from
ftp://ftp.maths.tcd.ie/Linux/opt/IBMJava2-13/jre/lib/fonts/.
Cover Chinese (both traditional and simplified).
Download: at ftp://ftp.gnu.org/non-gnu/chinese-fonts-truetype/. These fonts are truly free.
Download locations for these and other TrueType fonts can be found at Christoph Singer's list of freely downloadable Unicode TrueType fonts http://www.ccss.de/slovo/unifonts.htm.
Truetype fonts are installed similarly to fixed size fonts, except that
they go in a separate directory, and that ttmkfdir
must be
called before mkfontdir
:
# mkdir -p /usr/X11R6/lib/X11/fonts/truetype
# cp /somewhere/Cyberbit.ttf ... /usr/X11R6/lib/X11/fonts/truetype
# cd /usr/X11R6/lib/X11/fonts/truetype
# ttmkfdir > fonts.scale
# mkfontdir
# xset fp rehash
TrueType fonts can be converted to low resolution, non-scalable X11 fonts by use of Mark Leisher's ttf2bdf utility ftp://crl.nmsu.edu/CLR/multiling/General/ttf2bdf-2.8-LINUX.tar.gz. For example, to generate a proportional Unicode font for use with cooledit:
# cd /usr/X11R6/lib/X11/fonts/local
# ttf2bdf ../truetrype/Cyberbit.ttf > cyberbit.bdf
# bdftopcf -o cyberbit.pcf cyberbit.bdf
# gzip -9 cyberbit.pcf
# mkfontdir
# xset fp rehash
More information about TrueType fonts can be found in the Linux TrueType HOWTO http://www.moisty.org/~brion/linux/TrueType-HOWTO.html.
A small program which tests whether a Linux console or xterm is in UTF-8 mode can be found in the ftp://sunsite.unc.edu/pub/Linux/system/keyboards/x-lt-1.24.tar.gz package by Ricardas Cepas, files testUTF-8.c and testUTF8.c. Most applications should not use this, however: they should look at the environment variables, see section "Locale environment variables".