You can now use any Unicode characters in file names. No kernel or file utilities need modifications. This is because file names in the kernel can be anything not containing a null byte, and '/' is used to delimit subdirectories. When encoded using UTF-8, non-ASCII characters will never be encoded using null bytes or slashes. All that happens is that file and directory names occupy more bytes than they contain characters. For example, a filename consisting of five greek characters will appear to the kernel as a 10-byte filename. The kernel does not know (and does not need to know) that these bytes are displayed as greek.
This is the general theory, so long as your files reside on Linux. On filesystems which are used from other operating systems, you have mount options to control conversion of filenames to or from UTF-8:
The "vfat" filesystems has a mount option "utf8". See file /usr/src/linux/Documentation/filesystems/vfat.txt. When you give an "iocharset" mount option different from the default (which is "iso8859-1"), the results with and without "utf8" are not consistent. Therefore, it is not I recommend to use the "iocharset" mount option.
The "msdos", "umsdos" filesystems have the same mount option, but appear to have no effect.
The "iso9660" filesystem has a mount option "utf8". See file /usr/src/linux/Documentation/filesystems/isofs.txt.
Since Linux 2.2.x kernels, the "ntfs" filesystem has a mount option "utf8". See file /usr/src/linux/Documentation/filesystems/ntfs.txt.
The other filesystems (nfs, smbfs, ncpfs, hpfs, etc.) don't convert filenames; therefore they support Unicode file names in UTF-8 encoding only if the other operating system supports them. Please note that to enable a mount option for all future remounts, you add it to the fourth column of the corresponding /etc/fstab line.
You should have the following environment variables set, containing locale names:
override for LC_MESSAGES
override for all other LC_* variables
individual variables for: character types and encoding, natural language messages, sorting rules, number formatting, money amount formatting, date and time display.
default value for all LC_* variables. (See `man 7 locale' for a detailed description.)
In order to tell your system and all applications that you are using UTF-8, you need to add a codeset suffix of UTF-8 to your locale names. For example, if you want to run an application in UTF-8 Hindi locale then with bash shell, you can specify which environment variable to be passed to the application.
$ LANG=hi_IN.UTF-8 xman |
export LANG=hi_IN.UTF-8 |