Multilingual Support of WWW Applications in Ukraine/CIS

Yuri Demchenko, Kiev Polytechnic Institute (demch@cad.polytech.kiev.ua)

Igor Sinitsyn, Volt Computer Services (a-igors@microsoft.com)

Development of Internet networking infrastructure and active creation of Internet information resources in Ukraine demand support of multiple languages in presentation of some meaningful information about cultural and historical heritage of Ukraine.

Multilingual support of WWW applications have to include the following issues

  1. Development of documents/applications on one of used languages and any codepages (or computer platforms).
  2. Placing documents on WWW and delivering its to client according to it's language and charset accepting that is realised by different server-side and client-side methods.
  3. Viewing and browsing multilingual WWW documents by clients.
  4. Interactive communication via WWW that demands multilingual support of client-side input in forms (including e-mail)..

Many of mentioned problems are solved now by popular browsers such as Netscape Navigator 3.0, MS Internet Explorer 3.0, Alis Tango 1.5 that support different document encoding. Support of different charsets and codepages of the same document is realized now by new HTTP Servers and browsers that use HTTP/1.0 "charset=" directive during client-server negotiation for choosing document encoding. But real implementation of this approach meets some problems during document retrieving, browsing and storing.

It is a common practice for Ukraine to support two (Russian and English) or three (Ukrainian, Russian and English) languages. With involving educational and humanitarian organisations into the process of creating WWW information resources usage of Ukrainian language will be increased. But wide use of Ukrainian charset for Internet exchange meets two problems - standardisation of Ukrainian codepage for different computer platforms (MS DOS, MS Windows, ISO8859, UNIX KOI8 or Macintosh) and standardisation of keyboard mapping.

Ukrainian charset encoding known as KOI8-RU and commonly used as standard "de-facto" for mail, news exchange and WWW publishing. is not officially standardised yet. It use standardised KOI8-R charset [1] as base and add three Ukrainian characters "ukr. i", "ukr. yi", "ukr. ie", "ukr. kge (with upturn)" [2, 3] that replace some pseudographics symbols not used in common practice of mail and news exchange. The official standardisation of this Ukrainian net charset is planned to be considered by working group of Committee of Standardisation of Ukraine (CSU).

In general, it is expediently to extend KOI8-R for support all three FSU Cyrillic languages - Russian, Ukrainian and Belorussian. This charset KOI8-RUB could be regarded as meta (base) encoding and other encoding types should be derivative from it. The only problem to be discussed is regarding support of full Ukrainian charset in ISO8859-5 that miss now one Ukrainian character "gapa" (strong "g").

Discussed now in CSU standard draft on Ukrainian keyboard was prepared without wide consultation with IT experts and hardware and software developers. It does not allow to use convenient switched combination of Ukrainian, Russian and English keyboards.

More efficient solution can be proposed to combine all three languages - Ukrainian (as national) , Russian (as common CIS language) and English (as Internet metalanguage), - applying recommendations of ISO/IEC CD2 14755 [4] to input some non-ASCII characters switching standard keys by depressing one of the control key (CTRL, ALT or ESC). This approach will not change commonly used in Ukraine as standard "de-facto" MS Windows Ukrainian keyboard and can be used to input all "missing" (or seldomly used) characters: "rus. io", "ukr. gapa", "belorus. short u".

This work was sufficiently pushed ahead by mutual consultation with the

members of the Microsoft international test team testing multi-language

functionality , in particular, for Ukrainian and Belorussian languages ,

in Internet products like IE 3.0 and Internet Mail and News 1.0.

Reference

[1] RFC 1489. Registration of Cyrillic Character Set. - July 1993.

[2] Shevchenko L., Rizun V., Lysenko Yu. Modern Ukrainian Language. - Kiev. - Lybid'. - 1993. - 336 pp.

[3] Nadine Kano. Developing International software for Windows 95 and Windows NT:a handbook for software design. - Microsoft Press.

[4] Second CD - ISO/IEC CD2 14755 - Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices. - http://www-rocq.inria.fr/~deschamp/www/divers/ALB-CD.html