[Ed: Links to web pages and/or e-mail addresses which have become inactive since the publication of this article have been enclosed in curly brackets { }. Replacement links have been provided where possible.]
As more and more people around the world discover computer-mediated communication such as e-mail and the World-Wide Web, the horizons of communication are steadily expanding. However, people who speak languages other than English often find their options limited in the digital age.On February 9, François Yergeau spoke on "Characters, Unicode, and Multilingualism on the World-Wide Web" at one of a series of NYU colloquia on computers and communication, sponsored by the ACF and several departments. Dr. Yergeau's talk focused on technologies that allow people speaking different languages to communicate using computers and networks. One stumbling block to communicating in, say, Hindi is the dominance of ASCII -- the American Standard Code for Information Interchange. It contains a limited set of characters in the Roman alphabet, roughly comparable to a basic American typewriter. Though extensions include accented letters for most Western European languages, it cannot support any other languages or alphabets. Computer users who wish to work with a different script often find themselves struggling to find the appropriate software. Two persons who wish to communicate must ensure that they are using similar software and fonts if their documents are not to show up as gibberish when they reach the other end.
The Unicode character set solves some of these difficulties. Unicode now supports 24 different character sets and will allow users to communicate using more than 34,000 individual characters. If Unicode is adopted as a standard, it will mark the advent of a Web that is truly world-wide. Though various resources on the Internet are provided in languages other than English, most require the reader to operate specific software. Alis, the company Dr. Yergeau works for, has developed a Unicode-based Web browser that can be operated in a number of languages and is capable of displaying documents written in thousands.
The Unicode standard provides a relatively clean solution for a number of tricky problems that more and more computer users are facing. These include
Since the Latin script and ASCII are generally the only options, many individuals are unable or unwilling to use computers for communication. Unicode would greatly expand the accessibility of computers and the Internet, making it relatively easy for billions of people all over the world to use the new medium. At the same time it must be acknowledged that Unicode does not remove all the hurdles we face. Unicode will permit almost anyone to view a document written in Kanji, but it will not help me find an e-mail address in Arabic. Nonetheless, we are making progress towards ensuring that the "machine that changed the world" lives up to expectations.
![]()
Posted 20 May 1996. Last reviewed 30 November 2005.
| |
|
| |