Other Xoc managed sites:
© 2004 by Greg Reddick
.NET strings are implicitly Unicode enabled. But any place that you exit .NET to
store data, you must be sure to allocate and store Unicode rather than ASCII.
This includes databases, registry, web services, text files, etc.
Encodings & Code Pages
When doing web enabled applications, you need to encode the page using either
UTF-8 or UTF-16, and with send it with the appropriate headers. This is
controlled in .NET through the web.config file.
Use Locale Model
You need to discover what language and region the person using the program is
using. In a Windows application, this is done by several .NET framework calls.
In a web application, there is an HTTP header that has to be parsed.
Sorting & String Comparison
Comparing and sorting strings can be simple or complicated, especially when
dealing with multiple cultures. For example, where diacritical marks fall in a
sorting sequence depends on the language and culture. If your application is
dealing with a single culture, then it can be simple. If your application must
deal with multiple cultures, then you can have problems, because data must be
resorted based off language and culture info. You won't believe how many hours
the Access program managers talked about how you sort databases in different
Calendars are very complicated things. They only seem simple because we've dealt
with our calendar since we were young. To show how complicated they are, try to
solve this question in your head: What day of the week is your birthday on in
2012? What does it take to figure that out? Figuring a date in any particular
calendar isn't a problem, but if you deal with cultures that use calendars that
aren't Gregorian, such as Japanese or Taiwan, then things get somewhat more
complicated. You must allow input in the different calendars and output them
correctly. .NET has some pretty extensive classes for dealing with calendar
issues but I/O of the calendar data is still your responsibility.
Outputting dates is different in different parts of the world. The localization
settings in Windows helps, but in a Web application, you must determine what
format to output things in. You must be very careful about month/day/year
versus day/month/year confusion. Many people have been burned because 6/3/2004
is ambiguous in an i18n context. It is either June 3rd or March 6th depending
on where you are. Best practices are to output all dates in an unambiguous
manner using either ISO format, 2004-06-03, or using a spelled out month as in
June 6, 2004.
Time formatting is typically in a 12 or 24 hour clock. Same issues of isolation.
In an application that is working across the world, it is very important to
store and move around all times in UTC (Coordinated Universal Time), because
local times are ambiguous (and dates can be, too, across the International Date
Line). Converting to and from Local Time may cause more problems than it is
worth, because if end-users are talking on the phone about a display, they may
be seeing different data. Whether you can get away with having end-users deal
with UTC or whether times must be output as local is an important decision.
Currency is a big can of worms. How is input, output, storage, and
conversion handled? There is no universally base monetary unit, so all money
must be stored in a particular currency. Is storage done in one particular
currency (such as US Dollars) or multiple (Dollars, Euros, and Yen)? Is
conversion between currencies done at the moment of input, the moment of
output, or some other time (such as midnight Eastern Time)? Where is the
exchange rate data coming from, and how often is it updated? Is it important to
be especially precise--A difference of .00001 cent in the exchange rate can
make a difference if you are dealing with millions of dollars. If conversion is
done at the moment of input, the exchange rates at that moment may need to be
stored depending on the application. Input and output is also complicated. Do
you need to allow for input in multiple currencies? This requires some UI
design consideration. Output is simpler, but also has considerations: . versus
, as separators for example. .NET has parsing and formatting functions that
When outputting numbers, the localized setting may have to be addressed (.
versus ,; where - signs go, etc.) . For a Windows application, this is easy.
For a web application, the proper format must somehow be stored and retrieved
to properly output the number.
There are UI and storage considerations necessary for addresses, especially
Postal Codes. Countries don't all use the U.S. Zip Code format. If your I/O and
storage are expecting 99999-9999 format and gets a Canadian A9A 9A9 postal
code, will it choke?
Same issue as addresses. Telephone numbers vary across the world.
Paper sizes in the U.S. are typically Letter (8.5" x 11") or Legal (8.5" x 14"),
whereas the rest of the world mostly uses ISO formats such as A4. You must be
very careful when printing reports to allow for the different sizes of paper,
otherwise truncation or excessive white space may occur. Selecting paper sizes
may be important.
Units of Measurement
The U.S. largely uses English measurements, whereas most of the rest of the
world uses Metric measurements. How is I/O and storage of units accomplished?
Do you store everything in Metric, and do conversion as necessary, or do you
store the size and a separate field for unit? You don't want your spacecraft
crashing into Mars because of a screw-up here!
I/O and Display
Different languages have different requirements for word length. English is
among the tightest Roman Alphabet languages to express some word, whereas
Finnish typically requires about 50% more characters to say the same word. This
has some serious repercussions when designing the UI of a program or web page.
If you create a nice looking screen in English, it may come out truncated or
wrapped when expressed in Finnish. On the other hand, if you allow enough space
to express the UI in Finnish, it may come out with unacceptable white space in
English. In that case, multiple UI's may be necessary to account for different
word length, or the UI may have to dynamical move and resize element on the
Output of Unicode requires the proper font to be installed on the end-user's
system, otherwise the output will look like a bunch of gibberish. How are you
going to ensure that the proper font is installed, and how are you going to
keep the end-user from uninstalling it. It may be worthwhile in a Windows
application to check that the proper font is installed at startup time. In a
web application, there is not much that you can do here, unless you want to put
all the data into PDF files or GIF files.
Fortunately, Windows does all the proper handling of the input of various
keyboards and international characters.
Complex Script Awareness
Some languages write right-to-left or vertically instead of left-to-right.
Frequently Windows and .NET take the hassle of dealing with these away from the
application, but you may need different UIs to deal with these. There are many
places where .NET has a property that you can set to right-to-left to
accommodate these languages.
Isolate Localizable Resources
When you internationalize an application, you need to isolate all the strings
that need to be localized in a small number of places. This means that an
application cannot embed English strings into the application. All strings must
be stored in a resource file, XML file, database, or other common location. All
dialogs must either have their strings pulled from this repository, or
individually recreated. The people translating the program into various
languages do not want to look at code. .NET provides a number of tools, such as
internationalized resource files to help with this process. Access 1.0 shipped
within 30 days of the English version in three other languages because we took
this into account. The earliest Microsoft had ever shipped another language
edition of a program before that was six months.
Be very careful about string concatenation. Different languages use different
word order than English. For example, English uses Subject-Verb-Direct Object,
but a Maya language uses Verb-Direct Object-Subject. If you concatenate strings
together to form, say, an error message, then when the string is
internationalized, it may become gibberish. .NET provides a String.Format
method that is designed to handle this, but it must be consistently used.
Besides the other considerations above, avoid putting text into graphics. An
icon is an icon, not a billboard. If text is placed into graphics, then
those graphics must then be internationalized, as well. .NET also provides
things like multiple Tooltip controls that can be internationalized easily.
The application must be localized into at least one other language to test
whether all the interface items that should be localized have been localized,
and things that shouldn't be localized haven't. Then as development continues,
needs to be rechecked regularly. I'll give a great trick for this that
Microsoft uses: develop a tool that automatically localizes the English version
into another language. You may think this is impossible because of the
variability of different languages, but there is one language that has
consistent rules for conversion from English: Pig Latin! It also has the added
benefit that it adds two letters per word, making the word length more like
Finnish. There is a Pig Latin version of most Microsoft programs somewhere.
Testing may take considerable time.
Internationalizing a program is a non-trivial exercise, and planning to
internationalize should be done from the earliest stages of design on a
project. If done with proper planning, the cost can be small, maybe 15% more
effort. If done after the fact, it may take a total redesign from ground up to
re-do the application.