Things to Consider when Internationalizing an Application

Previous Topic Previous Next Topic Next
Xoc Software

Other Xoc managed sites:

2004 by Greg Reddick

  • Data Encoding

    • Unicode Enabled

      .NET strings are implicitly Unicode enabled. But any place that you exit .NET to store data, you must be sure to allocate and store Unicode rather than ASCII. This includes databases, registry, web services, text files, etc.

    • Encodings & Code Pages

      When doing web enabled applications, you need to encode the page using either UTF-8 or UTF-16, and with send it with the appropriate headers. This is controlled in .NET through the web.config file.

  • Locale/Cultural Awareness

    • Use Locale Model

      You need to discover what language and region the person using the program is using. In a Windows application, this is done by several .NET framework calls. In a web application, there is an HTTP header that has to be parsed.

    • Sorting & String Comparison

      Comparing and sorting strings can be simple or complicated, especially when dealing with multiple cultures. For example, where diacritical marks fall in a sorting sequence depends on the language and culture. If your application is dealing with a single culture, then it can be simple. If your application must deal with multiple cultures, then you can have problems, because data must be resorted based off language and culture info. You won't believe how many hours the Access program managers talked about how you sort databases in different cultures!

    • Calendar Differences

      Calendars are very complicated things. They only seem simple because we've dealt with our calendar since we were young. To show how complicated they are, try to solve this question in your head: What day of the week is your birthday on in 2012? What does it take to figure that out? Figuring a date in any particular calendar isn't a problem, but if you deal with cultures that use calendars that aren't Gregorian, such as Japanese or Taiwan, then things get somewhat more complicated. You must allow input in the different calendars and output them correctly. .NET has some pretty extensive classes for dealing with calendar issues but I/O of the calendar data is still your responsibility.

    • Date Formatting

      Outputting dates is different in different parts of the world. The localization settings in Windows helps, but in a Web application, you must determine what format to output things in. You must be very careful about month/day/year versus day/month/year confusion. Many people have been burned because 6/3/2004 is ambiguous in an i18n context. It is either June 3rd or March 6th depending on where you are. Best practices are to output all dates in an unambiguous manner using either ISO format, 2004-06-03, or using a spelled out month as in June 6, 2004.

    • Time Formatting

      Time formatting is typically in a 12 or 24 hour clock. Same issues of isolation. In an application that is working across the world, it is very important to store and move around all times in UTC (Coordinated Universal Time), because local times are ambiguous (and dates can be, too, across the International Date Line). Converting to and from Local Time may cause more problems than it is worth, because if end-users are talking on the phone about a display, they may be seeing different data. Whether you can get away with having end-users deal with UTC or whether times must be output as local is an important decision.

    • Currency

      Currency is a big can of worms. How is input, output, storage, and conversion handled? There is no universally base monetary unit, so all money must be stored in a particular currency. Is storage done in one particular currency (such as US Dollars) or multiple (Dollars, Euros, and Yen)? Is conversion between currencies done at the moment of input, the moment of output, or some other time (such as midnight Eastern Time)? Where is the exchange rate data coming from, and how often is it updated? Is it important to be especially precise--A difference of .00001 cent in the exchange rate can make a difference if you are dealing with millions of dollars. If conversion is done at the moment of input, the exchange rates at that moment may need to be stored depending on the application. Input and output is also complicated. Do you need to allow for input in multiple currencies? This requires some UI design consideration. Output is simpler, but also has considerations: . versus , as separators for example. .NET has parsing and formatting functions that helps here.

    • Number Formatting

      When outputting numbers, the localized setting may have to be addressed (. versus ,; where - signs go, etc.) . For a Windows application, this is easy. For a web application, the proper format must somehow be stored and retrieved to properly output the number.

    • Addresses

      There are UI and storage considerations necessary for addresses, especially Postal Codes. Countries don't all use the U.S. Zip Code format. If your I/O and storage are expecting 99999-9999 format and gets a Canadian A9A 9A9 postal code, will it choke?

    • Telephone Numbers

      Same issue as addresses. Telephone numbers vary across the world.

    • Paper Size

      Paper sizes in the U.S. are typically Letter (8.5" x 11") or Legal (8.5" x 14"), whereas the rest of the world mostly uses ISO formats such as A4. You must be very careful when printing reports to allow for the different sizes of paper, otherwise truncation or excessive white space may occur. Selecting paper sizes may be important.

    • Units of Measurement

      The U.S. largely uses English measurements, whereas most of the rest of the world uses Metric measurements. How is I/O and storage of units accomplished? Do you store everything in Metric, and do conversion as necessary, or do you store the size and a separate field for unit? You don't want your spacecraft crashing into Mars because of a screw-up here!

  • I/O and Display

    • Word Length

      Different languages have different requirements for word length. English is among the tightest Roman Alphabet languages to express some word, whereas Finnish typically requires about 50% more characters to say the same word. This has some serious repercussions when designing the UI of a program or web page. If you create a nice looking screen in English, it may come out truncated or wrapped when expressed in Finnish. On the other hand, if you allow enough space to express the UI in Finnish, it may come out with unacceptable white space in English. In that case, multiple UI's may be necessary to account for different word length, or the UI may have to dynamical move and resize element on the dialog.

    • Fonts

      Output of Unicode requires the proper font to be installed on the end-user's system, otherwise the output will look like a bunch of gibberish. How are you going to ensure that the proper font is installed, and how are you going to keep the end-user from uninstalling it. It may be worthwhile in a Windows application to check that the proper font is installed at startup time. In a web application, there is not much that you can do here, unless you want to put all the data into PDF files or GIF files.

    • Keyboards

      Fortunately, Windows does all the proper handling of the input of various keyboards and international characters.

    • Complex Script Awareness

      Some languages write right-to-left or vertically instead of left-to-right. Frequently Windows and .NET take the hassle of dealing with these away from the application, but you may need different UIs to deal with these. There are many places where .NET has a property that you can set to right-to-left to accommodate these languages.

  • Localizability

    • Isolate Localizable Resources

      When you internationalize an application, you need to isolate all the strings that need to be localized in a small number of places. This means that an application cannot embed English strings into the application. All strings must be stored in a resource file, XML file, database, or other common location. All dialogs must either have their strings pulled from this repository, or individually recreated. The people translating the program into various languages do not want to look at code. .NET provides a number of tools, such as internationalized resource files to help with this process. Access 1.0 shipped within 30 days of the English version in three other languages because we took this into account. The earliest Microsoft had ever shipped another language edition of a program before that was six months.

    • String Handling

      Be very careful about string concatenation. Different languages use different word order than English. For example, English uses Subject-Verb-Direct Object, but a Maya language uses Verb-Direct Object-Subject. If you concatenate strings together to form, say, an error message, then when the string is internationalized, it may become gibberish. .NET provides a String.Format method that is designed to handle this, but it must be consistently used.

    • UI Considerations

      Besides the other considerations above, avoid putting text into graphics. An icon is an icon, not a billboard. If text is placed into graphics, then those graphics must then be internationalized, as well. .NET also provides things like multiple Tooltip controls that can be internationalized easily.

  • Internationalized Testing

    The application must be localized into at least one other language to test whether all the interface items that should be localized have been localized, and things that shouldn't be localized haven't. Then as development continues, needs to be rechecked regularly. I'll give a great trick for this that Microsoft uses: develop a tool that automatically localizes the English version into another language. You may think this is impossible because of the variability of different languages, but there is one language that has consistent rules for conversion from English: Pig Latin! It also has the added benefit that it adds two letters per word, making the word length more like Finnish. There is a Pig Latin version of most Microsoft programs somewhere. Testing may take considerable time.

Internationalizing a program is a non-trivial exercise, and planning to internationalize should be done from the earliest stages of design on a project. If done with proper planning, the cost can be small, maybe 15% more effort. If done after the fact, it may take a total redesign from ground up to re-do the application.