Multilingualism in company webs
Website of the Generalitat, Government of Catalonia Contact
Home: Are you a webmaster? How is the web site designed to make production in various languages easier?
How is the web site designed to make production
in various languages easier?
Technical aspects
Aspects relating to the textual content
Aspects relating to the graphic content
Technical aspects
In spite of the fact that the most common language used to design web pages is HTML, the various elements of a web site can have different sequences in languages such as XML, XHTML, Java, JavaScript, ASP, PHP, etc. In these guidelines we concentrate on HTML, but it should be remembered that we could use other programming languages, as follows:

Recomanació
Using ASP or PHP and databases makes various languages available on a single page (that is to say, an HTML document is not necessary for each language version of the same page). This means programming calls to a small, personalised dictionary (for example, inici = inicio = home). It is only necessary to incorporate a code such as <% inici %> at the point of the page where we want the term in question to appear. Then the user will be shown the appropriate term according to the language he or she has chosen to navigate.
 

At the beginning, the Internet pages were codified with the standard set of characters of ISO-8859-1 (Latin 1), which only allowed writing in western languages. The navigators did not have the necessary support to work with other sets of characters, HTML did not have labels to work with BIDI languages (two-directional, such as Arabic or Hebrew), etc. For these reasons, the World Wide Web Consortium worked on eliminating the limitations of the alphabet and to achieve that all languages could be present on the Internet without the problems of sets of characters, two-directional writing, etc.

The most important job was done on two basic aspects: on the one hand, developing standards for the page coding language (whether HTML, XML, XHTML, etc.) and, on the other, also to improve the protocol of transmission of HTTP data.

To accept non-Latin characters the standards ISO 10646 and ISTD 1 (Internet Official Protocol Standards) were created, specific standards for Internet.

The  HTTP  is the protocol used to transfer the web pages from the server that stores
them to the computer that reads them, that is, the customer computer.To access multilingual web pages, it is essential to respect the following three recommendations in order to guarantee correct functioning of the HTTP protocol:

1. There must be an indication of the set of characters of the page sent from the server computer which contains the page to the user's computer where it is read (customer
computer). The coding of the characters of the page is called  charset parameter and
is indicated in the following way:  

  HTML: Content-Type: text/html; charset=iso-8859-1
PHP: header("Content-type: text/html; charset=iso-8859-1");
Java: resource.setContentType ("text/html;charset=iso-8859-1");
 
The character coding of the page can normally be seen on the first line of the source code.

 
enlarge image

With the navigator Mozilla:
Click on Visualize > Page information or press the key combination Ctrl+I. This will open the window Page information, where you then select the tab General.

In the example we see the Page information of the URL
http://www.softcatala.org/
Softcatalà
 
 
enlarge image
With the navigator Explorer:
Click on Visualisation > Source code. It will open a text document.

As an example we see again the Page information of the URL
http://www.softcatala.org/
Softcatalà
 
 
The most usual coding systems used for web pages are:
   ASCII
 
Acronym of American Standard Code for Information Interchange. Form of coding of 7 bits and of 8 for extended ASCII. (Source: Unicode Glossary)
<buit>
   ISO-8859-1
   (ISO Latin 1)
 
The official name is ISO-8859-1, but it is usually called ISO Latin 1. It is a form of coding developed by ISO from ASCII. (Source: Webopaedia.com)
<buit>
   UTF-8
 
Acronym of Unicode (or UCS) Transformation Format. Form of coding of 8 bits. (Source: Unicode Glossary)
<buit>
   UTF-16
 
Acronym of Unicode (or UCS) Transformation Format. Form of coding of 16 bits. (Source: Unicode Glossary)
  Figure: Coding of web pages
 
2. The language of the page sent to the server has to be indicated at the head of the page. The page language codes are taken from the list of language codes established by ISO 639. If the page is in Catalan it is indicated in the following way (see 3.2.2):

  HTML: atribut LANG (per exemple, <HTML LANG=ca>)

XML: atribut xml:lang (per exemple, <dc:title xml:lang="ca">)

XHTML1.0: tant lang com xml:lang (per exemple, <p xml:lang=”ca”lang=”ca”>Llengua.</p>

XHTML1.1: atribut xml:lang (per exemple, xml:lang="ca")
 
See also Placing of translatable text.
 
3. The language that the user understands has to be indicated. This is called "negotiation of language". If the web is multilingual, on the initial page there will be some system of choice of language:

 
a dropdown list with the names of the languages in text format (this option is not advised by W3C because it can give errors in the visualisation of characters).
flags or other graphics which symbolise the languages (neither is this option recommended because often there is no direct correspondence between the language and the flag).
the names of the languages inserted in a graphic, such as a map (option recommended by W3C) or the names of the languages with a link to the initial page for each language.
   
 
enlarge image
In the example we see the Page information of the URL
http://www.visitbritain.com
Visitbritain


Other technical recommendations for internationalisation
   
 
Content
1. Introduction
2. What do you want to translate?
3. How is the web site designed to make production in various languages easier?
3.1. Technical aspects
3.2. Textual content
3.3. Graphic content
4. The localization phase
4.1. Preparation
4.2. Realisation
5. After localization: the checking phase
5.1. Functional
5.2. Visual
6. Style guide
> Bibliography
HTTPIcona per pujar
You can find more information on questions of HTTP relating to internationalisation at http://www.w3.org/International/O-HTTP.html.
 
CharsetIcona per pujar
You can find more information on the charset parameter at
http://www.w3.org/International/O-HTTP-charset.html.
 
Unicode Glossary Icona per pujar
For more information on the Unicode glossary you can visit
http://www.unicode.org/glossary/.
Next
         
© Generalitat de Catalunya
Legal notice
E-mail