 |
 |
How is the web site designed to
make production
in various languages easier? |
|
 |
 |
|
 |
 |
 |
 |
 |
Technical aspects |
In
spite of the fact
that the most common
language used to design
web pages is HTML,
the various elements
of a web site can
have different sequences
in languages such
as XML, XHTML, Java,
JavaScript, ASP, PHP,
etc. In these
guidelines we concentrate
on HTML,
but it should be remembered
that we could use
other programming
languages, as follows:
|
 |
 |
Using
ASP
or
PHP
and
databases
makes
various
languages
available
on
a
single
page
(that
is
to
say,
an
HTML
document
is
not
necessary
for
each
language
version
of
the
same
page).
This
means
programming
calls
to
a
small,
personalised
dictionary
(for
example,
inici
=
inicio
=
home).
It
is
only
necessary
to
incorporate
a
code
such
as
<%
inici
%>
at
the
point
of
the
page
where
we
want
the
term
in
question
to
appear.
Then
the
user
will
be
shown
the
appropriate
term
according
to
the
language
he
or
she
has
chosen
to
navigate.
 |
|
|
|
| |
At the beginning,
the Internet pages
were codified with
the standard
set of characters
of ISO-8859-1 (Latin
1), which
only allowed writing
in western languages.
The navigators did
not have the necessary
support to work with
other sets of characters,
HTML did not have
labels to work with
BIDI languages (two-directional,
such as Arabic or
Hebrew), etc. For
these reasons, the
World Wide Web Consortium
worked on eliminating
the limitations of
the alphabet and to
achieve that all languages
could be present on
the Internet without
the problems of sets
of characters, two-directional
writing, etc.
The most important job was done on two basic aspects: on the one hand, developing standards for the page coding language (whether HTML, XML, XHTML, etc.) and, on the other, also to improve the protocol of transmission of HTTP data.
To accept non-Latin
characters the
standards ISO 10646
and ISTD 1
(Internet Official
Protocol Standards)
were created, specific
standards for Internet.
| The |
HTTP |
|
is the
protocol used
to transfer
the web pages
from the server
that stores
|
them to the computer that reads them, that is, the customer computer.To
access multilingual
web pages, it is essential
to respect the following
three recommendations
in order to guarantee
correct functioning
of the HTTP protocol:
| 1. |
There must
be an
indication of
the set of characters
of the page
sent from the
server computer
which contains
the page to
the user's computer
where it is
read (customer
| computer).
The coding
of the
characters
of the
page is
called
|
charset |
parameter
and |
| is indicated
in the
following
way: |
|
|
|
| |
HTML: Content-Type: text/html; charset=iso-8859-1
PHP: header("Content-type: text/html; charset=iso-8859-1");
Java: resource.setContentType ("text/html;charset=iso-8859-1"); |
| |
The character coding of the page can normally be seen on the first line of the source code.
|
| |
 |
 |

|
With
the
navigator
Mozilla:
Click
on
Visualize >
Page
information
or
press
the
key
combination
Ctrl+I.
This
will
open
the
window
Page
information,
where
you
then
select
the
tab
General.
In
the
example
we
see
the
Page
information
of
the
URL
http://www.softcatala.org/ |
|
 |
 |
|
| |
 |
| |
 |

|
 |
 |
With
the
navigator
Explorer:
Click
on
Visualisation >
Source
code.
It
will open
a text
document.
As
an
example
we
see
again
the
Page
information
of
the
URL
http://www.softcatala.org/ |
|
 |
 |
|
| |
 |
| |
| The
most usual
coding
systems
used for
web pages
are: |
 |
 |
 |
ASCII
|
|

Acronym
of American
Standard
Code for
Information
Interchange.
Form of
coding
of 7 bits
and of
8 for
extended
ASCII.
(Source:
Unicode Glossary)
 |
 |
ISO-8859-1
(ISO Latin 1)
|
|

The official
name is
ISO-8859-1,
but it
is usually
called
ISO Latin
1. It
is a form
of coding
developed
by ISO
from ASCII.
(Source:
Webopaedia.com)
 |
 |
UTF-8
|
|

Acronym
of Unicode
(or UCS)
Transformation
Format.
Form of
coding
of 8 bits.
(Source:
Unicode
Glossary)
 |
 |
UTF-16
|
|

Acronym
of Unicode
(or UCS)
Transformation
Format.
Form of
coding
of 16
bits.
(Source:
Unicode
Glossary)
 |
|
 |
 |
| |
Figure: Coding of web pages |
| |
 |
| 2. |
The
language of
the page
sent to the
server has to
be indicated
at the head
of the page.
The page language
codes are taken
from the list
of language
codes established
by ISO 639.
If the page
is in Catalan
it is indicated
in the following
way (see 3.2.2):
|
| |
HTML: atribut LANG (per exemple, <HTML LANG=ca>)
XML: atribut xml:lang (per exemple, <dc:title xml:lang="ca">)
XHTML1.0: tant lang com xml:lang (per exemple, <p xml:lang=”ca”lang=”ca”>Llengua.</p>
XHTML1.1: atribut xml:lang (per exemple, xml:lang="ca") |
| |
See also Placing of translatable text. |
| |
 |
| 3. |
The language
that the user
understands
has to be indicated.
This is called
"negotiation
of language".
If the web is
multilingual,
on the initial
page there will
be some system
of choice of
language:
|
| |
 |
 |
a dropdown list with the names of the languages in text format (this option is not advised by W3C because it can give errors in the visualisation of characters).
|
 |
flags or other graphics which symbolise the languages (neither is this option recommended because often there is no direct correspondence between the language and the flag).
|
 |
the names of the languages inserted in a graphic, such as a map (option recommended by W3C) or the names of the languages with a link to the initial page for each language. |
|
|
| |
|
| |
|
|
|
|
|
|
|
|
|
 |
|