Polish letters and mulitiple languages - technical information


[ In English... ] [ Po Polsku... ]


Old technology (April '96)

In the old technology language versions are distinguished by a single letter at the beginning of the extension, as shown in the table:

Document version Extension (HTML) Extension (CGI)
Single .html (none)
Polish - no polish characters .html (none)
Polish - ISO 8859-2 .ihtml .i
Polish - Win CP 1250 .whtml .w
English .ehtml .e
Note: Use of CP 1250 documents is not advised.

As you see the default version is the one without polish characters, which is not the best solution nowadays.

This allows simple implementation on most HTTP servers, eg. for Apache-Httpd you should put the following lines in srm.conf config file (or in .htaccess file in your documents' directory):

AddType text/html .ihtml
AddType text/html .whtml
AddType text/html .ehtml

Or in change the line defining the type text/html in mime.types config file so it will look for example like this:

text/html       html htm ihtml whtml ehtml

For Cern-Httpd server you should put the following lines in the config file (httpd.conf):

AddType .ihtml text/html 8bit
AddType .whtml text/html 8bit
AddType .ehtml text/html 8bit

You may use HtmlConv utility (see below) for automated creation of no polish letters and optionally CP 1250 versions of the documents based on ISO 8859-2 version in that technology.


New technology (August '96)

New technology uses Apache-Httpd server's features, namely MultiViews technology. Thanks to this it makes use of user's language preferences.

This generates the following extensions for the language versions of the documents:

Document version Extension (HTML without SSI) Extension (HTML with SSI) Extension (User's CGI)
Single .html .shtml .cgi
Polish - no polish characters .html.pn .shtml.pn .cgi.pn
Polish - ISO 8859-2 .html.po .shtml.po .cgi.po
Polish - Win CP 1250 .html.pw .shtml.pw .cgi.pw
English .html.en .shtml.en .cgi.en
Note: Use of CP 1250 documents is not advised.

For CGI scripts not recognized by extension (that is recognized by directory) you should use same extensions as in the old technology, or optionally same as for user's CGIs (recognized by extension).

In order to use this technology you should put the .htaccess file with the following contents in each document directory (those directives may also be put in server configuration files - access.conf and srm.conf):

Options All MultiViews

AddLanguage         pl .po
AddLanguage         pn .pn
AddLanguage         pw .pw

LanguagePriority    pl pn pw en

DirectoryIndex index.shtml index.html

The directive declaring .en extension is put in the config file srm.conf by default.

Proposed .htaccess file turns MultiViews on, so you can use this technology, defines language extensions used by it, sets the default language preferences and allows use of index files with Server-Side Includes.

You may use HtmlConv utility (see below) for automated creation of no polish letters and optionally CP 1250 versions of the documents based on ISO 8859-2 version in that technology.


Preparing documents under DOS

From mu own experience I know that preparing HTML documents under DOS is pretty popular. However, there's a problem there - DOS' filename length limit, or to be more specific 3 characters extension length limit.

I suggest using the following substitute extensions:

Extension - Unix Extension - DOS
.html .htm
.ihtml .iht
.whtml .wht
.ehtml .eht
.shtml .sht
.html.po .htp
.html.pn .htn
.html.pw .htw
.html.en .hte
.shtml.po .shp
.shtml.pn .shn
.shtml.pw .shw
.shtml.en .she

Additionally .htaccess file should be called _htacces under DOS. Translation of filenames from DOS versions to Unix versions are done using Reh utility (see below).

If you want to test your pages too, you may use any browser, eg. Netscape. If you use the old technology attempts to load long extension files will automatically result in loading short extension ones. If you use the new one, you will need ExtFake utility (see below) which translates filenames.

HtmlConv utility (see below) is also avalable under DOS.

Because HtmlConv utility requires source versions of the documents to use ISO 8859-2 charset, you should use editors that allow its use - I personally suggest using any text editor (eg. TvEdit, Vim) and Ogonki utility (written by Andrzej Górbiel), available on Polish Ogonek Page.


Utility programs

Below I listed some utility programs (written by me) that make creating documents compliant with one of the above technologies easier.

Each archive contains short documentation.

HtmlConv - sources
Utility for converting HTML documents using ISO 8859-2 charset to documents without polish letters and optionally with polish letters in CP 1250 charset. It changes polish letters, URLs in links, used charset declaration. May be used for documents using any of the above technologies and for documents using mixed technology. Above archive contains sources destined for Unix-like systems. Requires Coven library (see below).

HtmlConv - DOS
Utility for converting HTML documents using ISO 8859-2 charset to documents without polish letters and optionally with polish letters in CP 1250 charset. It changes polish letters, URLs in links, used charset declaration. May be used for documents using any of the above technologies and for documents using mixed technology. Above archive contains MS-DOS binaries.

Reh
Shell script (in fact a set of scripts) translating DOS version extensions to their Unix equivalents, also renames _htacces to .htaccess. May be used for both technologies. Requires sh, find, sed, mv (so it should run on any Unix-like system. Note: find use may be nonportable).

ExtFake
Resident (224 bytes) utility translating long versions of the extensions used in the new technology to their short equivalents, which saves troubles with testing documents using the new technology.

Coven library - sources
Library containing many useful functions, eg. for creating multilanguage CGI scripts (compatible with both technologies described above), polish letters conversion, making CGI scripts writing easier, etc. Above archive contains library sources and header files.


Paweł Więcek <coven@vmh.net>
All rights reserved.
This page's URL: http://www.coven.vmh.net/tech/langtech.html