Vous êtes sur la page 1sur 9

http://www.linuxjournal.

com/content/internationalizing-those-bash-scripts

Username/Email: Password: Login


Register | Forgot your password?

Internationalizing Those Bash Scripts


Sep 27, 2010 By Louis Iacona (/users/louis-iacona) Like 34 people like this.

in

The first software that I was actually paid to develop


was a 2-page shell script that prompted the user for a
dozen or so pieces of information, before launching a
set of cooperating processes. Those processes formed
the core of a performance evaluation suite for the
public telephone network - a rather sizable system for
its day with high visibility.

Thinking through that assignment and the greater application, I can say with
complete certainty that none of its stakeholders were contemplating [human]
language independence - that is, how to render prompts, error messages, progress
diagnostics, etc. in a language other than US-English. Even if we had been thinking
that progressively, the level of facilitation provided by development
languages/platforms was either very limited or non-existent.

Fast forward to 2010, language independence - or Internationalization as it has come


to be known - is something that is now expected of commercial grade software. That
shell script that I had proudly written back in 1982 was one of a few application
modules that interacted directly with the user or generated progress diagnostics.
That is exactly the sort of shell script that would compel us to consider
Internationalization.

My motivation to offer up this column is grounded in a recent experience. Our team


was asked to assess the 'Internationalization readiness' of a large-scale legacy-system
- that is, identify modules that were not internationalized and needed to be, and
estimate the effort to apply all required changes. The gap was mainly found to be in
modules implemented in interpreted languages such as Rexx, TCL and the bash
shell. I found that while there seemed to be generally available documentation
around Internationalization for most programming languages used in this
application, there wasn't much to be found for shell scripting (at least nothing that
provided a "how-to" with code samples). One of the more complete online resources I
found was an appendix to a bash-scripting guide, which started out with the
following sentence "Localization is an undocumented Bash feature.". Well, at least it
offered some hope, basic information and code fragments. This column goes on to
distill what I thought was missing in a complete but summary form.

The Big Picture (in a small frame).

First, let's agree on a common vocabulary - terms that begin to lay out a framework
for the effort and code samples presented thereafter.

Message Catalog: is an indexed repository of natural language messages used


by Internationalized applications. The Message Catalog provides for the
decoupling of the [human] language content and the application code. When an
application needs to access a message at run time, something in the underlying
processing stack knows how to retrieve it based on a unique key. The format
and maintenance details of a Message Catalog is typically development-
platform specific, but the goal is always the same - decouple and centralize the
application's natural language text.
Internationalization: the term Internationalization (hereafter referred to as its
commonly known abbreviation: I18N - "I - eighteen letters - N") applies to the
steps that software designers/developers take in order to make an application
language-independent. At the coding level, user readable text is never compiled
into the application or intermixed with a markup language. Instead, the
application code refers to such content through unique message-catalog keys.
Localization: (sometimes abbreviated as "L10N") applies to the process of
adapting an application to specific target languages. If IN18 has been applied,
Localization should not involve re-coding, but rather focuses on language
translation and re-deployment. Stated another way, Localization is simply the
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

process of adding support for a new Language - translating Message Catalog


content from one language to another.
Locale: is the part of a user's environment that defines location, country and
culture information - most noticeably, the user's language preference. The
Locale is typically installed and configured as part of the underlying operating
system or rendering application such as a browser.

So let's summarize. We international, so that we can localize. I18N is a design and


coding time effort that requires developers to adhere to certain design and coding
practices with one primary goal in mind - decoupling language sensitive content
from the source code. For every language that needs to be supported, a Localization
effort is performed - creating a new Message Catalog for that language.

The good news here is that the I18N process need not start from first principals. Most
modern development languages, including Bash, offer features that facilitate the
basics - leaving the developer with the task of deciding how to integrate these basics
into the lifecycle process and the code base.

In and Out of Scope

The only soft prerequisites to getting the most out of this material is a general
understanding of I18N, (independent of programming language, as presented above)
and a basic familiarity with shell scripting.

In the grand scheme, I18N/L10N goes beyond natural language independence.


Although not the focus of this column, a Locale can include preferences that define
date/time format, currency symbols, time zones, non-working days, … which all serve
to drive aspects of processing and presentation. The process, coding and testing
examples presented here only focus on language preference. It also should be noted
that a rather simple example of Localization is presented - US English to Italian -
languages that share the same alphabet (more or less). This precludes the need to
cover details like extended character sets and the role of localized I/O devices such as
keyboards. Other deeper and broader areas of I18N can be researched for further
study through online and other resources. Here are some examples:

Unicode character
http://www.unicode.org (http://www.unicode.org)
encoding standards

Decent I18N intro to http://www.debian.org/doc/manuals/intro-i18n/


I18n (http://www.debian.org/doc/manuals/intro-i18n/)

W3C related I18N http://www.w3.org/International/ (http://www.debian.org


Material /doc/manuals/intro-i18n/)

Advanced Bash http://www.tldp.org/LDP/abs/html/ (http://www.tldp.org


Scripting Guide /LDP/abs/html/) /td>

The Moving Parts of I18N in Bash

Building on the fundamentals outlined above, let's move onto a real example. This
section demonstrate how I18N and Localization are supported and applied in a bash
environment, using a simple bash script to drive home concepts and details.

First, what sort of shell script elements are sensitive to natural language support?
Well, the short answer is anything that a human user visually reviews as part of using
an application. So that would include:

Textual prompts to the user


Error messages
Progress or error diagnostics diverted to log files or presented on a console
Help text, and other usage information and interactive documentation.

Just how does Bash facilitate I18N and Localization? We'll begin answering that
question by presenting a shell script that cannot be considered internationalized. The
short script below doesn't have much of a commercial value, but that "quality" will
allow us to focus on the task at hand - identifying and applying changes to language
sensitive areas. This script generates and displays a random number within a range
provided by the user, and logs its activity.
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

- orig-rand.sh
#!/bin/bash

function random {
typeset low=$1 high=$2
echo $(( ($RANDOM % ($high - $low) ) + $low ))
}

# (1)
echo "Hello, I can generate a random number between 2 numbers that you provide"
#(2)
echo -n "What is your low number? "
read low
#(3)
echo -n "What is your high number? "
read high

if [[ $low -ge $high ]]


then
#(4)
echo "1st number should be lower than the second - leaving early." >&2
exit 1
fi

rand=$(random $low $high )


#(5)
echo "from/to generated (by/at): $low / $high $rand (${LOGNAME} / $(date))" >> /tmp/POC
#(6)
echo "Your Random Number Is: $rand "

exit 0

Running the script produces the expected output.

$: orig-rand.sh
Hello, I can generate a random number between 2 numbers that you provide
What is your low number? 50
What is your high number? 125
Your Random Number Is: 95
$:

Commented lines (1) through (6) have been flagged as requiring change - as they
contain natural language. With this content identified, we can move onto creating a
Message Catalog that can be used by an altered, internationalized script. To introduce
the format, here's an example Message Catalog. It contains 2 messages - a greeting
and an error message. The general format of the file consists of key/value line pairs.
The "msgid" portion naming a key, and the "msgstr" portion associating a natural
language value. Each Message Catalog supports exactly one language - in this case,
US-English.

File: en.po

msgid "Main Greeting"


msgstr "Welcome, what do you want to do today?"
msgid "Missing File Error"
msgstr "File Not Found"

Message Catalogs like this can be constructed manually, post processed and installed
in the environment to support one or more application. (These Message Catalogs
reside in files that are otherwise referred to as Portable Object files, and by
convention, are named with a .po suffix).

Now let's construct a Message Catalog to maintain the user viewable content found in
the example script above. Notice there are 6 distinct messages that line up with the
content that was embedded in the original script.

File: en.po
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

msgid "Greeting"
msgstr "Hello, I can generate a random number between 2 numbers that you provide"
msgid "Low Number Prompt"
msgstr "What is your low number"
msgid "High Number Prompt"
msgstr "What is your high number"
msgid "Input Error"
msgstr "1st number should be lower than the second - leaving early."
msgid "Result Title"
msgstr "Your Random Number Is: "
msgid "Activity Log"
msgstr "from/to generated (by/at): "

Okay, at least as far as the Message Catalog is concerned, we now have US English
content covered. Now let's assemble one for another language - Italian.

File: it.po

msgid "Greeting"
msgstr "Ciao, posso generare un numero casuale fra il numero 2 che assicurate"
msgid "Low Number Prompt"
msgstr "Che cosa il vostro numero basso"
msgid "High Number Prompt"
msgstr "Che cosa il vostro alto numero"
msgid "Input Error"
msgstr "il primo numero dovrebbe essere pi basso del secondo - andando presto."
msgid "Result Title"
msgstr "Il vostro numero casuale :"
msgid "Activity Log"
msgstr "da/al generato a (da/a):"

Notice that the "msgid" values are constant and have not changed. They will be used
by a modified script - an internationalized script. Now that the language catalogs
exist, what needs to be done to make them accessible by Internationalized scripts?
Linux provides a utility called "msgfmt" that creates 'message object files' (*.mo)
from portable object files (*.po), without changing the portable object files. Refer to
the installed or online manual page for complete command line usage details.
Executing the following commands will generate and install the message object files
for both US-English and Italian.

msgfmt -o rand.sh.mo it.po


cp -p rand.sh.mo $HOME/locale/it/LC_MESSAGES/
msgfmt -o rand.sh.mo en.po
cp -p rand.sh.mo $HOME/locale/en/LC_MESSAGES/

Now that the Message Catalogs for two languages are installed, how can a bash script
leverage them? The other Linux utility critical to our example is called "gettext".

Given a directory and file naming organization for the Message Catalogs, gettext
provides access to the messages stored in the catalog. First, depicting how Message
Catalogs must be stored on the file system, see the listing below. For each 2 letter
language code ('en' and 'it' in our example), some number of "text domain" message
object files are stored under a subdirectory called LC_MESSAGES. By convention, a
text domain is related to a single application, but this is an organizational decision to
be made when localizing.

Directory/file listing:

en
en/LC_MESSAGES
en/LC_MESSAGES/rand.sh.mo
it
it/LC_MESSAGES
it/LC_MESSAGES/rand.sh.mo

As shown above, we chose to install the Message Catalogs under the user's HOME
directory under a subdirectory called locale. System Message Catalogs that get
distributed with Linux are normally found under /usr/lib/locale. Here's what some
of the directory listing looks like on my distribution:
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

aa_DJ
aa_DJ/LC_MESSAGES
aa_DJ.utf8
aa_DJ.utf8/LC_MESSAGES
aa_ER
aa_ER/LC_MESSAGES
aa_ER@saaho
... many others not shown

Retrieving a message stored in a Message catalog is very straightforward - the


following 2 lines demonstrate basic access. See installed or online manual page for
complete command line usage. Setting the environment variable TEXTDOMAINDIR
to the base of the Message Catalog directory is required.

$: export TEXTDOMAINDIR=/home/lji/locale
$: gettext -s "Greeting"
Hello, I can generate a random number between 2 numbers that you provide
$:

Notice that the invocation above compelled the 'gettext' utility to present the
US-English copy of the message. This was driven by the language preference value
assigned to the user's Locale. Without elaborating on the details, the 'locale' Linux
utility displays the following values. Of course, the first value drives language
preference.

$: locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
?? other values not shown.
$:

So if you're following along, the next natural question to ask is how to alter language
preference. How can we test access to our Italian Message Catalog? Once again,
without elaborating of the details, setting the environment variable LC_ALL to a
value that includes language and country codes, will reset every Locale attribute.
Notice the updated output from the 'locale' utility after Italian/Italy (it/IT) has been
assigned as the language/country.

$: export LC_ALL="it_IT.UTF-8"
$: locale
LANG=it_IT.UTF-8
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
?? other values not shown.
$:

Now if the same 'gettext' command is executed, we would expect to display the
equivalent Italian content, and we do as shown below.

$: gettext -s "Greeting"
Ciao, posso generare un numero casuale fra il numero 2 che assicurate
$:

So if the 'msgfmt' and 'gettext' utilities are the core of basic I18N and Localization in
the bash shell, what's the best way of internationalizing the original example script
and other scripts like it? The first step I took was to build a thin convenience library,
which offers 4 useful functions. I chose this general approach for two reasons: it
insolates the lowest level details from the application code, and promotes code reuse
by offering developers a straightforward way of dealing these common natural-
language sensitive operations:

displaying text to standard output


displaying an error message
prompting a user for a response
logging a message to a file

The library code below sets the TEXTDOMAINDIR environment variable and
implements 4 functions.

Source code for i18n-lib.sh


http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

#!/bin/bash
##
# Thin library around basic I18N facilitated function
# basic text display, file logging, error display, and prompting
export TEXTDOMAINDIR=/home/lji/locale

###############################################
##
## Display some text to stderr
## $1 is assumed to be the Message Catalog key
function i18n_error {
echo "$(gettext -s "$1")" >&2
}

###############################################
##
## Display some text to sdtout
## $1 is assumed to be the Message Catalog key
## rest of args are used as misc information
function i18n_display {
typeset key="$1"
shift
echo "$(gettext -s "$key") $@"
}

###############################################
## Append a log message to a file.
## use $1 as target file to append to
## use $2 as catalog key
## rest of args are used as misc information
function i18n_fileout {
[[ $# -lt 2 ]] && return 1
typeset file="$1"
typeset key="$2"
shift 2
echo "$(gettext -s "$key") $@" >> ${file}
}

## Prompt the user with a message and echo back the response.
## $1 is assumed to be the Message Catalog key
function i18n_prompt {
typeset rv
[[ $# -lt 1 ]] && return 1
read -p "$(gettext "$1"): " rv
echo $rv
}

So how can we transform the original sample script to leverage this library - that is,
internationalize it? See the re-implemented script below. There are 4 noticeable
changes:

1. The TEXTDOMAIN environment variable is set to the base application value


2. Our I18N library file is sourced in.
3. The user is given the opportunity to select Italian as the preferred language.
4. All "echo" statements that directed natural-language content were replaced by
calls to functions offered by the I18N library.

File: i18n-rand.sh
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

#!/bin/bash
##
# POC around i18n/Localization in a bash script
#(1)
export TEXTDOMAIN=rand.sh
I18NLIB=i18n-lib.sh
#(2)
# source in I18N library - shown above
if [[ -f $I18NLIB ]]
then
. $I18NLIB
else
echo "ERROR - $I18NLIB NOT FOUND"
exit 1
fi

## Start of example script


function random {
typeset low=$1 high=$2
echo $(( ($RANDOM % ($high - $low) ) + $low ))
}
#(3)
## ALLOW USER TO SET LANG PREFERENCE
## assume lang and country code follows
if [[ "$1" = "-lang" ]]
then
export LC_ALL="$2_$3.UTF-8"
fi

#(4)
# Display initial greeting
i18n_display "Greeting"
# ask for input
low=$(i18n_prompt "Low Number Prompt" )
high=$(i18n_prompt "High Number Prompt" )
# check for error condition and display error if found
if [[ $low -ge $high ]]
then
i18n_error "Input Error"
exit 1
fi
rand=$(random $low $high )
# Log what was just done
i18n_fileout "/tmp/POC" "Activity Log" "$low / $high $rand (${LOGNAME} / $(date))"
# Display Results
i18n_display "Result Title" $rand
exit 0

Now we can prove that it all works. Two test runs appear below - one using the
English content and the other the Italian content.

$: i18n-rand.sh
Hello, I can generate a random number between 2 numbers that you provide
What is your low number? 100
What is your high number? 1000
Your Random Number Is: 615
## now specify Italian as language preference
$: i18n-rand.sh -lang it IT
Ciao, posso generare un numero casuale fra il numero 2 che assicurate
Che cosa il vostro numero basso? 500
Che cosa il vostro alto numero? 1000
Il vostro numero casuale : 601
$:

The content of the log file is as expected. Notice, that this script was not the only
processing affected by changing the Locale. The output of the 'date' command shows
the Italian abbreviation of Sunday (dom) and June (giu). Yes, Linux and all of its
utilities are to be considered internationalized.

from/to generated (by/at): 50 / 125 95 (lji / Sun Jun 10 12:57:38 EDT 2010)
from/to generated (by/at): 100 / 1000 615 (lji / Sun Jun 10 12:57:59 EDT 2010)
da/al generato a (da/a): 500 / 1000 601 (lji / dom giu 10 12:58:48 EDT 2010)
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

Summary/Conclusions

Just as information exchange standards such as XML allow systems to be more


interoperable, at its core, I18N allows applications to be more usable - by a broader,
more global user base. I'm not suggesting that every trivial shell script necessarily
warrants I18N, but because all commercial software is potentially a global
commodity, language independence is something that needs to be considered - and
considered early in the design/development process. The lack of such planning would
be quite shortsighted in 2010. As with all core application services, I18N is much less
expensive (overwhelmingly so) to address at the outset of a project rather than to
shoehorn in a solution deep into a product lifecycle.

Every modern development language supports I18N / Localization in a unique way.


But whether your application is a major web site or a 2-page shell script, the same
general concepts always apply. Optimally, architects and designers set the tone by
providing a convenient way for developers to leverage the existing I18N and
Localization tools/APIs. Lead developers can and should implement a thin
convenience wrapper around the low level details of obtaining content from a
Message Catalog. Offering functionality at this level goes a long way to encourage
developers to apply a common solution across all applications and prevent code bloat.

It may be sparsely documented, but there is real support in Linux and its bash shell
for creating and using Message Catalogs. As a relatively small part of large-scale
applications, shell scripts that present a textual interface, or control progress and
error logging, are often forgotten in a sea of browser accessible content. It's just easy
to forget the shell scripts. My hope is that the minor investment in time and effort
put into assembling this material can be leveraged on development efforts that
include shell scripts.

Miscellaneous Notes

These code samples used here were built and tested on a Suse Linux 10.
the google translator (http://www.google.com/translate_t (http://www.google.com
/translate_t) ) was used to translate the base English Message Catalog into Italian,
so they may not be the most appropriate, in-context translations. More often
than not, language translation for Locaization is performed by a human
translator that's familiar with the application and its customer base.

Photo Credit: © asharkyu/Shutterstock (http://shutterstock.com)

______________________

Louis Iacona has been designing and developing software since 1982 on
UNIX/Linux and other platforms. Most recently, his efforts have focused
on Java/J2EE constructed solutions for enterprise-scoped applications.
Louis is currently on assignment at Je

Comments

Comment viewing options


Threaded list - expanded Date - newest first 50 comments per page Save settings
Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks for this info (/content/internationalizing-those-bash-scripts#comment-356341)


http://www.linuxjournal.com/content/internationalizing-those-bash-scripts

Submitted by MortenB (not verified) on Sep 27, 2010.

Nice article, but it only scratches the surface. utf-8 is a nightmare.

I have serious problems getting utf-8 characters from bash and perl script on a
redHat5 server into oracle,DB2 and mysql. Even though you use the same
character encoding in the databases they are converted differently.

The ugly solution is to use blobs and store binary data. We usually just turns off
utf-8 via /etc/sysconfig/i18n so it just do not confuses the 8bit ascii characters.

Post new comment


Please note that comments may not appear immediately, so there is no need to repost your
comment.

Your name:

Anonymous

E-mail:

The content of this field is kept private and will not be shown publicly.

Homepage:

Subject:

Comment: *

Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre><tt> <ul> <ol> <li> <dl> <dt> <dd>
<i> <b><blockquote>
Lines and paragraphs break automatically.
Web page addresses and e-mail addresses turn into links automatically.

Notify me when new comments are posted

All comments Replies to my comment

Preview

Vous aimerez peut-être aussi