Académique Documents
Professionnel Documents
Culture Documents
com/content/internationalizing-those-bash-scripts
in
Thinking through that assignment and the greater application, I can say with
complete certainty that none of its stakeholders were contemplating [human]
language independence - that is, how to render prompts, error messages, progress
diagnostics, etc. in a language other than US-English. Even if we had been thinking
that progressively, the level of facilitation provided by development
languages/platforms was either very limited or non-existent.
First, let's agree on a common vocabulary - terms that begin to lay out a framework
for the effort and code samples presented thereafter.
The good news here is that the I18N process need not start from first principals. Most
modern development languages, including Bash, offer features that facilitate the
basics - leaving the developer with the task of deciding how to integrate these basics
into the lifecycle process and the code base.
The only soft prerequisites to getting the most out of this material is a general
understanding of I18N, (independent of programming language, as presented above)
and a basic familiarity with shell scripting.
Unicode character
http://www.unicode.org (http://www.unicode.org)
encoding standards
Building on the fundamentals outlined above, let's move onto a real example. This
section demonstrate how I18N and Localization are supported and applied in a bash
environment, using a simple bash script to drive home concepts and details.
First, what sort of shell script elements are sensitive to natural language support?
Well, the short answer is anything that a human user visually reviews as part of using
an application. So that would include:
Just how does Bash facilitate I18N and Localization? We'll begin answering that
question by presenting a shell script that cannot be considered internationalized. The
short script below doesn't have much of a commercial value, but that "quality" will
allow us to focus on the task at hand - identifying and applying changes to language
sensitive areas. This script generates and displays a random number within a range
provided by the user, and logs its activity.
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts
- orig-rand.sh
#!/bin/bash
function random {
typeset low=$1 high=$2
echo $(( ($RANDOM % ($high - $low) ) + $low ))
}
# (1)
echo "Hello, I can generate a random number between 2 numbers that you provide"
#(2)
echo -n "What is your low number? "
read low
#(3)
echo -n "What is your high number? "
read high
exit 0
$: orig-rand.sh
Hello, I can generate a random number between 2 numbers that you provide
What is your low number? 50
What is your high number? 125
Your Random Number Is: 95
$:
Commented lines (1) through (6) have been flagged as requiring change - as they
contain natural language. With this content identified, we can move onto creating a
Message Catalog that can be used by an altered, internationalized script. To introduce
the format, here's an example Message Catalog. It contains 2 messages - a greeting
and an error message. The general format of the file consists of key/value line pairs.
The "msgid" portion naming a key, and the "msgstr" portion associating a natural
language value. Each Message Catalog supports exactly one language - in this case,
US-English.
File: en.po
Message Catalogs like this can be constructed manually, post processed and installed
in the environment to support one or more application. (These Message Catalogs
reside in files that are otherwise referred to as Portable Object files, and by
convention, are named with a .po suffix).
Now let's construct a Message Catalog to maintain the user viewable content found in
the example script above. Notice there are 6 distinct messages that line up with the
content that was embedded in the original script.
File: en.po
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts
msgid "Greeting"
msgstr "Hello, I can generate a random number between 2 numbers that you provide"
msgid "Low Number Prompt"
msgstr "What is your low number"
msgid "High Number Prompt"
msgstr "What is your high number"
msgid "Input Error"
msgstr "1st number should be lower than the second - leaving early."
msgid "Result Title"
msgstr "Your Random Number Is: "
msgid "Activity Log"
msgstr "from/to generated (by/at): "
Okay, at least as far as the Message Catalog is concerned, we now have US English
content covered. Now let's assemble one for another language - Italian.
File: it.po
msgid "Greeting"
msgstr "Ciao, posso generare un numero casuale fra il numero 2 che assicurate"
msgid "Low Number Prompt"
msgstr "Che cosa il vostro numero basso"
msgid "High Number Prompt"
msgstr "Che cosa il vostro alto numero"
msgid "Input Error"
msgstr "il primo numero dovrebbe essere pi basso del secondo - andando presto."
msgid "Result Title"
msgstr "Il vostro numero casuale :"
msgid "Activity Log"
msgstr "da/al generato a (da/a):"
Notice that the "msgid" values are constant and have not changed. They will be used
by a modified script - an internationalized script. Now that the language catalogs
exist, what needs to be done to make them accessible by Internationalized scripts?
Linux provides a utility called "msgfmt" that creates 'message object files' (*.mo)
from portable object files (*.po), without changing the portable object files. Refer to
the installed or online manual page for complete command line usage details.
Executing the following commands will generate and install the message object files
for both US-English and Italian.
Now that the Message Catalogs for two languages are installed, how can a bash script
leverage them? The other Linux utility critical to our example is called "gettext".
Given a directory and file naming organization for the Message Catalogs, gettext
provides access to the messages stored in the catalog. First, depicting how Message
Catalogs must be stored on the file system, see the listing below. For each 2 letter
language code ('en' and 'it' in our example), some number of "text domain" message
object files are stored under a subdirectory called LC_MESSAGES. By convention, a
text domain is related to a single application, but this is an organizational decision to
be made when localizing.
Directory/file listing:
en
en/LC_MESSAGES
en/LC_MESSAGES/rand.sh.mo
it
it/LC_MESSAGES
it/LC_MESSAGES/rand.sh.mo
As shown above, we chose to install the Message Catalogs under the user's HOME
directory under a subdirectory called locale. System Message Catalogs that get
distributed with Linux are normally found under /usr/lib/locale. Here's what some
of the directory listing looks like on my distribution:
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts
aa_DJ
aa_DJ/LC_MESSAGES
aa_DJ.utf8
aa_DJ.utf8/LC_MESSAGES
aa_ER
aa_ER/LC_MESSAGES
aa_ER@saaho
... many others not shown
$: export TEXTDOMAINDIR=/home/lji/locale
$: gettext -s "Greeting"
Hello, I can generate a random number between 2 numbers that you provide
$:
Notice that the invocation above compelled the 'gettext' utility to present the
US-English copy of the message. This was driven by the language preference value
assigned to the user's Locale. Without elaborating on the details, the 'locale' Linux
utility displays the following values. Of course, the first value drives language
preference.
$: locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
?? other values not shown.
$:
So if you're following along, the next natural question to ask is how to alter language
preference. How can we test access to our Italian Message Catalog? Once again,
without elaborating of the details, setting the environment variable LC_ALL to a
value that includes language and country codes, will reset every Locale attribute.
Notice the updated output from the 'locale' utility after Italian/Italy (it/IT) has been
assigned as the language/country.
$: export LC_ALL="it_IT.UTF-8"
$: locale
LANG=it_IT.UTF-8
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
?? other values not shown.
$:
Now if the same 'gettext' command is executed, we would expect to display the
equivalent Italian content, and we do as shown below.
$: gettext -s "Greeting"
Ciao, posso generare un numero casuale fra il numero 2 che assicurate
$:
So if the 'msgfmt' and 'gettext' utilities are the core of basic I18N and Localization in
the bash shell, what's the best way of internationalizing the original example script
and other scripts like it? The first step I took was to build a thin convenience library,
which offers 4 useful functions. I chose this general approach for two reasons: it
insolates the lowest level details from the application code, and promotes code reuse
by offering developers a straightforward way of dealing these common natural-
language sensitive operations:
The library code below sets the TEXTDOMAINDIR environment variable and
implements 4 functions.
#!/bin/bash
##
# Thin library around basic I18N facilitated function
# basic text display, file logging, error display, and prompting
export TEXTDOMAINDIR=/home/lji/locale
###############################################
##
## Display some text to stderr
## $1 is assumed to be the Message Catalog key
function i18n_error {
echo "$(gettext -s "$1")" >&2
}
###############################################
##
## Display some text to sdtout
## $1 is assumed to be the Message Catalog key
## rest of args are used as misc information
function i18n_display {
typeset key="$1"
shift
echo "$(gettext -s "$key") $@"
}
###############################################
## Append a log message to a file.
## use $1 as target file to append to
## use $2 as catalog key
## rest of args are used as misc information
function i18n_fileout {
[[ $# -lt 2 ]] && return 1
typeset file="$1"
typeset key="$2"
shift 2
echo "$(gettext -s "$key") $@" >> ${file}
}
## Prompt the user with a message and echo back the response.
## $1 is assumed to be the Message Catalog key
function i18n_prompt {
typeset rv
[[ $# -lt 1 ]] && return 1
read -p "$(gettext "$1"): " rv
echo $rv
}
So how can we transform the original sample script to leverage this library - that is,
internationalize it? See the re-implemented script below. There are 4 noticeable
changes:
File: i18n-rand.sh
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts
#!/bin/bash
##
# POC around i18n/Localization in a bash script
#(1)
export TEXTDOMAIN=rand.sh
I18NLIB=i18n-lib.sh
#(2)
# source in I18N library - shown above
if [[ -f $I18NLIB ]]
then
. $I18NLIB
else
echo "ERROR - $I18NLIB NOT FOUND"
exit 1
fi
#(4)
# Display initial greeting
i18n_display "Greeting"
# ask for input
low=$(i18n_prompt "Low Number Prompt" )
high=$(i18n_prompt "High Number Prompt" )
# check for error condition and display error if found
if [[ $low -ge $high ]]
then
i18n_error "Input Error"
exit 1
fi
rand=$(random $low $high )
# Log what was just done
i18n_fileout "/tmp/POC" "Activity Log" "$low / $high $rand (${LOGNAME} / $(date))"
# Display Results
i18n_display "Result Title" $rand
exit 0
Now we can prove that it all works. Two test runs appear below - one using the
English content and the other the Italian content.
$: i18n-rand.sh
Hello, I can generate a random number between 2 numbers that you provide
What is your low number? 100
What is your high number? 1000
Your Random Number Is: 615
## now specify Italian as language preference
$: i18n-rand.sh -lang it IT
Ciao, posso generare un numero casuale fra il numero 2 che assicurate
Che cosa il vostro numero basso? 500
Che cosa il vostro alto numero? 1000
Il vostro numero casuale : 601
$:
The content of the log file is as expected. Notice, that this script was not the only
processing affected by changing the Locale. The output of the 'date' command shows
the Italian abbreviation of Sunday (dom) and June (giu). Yes, Linux and all of its
utilities are to be considered internationalized.
from/to generated (by/at): 50 / 125 95 (lji / Sun Jun 10 12:57:38 EDT 2010)
from/to generated (by/at): 100 / 1000 615 (lji / Sun Jun 10 12:57:59 EDT 2010)
da/al generato a (da/a): 500 / 1000 601 (lji / dom giu 10 12:58:48 EDT 2010)
http://www.linuxjournal.com/content/internationalizing-those-bash-scripts
Summary/Conclusions
It may be sparsely documented, but there is real support in Linux and its bash shell
for creating and using Message Catalogs. As a relatively small part of large-scale
applications, shell scripts that present a textual interface, or control progress and
error logging, are often forgotten in a sea of browser accessible content. It's just easy
to forget the shell scripts. My hope is that the minor investment in time and effort
put into assembling this material can be leveraged on development efforts that
include shell scripts.
Miscellaneous Notes
These code samples used here were built and tested on a Suse Linux 10.
the google translator (http://www.google.com/translate_t (http://www.google.com
/translate_t) ) was used to translate the base English Message Catalog into Italian,
so they may not be the most appropriate, in-context translations. More often
than not, language translation for Locaization is performed by a human
translator that's familiar with the application and its customer base.
______________________
Louis Iacona has been designing and developing software since 1982 on
UNIX/Linux and other platforms. Most recently, his efforts have focused
on Java/J2EE constructed solutions for enterprise-scoped applications.
Louis is currently on assignment at Je
Comments
I have serious problems getting utf-8 characters from bash and perl script on a
redHat5 server into oracle,DB2 and mysql. Even though you use the same
character encoding in the databases they are converted differently.
The ugly solution is to use blobs and store binary data. We usually just turns off
utf-8 via /etc/sysconfig/i18n so it just do not confuses the 8bit ascii characters.
Your name:
Anonymous
E-mail:
The content of this field is kept private and will not be shown publicly.
Homepage:
Subject:
Comment: *
Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre><tt> <ul> <ol> <li> <dl> <dt> <dd>
<i> <b><blockquote>
Lines and paragraphs break automatically.
Web page addresses and e-mail addresses turn into links automatically.
Preview