Statistics
| Revision:

svn-gvsig-desktop / trunk / install / launcher / izpack-launcher-1.3 / src / gettext / share / doc / gettext / gettext_8.html @ 7940

History | View | Annotate | Download (23.8 KB)

1
<HTML>
2
<HEAD>
3
<!-- This HTML file has been created by texi2html 1.52a
4
     from gettext.texi on 9 December 2003 -->
5

    
6
<TITLE>GNU gettext utilities - 8  Producing Binary MO Files</TITLE>
7
</HEAD>
8
<BODY>
9
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
10
<P><HR><P>
11

    
12

    
13
<H1><A NAME="SEC133" HREF="gettext_toc.html#TOC133">8  Producing Binary MO Files</A></H1>
14

    
15

    
16

    
17
<H2><A NAME="SEC134" HREF="gettext_toc.html#TOC134">8.1  Invoking the <CODE>msgfmt</CODE> Program</A></H2>
18

    
19
<P>
20
<A NAME="IDX853"></A>
21
<A NAME="IDX854"></A>
22

    
23
<PRE>
24
msgfmt [<VAR>option</VAR>] <VAR>filename</VAR>.po ...
25
</PRE>
26

    
27
<P>
28
<A NAME="IDX855"></A>
29
The <CODE>msgfmt</CODE> programs generates a binary message catalog from a textual
30
translation description.
31

    
32
</P>
33

    
34

    
35
<H3><A NAME="SEC135" HREF="gettext_toc.html#TOC135">8.1.1  Input file location</A></H3>
36

    
37
<DL COMPACT>
38

    
39
<DT><SAMP>`<VAR>filename</VAR>.po ...&acute;</SAMP>
40
<DD>
41
<DT><SAMP>`-D <VAR>directory</VAR>&acute;</SAMP>
42
<DD>
43
<DT><SAMP>`--directory=<VAR>directory</VAR>&acute;</SAMP>
44
<DD>
45
<A NAME="IDX856"></A>
46
<A NAME="IDX857"></A>
47
Add <VAR>directory</VAR> to the list of directories.  Source files are
48
searched relative to this list of directories.  The resulting <TT>`.po&acute;</TT>
49
file will be written relative to the current directory, though.
50

    
51
</DL>
52

    
53
<P>
54
If an input file is <SAMP>`-&acute;</SAMP>, standard input is read.
55

    
56
</P>
57

    
58

    
59
<H3><A NAME="SEC136" HREF="gettext_toc.html#TOC136">8.1.2  Operation mode</A></H3>
60

    
61
<DL COMPACT>
62

    
63
<DT><SAMP>`-j&acute;</SAMP>
64
<DD>
65
<DT><SAMP>`--java&acute;</SAMP>
66
<DD>
67
<A NAME="IDX858"></A>
68
<A NAME="IDX859"></A>
69
<A NAME="IDX860"></A>
70
Java mode: generate a Java <CODE>ResourceBundle</CODE> class.
71

    
72
<DT><SAMP>`--java2&acute;</SAMP>
73
<DD>
74
<A NAME="IDX861"></A>
75
Like --java, and assume Java2 (JDK 1.2 or higher).
76

    
77
<DT><SAMP>`--tcl&acute;</SAMP>
78
<DD>
79
<A NAME="IDX862"></A>
80
<A NAME="IDX863"></A>
81
Tcl mode: generate a tcl/msgcat <TT>`.msg&acute;</TT> file.
82

    
83
<DT><SAMP>`--qt&acute;</SAMP>
84
<DD>
85
<A NAME="IDX864"></A>
86
<A NAME="IDX865"></A>
87
Qt mode: generate a Qt <TT>`.qm&acute;</TT> file.
88

    
89
</DL>
90

    
91

    
92

    
93
<H3><A NAME="SEC137" HREF="gettext_toc.html#TOC137">8.1.3  Output file location</A></H3>
94

    
95
<DL COMPACT>
96

    
97
<DT><SAMP>`-o <VAR>file</VAR>&acute;</SAMP>
98
<DD>
99
<DT><SAMP>`--output-file=<VAR>file</VAR>&acute;</SAMP>
100
<DD>
101
<A NAME="IDX866"></A>
102
<A NAME="IDX867"></A>
103
Write output to specified file.
104

    
105
<DT><SAMP>`--strict&acute;</SAMP>
106
<DD>
107
<A NAME="IDX868"></A>
108
Direct the program to work strictly following the Uniforum/Sun
109
implementation.  Currently this only affects the naming of the output
110
file.  If this option is not given the name of the output file is the
111
same as the domain name.  If the strict Uniforum mode is enabled the
112
suffix <TT>`.mo&acute;</TT> is added to the file name if it is not already
113
present.
114

    
115
We find this behaviour of Sun's implementation rather silly and so by
116
default this mode is <EM>not</EM> selected.
117

    
118
</DL>
119

    
120
<P>
121
If the output <VAR>file</VAR> is <SAMP>`-&acute;</SAMP>, output is written to standard output.
122

    
123
</P>
124

    
125

    
126
<H3><A NAME="SEC138" HREF="gettext_toc.html#TOC138">8.1.4  Output file location in Java mode</A></H3>
127

    
128
<DL COMPACT>
129

    
130
<DT><SAMP>`-r <VAR>resource</VAR>&acute;</SAMP>
131
<DD>
132
<DT><SAMP>`--resource=<VAR>resource</VAR>&acute;</SAMP>
133
<DD>
134
<A NAME="IDX869"></A>
135
<A NAME="IDX870"></A>
136
Specify the resource name.
137

    
138
<DT><SAMP>`-l <VAR>locale</VAR>&acute;</SAMP>
139
<DD>
140
<DT><SAMP>`--locale=<VAR>locale</VAR>&acute;</SAMP>
141
<DD>
142
<A NAME="IDX871"></A>
143
<A NAME="IDX872"></A>
144
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
145
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
146

    
147
<DT><SAMP>`-d <VAR>directory</VAR>&acute;</SAMP>
148
<DD>
149
<A NAME="IDX873"></A>
150
Specify the base directory of classes directory hierarchy.
151

    
152
</DL>
153

    
154
<P>
155
The class name is determined by appending the locale name to the resource name,
156
separated with an underscore.  The <SAMP>`-d&acute;</SAMP> option is mandatory.  The class
157
is written under the specified directory.
158

    
159
</P>
160

    
161

    
162
<H3><A NAME="SEC139" HREF="gettext_toc.html#TOC139">8.1.5  Output file location in Tcl mode</A></H3>
163

    
164
<DL COMPACT>
165

    
166
<DT><SAMP>`-l <VAR>locale</VAR>&acute;</SAMP>
167
<DD>
168
<DT><SAMP>`--locale=<VAR>locale</VAR>&acute;</SAMP>
169
<DD>
170
<A NAME="IDX874"></A>
171
<A NAME="IDX875"></A>
172
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
173
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
174

    
175
<DT><SAMP>`-d <VAR>directory</VAR>&acute;</SAMP>
176
<DD>
177
<A NAME="IDX876"></A>
178
Specify the base directory of <TT>`.msg&acute;</TT> message catalogs.
179

    
180
</DL>
181

    
182
<P>
183
The <SAMP>`-l&acute;</SAMP> and <SAMP>`-d&acute;</SAMP> options are mandatory.  The <TT>`.msg&acute;</TT> file is
184
written in the specified directory.
185

    
186
</P>
187

    
188

    
189
<H3><A NAME="SEC140" HREF="gettext_toc.html#TOC140">8.1.6  Input file syntax</A></H3>
190

    
191
<DL COMPACT>
192

    
193
<DT><SAMP>`-P&acute;</SAMP>
194
<DD>
195
<DT><SAMP>`--properties-input&acute;</SAMP>
196
<DD>
197
<A NAME="IDX877"></A>
198
<A NAME="IDX878"></A>
199
Assume the input files are Java ResourceBundles in Java <CODE>.properties</CODE>
200
syntax, not in PO file syntax.
201

    
202
<DT><SAMP>`--stringtable-input&acute;</SAMP>
203
<DD>
204
<A NAME="IDX879"></A>
205
Assume the input files are NeXTstep/GNUstep localized resource files in
206
<CODE>.strings</CODE> syntax, not in PO file syntax.
207

    
208
</DL>
209

    
210

    
211

    
212
<H3><A NAME="SEC141" HREF="gettext_toc.html#TOC141">8.1.7  Input file interpretation</A></H3>
213

    
214
<DL COMPACT>
215

    
216
<DT><SAMP>`-c&acute;</SAMP>
217
<DD>
218
<DT><SAMP>`--check&acute;</SAMP>
219
<DD>
220
<A NAME="IDX880"></A>
221
<A NAME="IDX881"></A>
222
Perform all the checks implied by <CODE>--check-format</CODE>, <CODE>--check-header</CODE>,
223
<CODE>--check-domain</CODE>.
224

    
225
<DT><SAMP>`--check-format&acute;</SAMP>
226
<DD>
227
<A NAME="IDX882"></A>
228
<A NAME="IDX883"></A>
229
Check language dependent format strings.
230

    
231
If the string represents a format string used in a
232
<CODE>printf</CODE>-like function both strings should have the same number of
233
<SAMP>`%&acute;</SAMP> format specifiers, with matching types.  If the flag
234
<CODE>c-format</CODE> or <CODE>possible-c-format</CODE> appears in the special
235
comment <KBD>#,</KBD> for this entry a check is performed.  For example, the
236
check will diagnose using <SAMP>`%.*s&acute;</SAMP> against <SAMP>`%s&acute;</SAMP>, or <SAMP>`%d&acute;</SAMP>
237
against <SAMP>`%s&acute;</SAMP>, or <SAMP>`%d&acute;</SAMP> against <SAMP>`%x&acute;</SAMP>.  It can even handle
238
positional parameters.
239

    
240
Normally the <CODE>xgettext</CODE> program automatically decides whether a
241
string is a format string or not.  This algorithm is not perfect,
242
though.  It might regard a string as a format string though it is not
243
used in a <CODE>printf</CODE>-like function and so <CODE>msgfmt</CODE> might report
244
errors where there are none.
245

    
246
To solve this problem the programmer can dictate the decision to the
247
<CODE>xgettext</CODE> program (see section <A HREF="gettext_13.html#SEC221">13.3.1  C Format Strings</A>).  The translator should not
248
consider removing the flag from the <KBD>#,</KBD> line.  This "fix" would be
249
reversed again as soon as <CODE>msgmerge</CODE> is called the next time.
250

    
251
<DT><SAMP>`--check-header&acute;</SAMP>
252
<DD>
253
<A NAME="IDX884"></A>
254
Verify presence and contents of the header entry.  See section <A HREF="gettext_5.html#SEC38">5.2  Filling in the Header Entry</A>,
255
for a description of the various fields in the header entry.
256

    
257
<DT><SAMP>`--check-domain&acute;</SAMP>
258
<DD>
259
<A NAME="IDX885"></A>
260
Check for conflicts between domain directives and the <CODE>--output-file</CODE>
261
option
262

    
263
<DT><SAMP>`-C&acute;</SAMP>
264
<DD>
265
<DT><SAMP>`--check-compatibility&acute;</SAMP>
266
<DD>
267
<A NAME="IDX886"></A>
268
<A NAME="IDX887"></A>
269
<A NAME="IDX888"></A>
270
Check that GNU msgfmt behaves like X/Open msgfmt.  This will give an error
271
when attempting to use the GNU extensions.
272

    
273
<DT><SAMP>`--check-accelerators[=<VAR>char</VAR>]&acute;</SAMP>
274
<DD>
275
<A NAME="IDX889"></A>
276
<A NAME="IDX890"></A>
277
<A NAME="IDX891"></A>
278
<A NAME="IDX892"></A>
279
Check presence of keyboard accelerators for menu items.  This is based on
280
the convention used in some GUIs that a keyboard accelerator in a menu
281
item string is designated by an immediately preceding <SAMP>`&#38;&acute;</SAMP> character.
282
Sometimes a keyboard accelerator is also called "keyboard mnemonic".
283
This check verifies that if the untranslated string has exactly one
284
<SAMP>`&#38;&acute;</SAMP> character, the translated string has exactly one <SAMP>`&#38;&acute;</SAMP> as well.
285
If this option is given with a <VAR>char</VAR> argument, this <VAR>char</VAR> should
286
be a non-alphanumeric character and is used as keyboard acceleator mark
287
instead of <SAMP>`&#38;&acute;</SAMP>.
288

    
289
<DT><SAMP>`-f&acute;</SAMP>
290
<DD>
291
<DT><SAMP>`--use-fuzzy&acute;</SAMP>
292
<DD>
293
<A NAME="IDX893"></A>
294
<A NAME="IDX894"></A>
295
<A NAME="IDX895"></A>
296
Use fuzzy entries in output.  Note that using this option is usually wrong,
297
because fuzzy messages are exactly those which have not been validated by
298
a human translator.
299

    
300
</DL>
301

    
302

    
303

    
304
<H3><A NAME="SEC142" HREF="gettext_toc.html#TOC142">8.1.8  Output details</A></H3>
305

    
306
<DL COMPACT>
307

    
308
<DT><SAMP>`-a <VAR>number</VAR>&acute;</SAMP>
309
<DD>
310
<DT><SAMP>`--alignment=<VAR>number</VAR>&acute;</SAMP>
311
<DD>
312
<A NAME="IDX896"></A>
313
<A NAME="IDX897"></A>
314
Align strings to <VAR>number</VAR> bytes (default: 1).
315

    
316
<DT><SAMP>`--no-hash&acute;</SAMP>
317
<DD>
318
<A NAME="IDX898"></A>
319
Don't include a hash table in the binary file.  Lookup will be more expensive
320
at run time (binary search instead of hash table lookup).
321

    
322
</DL>
323

    
324

    
325

    
326
<H3><A NAME="SEC143" HREF="gettext_toc.html#TOC143">8.1.9  Informative output</A></H3>
327

    
328
<DL COMPACT>
329

    
330
<DT><SAMP>`-h&acute;</SAMP>
331
<DD>
332
<DT><SAMP>`--help&acute;</SAMP>
333
<DD>
334
<A NAME="IDX899"></A>
335
<A NAME="IDX900"></A>
336
Display this help and exit.
337

    
338
<DT><SAMP>`-V&acute;</SAMP>
339
<DD>
340
<DT><SAMP>`--version&acute;</SAMP>
341
<DD>
342
<A NAME="IDX901"></A>
343
<A NAME="IDX902"></A>
344
Output version information and exit.
345

    
346
<DT><SAMP>`--statistics&acute;</SAMP>
347
<DD>
348
<A NAME="IDX903"></A>
349
Print statistics about translations.
350

    
351
<DT><SAMP>`-v&acute;</SAMP>
352
<DD>
353
<DT><SAMP>`--verbose&acute;</SAMP>
354
<DD>
355
<A NAME="IDX904"></A>
356
<A NAME="IDX905"></A>
357
Increase verbosity level.
358

    
359
</DL>
360

    
361

    
362

    
363
<H2><A NAME="SEC144" HREF="gettext_toc.html#TOC144">8.2  Invoking the <CODE>msgunfmt</CODE> Program</A></H2>
364

    
365
<P>
366
<A NAME="IDX906"></A>
367
<A NAME="IDX907"></A>
368

    
369
<PRE>
370
msgunfmt [<VAR>option</VAR>] [<VAR>file</VAR>]...
371
</PRE>
372

    
373
<P>
374
<A NAME="IDX908"></A>
375
The <CODE>msgunfmt</CODE> program converts a binary message catalog to a
376
Uniforum style .po file.
377

    
378
</P>
379

    
380

    
381
<H3><A NAME="SEC145" HREF="gettext_toc.html#TOC145">8.2.1  Operation mode</A></H3>
382

    
383
<DL COMPACT>
384

    
385
<DT><SAMP>`-j&acute;</SAMP>
386
<DD>
387
<DT><SAMP>`--java&acute;</SAMP>
388
<DD>
389
<A NAME="IDX909"></A>
390
<A NAME="IDX910"></A>
391
<A NAME="IDX911"></A>
392
Java mode: input is a Java <CODE>ResourceBundle</CODE> class.
393

    
394
<DT><SAMP>`--tcl&acute;</SAMP>
395
<DD>
396
<A NAME="IDX912"></A>
397
<A NAME="IDX913"></A>
398
Tcl mode: input is a tcl/msgcat <TT>`.msg&acute;</TT> file.
399

    
400
</DL>
401

    
402

    
403

    
404
<H3><A NAME="SEC146" HREF="gettext_toc.html#TOC146">8.2.2  Input file location</A></H3>
405

    
406
<DL COMPACT>
407

    
408
<DT><SAMP>`<VAR>file</VAR> ...&acute;</SAMP>
409
<DD>
410
Input .mo files.
411

    
412
</DL>
413

    
414
<P>
415
If no input <VAR>file</VAR> is given or if it is <SAMP>`-&acute;</SAMP>, standard input is read.
416

    
417
</P>
418

    
419

    
420
<H3><A NAME="SEC147" HREF="gettext_toc.html#TOC147">8.2.3  Input file location in Java mode</A></H3>
421

    
422
<DL COMPACT>
423

    
424
<DT><SAMP>`-r <VAR>resource</VAR>&acute;</SAMP>
425
<DD>
426
<DT><SAMP>`--resource=<VAR>resource</VAR>&acute;</SAMP>
427
<DD>
428
<A NAME="IDX914"></A>
429
<A NAME="IDX915"></A>
430
Specify the resource name.
431

    
432
<DT><SAMP>`-l <VAR>locale</VAR>&acute;</SAMP>
433
<DD>
434
<DT><SAMP>`--locale=<VAR>locale</VAR>&acute;</SAMP>
435
<DD>
436
<A NAME="IDX916"></A>
437
<A NAME="IDX917"></A>
438
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
439
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
440

    
441
</DL>
442

    
443
<P>
444
The class name is determined by appending the locale name to the resource name,
445
separated with an underscore.  The class is located using the <CODE>CLASSPATH</CODE>.
446

    
447
</P>
448

    
449

    
450
<H3><A NAME="SEC148" HREF="gettext_toc.html#TOC148">8.2.4  Input file location in Tcl mode</A></H3>
451

    
452
<DL COMPACT>
453

    
454
<DT><SAMP>`-l <VAR>locale</VAR>&acute;</SAMP>
455
<DD>
456
<DT><SAMP>`--locale=<VAR>locale</VAR>&acute;</SAMP>
457
<DD>
458
<A NAME="IDX918"></A>
459
<A NAME="IDX919"></A>
460
Specify the locale name, either a language specification of the form <VAR>ll</VAR>
461
or a combined language and country specification of the form <VAR>ll_CC</VAR>.
462

    
463
<DT><SAMP>`-d <VAR>directory</VAR>&acute;</SAMP>
464
<DD>
465
<A NAME="IDX920"></A>
466
Specify the base directory of <TT>`.msg&acute;</TT> message catalogs.
467

    
468
</DL>
469

    
470
<P>
471
The <SAMP>`-l&acute;</SAMP> and <SAMP>`-d&acute;</SAMP> options are mandatory.  The <TT>`.msg&acute;</TT> file is
472
located in the specified directory.
473

    
474
</P>
475

    
476

    
477
<H3><A NAME="SEC149" HREF="gettext_toc.html#TOC149">8.2.5  Output file location</A></H3>
478

    
479
<DL COMPACT>
480

    
481
<DT><SAMP>`-o <VAR>file</VAR>&acute;</SAMP>
482
<DD>
483
<DT><SAMP>`--output-file=<VAR>file</VAR>&acute;</SAMP>
484
<DD>
485
<A NAME="IDX921"></A>
486
<A NAME="IDX922"></A>
487
Write output to specified file.
488

    
489
</DL>
490

    
491
<P>
492
The results are written to standard output if no output file is specified
493
or if it is <SAMP>`-&acute;</SAMP>.
494

    
495
</P>
496

    
497

    
498
<H3><A NAME="SEC150" HREF="gettext_toc.html#TOC150">8.2.6  Output details</A></H3>
499

    
500
<DL COMPACT>
501

    
502
<DT><SAMP>`--force-po&acute;</SAMP>
503
<DD>
504
<A NAME="IDX923"></A>
505
Always write an output file even if it contains no message.
506

    
507
<DT><SAMP>`-i&acute;</SAMP>
508
<DD>
509
<DT><SAMP>`--indent&acute;</SAMP>
510
<DD>
511
<A NAME="IDX924"></A>
512
<A NAME="IDX925"></A>
513
Write the .po file using indented style.
514

    
515
<DT><SAMP>`--strict&acute;</SAMP>
516
<DD>
517
<A NAME="IDX926"></A>
518
Write out a strict Uniforum conforming PO file.  Note that this
519
Uniforum format should be avoided because it doesn't support the
520
GNU extensions.
521

    
522
<DT><SAMP>`-p&acute;</SAMP>
523
<DD>
524
<DT><SAMP>`--properties-output&acute;</SAMP>
525
<DD>
526
<A NAME="IDX927"></A>
527
<A NAME="IDX928"></A>
528
Write out a Java ResourceBundle in Java <CODE>.properties</CODE> syntax.  Note
529
that this file format doesn't support plural forms and silently drops
530
obsolete messages.
531

    
532
<DT><SAMP>`--stringtable-output&acute;</SAMP>
533
<DD>
534
<A NAME="IDX929"></A>
535
Write out a NeXTstep/GNUstep localized resource file in <CODE>.strings</CODE> syntax.
536
Note that this file format doesn't support plural forms.
537

    
538
<DT><SAMP>`-w <VAR>number</VAR>&acute;</SAMP>
539
<DD>
540
<DT><SAMP>`--width=<VAR>number</VAR>&acute;</SAMP>
541
<DD>
542
<A NAME="IDX930"></A>
543
<A NAME="IDX931"></A>
544
Set the output page width.  Long strings in the output files will be
545
split across multiple lines in order to ensure that each line's width
546
(= number of screen columns) is less or equal to the given <VAR>number</VAR>.
547

    
548
<DT><SAMP>`--no-wrap&acute;</SAMP>
549
<DD>
550
<A NAME="IDX932"></A>
551
Do not break long message lines.  Message lines whose width exceeds the
552
output page width will not be split into several lines.  Only file reference
553
lines which are wider than the output page width will be split.
554

    
555
<DT><SAMP>`-s&acute;</SAMP>
556
<DD>
557
<DT><SAMP>`--sort-output&acute;</SAMP>
558
<DD>
559
<A NAME="IDX933"></A>
560
<A NAME="IDX934"></A>
561
<A NAME="IDX935"></A>
562
Generate sorted output.  Note that using this option makes it much harder
563
for the translator to understand each message's context.
564

    
565
</DL>
566

    
567

    
568

    
569
<H3><A NAME="SEC151" HREF="gettext_toc.html#TOC151">8.2.7  Informative output</A></H3>
570

    
571
<DL COMPACT>
572

    
573
<DT><SAMP>`-h&acute;</SAMP>
574
<DD>
575
<DT><SAMP>`--help&acute;</SAMP>
576
<DD>
577
<A NAME="IDX936"></A>
578
<A NAME="IDX937"></A>
579
Display this help and exit.
580

    
581
<DT><SAMP>`-V&acute;</SAMP>
582
<DD>
583
<DT><SAMP>`--version&acute;</SAMP>
584
<DD>
585
<A NAME="IDX938"></A>
586
<A NAME="IDX939"></A>
587
Output version information and exit.
588

    
589
<DT><SAMP>`-v&acute;</SAMP>
590
<DD>
591
<DT><SAMP>`--verbose&acute;</SAMP>
592
<DD>
593
<A NAME="IDX940"></A>
594
<A NAME="IDX941"></A>
595
Increase verbosity level.
596

    
597
</DL>
598

    
599

    
600

    
601
<H2><A NAME="SEC152" HREF="gettext_toc.html#TOC152">8.3  The Format of GNU MO Files</A></H2>
602
<P>
603
<A NAME="IDX942"></A>
604
<A NAME="IDX943"></A>
605

    
606
</P>
607
<P>
608
The format of the generated MO files is best described by a picture,
609
which appears below.
610

    
611
</P>
612
<P>
613
<A NAME="IDX944"></A>
614
The first two words serve the identification of the file.  The magic
615
number will always signal GNU MO files.  The number is stored in the
616
byte order of the generating machine, so the magic number really is
617
two numbers: <CODE>0x950412de</CODE> and <CODE>0xde120495</CODE>.  The second
618
word describes the current revision of the file format.  For now the
619
revision is 0.  This might change in future versions, and ensures
620
that the readers of MO files can distinguish new formats from old
621
ones, so that both can be handled correctly.  The version is kept
622
separate from the magic number, instead of using different magic
623
numbers for different formats, mainly because <TT>`/etc/magic&acute;</TT> is
624
not updated often.  It might be better to have magic separated from
625
internal format version identification.
626

    
627
</P>
628
<P>
629
Follow a number of pointers to later tables in the file, allowing
630
for the extension of the prefix part of MO files without having to
631
recompile programs reading them.  This might become useful for later
632
inserting a few flag bits, indication about the charset used, new
633
tables, or other things.
634

    
635
</P>
636
<P>
637
Then, at offset <VAR>O</VAR> and offset <VAR>T</VAR> in the picture, two tables
638
of string descriptors can be found.  In both tables, each string
639
descriptor uses two 32 bits integers, one for the string length,
640
another for the offset of the string in the MO file, counting in bytes
641
from the start of the file.  The first table contains descriptors
642
for the original strings, and is sorted so the original strings
643
are in increasing lexicographical order.  The second table contains
644
descriptors for the translated strings, and is parallel to the first
645
table: to find the corresponding translation one has to access the
646
array slot in the second array with the same index.
647

    
648
</P>
649
<P>
650
Having the original strings sorted enables the use of simple binary
651
search, for when the MO file does not contain an hashing table, or
652
for when it is not practical to use the hashing table provided in
653
the MO file.  This also has another advantage, as the empty string
654
in a PO file GNU <CODE>gettext</CODE> is usually <EM>translated</EM> into
655
some system information attached to that particular MO file, and the
656
empty string necessarily becomes the first in both the original and
657
translated tables, making the system information very easy to find.
658

    
659
</P>
660
<P>
661
<A NAME="IDX945"></A>
662
The size <VAR>S</VAR> of the hash table can be zero.  In this case, the
663
hash table itself is not contained in the MO file.  Some people might
664
prefer this because a precomputed hashing table takes disk space, and
665
does not win <EM>that</EM> much speed.  The hash table contains indices
666
to the sorted array of strings in the MO file.  Conflict resolution is
667
done by double hashing.  The precise hashing algorithm used is fairly
668
dependent on GNU <CODE>gettext</CODE> code, and is not documented here.
669

    
670
</P>
671
<P>
672
As for the strings themselves, they follow the hash file, and each
673
is terminated with a <KBD>NUL</KBD>, and this <KBD>NUL</KBD> is not counted in
674
the length which appears in the string descriptor.  The <CODE>msgfmt</CODE>
675
program has an option selecting the alignment for MO file strings.
676
With this option, each string is separately aligned so it starts at
677
an offset which is a multiple of the alignment value.  On some RISC
678
machines, a correct alignment will speed things up.
679

    
680
</P>
681
<P>
682
<A NAME="IDX946"></A>
683
Plural forms are stored by letting the plural of the original string
684
follow the singular of the original string, separated through a
685
<KBD>NUL</KBD> byte.  The length which appears in the string descriptor
686
includes both.  However, only the singular of the original string
687
takes part in the hash table lookup.  The plural variants of the
688
translation are all stored consecutively, separated through a
689
<KBD>NUL</KBD> byte.  Here also, the length in the string descriptor
690
includes all of them.
691

    
692
</P>
693
<P>
694
Nothing prevents a MO file from having embedded <KBD>NUL</KBD>s in strings.
695
However, the program interface currently used already presumes
696
that strings are <KBD>NUL</KBD> terminated, so embedded <KBD>NUL</KBD>s are
697
somewhat useless.  But the MO file format is general enough so other
698
interfaces would be later possible, if for example, we ever want to
699
implement wide characters right in MO files, where <KBD>NUL</KBD> bytes may
700
accidently appear.  (No, we don't want to have wide characters in MO
701
files.  They would make the file unnecessarily large, and the
702
<SAMP>`wchar_t&acute;</SAMP> type being platform dependent, MO files would be
703
platform dependent as well.)
704

    
705
</P>
706
<P>
707
This particular issue has been strongly debated in the GNU
708
<CODE>gettext</CODE> development forum, and it is expectable that MO file
709
format will evolve or change over time.  It is even possible that many
710
formats may later be supported concurrently.  But surely, we have to
711
start somewhere, and the MO file format described here is a good start.
712
Nothing is cast in concrete, and the format may later evolve fairly
713
easily, so we should feel comfortable with the current approach.
714

    
715
</P>
716

    
717
<PRE>
718
        byte
719
             +------------------------------------------+
720
          0  | magic number = 0x950412de                |
721
             |                                          |
722
          4  | file format revision = 0                 |
723
             |                                          |
724
          8  | number of strings                        |  == N
725
             |                                          |
726
         12  | offset of table with original strings    |  == O
727
             |                                          |
728
         16  | offset of table with translation strings |  == T
729
             |                                          |
730
         20  | size of hashing table                    |  == S
731
             |                                          |
732
         24  | offset of hashing table                  |  == H
733
             |                                          |
734
             .                                          .
735
             .    (possibly more entries later)         .
736
             .                                          .
737
             |                                          |
738
          O  | length &#38; offset 0th string  ----------------.
739
      O + 8  | length &#38; offset 1st string  ------------------.
740
              ...                                    ...   | |
741
O + ((N-1)*8)| length &#38; offset (N-1)th string           |  | |
742
             |                                          |  | |
743
          T  | length &#38; offset 0th translation  ---------------.
744
      T + 8  | length &#38; offset 1st translation  -----------------.
745
              ...                                    ...   | | | |
746
T + ((N-1)*8)| length &#38; offset (N-1)th translation      |  | | | |
747
             |                                          |  | | | |
748
          H  | start hash table                         |  | | | |
749
              ...                                    ...   | | | |
750
  H + S * 4  | end hash table                           |  | | | |
751
             |                                          |  | | | |
752
             | NUL terminated 0th string  &#60;----------------' | | |
753
             |                                          |    | | |
754
             | NUL terminated 1st string  &#60;------------------' | |
755
             |                                          |      | |
756
              ...                                    ...       | |
757
             |                                          |      | |
758
             | NUL terminated 0th translation  &#60;---------------' |
759
             |                                          |        |
760
             | NUL terminated 1st translation  &#60;-----------------'
761
             |                                          |
762
              ...                                    ...
763
             |                                          |
764
             +------------------------------------------+
765
</PRE>
766

    
767
<P><HR><P>
768
Go to the <A HREF="gettext_1.html">first</A>, <A HREF="gettext_7.html">previous</A>, <A HREF="gettext_9.html">next</A>, <A HREF="gettext_22.html">last</A> section, <A HREF="gettext_toc.html">table of contents</A>.
769
</BODY>
770
</HTML>