Statistics
| Revision:

svn-gvsig-desktop / tags / v1_9_Build_1222 / install / IzPack / src / doc / splitjar.tex @ 41290

History | View | Annotate | Download (13 KB)

1 5819 cesar
2
%begin{latexonly}
3
\newif\ifpdf
4
\ifx\pdfoutput\undefined
5
\pdffalse
6
\else
7
\pdfoutput=1
8
\pdftrue
9
\fi
10
11
% Change this as needed :
12
%   - a4paper to your paper format
13
%   - the document class to your need (book, article, ...)
14
\ifpdf
15
\documentclass[a4paper, 12pt, pdftex]{report}
16
\else
17
%end{latexonly}
18
\documentclass[a4paper, 12pt, dvips]{report}
19
%begin{latexonly}
20
\fi
21
%end{latexonly}
22
23
% The packages we need
24
\usepackage{verbatim}
25
\usepackage{moreverb}
26
\usepackage{url}
27
\usepackage{tabularx}
28
\usepackage[final]{graphicx}
29
\usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref}
30
%begin{latexonly}
31
\ifpdf
32
\hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red}
33
\fi
34
%end{latexonly}
35
\usepackage{html}
36
\begin{htmlonly}
37
\newcommand{\href}[2]{\htmladdnormallink{#2}{#1}}
38
\end{htmlonly}
39
40
% block style paragraphs tend to look better in technical docs
41
\parindent=0in
42
\parskip=10pt
43
44
\begin{document}
45
46
% Split Jar Specification
47
48
\appendix
49
50
\chapter{Split Jars and MANIFEST Extensions}
51
52
The jar file specification allows archiving and packaging classes and
53
resources, but is limited in size. To overcome these limitations with
54
minimal changes to the jar format we create a set of jar files and add
55
new key-value attributes to the jar MANIFEST. These attributes
56
indicate now many jars are in the set, and which, if any, files are
57
split across multiple jars, and which jars they are contained in.
58
59
\section{Motivations and Limitations}
60
61
Java's zip implementation limited to \~{}2GB. The problem is not solved
62
by zip64 extensions (which will allow larger files), when medium limitations
63
restrict the jar size. There must be a way to split the archive into
64
multiple files, and indeed, the individual entries must be split
65
across jars.
66
67
A ``Split Jar'' is a set of normal jar files, one being the
68
\emph{primary} jar, and zero or more \emph{secondary} jars. The
69
Primary jar file has additional manifest attributes to help
70
reconstruct the data. Entries may or may not be split across multiple
71
jars, and need to be spliced back together upon extraction. Secondary
72
jar names derive from the basename of the primary jar, and each
73
segment of a split entry, shares a basename derived from the original
74
entry name.
75
76
Segments of a split entry need not be in separate jars. Thus if a jar
77
is split to deal with media limitations, all the resulting jars may be
78
combined into a single primary jar, as long as the Manifest is
79
correctly updated.
80
81
A major benefit to this format is that the split archive contents can
82
be recovered manually by extracting the contents of all jar files in
83
the set, and simply concatenating the segments of the split entries.
84
85
Entry names in the primary and secondary jars must not conflict, so
86
that together they represent a single archive. This includes the
87
generated names of split entry segments. This ensures that each jar in
88
a split archive may be extracted to the same location without risk of
89
loosing data. Split file segments are then be concatenated manually or
90
by automation to get the original data set. The manifest should always
91
be consulted to ensure that files which look like split entry segments
92
should actually be spliced together. It is possible that the files
93
were intended to be part of the archive (See ``Naming Conventions'',
94
for name conflict resolution).
95
96
All segments of a split jar are given generated names so that normal
97
jar tools will never unpack the original file. This ensures that no
98
unsuspecting user mistakenly uses a truncated, partial file.
99
100
\subsection{Warnings}
101
102
Signing jar entries which have been split has not been addressed.
103
104
Files can not be compressed directly into streams when there are
105
potential name conflicts with the generated segment names. This
106
requires that robust tools collect a list of files to be added, and
107
determine any conflicts first to avoid the issue (See Naming
108
Conventions: Entry Names).
109
110
Adding files to existing split jars may also have problems with name
111
conflicts.
112
113
\section{Naming Conventions}
114
115
A primary goal for this design is to allow split jars to be created
116
and unpacked manually with minimal problems. This is accomplished by
117
using a naming convention which lends to visual reconstruction. When a
118
jar file must be split into multiple segments, there is a primary
119
file, and multiple secondary jars with a common name. When an entry
120
within the set of jars must be split, \emph{each} segment is given
121
a numbered suffix.
122
123
\subsection{Jar File Names}
124
125
For the primary jar \texttt{\emph{basename}.jar}, the names of
126
secondary jars must always be \texttt{\emph{basename}.split\#.jar}
127
where \texttt{\#} is an integer \emph{secondary jar ID} starting at
128
\texttt{1}. Left padded zeros in the ID are ignored, and encouraged
129
to allow lexicographical sorting. The jars can be renamed, as long as
130
the \emph{basename} is the same for all, and the suffixes
131
(\texttt{.split\#.jar}) remain the same. All entries within the set
132
must be unique.
133
134
\subsection{Jar Entry Names}
135
136
For the split entry named \texttt{\emph{basename}} (including
137
suffixes), all segments are named using the template:
138
\texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer
139
\emph{segment ID} starting at \texttt{0}. These segments are
140
rejoined by concatenating the segments in numeric order, to a file
141
named \texttt{basename}. The template is recorded in the \emph{main} section
142
of the manifest.
143
144
In the rare case where an entry is split, and the name of a real entry
145
may conflicts with a generated segment name, a non-default suffix
146
template is used. In Our case, all of the generated segments will have
147
'\texttt{\~{}}' characters appended, as needed, to eliminate potential
148
conflicts. This non-default template is recorded in the
149
\emph{per-entry} section of the manifest for the split entry.
150
151
Non-default suffixes are used for all \emph{potential} conflicts even in
152
cases where there is no actual conflict.
153
154
\begin{itemize}
155
  \item When the split entry does not generate enough segments to
156
        conflict, but the suffix matches the default template.
157
  \item When the conflicting real entry must also be split, thus its
158
        actual entries use generated suffixes.
159
\end{itemize}\
160
161
Examples are given below.
162
163
Other tools implementing split jars may (though are not encouraged to)
164
use different suffixes, though they must have numeric segment replaced
165
by '\#' in the manifest. Tools must sort these numerically, not
166
lexicographically as ``2'' is generally greater than ``10''
167
lexicographically. However, tools are encouraged to zero padding names,
168
as needed, so that lexicographic sorting is correct.
169
170
\section{Manifest Attributes}
171
172
To minimize changes needed to implement the split jar, we simply add
173
attributes to the manifest. Additional attributes are ignored by other
174
jar tools, so the only consequences is that files split files, and
175
files completely located in secondary jars will not be available to
176
them.
177
178
To prevent adding too much space overhead, and allow jar files to be
179
renamed, the entries are kept minimalistic.
180
181
\subsection{Main Section Attributes}
182
183
Two attribute are added to indicate the number of secondary jars, and
184
the default suffix added to the segments of split files.
185
186
% TODO: make this like an html <dl><dd>... <dt> ...</dl>
187
\begin{itemize}
188
  \item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars
189
        in the set.
190
  \item \texttt{Split-Jar-Secondary-Suffix}: the suffix template
191
        inserted prior to the \texttt{.jar} suffix typical of jar
192
        files, to make the names of secondary jar file in the set;
193
        typically \texttt{.split\#}.
194
  \item \texttt{Split-Entry-Suffix}: the suffix template appended to
195
        an entry name, to name each of the entries constituent parts;
196
        typically \texttt{.---\#.\~{}}. The \# char indicates the
197
        location of the numeric value.  This cannot currently be
198
        changed.
199
\end{itemize}
200
201
\subsection{Per-Entry Section Attributes}
202
203
Only files which are split require an attributes in the manifest. A
204
space separated list of integers is recorded; one for each jar
205
containing a segment of the entry. Entries which have a segment in the
206
primary jar file, indicate this with the id \texttt{0}.
207
208
No restriction is placed on the order of the entries, or the IDs of the
209
jar in which any segment is contained.
210
211
% TODO: make this like an html <dl><dd>... <dt> ...</dl>
212
\begin{itemize}
213
  \item \texttt{Split-Entry-Jar-IDs}: A space separated set of
214
        secondary jar IDs which contains the segments of the
215
        entry. Essentially a list of integers.
216
  \item \texttt{Split-Entry-Suffix}: Overrides the default
217
        Split-Entry-Suffix specified in the Main-Attributes. Needed
218
        when one (or more) '\texttt{\~{}}' chars are appended due to name
219
        conflict with real entries.  This is not strictly necessary,
220
        as simply knowing the basename and unpacking all jars would
221
        allow the suffix to be determined, but is included to conserve
222
        processing. This is currently not user configurable.
223
\end{itemize}
224
225
\section{Examples}
226
227
Two examples, one simple, and another cluttered with pathological
228
cases. Notice that the jar ID number and segment of a split entry have
229
no correlation. In most applications, there will seldom be more than
230
two segments in a single file: the end of the last entry to the
231
previous jar, and maybe the last entry of this jar, which is continued
232
in the next. The examples aren't so well organized though. :-)
233
234
\subsection{Basic Example}
235
236
% TODO: format for tex
237
TODO: format for TeX
238
239
Jar to Create
240
-------------
241
    example.jar
242
243
Files to Compress
244
-----------------
245
    movie.mpeg
246
    README
247
    song.mp3
248
    text.txt
249
250
Entries in Jars
251
---------------
252
    example.jar           movie.mpeg.---0.\~{}
253
                          README
254
255
    example.split1.jar    movie.mpeg.---1.\~{}
256
                          song.mp3.---0.\~{}
257
258
    example.split2.jar    movie.mpeg.---2.\~{}
259
                          song.mp3.---1.\~{}
260
                          text.txt
261
262
MANIFEST (primary jar only)
263
---------------------------
264
    Manifest-Version: 1.0
265
    Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.)
266
    Built-By: IzPack 1.6.0
267
    Main-Class: com.izforge.izpack.installer.Installer
268
    Split-Jar-Secondary-Count: 2
269
    Split-Entry-Suffix: .---\#.\~{}
270
271
    movie.mpg
272
    Split-Entry-Jar-IDs: 0 1 2
273
274
    song.mp3
275
    Split-Entry-Jar-IDs: 1 2
276
277
\subsection{Name Conflicts}
278
279
Pathological example showing name conflict resolution.  Includes
280
281
\begin{itemize}
282
  \item Direct conflict with real archive file
283
        (\texttt{foo...}).
284
285
  \item Indirect conflict with file by suffix template only
286
        (\texttt{bar...}).
287
288
  \item Conflict with real archive file that is also split. Due to
289
        both being split, there would be no name conflict amongst jar
290
        entries, however The default suffix is not used anyway
291
        (\texttt{yin...}).
292
293
  \item A \emph{near} conflict, just to be annoying. Normal behavior
294
        (\texttt{chi...}).
295
296
  \item Files which look like segments of a split file, but are not,
297
        requiring manifest to know the difference (\texttt{zig...}).
298
\end{itemize}
299
300
\begin{verbatim}
301
302
Jar to Create
303
-------------
304
    example.jar
305
306
Files to Compress
307
-----------------
308
    foo.dat
309
    foo.dat.---0.~{} .... Extremely unlikely that these would exist,
310
                        much less need to be archived. Provided as an
311
                        example.
312
    bar.dat
313
    bar.dat.---555.~{} .. Another unlikely case which would not conflict
314
                        (assume bar.dat is split into only 2 segments)
315
                        except for the suffix template.
316
    yin.dat
317
    yin.dat.---2.~{} .... Yet another template only conflict
318
                        conflicting file needs to be split.
319
    chi.dat
320
    chi.dat.---0.~{}~{} ... No potential conflict.
321
    zig.dat.---0.~{} .... Files to be archived as they are, but not
322
    zig.dat.---1.~{}      intended to be spliced back together.
323
324
Entries in Jars
325
---------------
326
    example.jar           foo.dat.---0.~{}
327
                          foo.dat.---0.~{}~{}
328
                          bar.dat.---555.~{}
329
                          bar.dat.---0.~{}~{}
330
                          yin.dat.---0.~{}~{}
331
                          yin.dat.---2.~{}.---0.~{}
332
                          chi.dat.---0.~{}
333
                          chi.dat.---2.~{}~{}
334
                          zig.dat.---0.~{}
335
                          zig.dat.---1.~{}
336
337
    example.split1.jar    foo.dat.---1.~{}~{}
338
                          bar.dat.---1.~{}~{}
339
                          yin.dat.---1.~{}~{}
340
                          yin.dat.---2.~{}.---1.~{}
341
                          chi.dat.---1.~{}
342
343
MANIFEST (primary jar only)
344
---------------------------
345
    Manifest-Version: 1.0
346
    IzPack-Version: X.X.X
347
    Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.)
348
    Built-By: IzPack
349
    Class-Path:
350
    Main-Class: com.izforge.izpack.installer.Installer
351
    Split-Jar-Secondary-Count: 2
352
    Split-Entry-Suffix: .---#.~{}
353
354
    foo.dat
355
    Split-Entry-Jar-IDs: 0 1
356
    Split-Entry-Suffix: .---#.~{}~{}
357
358
    bar.dat
359
    Split-Entry-Jar-IDs: 0 1
360
    Split-Entry-Suffix: .---#.~{}~{}
361
362
    fig.dat
363
    Split-Entry-Jar-IDs: 0 1
364
    Split-Entry-Suffix: .---#.~{}~{}
365
366
    fig.dat.---2.~{}
367
    Split-Entry-Jar-IDs: 0 1
368
369
    moa.dat
370
    Split-Entry-Jar-IDs: 0 1
371
\end{verbatim}
372
373
\end{document}