svn-gvsig-desktop / tags / v1_9_Build_1222 / install / IzPack / src / doc / splitjar.tex @ 41290
History | View | Annotate | Download (13 KB)
1 | 5819 | cesar | |
---|---|---|---|
2 | %begin{latexonly} |
||
3 | \newif\ifpdf |
||
4 | \ifx\pdfoutput\undefined |
||
5 | \pdffalse |
||
6 | \else |
||
7 | \pdfoutput=1 |
||
8 | \pdftrue |
||
9 | \fi |
||
10 | |||
11 | % Change this as needed : |
||
12 | % - a4paper to your paper format |
||
13 | % - the document class to your need (book, article, ...) |
||
14 | \ifpdf |
||
15 | \documentclass[a4paper, 12pt, pdftex]{report} |
||
16 | \else |
||
17 | %end{latexonly} |
||
18 | \documentclass[a4paper, 12pt, dvips]{report} |
||
19 | %begin{latexonly} |
||
20 | \fi |
||
21 | %end{latexonly} |
||
22 | |||
23 | % The packages we need |
||
24 | \usepackage{verbatim} |
||
25 | \usepackage{moreverb} |
||
26 | \usepackage{url} |
||
27 | \usepackage{tabularx} |
||
28 | \usepackage[final]{graphicx} |
||
29 | \usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref} |
||
30 | %begin{latexonly} |
||
31 | \ifpdf |
||
32 | \hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red} |
||
33 | \fi |
||
34 | %end{latexonly} |
||
35 | \usepackage{html} |
||
36 | \begin{htmlonly} |
||
37 | \newcommand{\href}[2]{\htmladdnormallink{#2}{#1}} |
||
38 | \end{htmlonly} |
||
39 | |||
40 | % block style paragraphs tend to look better in technical docs |
||
41 | \parindent=0in |
||
42 | \parskip=10pt |
||
43 | |||
44 | \begin{document} |
||
45 | |||
46 | % Split Jar Specification |
||
47 | |||
48 | \appendix |
||
49 | |||
50 | \chapter{Split Jars and MANIFEST Extensions} |
||
51 | |||
52 | The jar file specification allows archiving and packaging classes and |
||
53 | resources, but is limited in size. To overcome these limitations with |
||
54 | minimal changes to the jar format we create a set of jar files and add |
||
55 | new key-value attributes to the jar MANIFEST. These attributes |
||
56 | indicate now many jars are in the set, and which, if any, files are |
||
57 | split across multiple jars, and which jars they are contained in. |
||
58 | |||
59 | \section{Motivations and Limitations} |
||
60 | |||
61 | Java's zip implementation limited to \~{}2GB. The problem is not solved |
||
62 | by zip64 extensions (which will allow larger files), when medium limitations |
||
63 | restrict the jar size. There must be a way to split the archive into |
||
64 | multiple files, and indeed, the individual entries must be split |
||
65 | across jars. |
||
66 | |||
67 | A ``Split Jar'' is a set of normal jar files, one being the |
||
68 | \emph{primary} jar, and zero or more \emph{secondary} jars. The |
||
69 | Primary jar file has additional manifest attributes to help |
||
70 | reconstruct the data. Entries may or may not be split across multiple |
||
71 | jars, and need to be spliced back together upon extraction. Secondary |
||
72 | jar names derive from the basename of the primary jar, and each |
||
73 | segment of a split entry, shares a basename derived from the original |
||
74 | entry name. |
||
75 | |||
76 | Segments of a split entry need not be in separate jars. Thus if a jar |
||
77 | is split to deal with media limitations, all the resulting jars may be |
||
78 | combined into a single primary jar, as long as the Manifest is |
||
79 | correctly updated. |
||
80 | |||
81 | A major benefit to this format is that the split archive contents can |
||
82 | be recovered manually by extracting the contents of all jar files in |
||
83 | the set, and simply concatenating the segments of the split entries. |
||
84 | |||
85 | Entry names in the primary and secondary jars must not conflict, so |
||
86 | that together they represent a single archive. This includes the |
||
87 | generated names of split entry segments. This ensures that each jar in |
||
88 | a split archive may be extracted to the same location without risk of |
||
89 | loosing data. Split file segments are then be concatenated manually or |
||
90 | by automation to get the original data set. The manifest should always |
||
91 | be consulted to ensure that files which look like split entry segments |
||
92 | should actually be spliced together. It is possible that the files |
||
93 | were intended to be part of the archive (See ``Naming Conventions'', |
||
94 | for name conflict resolution). |
||
95 | |||
96 | All segments of a split jar are given generated names so that normal |
||
97 | jar tools will never unpack the original file. This ensures that no |
||
98 | unsuspecting user mistakenly uses a truncated, partial file. |
||
99 | |||
100 | \subsection{Warnings} |
||
101 | |||
102 | Signing jar entries which have been split has not been addressed. |
||
103 | |||
104 | Files can not be compressed directly into streams when there are |
||
105 | potential name conflicts with the generated segment names. This |
||
106 | requires that robust tools collect a list of files to be added, and |
||
107 | determine any conflicts first to avoid the issue (See Naming |
||
108 | Conventions: Entry Names). |
||
109 | |||
110 | Adding files to existing split jars may also have problems with name |
||
111 | conflicts. |
||
112 | |||
113 | \section{Naming Conventions} |
||
114 | |||
115 | A primary goal for this design is to allow split jars to be created |
||
116 | and unpacked manually with minimal problems. This is accomplished by |
||
117 | using a naming convention which lends to visual reconstruction. When a |
||
118 | jar file must be split into multiple segments, there is a primary |
||
119 | file, and multiple secondary jars with a common name. When an entry |
||
120 | within the set of jars must be split, \emph{each} segment is given |
||
121 | a numbered suffix. |
||
122 | |||
123 | \subsection{Jar File Names} |
||
124 | |||
125 | For the primary jar \texttt{\emph{basename}.jar}, the names of |
||
126 | secondary jars must always be \texttt{\emph{basename}.split\#.jar} |
||
127 | where \texttt{\#} is an integer \emph{secondary jar ID} starting at |
||
128 | \texttt{1}. Left padded zeros in the ID are ignored, and encouraged |
||
129 | to allow lexicographical sorting. The jars can be renamed, as long as |
||
130 | the \emph{basename} is the same for all, and the suffixes |
||
131 | (\texttt{.split\#.jar}) remain the same. All entries within the set |
||
132 | must be unique. |
||
133 | |||
134 | \subsection{Jar Entry Names} |
||
135 | |||
136 | For the split entry named \texttt{\emph{basename}} (including |
||
137 | suffixes), all segments are named using the template: |
||
138 | \texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer |
||
139 | \emph{segment ID} starting at \texttt{0}. These segments are |
||
140 | rejoined by concatenating the segments in numeric order, to a file |
||
141 | named \texttt{basename}. The template is recorded in the \emph{main} section |
||
142 | of the manifest. |
||
143 | |||
144 | In the rare case where an entry is split, and the name of a real entry |
||
145 | may conflicts with a generated segment name, a non-default suffix |
||
146 | template is used. In Our case, all of the generated segments will have |
||
147 | '\texttt{\~{}}' characters appended, as needed, to eliminate potential |
||
148 | conflicts. This non-default template is recorded in the |
||
149 | \emph{per-entry} section of the manifest for the split entry. |
||
150 | |||
151 | Non-default suffixes are used for all \emph{potential} conflicts even in |
||
152 | cases where there is no actual conflict. |
||
153 | |||
154 | \begin{itemize} |
||
155 | \item When the split entry does not generate enough segments to |
||
156 | conflict, but the suffix matches the default template. |
||
157 | \item When the conflicting real entry must also be split, thus its |
||
158 | actual entries use generated suffixes. |
||
159 | \end{itemize}\ |
||
160 | |||
161 | Examples are given below. |
||
162 | |||
163 | Other tools implementing split jars may (though are not encouraged to) |
||
164 | use different suffixes, though they must have numeric segment replaced |
||
165 | by '\#' in the manifest. Tools must sort these numerically, not |
||
166 | lexicographically as ``2'' is generally greater than ``10'' |
||
167 | lexicographically. However, tools are encouraged to zero padding names, |
||
168 | as needed, so that lexicographic sorting is correct. |
||
169 | |||
170 | \section{Manifest Attributes} |
||
171 | |||
172 | To minimize changes needed to implement the split jar, we simply add |
||
173 | attributes to the manifest. Additional attributes are ignored by other |
||
174 | jar tools, so the only consequences is that files split files, and |
||
175 | files completely located in secondary jars will not be available to |
||
176 | them. |
||
177 | |||
178 | To prevent adding too much space overhead, and allow jar files to be |
||
179 | renamed, the entries are kept minimalistic. |
||
180 | |||
181 | \subsection{Main Section Attributes} |
||
182 | |||
183 | Two attribute are added to indicate the number of secondary jars, and |
||
184 | the default suffix added to the segments of split files. |
||
185 | |||
186 | % TODO: make this like an html <dl><dd>... <dt> ...</dl> |
||
187 | \begin{itemize} |
||
188 | \item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars |
||
189 | in the set. |
||
190 | \item \texttt{Split-Jar-Secondary-Suffix}: the suffix template |
||
191 | inserted prior to the \texttt{.jar} suffix typical of jar |
||
192 | files, to make the names of secondary jar file in the set; |
||
193 | typically \texttt{.split\#}. |
||
194 | \item \texttt{Split-Entry-Suffix}: the suffix template appended to |
||
195 | an entry name, to name each of the entries constituent parts; |
||
196 | typically \texttt{.---\#.\~{}}. The \# char indicates the |
||
197 | location of the numeric value. This cannot currently be |
||
198 | changed. |
||
199 | \end{itemize} |
||
200 | |||
201 | \subsection{Per-Entry Section Attributes} |
||
202 | |||
203 | Only files which are split require an attributes in the manifest. A |
||
204 | space separated list of integers is recorded; one for each jar |
||
205 | containing a segment of the entry. Entries which have a segment in the |
||
206 | primary jar file, indicate this with the id \texttt{0}. |
||
207 | |||
208 | No restriction is placed on the order of the entries, or the IDs of the |
||
209 | jar in which any segment is contained. |
||
210 | |||
211 | % TODO: make this like an html <dl><dd>... <dt> ...</dl> |
||
212 | \begin{itemize} |
||
213 | \item \texttt{Split-Entry-Jar-IDs}: A space separated set of |
||
214 | secondary jar IDs which contains the segments of the |
||
215 | entry. Essentially a list of integers. |
||
216 | \item \texttt{Split-Entry-Suffix}: Overrides the default |
||
217 | Split-Entry-Suffix specified in the Main-Attributes. Needed |
||
218 | when one (or more) '\texttt{\~{}}' chars are appended due to name |
||
219 | conflict with real entries. This is not strictly necessary, |
||
220 | as simply knowing the basename and unpacking all jars would |
||
221 | allow the suffix to be determined, but is included to conserve |
||
222 | processing. This is currently not user configurable. |
||
223 | \end{itemize} |
||
224 | |||
225 | \section{Examples} |
||
226 | |||
227 | Two examples, one simple, and another cluttered with pathological |
||
228 | cases. Notice that the jar ID number and segment of a split entry have |
||
229 | no correlation. In most applications, there will seldom be more than |
||
230 | two segments in a single file: the end of the last entry to the |
||
231 | previous jar, and maybe the last entry of this jar, which is continued |
||
232 | in the next. The examples aren't so well organized though. :-) |
||
233 | |||
234 | \subsection{Basic Example} |
||
235 | |||
236 | % TODO: format for tex |
||
237 | TODO: format for TeX |
||
238 | |||
239 | Jar to Create |
||
240 | ------------- |
||
241 | example.jar |
||
242 | |||
243 | Files to Compress |
||
244 | ----------------- |
||
245 | movie.mpeg |
||
246 | README |
||
247 | song.mp3 |
||
248 | text.txt |
||
249 | |||
250 | Entries in Jars |
||
251 | --------------- |
||
252 | example.jar movie.mpeg.---0.\~{} |
||
253 | README |
||
254 | |||
255 | example.split1.jar movie.mpeg.---1.\~{} |
||
256 | song.mp3.---0.\~{} |
||
257 | |||
258 | example.split2.jar movie.mpeg.---2.\~{} |
||
259 | song.mp3.---1.\~{} |
||
260 | text.txt |
||
261 | |||
262 | MANIFEST (primary jar only) |
||
263 | --------------------------- |
||
264 | Manifest-Version: 1.0 |
||
265 | Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.) |
||
266 | Built-By: IzPack 1.6.0 |
||
267 | Main-Class: com.izforge.izpack.installer.Installer |
||
268 | Split-Jar-Secondary-Count: 2 |
||
269 | Split-Entry-Suffix: .---\#.\~{} |
||
270 | |||
271 | movie.mpg |
||
272 | Split-Entry-Jar-IDs: 0 1 2 |
||
273 | |||
274 | song.mp3 |
||
275 | Split-Entry-Jar-IDs: 1 2 |
||
276 | |||
277 | \subsection{Name Conflicts} |
||
278 | |||
279 | Pathological example showing name conflict resolution. Includes |
||
280 | |||
281 | \begin{itemize} |
||
282 | \item Direct conflict with real archive file |
||
283 | (\texttt{foo...}). |
||
284 | |||
285 | \item Indirect conflict with file by suffix template only |
||
286 | (\texttt{bar...}). |
||
287 | |||
288 | \item Conflict with real archive file that is also split. Due to |
||
289 | both being split, there would be no name conflict amongst jar |
||
290 | entries, however The default suffix is not used anyway |
||
291 | (\texttt{yin...}). |
||
292 | |||
293 | \item A \emph{near} conflict, just to be annoying. Normal behavior |
||
294 | (\texttt{chi...}). |
||
295 | |||
296 | \item Files which look like segments of a split file, but are not, |
||
297 | requiring manifest to know the difference (\texttt{zig...}). |
||
298 | \end{itemize} |
||
299 | |||
300 | \begin{verbatim} |
||
301 | |||
302 | Jar to Create |
||
303 | ------------- |
||
304 | example.jar |
||
305 | |||
306 | Files to Compress |
||
307 | ----------------- |
||
308 | foo.dat |
||
309 | foo.dat.---0.~{} .... Extremely unlikely that these would exist, |
||
310 | much less need to be archived. Provided as an |
||
311 | example. |
||
312 | bar.dat |
||
313 | bar.dat.---555.~{} .. Another unlikely case which would not conflict |
||
314 | (assume bar.dat is split into only 2 segments) |
||
315 | except for the suffix template. |
||
316 | yin.dat |
||
317 | yin.dat.---2.~{} .... Yet another template only conflict |
||
318 | conflicting file needs to be split. |
||
319 | chi.dat |
||
320 | chi.dat.---0.~{}~{} ... No potential conflict. |
||
321 | zig.dat.---0.~{} .... Files to be archived as they are, but not |
||
322 | zig.dat.---1.~{} intended to be spliced back together. |
||
323 | |||
324 | Entries in Jars |
||
325 | --------------- |
||
326 | example.jar foo.dat.---0.~{} |
||
327 | foo.dat.---0.~{}~{} |
||
328 | bar.dat.---555.~{} |
||
329 | bar.dat.---0.~{}~{} |
||
330 | yin.dat.---0.~{}~{} |
||
331 | yin.dat.---2.~{}.---0.~{} |
||
332 | chi.dat.---0.~{} |
||
333 | chi.dat.---2.~{}~{} |
||
334 | zig.dat.---0.~{} |
||
335 | zig.dat.---1.~{} |
||
336 | |||
337 | example.split1.jar foo.dat.---1.~{}~{} |
||
338 | bar.dat.---1.~{}~{} |
||
339 | yin.dat.---1.~{}~{} |
||
340 | yin.dat.---2.~{}.---1.~{} |
||
341 | chi.dat.---1.~{} |
||
342 | |||
343 | MANIFEST (primary jar only) |
||
344 | --------------------------- |
||
345 | Manifest-Version: 1.0 |
||
346 | IzPack-Version: X.X.X |
||
347 | Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.) |
||
348 | Built-By: IzPack |
||
349 | Class-Path: |
||
350 | Main-Class: com.izforge.izpack.installer.Installer |
||
351 | Split-Jar-Secondary-Count: 2 |
||
352 | Split-Entry-Suffix: .---#.~{} |
||
353 | |||
354 | foo.dat |
||
355 | Split-Entry-Jar-IDs: 0 1 |
||
356 | Split-Entry-Suffix: .---#.~{}~{} |
||
357 | |||
358 | bar.dat |
||
359 | Split-Entry-Jar-IDs: 0 1 |
||
360 | Split-Entry-Suffix: .---#.~{}~{} |
||
361 | |||
362 | fig.dat |
||
363 | Split-Entry-Jar-IDs: 0 1 |
||
364 | Split-Entry-Suffix: .---#.~{}~{} |
||
365 | |||
366 | fig.dat.---2.~{} |
||
367 | Split-Entry-Jar-IDs: 0 1 |
||
368 | |||
369 | moa.dat |
||
370 | Split-Entry-Jar-IDs: 0 1 |
||
371 | \end{verbatim} |
||
372 | |||
373 | \end{document} |