/tags/v1_9_Build_1222/install/IzPack/src/doc/splitjar.tex - Annotate - Application: gvSIG desktop - gvSIG

5819

cesar

%begin{latexonly}

3

\newif\ifpdf

4

\ifx\pdfoutput\undefined

5

\pdffalse

6

\else

7

\pdfoutput=1

8

\pdftrue

9

\fi

10

11

% Change this as needed :

12

%   - a4paper to your paper format

13

%   - the document class to your need (book, article, ...)

14

\ifpdf

15

\documentclass[a4paper, 12pt, pdftex]{report}

16

\else

17

%end{latexonly}

18

\documentclass[a4paper, 12pt, dvips]{report}

19

%begin{latexonly}

20

\fi

21

%end{latexonly}

22

23

% The packages we need

24

\usepackage{verbatim}

25

\usepackage{moreverb}

26

\usepackage{url}

27

\usepackage{tabularx}

28

\usepackage[final]{graphicx}

29

\usepackage[hyperindex,breaklinks=true,pdfborder={0 0 0}]{hyperref}

30

%begin{latexonly}

31

\ifpdf

32

\hypersetup{colorlinks=true,linkcolor=blue,urlcolor=blue,citecolor=red}

33

\fi

34

%end{latexonly}

35

\usepackage{html}

36

\begin{htmlonly}

37

\newcommand{\href}[2]{\htmladdnormallink{#2}{#1}}

38

\end{htmlonly}

39

40

% block style paragraphs tend to look better in technical docs

41

\parindent=0in

42

\parskip=10pt

43

44

\begin{document}

45

46

% Split Jar Specification

47

48

\appendix

49

50

\chapter{Split Jars and MANIFEST Extensions}

51

52

The jar file specification allows archiving and packaging classes and

53

resources, but is limited in size. To overcome these limitations with

54

minimal changes to the jar format we create a set of jar files and add

55

new key-value attributes to the jar MANIFEST. These attributes

56

indicate now many jars are in the set, and which, if any, files are

57

split across multiple jars, and which jars they are contained in.

58

59

\section{Motivations and Limitations}

60

61

Java's zip implementation limited to \~{}2GB. The problem is not solved

62

by zip64 extensions (which will allow larger files), when medium limitations

63

restrict the jar size. There must be a way to split the archive into

64

multiple files, and indeed, the individual entries must be split

65

across jars.

66

67

A ``Split Jar'' is a set of normal jar files, one being the

68

\emph{primary} jar, and zero or more \emph{secondary} jars. The

69

Primary jar file has additional manifest attributes to help

70

reconstruct the data. Entries may or may not be split across multiple

71

jars, and need to be spliced back together upon extraction. Secondary

72

jar names derive from the basename of the primary jar, and each

73

segment of a split entry, shares a basename derived from the original

74

entry name.

75

76

Segments of a split entry need not be in separate jars. Thus if a jar

77

is split to deal with media limitations, all the resulting jars may be

78

combined into a single primary jar, as long as the Manifest is

79

correctly updated.

80

81

A major benefit to this format is that the split archive contents can

82

be recovered manually by extracting the contents of all jar files in

83

the set, and simply concatenating the segments of the split entries.

84

85

Entry names in the primary and secondary jars must not conflict, so

86

that together they represent a single archive. This includes the

87

generated names of split entry segments. This ensures that each jar in

88

a split archive may be extracted to the same location without risk of

89

loosing data. Split file segments are then be concatenated manually or

90

by automation to get the original data set. The manifest should always

91

be consulted to ensure that files which look like split entry segments

92

should actually be spliced together. It is possible that the files

93

were intended to be part of the archive (See ``Naming Conventions'',

94

for name conflict resolution).

95

96

All segments of a split jar are given generated names so that normal

97

jar tools will never unpack the original file. This ensures that no

98

unsuspecting user mistakenly uses a truncated, partial file.

99

100

\subsection{Warnings}

101

102

Signing jar entries which have been split has not been addressed.

103

104

Files can not be compressed directly into streams when there are

105

potential name conflicts with the generated segment names. This

106

requires that robust tools collect a list of files to be added, and

107

determine any conflicts first to avoid the issue (See Naming

108

Conventions: Entry Names).

109

110

Adding files to existing split jars may also have problems with name

111

conflicts.

112

113

\section{Naming Conventions}

114

115

A primary goal for this design is to allow split jars to be created

116

and unpacked manually with minimal problems. This is accomplished by

117

using a naming convention which lends to visual reconstruction. When a

118

jar file must be split into multiple segments, there is a primary

119

file, and multiple secondary jars with a common name. When an entry

120

within the set of jars must be split, \emph{each} segment is given

121

a numbered suffix.

122

123

\subsection{Jar File Names}

124

125

For the primary jar \texttt{\emph{basename}.jar}, the names of

126

secondary jars must always be \texttt{\emph{basename}.split\#.jar}

127

where \texttt{\#} is an integer \emph{secondary jar ID} starting at

128

\texttt{1}. Left padded zeros in the ID are ignored, and encouraged

129

to allow lexicographical sorting. The jars can be renamed, as long as

130

the \emph{basename} is the same for all, and the suffixes

131

(\texttt{.split\#.jar}) remain the same. All entries within the set

132

must be unique.

133

134

\subsection{Jar Entry Names}

135

136

For the split entry named \texttt{\emph{basename}} (including

137

suffixes), all segments are named using the template:

138

\texttt{\emph{basename}}\texttt{.---\#.\~{}}, where \texttt{\#} is an integer

139

\emph{segment ID} starting at \texttt{0}. These segments are

140

rejoined by concatenating the segments in numeric order, to a file

141

named \texttt{basename}. The template is recorded in the \emph{main} section

142

of the manifest.

143

144

In the rare case where an entry is split, and the name of a real entry

145

may conflicts with a generated segment name, a non-default suffix

146

template is used. In Our case, all of the generated segments will have

147

'\texttt{\~{}}' characters appended, as needed, to eliminate potential

148

conflicts. This non-default template is recorded in the

149

\emph{per-entry} section of the manifest for the split entry.

150

151

Non-default suffixes are used for all \emph{potential} conflicts even in

152

cases where there is no actual conflict.

153

154

\begin{itemize}

155

  \item When the split entry does not generate enough segments to

156

        conflict, but the suffix matches the default template.

157

  \item When the conflicting real entry must also be split, thus its

158

        actual entries use generated suffixes.

159

\end{itemize}\

160

161

Examples are given below.

162

163

Other tools implementing split jars may (though are not encouraged to)

164

use different suffixes, though they must have numeric segment replaced

165

by '\#' in the manifest. Tools must sort these numerically, not

166

lexicographically as ``2'' is generally greater than ``10''

167

lexicographically. However, tools are encouraged to zero padding names,

168

as needed, so that lexicographic sorting is correct.

169

170

\section{Manifest Attributes}

171

172

To minimize changes needed to implement the split jar, we simply add

173

attributes to the manifest. Additional attributes are ignored by other

174

jar tools, so the only consequences is that files split files, and

175

files completely located in secondary jars will not be available to

176

them.

177

178

To prevent adding too much space overhead, and allow jar files to be

179

renamed, the entries are kept minimalistic.

180

181

\subsection{Main Section Attributes}

182

183

Two attribute are added to indicate the number of secondary jars, and

184

the default suffix added to the segments of split files.

185

186

% TODO: make this like an html <dl><dd>... <dt> ...</dl>

187

\begin{itemize}

188

  \item \texttt{Split-Jar-Secondary-Count}: The number of secondary jars

189

        in the set.

190

  \item \texttt{Split-Jar-Secondary-Suffix}: the suffix template

191

        inserted prior to the \texttt{.jar} suffix typical of jar

192

        files, to make the names of secondary jar file in the set;

193

        typically \texttt{.split\#}.

194

  \item \texttt{Split-Entry-Suffix}: the suffix template appended to

195

        an entry name, to name each of the entries constituent parts;

196

        typically \texttt{.---\#.\~{}}. The \# char indicates the

197

        location of the numeric value.  This cannot currently be

198

        changed.

199

\end{itemize}

200

201

\subsection{Per-Entry Section Attributes}

202

203

Only files which are split require an attributes in the manifest. A

204

space separated list of integers is recorded; one for each jar

205

containing a segment of the entry. Entries which have a segment in the

206

primary jar file, indicate this with the id \texttt{0}.

207

208

No restriction is placed on the order of the entries, or the IDs of the

209

jar in which any segment is contained.

210

211

% TODO: make this like an html <dl><dd>... <dt> ...</dl>

212

\begin{itemize}

213

  \item \texttt{Split-Entry-Jar-IDs}: A space separated set of

214

        secondary jar IDs which contains the segments of the

215

        entry. Essentially a list of integers.

216

  \item \texttt{Split-Entry-Suffix}: Overrides the default

217

        Split-Entry-Suffix specified in the Main-Attributes. Needed

218

        when one (or more) '\texttt{\~{}}' chars are appended due to name

219

        conflict with real entries.  This is not strictly necessary,

220

        as simply knowing the basename and unpacking all jars would

221

        allow the suffix to be determined, but is included to conserve

222

        processing. This is currently not user configurable.

223

\end{itemize}

224

225

\section{Examples}

226

227

Two examples, one simple, and another cluttered with pathological

228

cases. Notice that the jar ID number and segment of a split entry have

229

no correlation. In most applications, there will seldom be more than

230

two segments in a single file: the end of the last entry to the

231

previous jar, and maybe the last entry of this jar, which is continued

232

in the next. The examples aren't so well organized though. :-)

233

234

\subsection{Basic Example}

235

236

% TODO: format for tex

237

TODO: format for TeX

238

239

Jar to Create

240

-------------

241

    example.jar

242

243

Files to Compress

244

-----------------

245

    movie.mpeg

246

    README

247

    song.mp3

248

    text.txt

249

250

Entries in Jars

251

---------------

252

    example.jar           movie.mpeg.---0.\~{}

253

                          README

254

255

    example.split1.jar    movie.mpeg.---1.\~{}

256

                          song.mp3.---0.\~{}

257

258

    example.split2.jar    movie.mpeg.---2.\~{}

259

                          song.mp3.---1.\~{}

260

                          text.txt

261

262

MANIFEST (primary jar only)

263

---------------------------

264

    Manifest-Version: 1.0

265

    Created-By: 1.4.2\_04-b05 (Sun Microsystems Inc.)

266

    Built-By: IzPack 1.6.0

267

    Main-Class: com.izforge.izpack.installer.Installer

268

    Split-Jar-Secondary-Count: 2

269

    Split-Entry-Suffix: .---\#.\~{}

270

271

    movie.mpg

272

    Split-Entry-Jar-IDs: 0 1 2

273

274

    song.mp3

275

    Split-Entry-Jar-IDs: 1 2

276

277

\subsection{Name Conflicts}

278

279

Pathological example showing name conflict resolution.  Includes

280

281

\begin{itemize}

282

  \item Direct conflict with real archive file

283

        (\texttt{foo...}).

284

285

  \item Indirect conflict with file by suffix template only

286

        (\texttt{bar...}).

287

288

  \item Conflict with real archive file that is also split. Due to

289

        both being split, there would be no name conflict amongst jar

290

        entries, however The default suffix is not used anyway

291

        (\texttt{yin...}).

292

293

  \item A \emph{near} conflict, just to be annoying. Normal behavior

294

        (\texttt{chi...}).

295

296

  \item Files which look like segments of a split file, but are not,

297

        requiring manifest to know the difference (\texttt{zig...}).

298

\end{itemize}

299

300

\begin{verbatim}

301

302

Jar to Create

303

-------------

304

    example.jar

305

306

Files to Compress

307

-----------------

308

    foo.dat

309

    foo.dat.---0.~{} .... Extremely unlikely that these would exist,

310

                        much less need to be archived. Provided as an

311

                        example.

312

    bar.dat

313

    bar.dat.---555.~{} .. Another unlikely case which would not conflict

314

                        (assume bar.dat is split into only 2 segments)

315

                        except for the suffix template.

316

    yin.dat

317

    yin.dat.---2.~{} .... Yet another template only conflict

318

                        conflicting file needs to be split.

319

    chi.dat

320

    chi.dat.---0.~{}~{} ... No potential conflict.

321

    zig.dat.---0.~{} .... Files to be archived as they are, but not

322

    zig.dat.---1.~{}      intended to be spliced back together.

323

324

Entries in Jars

325

---------------

326

    example.jar           foo.dat.---0.~{}

327

                          foo.dat.---0.~{}~{}

328

                          bar.dat.---555.~{}

329

                          bar.dat.---0.~{}~{}

330

                          yin.dat.---0.~{}~{}

331

                          yin.dat.---2.~{}.---0.~{}

332

                          chi.dat.---0.~{}

333

                          chi.dat.---2.~{}~{}

334

                          zig.dat.---0.~{}

335

                          zig.dat.---1.~{}

336

337

    example.split1.jar    foo.dat.---1.~{}~{}

338

                          bar.dat.---1.~{}~{}

339

                          yin.dat.---1.~{}~{}

340

                          yin.dat.---2.~{}.---1.~{}

341

                          chi.dat.---1.~{}

342

343

MANIFEST (primary jar only)

344

---------------------------

345

    Manifest-Version: 1.0

346

    IzPack-Version: X.X.X

347

    Created-By: 1.4.2_04-b05 (Sun Microsystems Inc.)

348

    Built-By: IzPack

349

    Class-Path:

350

    Main-Class: com.izforge.izpack.installer.Installer

351

    Split-Jar-Secondary-Count: 2

352

    Split-Entry-Suffix: .---#.~{}

353

354

    foo.dat

355

    Split-Entry-Jar-IDs: 0 1

356

    Split-Entry-Suffix: .---#.~{}~{}

357

358

    bar.dat

359

    Split-Entry-Jar-IDs: 0 1

360

    Split-Entry-Suffix: .---#.~{}~{}

361

362

    fig.dat

363

    Split-Entry-Jar-IDs: 0 1

364

    Split-Entry-Suffix: .---#.~{}~{}

365

366

    fig.dat.---2.~{}

367

    Split-Entry-Jar-IDs: 0 1

368

369

    moa.dat

370

    Split-Entry-Jar-IDs: 0 1

371

\end{verbatim}

372

373

\end{document}

Application: gvSIG desktop

svn-gvsig-desktop / tags / v1_9_Build_1222 / install / IzPack / src / doc / splitjar.tex @ 41290