View Source

Arc unpack as a debian package

First, we need to define a few properties in the pom file
{code:language=xml|title=Pom file properties}<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.deb.name>${groupId}.${artifactId}</maven.deb.name>
<maven.deb.libfolder>/usr/share/${maven.deb.name}</maven.deb.libfolder>
<maven.deb.docfolder>/usr/share/doc/${maven.deb.name}</maven.deb.docfolder>
<maven.deb.manfolder>/usr/share/man</maven.deb.manfolder>
<maven.deb.binfolder>/usr/bin</maven.deb.binfolder>
<maven.deb.assembly>debian-prepare</maven.deb.assembly>
<maven.deb.description>unpacker application for arc and warc files, based on the heritrix crawler.</maven.deb.description>
<maven.deb.extendedDescription>This is the extended Description</maven.deb.extendedDescription>
<maven.deb.maintainer><![CDATA[Asger Askov Blekinge <abr@statsbiblioteket.dk>]]></maven.deb.maintainer>
<maven.deb.copyright>Copyright (C) 2012 by State and University Library, Denmark</maven.deb.copyright>
<maven.deb.license>Apache-2.0</maven.deb.license>
{code}
Basically, these just allows us to use consistent names in the plugin configs that follow.

Then we come to the <build> section. The first thing to do, is to enable filtering on the resources. This is done such
{code:language=xml}<build>
<resources>
<resource>
<filtering>true</filtering>
<directory>src/main/resources</directory>
</resource>
</resources>{code}
The effect of this is that you can use $\{maven.deb.name\}

and the like in the config files, and have this replaced during the build.&nbsp;

Firstly, we want to ensure that the jar file is as executable as we can easily make it. To do this, we must make a manifest. This is done with the jar-plugin
{code:language=xml}<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<archive>
<manifest>
<mainClass>dk.statsbiblioteket.scape.arcunpacker.Archive</mainClass>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<addClasspath>true</addClasspath>
<classpathPrefix>${maven.deb.libfolder}</classpathPrefix>
</manifest>
<manifestEntries>
<mode>development</mode>
<url>${pom.url}</url>
</manifestEntries>
</archive>
<excludes>
<exclude>debian/**</exclude>
<exclude>scripts/**</exclude>
</excludes>
</configuration>
</plugin>{code}
In the manifest, I declare the main class, and the classpath. Notice that I declare the classpathPrefix to be the maven.deb.libfolder, in order for the system to be able to resolve the dependencies when installed as a debian package.

Okay, the jar file is now generated correctly. Lets start on generating the package. First, we need to use maven to get all the dependencies. This is done with the assembly plugin
{code:language=xml}<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-5</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/${maven.deb.assembly}.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
{code}
This plugin uses the deb assembly. So, what will the deb.assembly do? Lets look at it.&nbsp;
{code:language=xml}<assembly>
<id>debian-prepare</id>

<formats>
<format>dir</format>
</formats>

<dependencySets>

<dependencySet>
<outputDirectory>jars</outputDirectory>
<useTransitiveDependencies>true</useTransitiveDependencies>
<useTransitiveFiltering>true</useTransitiveFiltering>
<useProjectArtifact>false</useProjectArtifact>
</dependencySet>

</dependencySets>

</assembly>
{code}
Basically, it puts all the jar dependencies of the project in one dir, ready for inclusion in the debian package.&nbsp;

Time to make the first version of the deb. We need another plugin in the pom file to do that.
{code:language=xml}<plugin>
<artifactId>jdeb</artifactId>
<groupId>org.vafer</groupId>
<version>0.9</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jdeb</goal>
</goals>
<configuration>
<deb>${project.build.directory}/${artifactId}${version}-java.deb</deb>
<controlDir>${project.build.directory}/classes/debian/control/</controlDir>
                            <dataSet> <dataSet>
<data>
<src>${project.build.directory}/${project.build.finalName}-${maven.deb.assembly}/${project.build.finalName}/jars</src>
<type>directory</type>
<mapper>
<type>perm</type>
<prefix>${maven.deb.libfolder}</prefix>
</mapper>
</data>
<data>
<src>${project.build.directory}/${project.build.finalName}.jar</src>
<type>file</type>
<mapper>
<type>perm</type>
<prefix>${maven.deb.libfolder}</prefix>
</mapper>
</data>
<!--More data blocks will be added here, as we proceed-->
</dataSet>
</configuration>
</execution>
</executions>
</plugin>
{code}
Note that it is important that this plugin is placed below the assembly plugin, as they both execute in the package phase, and jdeb depends on the debian assembly having been run.

So, what will this plugin do?

Well, it will make a debian package, with the filename $\{project.build.directory\}/$\{artifactId\}$\{version\}-java.deb.

The ever-important control file should be found in&nbsp;$\{project.build.directory\}/classes/debian/control/ More about that file later.

The lib folder&nbsp;$\{maven.deb.libfolder\} should contain the jars, from the debian-prepare assembly detailed above. Oh, and the project jar should also go there, as it was not included in the assembly.

So, now we just need to detail the control file, and we have the first barebone deb file.&nbsp;

Make a file "control" in resources/debian/control, with this content
{code:language=none}Package: ${groupId}.${artifactId}
Version: ${version}
Section: java
Priority: optional
Architecture: all
Depends: default-jre (>= 1.5)
Maintainer: ${maven.deb.maintainer}
Description: ${maven.deb.description}
 ${maven.deb.extendedDescription}
{code}Now we should generate the deb, to check that what we have done so far works. Run mvn package.

Then we should check the deb. CD to the target directory, and use the lintian tool (lintian \*.deb). Install it if you do not have it (sudo apt-get install lintian)

The tool should produce this error&nbsp;
{code}E: dk.statsbiblioteket.scape.arc-unpacker: no-copyright-file{code}
So we should include a copyright file. Make a file "copyright" with the following content in "resources/debian/copyright"
{code}Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Files: *
Copyright:
    ${maven.deb.copyright}
License: ${maven.deb.license}
    /usr/share/common-licenses/${maven.deb.license}
{code}Now the jdeb plugin needs to be told to include this file. Add the following section to the plugin's list of datasets
{code}<data>
<src>${project.build.directory}/classes/debian/copyright</src>
<type>directory</type>
<mapper>
<type>perm</type>
<prefix>${maven.deb.docfolder}</prefix>
</mapper>
</data>
{code}mvn clean package to generate the package once more, and lintian it again. The error should now be
{code}E: dk.statsbiblioteket.scape.arc-unpacker: changelog-file-missing-in-native-package{code}
Okay, so we need to make a changelog file for debian.&nbsp;

Here is some content for at changelog file, that we can use&nbsp;
{code}${maven.deb.name} (0.3) unstable; urgency=low
 * Added the archive.org maven repository, so we can build without local caches.
 * Added the debian package functionality

 -- Asger Askov Blekinge <abr@statsbiblioteket.dk> Mon, 23 Apr 2011 15:00:00 +0100

${maven.deb.name} (0.2) unstable; urgency=low
  * Added the command line interface
  * Added the min/max response code restriction
  * Added the multiple naming of extracted resources

 -- Asger Askov Blekinge <abr@statsbiblioteket.dk> Fri, 2 Dec 2011 17:34:00 +0100

${maven.deb.name} (0.1) unstable; urgency=low
  * Updated maven structure for better git support
  * Stuff actually works
  * First release

 -- Asger Askov Blekinge <abr@statsbiblioteket.dk> Mon, 30 Nov 2011 15:21:00 +0100
{code}Put this content in "resources/debian/changelog/changelog

Regenerate the package and validate it once more. The error should now be
{code}E: dk.statsbiblioteket.scape.arc-unpacker: changelog-file-not-compressed changelog
{code}
So, the changelog should be compressed. I have found that the antrun plugin seems to be the simplest way of doing this
{code:language=xml}<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.7</version>
<executions>
<execution>
<phase>prepare-package</phase>
<configuration>
<target>
<exec executable="gzip">
<arg value="-9"/>
<arg value="-r"/>
<arg value="${project.build.directory}/classes/debian/changelog/"/>
</exec>
</target>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>
{code}So, after this change, the deb package should validate. But we have forgotten something. The most important thing, actually. The executable file.

Since this is a java program, we use a shell script to start the program. Create the file "resources/scripts/arc-unpack" with the following content
{code:language=bash}#!/bin/sh
java -jar ${maven.deb.libfolder}/${project.build.finalName}.jar $*
{code}Remember, this file is in resources, so maven will replace the keys when compiling. Now, we need to tell jdeb to include this file. This is done by adding the following definition to the set of datasets
{code:language=xml}<data>
<src>${project.build.directory}/classes/scripts</src>
<type>directory</type>
<mapper>
<type>perm</type>
<prefix>${maven.deb.binfolder}</prefix>
<filemode>755</filemode>
</mapper>
</data>
{code}Regenerate and validate the deb file. You should now receive the following warning:
{code}W: dk.statsbiblioteket.scape.arc-unpacker: binary-without-manpage usr/bin/arc-unpack{code}So, we need to add man pages to the project.&nbsp;

Create the file "arc-unpack.8" in resources/man/man8, and give it this content:
{code:language=none|title=sample manpage}.TH arc-unpack 8  "April 23, 2012" "version ${project.version}" "USER COMMANDS"
.SH NAME
arc-unpack \- ${maven.deb.description}
.SH SYNOPSIS
.B arc-unpack
\-f dataFile [\-o outputDir] [\-minResp number] [\-maxResp number] [\-naming [MD5,OFFSET,URL]]
.SH DESCRIPTION
${maven.deb.extendedDescription}
.SH OPTIONS
.TP
\-f dataFile
Data file to extract. Can be an arc or warc file, and can be compressed.
.TP
\-o outputDir
extracts the resources to this dir. Defaults to the current dir.
.TP
\-minResp number
Ignore resources, if the http return code is lower than minResp. Useful to filter out returns below the 200 range
.TP
\-maxResp number
Ignore resources, if the http return code is higher than maxResp. Useful to filter out returns above the 200 range
.TP
\-naming [MD5,OFFSET,URL]
Naming scheme for the extracted resources. Each resource in the archive is identified by an URL, but URLs do not map
neatly to filenames.
The URL scheme tries to map the resource urls to files, but can fail.
The OFFSET scheme uses offsets into the arc file as filenames
The MD5 scheme md5hash the resource URLs to ensure valid unique filenames
.SH EXIT STATUS
arc-unpack returns zero if the extraction succeeded
.SH AUTHOR
${maven.deb.maintainer}
{code}
It is important that the file starts with ".TH arc-unpack 8" but the rest seems optional

Now we need to tell jdeb to include this file also. Add this section to the plugins list of datasets
{code:language=none}<data>
<src>${project.build.directory}/classes/debian/man</src>
<type>directory</type>
<mapper>
<type>perm</type>
<prefix>${maven.deb.manfolder}</prefix>
</mapper>
</data>
{code}Clean, regenerate and validate the package.

The error received now should be
{code}E: dk.statsbiblioteket.scape.arc-unpacker: manpage-not-compressed usr/share/man/man8/arc-unpack.8{code}

So, we need to compress this file also. Change the antrun plugin to include the man folder also, like this.

{code}<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.7</version>
<executions>
<execution>
<phase>prepare-package</phase>
<configuration>
<target>
<exec executable="gzip">
<arg value="-9"/>
<arg value="-r"/>
<arg value="${project.build.directory}/classes/debian/man/"/>
</exec>
<exec executable="gzip">
<arg value="-9"/>
<arg value="-r"/>
<arg value="${project.build.directory}/classes/debian/changelog/"/>
</exec>
</target>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>
{code}The package should now be complete and validate. Lets try to install it.

The package should install without a problem. Play around, and try "man arc-unpacker" to see the manpage work