Este es mi esfuerzo de tratar de hacer un tarball determinista a partir de un repositorio SVN.
Requiere GNU tar 1.27 o posterior.
caracteristicas:
- Tarball determinista
- Precisión de marca de tiempo en microsegundos de SVN, gracias al
pax
encabezado extendido
- La ID de revisión almacenada en los comentarios del archivo tal como
git archive
lo haría
- Demuestro ambos
.tar.gz
y .tar.xz
compresiones. Para gzip, es posible optimizar la compresión adicional mediante el uso advdef
de AdvanceCOMP ( advdef
usos zopfli biblioteca).
Como ejemplo, utilizo el repositorio de subversión como fuente para el pago y la creación del tarball. Tenga en cuenta que esta no es la forma en que SVN empaqueta sus tarballs y no estoy relacionado con el desarrollo de SVN. Es solo un ejemplo , después de todo.
# URL of repository to export
url="https://svn.apache.org/repos/asf/subversion/tags/1.9.7/"
# Name of distribution sub-directory
dist_name=subversion-1.9.7-test
# ---------------------------------------------------------------------
info=$(svn info --xml "$url" | tr -- '\t\n' ' ')
revision=$(echo "$info" |
sed 's|.*<commit[^>]* revision="\{0,1\}\([^">]*\)"\{0,1\}>.*|\1|')
tar_name=${dist_name}-r${revision}
# Subversion's commit timestamps can be as precise as 0.000001 seconds,
# but sub-second precision is only available through --xml output
# format.
date=$(echo "$info" |
sed 's|.*<commit[^>]*>.*<date>\([^<]*\)</date>.*</commit>.*|\1|')
# Factors that would make tarball non-deterministic include:
# - umask
# - Ordering of file names
# - Timestamps of directories ("svn export" doesn't update them)
# - User and group names and IDs
# - Format of tar (gnu, ustar or pax)
# - For pax format, the name and contents of extended header blocks
umask u=rwx,go=rx
svn export -r "$revision" "$url" "$dist_name"
# "svn export" will update file modification time to latest time of
# commit that modifies the file, but won't do so on directories.
find . -type d | xargs touch -c -m -d "$date" --
trap 's=$?; rm -f "${tar_name}.tar" || : ; exit $s' 1 2 3 15
# pax extended header allows sub-second precision on modification time.
# The default extended header name pattern ("%d/PaxHeaders.%p/%f")
# would contain a process ID that makes tarball non-deterministic.
# "git archive" would store a commit ID in pax global header (named
# "pax_global_header"). We can do similar.
# GNU tar (<=1.30) has a bug that it rejects globexthdr.mtime that
# contains fraction of seconds.
pax_options=$(printf '%s%s%s%s%s%s' \
"globexthdr.name=pax_global_header," \
"globexthdr.mtime={$(echo ${date}|sed -e 's/\.[0-9]*Z/Z/g')}," \
"comment=${revision}," \
"exthdr.name=%d/PaxHeaders/%f," \
"delete=atime," \
"delete=ctime")
find "$dist_name" \
\( -type d -exec printf '%s/\n' '{}' \; \) -o -print |
LC_ALL=C sort |
tar -c --no-recursion --format=pax --owner=root:0 --group=root:0 \
--pax-option="$pax_options" -f "${tar_name}.tar" -T -
# Compression (gzip/xz) can add additional non-deterministic factors.
# xz format does not store file name or timestamp...
trap 's=$?; rm -f "${tar_name}.tar.xz" || : ; exit $s' 1 2 3 15
xz -9e -k "${tar_name}.tar"
# ...but for gzip, you need either --no-name option or feed the input
# from stdin. This example uses former, and also tries advdef to
# optimize compression if available.
trap 's=$?; rm -f "${tar_name}.tar.gz" || : ; exit $s' 1 2 3 15
gzip --no-name -9 -k "${tar_name}.tar" &&
{ advdef -4 -z "${tar_name}.tar.gz" || : ; }