{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Image Format vs Fossil Repository Size\n",
"\n",
"## Prerequisites\n",
"\n",
"This notebook was originally developed with standalone [JupyterLab] and Python 2 but was later moved to JupyterLab under [Anaconda] with Python 3. Backporting to Python 2 may require manual adjustment. Getting it running under stock JupyterLab or plain-old-Jupyter should be straightforward for one familiar with the tools. We will assume you're following in my footsteps, using Anaconda.\n",
"\n",
"One of the reasons we switched to Anaconda is that it comes with all but one of this notebook's prerequisites, that last remaining one of which you install so:\n",
"\n",
" $ pip install wand\n",
"\n",
"That should be done in a shell where \"`pip`\" is the version that came with Anaconda. Otherwise, the package will likely end up in some *other* Python package tree, which Anaconda's Python kernel may not be smart enough to find on its own.\n",
"\n",
"Note that you do *not* use `conda` for this: as of this writing, [Wand] is not available in a form that installs via `conda`.\n",
"\n",
"This notebook was written and tested on a macOS system where `/tmp` exists. Other platforms may require adjustments to the scripts below.\n",
"\n",
"[Anaconda]: https://www.anaconda.com/distribution/\n",
"[JupyterLab]: https://github.com/jupyterlab/\n",
"[Wand]: http://wand-py.org/\n",
"\n",
"\n",
"## Running\n",
"\n",
"The next cell generates the test repositories. This takes about 3 seconds to run on my machine. If you have to uncomment the \"`sleep`\" call in the inner loop, this will go up to about 45 seconds.\n",
"\n",
"The next cell produces the bar chart from the collected data, all but instantaneously.\n",
"\n",
"This split allows you to generate the expensive experimental data in a single pass, then play as many games as you like with the generated data.\n",
"\n",
"\n",
"## Discussion\n",
"\n",
"That is kept in [a separate document](image-format-vs-repo-size.md) so we can share that document with Fossil's Markdown renderer."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Created test directory /tmp/image-format-vs-repo-size\n",
"Created ../test-jpeg.fossil for format JPEG.\n",
"Created ../test-bmp.fossil for format BMP.\n",
"Created ../test-tiff.fossil for format TIFF.\n",
"Created ../test-png.fossil for format PNG.\n",
"Experiment completed in 3.0627901554107666 seconds.\n"
]
}
],
"source": [
"import os\n",
"import random\n",
"import subprocess\n",
"import time\n",
"\n",
"from wand.color import Color\n",
"from wand.drawing import Drawing\n",
"from wand.image import Image\n",
"\n",
"import pandas as pd\n",
"\n",
"size = 256\n",
"iterations = 10\n",
"start = time.time()\n",
"repo_sizes = []\n",
"fossil = '/usr/local/bin/fossil'\n",
"\n",
"if not os.path.isfile(fossil): raise RuntimeError(\"No such executable \" + fossil)\n",
"if not os.access(fossil, os.X_OK): raise RuntimeError(\"Cannot execute \" + fossil)\n",
"\n",
"tdir = os.path.join('/tmp', 'image-format-vs-repo-size')\n",
"if not os.path.isdir(tdir): os.mkdir(tdir, 0o700)\n",
"print(\"Created test directory \" + tdir)\n",
" \n",
"formats = ['JPEG', 'BMP', 'TIFF', 'PNG']\n",
"for f in formats:\n",
" ext = f.lower()\n",
" wdir = os.path.join(tdir, 'work-' + ext)\n",
" if not os.path.isdir(wdir): os.mkdir(wdir, 0o700)\n",
" os.chdir(wdir)\n",
" repo = '../test-' + ext + '.fossil'\n",
" ifn = 'test.' + ext\n",
" ipath = os.path.join(wdir, ifn)\n",
" rs = []\n",
" \n",
" def add_repo_size():\n",
" rs.append(os.path.getsize(repo) / 1024.0 / 1024.0)\n",
" \n",
" def set_repo_page_size(n):\n",
" subprocess.run([\n",
" fossil,\n",
" 'rebuild',\n",
" '--compress',\n",
" '--pagesize',\n",
" str(n),\n",
" '--vacuum'\n",
" ])\n",
"\n",
" try:\n",
" # Create test repo\n",
" subprocess.run([fossil, 'init', repo])\n",
" subprocess.run([fossil, 'open', repo])\n",
" subprocess.run([fossil, 'set', 'binary-glob', \"*.{0}\".format(ext)])\n",
" set_repo_page_size(512) # minimum\n",
" add_repo_size()\n",
" set_repo_page_size(8192) # default\n",
" print(\"Created \" + repo + \" for format \" + f + \".\")\n",
"\n",
" # Create test image and add it to the repo\n",
" img = Image(width = size, height = size, depth = 8,\n",
" background = 'white')\n",
" img.alpha_channel = 'remove'\n",
" img.evaluate('gaussiannoise', 1.0)\n",
" img.save(filename = ipath)\n",
" subprocess.run([fossil, 'add', ifn])\n",
" subprocess.run([fossil, 'ci', '-m', 'initial'])\n",
" #print(\"Added initial \" + f + \" image.\")\n",
" add_repo_size()\n",
"\n",
" # Change a random pixel to a random RGB value and check it in\n",
" # $iterations times.\n",
" for i in range(iterations - 1):\n",
" with Drawing() as draw:\n",
" x = random.randint(0, size - 1)\n",
" y = random.randint(0, size - 1)\n",
"\n",
" r = random.randint(0, 255)\n",
" g = random.randint(0, 255)\n",
" b = random.randint(0, 255)\n",
" \n",
" draw.fill_color = Color('rgb({0},{1},{2})'.format(\n",
" r, g, b\n",
" ))\n",
" draw.color(x, y, 'point')\n",
" draw(img)\n",
" img.save(filename = ipath)\n",
" \n",
" # You might need to uncomment the next line if you find that\n",
" # the repo size doesn't change as expected. In some versions\n",
" # of Wand (or is it the ImageMagick underneath?) we have seen\n",
" # what appear to be asynchronous saves, with a zero-length file\n",
" # here if you don't wait for the save to complete.\n",
" #time.sleep(1.0)\n",
" \n",
" subprocess.run([fossil, 'ci', '-m', '\"change {0} step {1}'.format(\n",
" f, i\n",
" )])\n",
" add_repo_size()\n",
" \n",
" # Repo complete for this format\n",
" repo_sizes.append(pd.Series(rs, name=f))\n",
"\n",
" finally:\n",
" if os.path.exists(ipath): os.remove(ipath)\n",
" if os.path.exists(tdir):\n",
" if os.path.isfile(repo):\n",
" subprocess.run([fossil, 'close', '-f'])\n",
" os.unlink(repo)\n",
" os.chdir(tdir);\n",
" os.rmdir(wdir)\n",
" if os.path.exists(repo): os.remove(repo)\n",
" \n",
"print(\"Experiment completed in \" + str(time.time() - start) + \" seconds.\")"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"