<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Justin's Ramblings</title>
    <link>http://www.bouncybouncy.net/blog</link>
    <description>Justin's Ramblings</description>
    <pubDate>Sun, 13 May 2012 21:41:28 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>Getting tornadio2 working on heroku</title>
      <link>http://www.bouncybouncy.net/blog/2012/04/30/getting-tornadio2-working-on-heroku</link>
      <pubDate>Mon, 30 Apr 2012 20:49:49 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[heroku]]></category>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2012/04/30/getting-tornadio2-working-on-heroku</guid>
      <description>Getting tornadio2 working on heroku</description>
      <content:encoded><![CDATA[<p>I spent a while the other day figuring out how to get websockets working on
heroku, so I thought I'd write it up.</p>
<p>First, Heroku doesn't actually support websockets, so you must use something
like socket.io which can fallback to various long polling mechanisms.</p>
<h2>Step 1, disable websocket support in socket.io</h2>
<p>Without this, socket.io tries to connect first using websockets and it takes a
while to timeout before switching to long polling.</p>
<div class="pygments_colorful"><pre>// remove websocket for heroku
var options = {transports:[&quot;flashsocket&quot;, &quot;htmlfile&quot;, &quot;xhr-polling&quot;, &quot;jsonp-polling&quot;]};
var socket = io.connect(&#39;http://.../&quot;, options);
</pre></div>

<h2>Step 2, configure tornadio to use xheaders</h2>
<p>If you don't tell tornadio to use xheaders it will think heroku is trying to
hijack sessions and nothing will work. You will get 401 unauthorized messages
back from tornado and the error from this statement in your logs:</p>
<div class="pygments_colorful"><pre># If IP address don&#39;t match - refuse connection
if handler.request.remote_ip != self.remote_ip:
    logging.error(&#39;Attempted to attach to session %s (%s) from different IP (%s)&#39;   % (
                  self.session_id,
                  self.remote_ip,
                  handler.request.remote_ip
                  ))
</pre></div>

<p>Enabling xheaders is a good idea when deploying to heroku in general and is not tornadio specific.</p>
<p>Add the xheaders option to the main SocketServer initialization, and everything is happy.</p>
<div class="pygments_colorful"><pre>SocketServer(application,xheaders=True)
</pre></div>]]></content:encoded>
    </item>
    <item>
      <title>How not to program in python</title>
      <link>http://www.bouncybouncy.net/blog/2011/07/13/how-not-to-program-in-python</link>
      <pubDate>Wed, 13 Jul 2011 21:45:10 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2011/07/13/how-not-to-program-in-python</guid>
      <description>How not to program in python</description>
      <content:encoded><![CDATA[<h2>TL;DR</h2>
<p>Whatever you do, make sure you are using versioned python packages, even for simple tasks. And use pip+virtualenv.</p>
<h2>So you want to program in python..</h2>
<p>It seems like only yesterday, and not 7 years ago, that I decided to learn
python.  I may not be the best python programmer, but I <em>have</em> made probably
every mistake you can, so here are a bunch of things <em>not</em> to do, and a few
things you should be doing. </p>
<h2>Don't: write python 'scripts'</h2>
<p>Don't write programs like this:</p>
<div class="pygments_colorful"><pre><span class="n">temp</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">&quot;C: &quot;</span><span class="p">)</span>
<span class="k">print</span> <span class="n">temp</span><span class="o">*</span><span class="mi">9</span><span class="o">/</span><span class="mi">5</span><span class="o">+</span><span class="mi">32</span>
</pre></div>

<p>The way you fix that is not by writing the following:</p>
<div class="pygments_colorful"><pre><span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&quot;__main__&quot;</span><span class="p">:</span>
    <span class="n">temp</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">&quot;C: &quot;</span><span class="p">)</span>
    <span class="k">print</span> <span class="n">temp</span><span class="o">*</span><span class="mi">9</span><span class="o">/</span><span class="mi">5</span><span class="o">+</span><span class="mi">32</span>
</pre></div>

<p>And don't write this either:</p>
<div class="pygments_colorful"><pre><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">temp</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">&quot;C: &quot;</span><span class="p">)</span>
    <span class="k">print</span> <span class="n">temp</span><span class="o">*</span><span class="mi">9</span><span class="o">/</span><span class="mi">5</span><span class="o">+</span><span class="mi">32</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&quot;__main__&quot;</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</pre></div>

<p>No matter how good your logic is, if you couple the logic with your input
and output you are painting yourself into a corner.  I've seen people write
scripts like this, and then have other scripts call them using os.system.  In a
loop. Then they wonder why python is so slow.</p>
<h2>Do: Write python modules and packages</h2>
<p>Minimally this could look something like:</p>
<div class="pygments_colorful"><pre><span class="k">def</span> <span class="nf">ctof</span><span class="p">(</span><span class="n">temp</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">temp</span><span class="o">*</span><span class="mi">9</span><span class="o">/</span><span class="mi">5</span><span class="o">+</span><span class="mi">32</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
    <span class="n">temp</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s">&quot;C: &quot;</span><span class="p">)</span>
    <span class="k">print</span> <span class="n">ctof</span><span class="p">(</span><span class="n">temp</span><span class="p">)</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&quot;__main__&quot;</span><span class="p">:</span>
    <span class="n">main</span><span class="p">()</span>
</pre></div>

<p>Even better would be to have <code>main</code> parse sys.argv rather than working
interactively.  For simple interactive tools it is hard to beat <a href="http://docs.python.org/library/cmd.html">the cmd
module</a></p>
<p>Now you have a (albeit poorly named) python module that can properly be
imported from a larger program:</p>
<div class="pygments_colorful"><pre><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">temp</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span> <span class="n">temp</span><span class="o">.</span><span class="n">ctof</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="mi">212</span>
</pre></div>

<h2>Don't: mess with PYTHONPATH</h2>
<p>Now that you have a module you can import, what do you do with it? For
years my development/production environment consisted of the following: a <code>lib</code>
directory containing modules and packages and a <code>util</code> directory containing
scripts that used those modules.  This worked fine for a long time, especially
when I only had one machine.  When I got more systems, I used the high tech
method of <code>rsync</code>'ing the entire directory tree to <code>/srv/python</code> or <code>~/python/</code>
and mucking with the python path.  This system worked, but had a number of
problems:</p>
<ul>
<li>If I wanted to run a program on a new system, I had to rsync the entire
   directory tree.</li>
<li>Since there was no dependency information, the first time I wanted to
   share a program I wrote, I had to figure out the dependencies manually.</li>
<li>I had no idea what modules were being used, and which were obsolete.</li>
<li>When I started writing test code and documentation, I did not have a good
   place to store them.  I used a single directory for all my tiny modules
   because one directory per module seemed like overkill at the time.</li>
<li>When the version of python on the system was upgraded, bad things happened.</li>
</ul>
<p>It's very tempting to simply throw all of your python code into a single
directory tree, but that method only causes problems later on.</p>
<h2>Do: Create python modules</h2>
<p>For the example above, we can write a simple <code>setup.py</code> file:</p>
<div class="pygments_colorful"><pre><span class="kn">from</span> <span class="nn">distutils.core</span> <span class="kn">import</span> <span class="n">setup</span>

<span class="n">setup</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">&quot;temp&quot;</span><span class="p">,</span>
    <span class="n">version</span><span class="o">=</span><span class="s">&quot;1.0&quot;</span><span class="p">,</span>
    <span class="n">py_modules</span> <span class="o">=</span> <span class="p">[</span><span class="s">&quot;temp&quot;</span><span class="p">],</span> 
    <span class="n">entry_points</span> <span class="o">=</span> <span class="p">{</span>
        <span class="s">&#39;console_scripts&#39;</span><span class="p">:</span> <span class="p">[</span>
            <span class="s">&#39;ctof   = temp:main&#39;</span><span class="p">,</span>
        <span class="p">]</span>
    <span class="p">},</span>
<span class="p">)</span>
</pre></div>

<p>If you have a full package instead of a single file module, you should use
<code>packages</code> and not <code>py_modules</code>.  The <a href="http://docs.python.org/distutils/setupscript.html">the official
documentation</a> should be
read if you are doing anything more complicated.  There are fields for your
name, short and long descriptions, licensing information, etc.  This
example was kept purposely short to make it clear that there is not much
you actually have to do to get started.  Even a barebones <code>setup.py</code> is
better than no <code>setup.py</code>.</p>
<h2>Don't: use 'scripts' in setup.py (Do: Use entry points)</h2>
<p><code>console_scripts</code> <code>entry_points</code> should be preferred over the 'scripts' that
setup.py can install.  The last time I tried, <code>scripts</code> did not get
correctly installed on Windows systems, but <code>console_scripts</code> did.
Additionally, the more code you have in scripts, the less testable code you
have in your modules.  When you use scripts, eventually you will get to the
point where they all contain something similar to:</p>
<div class="pygments_colorful"><pre><span class="kn">from</span> <span class="nn">mypackage.commands</span> <span class="kn">import</span> <span class="n">frob</span>
<span class="n">frob</span><span class="p">()</span>
</pre></div>

<p>and at that point, you are just re-implementing what <code>console_scripts</code> does for you.</p>
<h2>Do: Version your packages and depend on specific versions.</h2>
<p>So, after years of doing-the-wrong-thing, I finally created proper packages for
each of my libraries and tools.  Shortly after that I started having problems
again.  While I had been versioning all of my packages, any package that
required another package simply depended on the package name and not any
specific version or it.  This created problems any time I would add new
features.  I would install the latest version of a utility package on a server,
and it would crash since I had forgotten to upgrade the library it depended on.
Since I wasn't syncing the entire directory tree anymore, libraries were
becoming out of date.</p>
<h2>Don't install packages system wide. (Do: Use virtualenv and pip)</h2>
<p>Once you get to the point where you are using versioned packages, you'll
want to be able install different versions of modules under different
python versions.  When I was simply sticking everything under <code>/srv/python</code> it
was next to impossible to have multiple versions of python.  I could change
<code>PYTHONPATH</code> to point somewhere else, but there was no easy way to maintain two
complete different trees of modules.</p>
<p>It is extremely simple to get started using pip and virtual environments.
You can use the <code>-E</code> option to create a virtual environment and install a
package in one command. The <code>-E</code> option to pip creates a virtual environment if
one doesn't already exist:</p>
<div class="pygments_colorful"><pre>justin@eee:~/tmp$ pip  -E python_env install bottle
Creating new virtualenv environment in python_env
  New python executable in python_env/bin/python
  Installing distribute...done........................
Downloading/unpacking bottle
  Downloading bottle-0.9.5.tar.gz (45Kb): 45Kb downloaded
  Running setup.py egg_info for package bottle

Installing collected packages: bottle
  Running setup.py install for bottle

Successfully installed bottle
Cleaning up...
justin@eee:~/tmp$ ./python_env/bin/python 
&gt;&gt;&gt; import bottle
&gt;&gt;&gt; bottle.\_\_file\_\_
&#39;/home/justin/tmp/python_env/lib/python2.7/site-packages/bottle.pyc&#39;
&gt;&gt;&gt; 
</pre></div>

<p>I can use that same method to install the toy module I wrote for this post as
well:</p>
<pre><code>justin@eee:~/tmp$ pip  -E python_env install ~/tmp/post/temp_mod/
Unpacking ./post/temp_mod
  Running setup.py egg_info for package from file:///home/justin/tmp/post/temp_mod

Installing collected packages: temp
  Running setup.py install for temp

    Installing ctof script to /home/justin/tmp/python_env/bin

Successfully installed temp
Cleaning up...
</code></pre>
<p>pip was also nice enough to install my <code>console_script</code>:</p>
<pre><code>justin@eee:~/tmp$ ./python_env/bin/ctof 
C: 34
93
</code></pre>
<h2>Too long; Did read</h2>
<p>The barrier to entry for python is a lot lower compared to a language like java or c++.
It's true that helloworld is simply:</p>
<pre><code>print("Hello, World")
</code></pre>
<p>However, if you plan on using python for anything more complicated, you will
want to learn how to take advantage of modules and packages.  Python doesn't
force you to do this, but not doing so can quickly turn into a maintenance
nightmare.</p>]]></content:encoded>
    </item>
    <item>
      <title>os.popen considered harmful</title>
      <link>http://www.bouncybouncy.net/blog/2011/04/22/os.popen-considered-harmful</link>
      <pubDate>Fri, 22 Apr 2011 22:25:00 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2011/04/22/os.popen-considered-harmful</guid>
      <description>os.popen considered harmful</description>
      <content:encoded><![CDATA[<p>os.popen uses the shell by default, and unlike subprocess.Popen, has
no way of disabling it.  Problems can occur when the program you are
trying to run does not exist or is unable to be ran due to a
permissions issue.</p>
<p>Consider the following example function:</p>
<div class="pygments_colorful"><pre><span class="k">def</span> <span class="nf">logged_in_users</span><span class="p">():</span>
    <span class="n">users</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">popen</span><span class="p">(</span><span class="s">&quot;who&quot;</span><span class="p">):</span>
        <span class="n">users</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">()[</span><span class="mi">0</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">users</span>
</pre></div>

<p>This runs just fine when everything is working:</p>
<pre><code>In [4]: logged_in_users()
Out[4]: set(['justin'])
</code></pre>
<p>But if there is a problem running the command(for the example lets
change the 'who' to 'whom':</p>
<pre><code>In [6]: logged_in_users()
sh: whom: not found
Out[6]: set()
</code></pre>
<p>What happened was os.popen ran</p>
<pre><code>"sh -c whom"
</code></pre>
<p>While sh started fine, the actually command could not be ran.  Since
os.popen also does not pass the exit code back to the parent process
there is no easy method to use to see if anything went wrong.</p>
<p>If we switch over to subprocess.Popen, everything works fine:</p>
<div class="pygments_colorful"><pre>    <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span><span class="s">&quot;whom&quot;</span><span class="p">],</span> <span class="n">stdout</span><span class="o">=</span><span class="n">subprocess</span><span class="o">.</span><span class="n">PIPE</span><span class="p">)</span><span class="o">.</span><span class="n">stdout</span><span class="p">:</span>
</pre></div>

<p>This call will instead immediately raise an exception:</p>
<pre><code>OSError: [Errno 2] No such file or directory
</code></pre>
<p>So using subprocess.Popen and not using os.popen has the following
benefits:</p>
<ul>
<li>Is more secure against potential command injection</li>
<li>Does not waste a process</li>
<li>Returns better error message to the parent process</li>
</ul>]]></content:encoded>
    </item>
    <item>
      <title>normalizing ipv6 addresses</title>
      <link>http://www.bouncybouncy.net/blog/2011/04/19/normalizing-ipv6-addresses</link>
      <pubDate>Tue, 19 Apr 2011 16:34:30 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[tech]]></category>
      <category><![CDATA[ipv6]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2011/04/19/normalizing-ipv6-addresses</guid>
      <description>normalizing ipv6 addresses</description>
      <content:encoded><![CDATA[<p>One of the first steps in groking ipv6 is getting a handle on ipv6 addresses.</p>
<p>The 'dotted quad' notation for ipv4 is fairly simple, and other than possible
zero padding issues, they all look the same.  ipv6 addresses are a bit
different.  Rather than a dotted quad they are 8 hex groups, and there are a lot
of <a href="http://en.wikipedia.org/wiki/IPv6_address#Presentation">rules</a> for
displaying the addresses.  For working with ipv6 addresses there are two
options:</p>
<ul>
<li>Convert them to a 16 byte string</li>
<li>Normalize them</li>
</ul>
<p>There are some very nice libraries for working with ip addreses, but the low
level socket functions can be used to convert and normalize:</p>
<div class="pygments_colorful"><pre><span class="o">&gt;&gt;&gt;</span> <span class="kn">import</span> <span class="nn">socket</span>
<span class="o">&gt;&gt;&gt;</span> <span class="nb">bytes</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">inet_pton</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET6</span><span class="p">,</span> <span class="s">&quot;2001:4860:800f:0:0:0:0:0063&quot;</span><span class="p">)</span>
<span class="o">&gt;&gt;&gt;</span> <span class="nb">bytes</span>
<span class="s">&#39; </span><span class="se">\x01</span><span class="s">H`</span><span class="se">\x80\x0f\x00\x00\x00\x00\x00\x00\x00\x00\x00</span><span class="s">c&#39;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="s">&#39;we can see that the data is the same:&#39;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="n">binascii</span><span class="o">.</span><span class="n">hexlify</span><span class="p">(</span><span class="nb">bytes</span><span class="p">)</span>
<span class="s">&#39;20014860800f00000000000000000063&#39;</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_ntop</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET6</span><span class="p">,</span> <span class="nb">bytes</span><span class="p">)</span>
<span class="mi">2001</span><span class="p">:</span><span class="mi">4860</span><span class="p">:</span><span class="mi">800</span><span class="n">f</span><span class="p">::</span><span class="mi">63</span>
</pre></div>

<p>We can make a simple fuction to do that:</p>
<div class="pygments_colorful"><pre><span class="k">def</span> <span class="nf">normalize</span><span class="p">(</span><span class="n">ip</span><span class="p">):</span>
    <span class="nb">bytes</span><span class="o">=</span><span class="n">socket</span><span class="o">.</span><span class="n">inet_pton</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET6</span><span class="p">,</span> <span class="n">ip</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">socket</span><span class="o">.</span><span class="n">inet_ntop</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET6</span><span class="p">,</span> <span class="nb">bytes</span><span class="p">)</span>
</pre></div>

<p>You can see some of the weird normalization rules in action:</p>
<div class="pygments_colorful"><pre><span class="o">&gt;&gt;&gt;</span> <span class="k">print</span> <span class="n">normalize</span><span class="p">(</span><span class="s">&quot;2001:4860:800f:0:0:0:0:0063&quot;</span><span class="p">)</span>
<span class="mi">2001</span><span class="p">:</span><span class="mi">4860</span><span class="p">:</span><span class="mi">800</span><span class="n">f</span><span class="p">::</span><span class="mi">63</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span> <span class="n">normalize</span><span class="p">(</span><span class="s">&quot;::ffff:c000:280&quot;</span><span class="p">)</span>
<span class="p">::</span><span class="n">ffff</span><span class="p">:</span><span class="mf">192.0</span><span class="o">.</span><span class="mf">2.128</span>
<span class="o">&gt;&gt;&gt;</span> <span class="k">print</span> <span class="n">normalize</span><span class="p">(</span><span class="s">&quot;ff02:0:0:0:0:0:0:1&quot;</span><span class="p">)</span>
<span class="n">ff02</span><span class="p">::</span><span class="mi">1</span>
</pre></div>]]></content:encoded>
    </item>
    <item>
      <title>Debian/kFreeBSD</title>
      <link>http://www.bouncybouncy.net/blog/2011/04/17/debian/kfreebsd</link>
      <pubDate>Sun, 17 Apr 2011 21:04:38 EDT</pubDate>
      <category><![CDATA[tech]]></category>
      <category><![CDATA[debian]]></category>
      <category><![CDATA[freebsd]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2011/04/17/debian/kfreebsd</guid>
      <description>Debian/kFreeBSD</description>
      <content:encoded><![CDATA[<p>A few days ago I installed Debian/kFreeBSD on my home server.  It had
been running opensolaris for years, but doing just about anything on
that system was a complete pain in the ass.  I had been meaning to
give Debian/kFreeBSD a try, but had been putting it off thinking the
changeover would break a lot of things, or I would have trouble
importing the ZFS pools.</p>
<p>The other day I had some free time so I gave it a go.</p>
<p>I downloaded the <a href="ftp://ftp.debian.org/debian/dists/squeeze/main/installer-kfreebsd-i386/current/images/netboot/mini.iso">mini.iso</a>
and dd'd it to a spare usb stick.  The kFreeBSD ISOs support both cd and hard
disk booting like the linux images.   The install took about 40
minutes(including the time taken to download everything).</p>
<p>After that I expected to have a few problems.. but everything worked.
I was able to install zfsutils and import the zfs pools.
Debian/kFreeBSD doesn't currently support nfs, but it was easy enough
to install samba.</p>
<p>I'm left with a speedy, lightweight system, with thousands of packages and
full security support:</p>
<pre><code>root@pip:~# df -h /
Filesystem            Size  Used Avail Use% Mounted on
/dev/ad0s1             35G  596M   32G   2% /

root@pip:~# free -m
             total       used       free     shared    buffers     cached
Mem:          2026        222       1804         17          0          0
-/+ buffers/cache:        222       1804
Swap:            0          0          0

root@pip:~# apt-cache search ""|wc -l
26258
</code></pre>
<p>Other than a few utilities working a little differently (the main one
I noticed was netstat not taking the same flags) it feels exactly like
a debian/linux system.  But with ZFS.</p>]]></content:encoded>
    </item>
    <item>
      <title>Playing with blogofile</title>
      <link>http://www.bouncybouncy.net/blog/2011/04/17/playing-with-blogofile</link>
      <pubDate>Sun, 17 Apr 2011 20:34:08 EDT</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[blogofile]]></category>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2011/04/17/playing-with-blogofile</guid>
      <description>Playing with blogofile</description>
      <content:encoded><![CDATA[<p>Reworking my much neglected website with <a href="http://www.blogofile.com/">blogofile</a>.
Ikiwiki is great, but I never really did anything with it.  Blogofile
is written in python and uses mako, two things I use in almost every
project.</p>
<h2>Nice page titles</h2>
<p>The first thing I wanted to do was fix the page titles.  Blog posts
should automatically have their page title set. This was a trivial
change to head.mako:</p>
<div class="pygments_colorful"><pre>-<span class="nt">&lt;title&gt;</span>${bf.config.blog.name}<span class="nt">&lt;/title&gt;</span>
+<span class="nt">&lt;title&gt;</span>
+    BB.Net
+%if post and post.title:
+- ${post.title}
+%endif
+<span class="nt">&lt;/title&gt;</span>
</pre></div>

<h2>Easy blogging</h2>
<p>The second thing I needed to do was write a script for easily adding
a new post.  <a href="/newblog.py">newblog.py</a> was the result:</p>
<pre><code>justin@eee:~/projects/bbdotnet$ ./newblog.py
Title: Playing with blogofile
cats: tech,python,blogofile
</code></pre>
<p>This drops me into a vim session with the following contents</p>
<pre><code>---
categories: tech,python,blogofile
date: 2011/04/17 20:34:08
title: Playing with blogofile
---
</code></pre>
<p>all I have to do when I'm done is 'git commit'</p>
<h2>Makefile</h2>
<p>Finally, I wrote a stupid simple <a href="/Makefile">Makefile</a>, that way I can just kick
off a :make inside of vim.</p>
<pre><code>all: build

build:
    blogofile build
</code></pre>]]></content:encoded>
    </item>
    <item>
      <title>Shared HTTP Caching</title>
      <link>http://www.bouncybouncy.net/blog/2009/06/12/shared-http-caching</link>
      <pubDate>Fri, 12 Jun 2009 12:14:16 EDT</pubDate>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2009/06/12/shared-http-caching</guid>
      <description>Shared HTTP Caching</description>
      <content:encoded><![CDATA[<p>I've been wondering why the web doesn't have a mechanism for uniquely
identifying a resource by a means other than its URL.  I think if such a thing
existed, then HTTP caches for common files could be shared between sites.</p>
<p>There has been a <a href="http://www.google.com/search?q=google+host+jquery">push</a>
lately to let <a href="http://code.google.com/apis/ajaxlibs/">Google</a> host common JS
libraries for you.  The main reason for this is increased performance, there
are two cases where this helps:</p>
<ul>
<li>The user has never loaded jQuery before - They get to download it from fast servers</li>
<li>The user has visited another site that also hosted jQuery on google - They don't have to download it at all.</li>
</ul>
<p>However, there are issues with this:</p>
<ul>
<li>This will not work on a restricted intranet</li>
<li>If the copy of jQuery on google was somehow compromised, a large number of sites would be effected.</li>
<li>If google is unreachable(it happens!), the site will fail to function properly</li>
</ul>
<p>There should be a way to include a checksum like so:</p>
<div class="pygments_colorful"><pre>&lt;script type=&quot;text/javascript&quot;
    src=&quot;/js/jquery-1.3.2.min.js&quot;
    sha1=&quot;3dc9f7c2642efff4482e68c9d9df874bf98f5bcb&quot;&gt;
&lt;/script&gt;
</pre></div>

<p>(sha1 usage here is just an example, a more secure method could easily be used instead)</p>
<p>This would have two benefits:</p>
<ul>
<li>If the copy of jQuery was maliciously modified, or simply corrupted, the browser would refuse to load it.</li>
<li>The browser may be able to use a cached copy of jQuery from another site with the same checksum.</li>
</ul>
<p>This sort of fits in with one of the ideas in the <a href="http://video.google.com/videoplay?docid=-6972678839686672840">A New Way to look at Networking talk by Van Jacobson</a>.</p>]]></content:encoded>
    </item>
    <item>
      <title>Remap capslock to z</title>
      <link>http://www.bouncybouncy.net/blog/2009/04/16/remap-capslock-to-z</link>
      <pubDate>Thu, 16 Apr 2009 19:52:25 EDT</pubDate>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2009/04/16/remap-capslock-to-z</guid>
      <description>Remap capslock to z</description>
      <content:encoded><![CDATA[<p>My 'z' key has been (physically) broken for a while now.  Generally this isn't
a problem because there aren't that many places where I need to type a 'z'
that I can't autocomplete it.  Between tab completion in the shell, and the
irssi dictcomplete plugin, it hasn't bothered me that much.</p>
<p>I finally got around to figuring out how to remap Caps lock to 'z', the
magic lines to add to ~/.Xmodmap are</p>
<pre><code>remove Lock = Caps_Lock
keycode 66 = z
</code></pre>
<p>Most of the examples I found are for swapping capslock with control or
escape(which are mostly obsolete now that you can use the Keyboard prefs
thing in Gnome and swap keys around with a single click).  Remapping caps
lock to z is still too obscure to be in the nice GUI.</p>
<p>Now, if only two lines in a config file could fix the battery :-)</p>]]></content:encoded>
    </item>
    <item>
      <title>How My Dupe Finding Program Works</title>
      <link>http://www.bouncybouncy.net/blog/2008/02/21/how-my-dupe-finding-program-works</link>
      <pubDate>Thu, 21 Feb 2008 11:59:18 EST</pubDate>
      <category><![CDATA[python]]></category>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2008/02/21/how-my-dupe-finding-program-works</guid>
      <description>How My Dupe Finding Program Works</description>
      <content:encoded><![CDATA[<h2>finding duplicate files</h2>
<p>This post is about my duplicate finding program available under <a href="/programs">programs</a>.
The program is a little bare, and needs a nicer API, but the method it uses is
the most efficient one that I am aware of.</p>
<p>There are a couple of different ways you can find duplicate files:</p>
<h3>Compute the hash of all the files, and look for duplicates</h3>
<p>This method works well if the files on disk are mostly static, and files are
added infrequently.  In this case you can compute the hashes once, and keep it
around for later scans.  However, if you are only running the scan once, this
method is not ideal since it requires you to read the full contents of every
file </p>
<h3>Compute the hash of files with the same size</h3>
<p>This is the method that I think fdupes still uses. It first builds a candidate
list of files that are the same size, and computes the checksum of each.  This
method works well if most of the files that are the same size are really
duplicates, but otherwise triggers too much unneeded IO.</p>
<h3>Compare all files with the same size in parallel</h3>
<p>This is the method that my program uses.  Like fdupes, I first built up a
candidate list of files with the same size. Instead of hashing the files,
it simply reads each file at the same time, comparing block by block.
This is just like what the <em>cmp(1)</em> program does, but for multiple files at the
same time.  The benefit of this over calculating the files hash, is that
as soon as the files differ, you can stop reading.</p>
<h2>Implementation</h2>
<p>There are a couple of things you need to keep in mind to implement this method.</p>
<h3>Don't open too many files.</h3>
<p>You have to be careful not to try and open too many files at once.  If the user
has 5,000 files that all have the same size, the program shouldn't try and open
all 5,000 at once.  My program uses a simple helper class to handle opening and
closing files.  The default blocksize in my program would probably waste a bit
of memory in this case, but that is easily changed.</p>
<h3>Correctly handle diverging sets.</h3>
<p>Imagine the filesystem contains 4 files of the same size, 'a', 'b','c', and 'd',
where a==c, and b==d.  While reading through the files, it will become clear
that a!=b, a==c, and a!=d.  It is important that at this step the program
continues searching using (a,c) and (b,d) as possible duplicates.  This is
implemented using recursion, the sets (a,c) and (b,d) are fed back into the
duplicate finding function.</p>
<h2>Example run, compared to fdupes.</h2>
<p>Here is dupes.py running against fdupes on a modestly sized directory.
Notice how dupes.py only needs to read 600K(not counting metadata).</p>
<p>According to iofileb.d from the dtrace toolkit, dupes.py reads 10M of data (which
I think includes python), and fdupes reads 517M.  This alone explains the 20x speedup
seen in dupes.py</p>
<div class="pygments_colorful"><pre>justin@pip:~$ du -hs $DIR
15G   $DIR

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m1.224s
user    0m0.234s
sys     0m0.494s

justin@pip:~$ time fdupes -r $DIR
real    0m41.694s
user    0m13.612s
sys     0m7.491s

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m3.662s
user    0m0.256s
sys     0m0.568s

justin@pip:~$ time fdupes -r $DIR
real    0m55.473s
user    0m11.383s
sys     0m6.433s
</pre></div>]]></content:encoded>
    </item>
    <item>
      <title>Xen live migration without shared storage</title>
      <link>http://www.bouncybouncy.net/blog/2008/02/18/xen-live-migration-without-shared-storage</link>
      <pubDate>Mon, 18 Feb 2008 10:51:14 EST</pubDate>
      <category><![CDATA[tech]]></category>
      <guid isPermaLink="true">http://www.bouncybouncy.net/blog/2008/02/18/xen-live-migration-without-shared-storage</guid>
      <description>Xen live migration without shared storage</description>
      <content:encoded><![CDATA[<h2>The problem</h2>
<p>The <a href="http://www.cl.cam.ac.uk/research/srg/netos/xen/readmes/user/user.html#SECTION03520000000000000000">Xen documentation on live migration states</a>:</p>
<blockquote>
<p>Currently, there is no support for providing automatic remote access to
filesystems stored on local disk when a domain is migrated. Administrators
should choose an appropriate storage solution (i.e. SAN, NAS, etc.) to ensure
that domain filesystems are also available on their destination node. GNBD is a
good method for exporting a volume from one machine to another. iSCSI can do a
similar job, but is more complex to set up. </p>
</blockquote>
<p>This does not mean that it is impossible though. Live migration is a more
efficient migration, and migration can be seen as a save on one node, and a
restore on another. Normally, if you save a VM on one machine, and try to
restore it on another machine, it will fail when it is unable to read its
filesystems. But what would happen if you coppied the filesystem to the other
node between the save and restore? If done right, it works pretty well.</p>
<h2>The solution?</h2>
<p>The solution is simple:</p>
<ul>
<li>Save running image</li>
<li>Sync disks</li>
<li>copy image to other node, restore</li>
</ul>
<p>This can be somewhat sped up by syncing the disks twice:</p>
<ul>
<li>Sync disks</li>
<li>Save running image</li>
<li>Sync disks - only having to save any changes in the last few seconds</li>
<li>copy image to other node, restore</li>
</ul>
<h3>Syncronizing block devices</h3>
<h4>File backed</h4>
<p>If you are using plain files as vbds, you can sync the disks using rsync.</p>
<h4>Raw devices</h4>
<p>If you are using raw devices, rsync can not be used. I wrote a small utility
called [[blocksync|/programs/blocksync.py]] which can syncronize 2 block
devices over the network. In my testing it was easily able to max out the
network on an initial sync, and max out the disk read speed on a resync.</p>
<pre><code>$ blocksync.py /dev/xen/vm-root 1.2.3.4
</code></pre>
<p>Will sync /dev/xen/vm-root onto 1.2.3.4. The device should already exist on the destination and be the same size.</p>
<h4>Solaris ZFS</h4>
<p>If you are using ZFS, it should be possible to use <code>zfs send</code> to sync the block
devices before migration.  This would give an almost instantaneous sync time.</p>
<h2>Automation</h2>
<p>A simple script [[/programs/xen_migrate.sh]] and its helper [[/programs/xen_vbds.py]] will migrate a domain to another host.
File and raw vbds are supported.  <code>ZFS send</code> support is not yet implemented.</p>
<h3>Example migration</h3>
<div class="pygments_colorful"><pre>#migrating a 1G / + 128M swap over the network
#physical machines are 350mhz with 64M of ram,
#total downtime is about 3 minutes

xen1:~# time ./migrate.sh test 192.168.1.2
+ &#39;[&#39; 2 -ne 2 &#39;]&#39;
+ DOMID=test
+ DSTHOST=192.168.1.2
++ xen_vbds.py test
+ FILES=/dev/xen/test-root
/dev/xen/test-swap
+ main
+ check_running
+ xm list test
Name              Id  Mem(MB)  CPU  State  Time(s)  Console
test              87       15    0  -b---      0.0    9687
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 942, diff: 82, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ save_image
+ xm save test test.dump
+ sync_disk
+ blocksync.py /dev/xen/test-root 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-root -b 1048576
same: 1019, diff: 5, 1024/1024
+ blocksync.py /dev/xen/test-swap 192.168.1.2
ssh -c blowfish 192.168.1.2 blocksync.py server /dev/xen/test-swap -b 1048576
same: 128, diff: 0, 128/128
+ copy_image
+ scp test.dump 192.168.1.2:
test.dump                                       100%   16MB   3.2MB/s   00:05
+ restore_image
+ ssh 192.168.1.2 &#39;xm restore test.dump &amp;&amp; rm test.dump&#39;
(domain
    (id 89)
    [domain info stuff cut out]
)
+ rm test.dump

real    6m6.272s
user    1m29.610s
sys     0m30.930s
</pre></div>]]></content:encoded>
    </item>
  </channel>
</rss>
