<?xml version="1.0"?>
<rss version="2.0"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:dcterms="http://purl.org/dc/terms/" >
<channel>
<title>Python - Justin&#x27;s Ramblings</title>
<link>http://bouncybouncy.net//ramblings/tags/python/</link>
<description>BB.Net</description>
<item>
	
	<title>json vs thrift and protocol buffers round 2</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/json_vs_thrift_and_protocol_buffers_round_2/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/json_vs_thrift_and_protocol_buffers_round_2/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Mon, 02 Mar 2009 21:54:57 -0500</pubDate>
	<dcterms:modified>2009-03-03T03:06:44Z</dcterms:modified>
	
	<description><![CDATA[<p>Following up to the <a href="http://bouncybouncy.net//ramblings/tags/python/../../posts/thrift_and_protocol_buffers/">previous</a> <a href="http://bouncybouncy.net//ramblings/tags/python/../../posts/more_on_json_vs_thrift_and_protocol_buffers/">posts</a>,
A few comments out on the internet mentioned that my first tests
werent very fair to thrift and protocol buffers because they were
mostly serializing strings. I gutted the test code and re-wrote the
IDL files to use this structure:</p>
<div class="syntax">
<pre>
message DnsRecord {
  required fixed32 sip  = 1;
  required fixed32 dip  = 2;
  required uint32 sport = 3;
  required uint32 dport = 4;
}

</pre></div>
<p>Nothing fancy, basically the standard ipv4 4-tuple.</p>
<p>I also replaced the random record generation with this:</p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_random_records</span>(num=10000):
    data = []
    <span class="synStatement">for</span> x <span class=
"synStatement">in</span> xrange(num):
        data.append({
            '<span class=
"synConstant">sip</span>':     192*255**3+168*255**2+255+random.randrange(0,255),
            '<span class=
"synConstant">dip</span>':     random.randrange(1,255**4),
            '<span class=
"synConstant">sport</span>':   random.randrange(1024,2048),
            '<span class=
"synConstant">dport</span>':   random.choice([21,22,25,80,110,443])
        })
    <span class="synStatement">return</span> data

</pre></div>
<p>This will generate 10000 records with:</p>
<ul>
<li>a random source IP on the 192.168.1.0/24 network</li>
<li>a completely random destination IP</li>
<li>a source port between 1024 and 2048</li>
<li>a destination port chosen from six common ports.</li>
</ul>
<p>The raw size of this data using fixed length ints would be
10000*(4+4+4+4) = 160,000 bytes. The variable length encoding that
protocol buffers does should be able to save some space when
storing the smaller port numbers.</p>
<p>Running the test code produces the following output:</p>
<div class="syntax">
<pre>
10000 total records (0.280s)

get_thrift          (0.060s)
get_pb              (0.950s)

ser_thrift          (0.560s)  370009 bytes
ser_pb              (4.850s)  171650 bytes
ser_json            (0.080s)  680680 bytes
ser_cjson           (0.120s)  680680 bytes
ser_yaml            (17.330s) 610680 bytes

ser_thrift_compressed (0.620s)  111326 bytes
ser_pb_compressed     (3.980s)   98571 bytes
ser_json_compressed   (0.110s)  124919 bytes
ser_cjson_compressed  (0.120s)  124919 bytes
ser_yaml_compressed   (17.160s) 121065 bytes

serde_thrift        (2.130s)
serde_pb            (7.550s)
serde_json          (0.130s)
serde_cjson         (0.110s)
serde_yaml          (56.740s)

</pre></div>
<p>These results show that protocol buffers and thrift do indeed
excel at serializing numeric values. The pre-compressed output from
protocol buffers is considerably smaller than the other
serialization methods, with thrift ending up somewhere in the
middle. In fact, the protocol buffers output is barely larger than
the original data would be in compact binary form. Since JSON and
YAML serialize numbers to strings, their output ends up being 4
times bigger.</p>
<p>However, once you add in compression, all this fancy extra work
to save space only slightly improves on JSON. The speed and
simplicity of the JSON+zlib approach can not be ignored...</p>
<p>The protocol buffers speed issues are still there, but I'm sure
that over time things will improve. If the C extension for
simplejson can speed up serialization by an order of magnitude, I
have no doubt that similar improvements can be made to protocol
buffers and thrift.</p>
<p>If you want to run these tests for yourself, the code is
available from <a href="http://bouncybouncy.net//ramblings/tags/python/../../files/sertest2.tgz">sertest2.tgz</a></p>
<p>Some other things to try would be to set the default dport to
80, and see how that effects serialization size and speed.</p>

]]></description>
	
</item>
<item>
	
	<title>more on json vs thrift and protocol buffers</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Sun, 01 Mar 2009 13:19:58 -0500</pubDate>
	<dcterms:modified>2009-03-01T18:49:44Z</dcterms:modified>
	
	<description><![CDATA[<p>Following up to the <a href="http://bouncybouncy.net//ramblings/tags/python/../../posts/thrift_and_protocol_buffers/">previous post</a>, a
couple of people pointed out to me that the cjson library is faster
than simplejson, and that the latest simplejson has a small C
extension.</p>
<p>Re running the tests with simplejson 2.0.9 and cjson yields the
following results:</p>
<div class="syntax">
<pre>
5000 total records (0.730s)

get_thrift          (0.040s)
get_pb              (0.620s)

ser_thrift          (0.550s) 555125 bytes
ser_pb              (2.980s) 415125 bytes
ser_json            (0.030s) 718455 bytes
ser_cjson           (0.040s) 718455 bytes
ser_yaml            (12.770s) 623455 bytes

ser_thrift_compressed (0.630s) 287621 bytes
ser_pb_compressed     (3.020s) 284441 bytes
ser_json_compressed   (0.090s) 293073 bytes
ser_cjson_compressed  (0.080s) 293073 bytes
ser_yaml_compressed   (13.260s) 291106 bytes

serde_thrift        (1.460s)
serde_pb            (5.250s)
serde_json          (0.070s)
serde_cjson         (0.060s)
serde_yaml          (44.110s)

</pre></div>
<p>There doesn't seem to be much doubt about it, if you need to
serialize basic python data structures and don't need the extra
features of thrift or protocol buffers, it is hard to beat
JSON.</p>
<p>The test code at <a href="http://bouncybouncy.net//ramblings/tags/python/../../files/sertest.tgz">sertest.tgz</a> has also been updated to
use time.clock instead of time.time.</p>

]]></description>
	
</item>
<item>
	
	<title>thrift and protocol buffers</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/thrift_and_protocol_buffers/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/thrift_and_protocol_buffers/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Sat, 28 Feb 2009 11:53:08 -0500</pubDate>
	<dcterms:modified>2009-02-28T18:08:47Z</dcterms:modified>
	
	<description><![CDATA[<p>I've been experimenting with thrift and protocol buffers
recently. For the most part when I need to serialize something I've
been using JSON or compressed JSON. Thrift and protocol buffers
have a couple of advantages, and are also supposedly faster and
produce smaller output.</p>
<p>The test I've been using is a simple list of hashes, nothing too
complicated. here is the protocol buffers file. The thrift file is
pretty much the same thing.</p>
<div class="syntax">
<pre>
package passive_dns;

message DnsRecord {
  required string key = 1;
  required string value = 2;
  required string first = 3;
  required string last = 4;
  optional string type = 5 [default = "A"];
  optional int32  ttl = 6 [default = 86400];
}

message DnsResponse {
  repeated DnsRecord records = 1;
}

</pre></div>
<p>The optional and default values are one of the benefits of both
serialization libraries. A record that matches the default value
does not need to be included in the serialized output.</p>
<p>I wrote up a simple test program to compare thrift, protocol
buffers, json, and compressed json for size and speed. The results,
at least for the type of data I use, are very interesting:</p>
<div class="syntax">
<pre>
5000 total records (0.745s)

get_thrift          (0.044s)
get_pb              (0.608s)

ser_thrift          (0.474s) 554953 bytes
ser_pb              (3.087s) 414862 bytes
ser_json            (0.273s) 718191 bytes
ser_yaml            (13.121s) 623191 bytes

ser_thrift_compressed (0.545s) 287617 bytes
ser_pb_compressed     (3.150s) 284297 bytes
ser_json_compressed   (0.326s) 292904 bytes
ser_yaml_compressed   (13.665s) 290993 bytes

serde_thrift        (1.289s)
serde_pb            (5.411s)
serde_json          (1.474s)
serde_yaml          (45.637s)

</pre></div>
<p>EDIT: Updated to include yaml results</p>
<p>The get_* functions are the times needed to covert the python
data structure into the classes that the library needs.</p>
<p>The ser_* functions are the times needed to get and serialize
the python data structure to a string.</p>
<p>The ser_*_compressed functions are the times needed to get,
serialize, and compress the python data structure.</p>
<p>The serde_* functions are the times needed to get, serialize,
and de-serialize the python data structure to and from a
string.</p>
<p>The results show that serializing to compressed JSON is both
smaller and faster than thrift, and serializing+de-serializing is
only slightly slower. If I converted the python data to be (header,
rows) like a csv file, rather than a flat list of dicts, the json
output would be smaller, and likely faster to serialize.</p>
<p>The totally unexpected result was that protocol buffers clocked
in at over 4 times slower than thrift. I find it hard to believe
that protocol buffers could be that slow, so I will have to run
some more tests to make sure that I am using the library
correctly.</p>
<p>If you want to run my tests for yourself, the code is available
from <a href="http://bouncybouncy.net//ramblings/tags/python/../../files/sertest.tgz">sertest.tgz</a></p>

]]></description>
	
</item>
<item>
	
	<title>Python Evolution: From Script To Program</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/python_evolution_from_script_to_program/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/python_evolution_from_script_to_program/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Sat, 21 Jun 2008 23:18:12 -0400</pubDate>
	<dcterms:modified>2008-06-22T15:12:26Z</dcterms:modified>
	
	<description><![CDATA[<p><a href=
"http://forums.thedailywtf.com/forums/p/6978/132159.aspx">The
Evolution of a Python Programmer</a> is funny, but it only covers
one aspect of programming. Many times I will see code that is fine
from a CS point of view, but absolutely horrible when it comes to
program structure and module organization.</p>
<p>You often see people saying things like "Hello World in python
is just 'print "Hello World"'", and that is true. It is very easy
to get started writing python, but if you don't structure your
modules correctly, you will be in a world of pain later on. It is
something that can be hard to explain, since the results in the
short term are the same, and it may not be clear at first why one
way of doing things is better than the other.</p>
<p>Instead of Hello World, let's take the example of a program to
get stock quotes. The actual implementation here is not relevant,
pretend it contacts a web service or database or something.</p>
<p>A common case is the "python script". I HATE python scripts.
"script" almost always ends up being a single file with no entry
points, no main function, and mixes IO with logic.</p>
<div class="syntax">
<pre>
s = raw_input("<span class="synConstant">symbol:</span>")
<span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
<span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43

</pre></div>
<p>The first step in fixing this is to define an actual function.
Now you can import the module and run get_price().</p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>():
    s = raw_input("<span class="synConstant">symbol:</span>")
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43

</pre></div>
<p>The (hopefully) obvious problem with this is that the IO is
mixed in with the logic. What if you wanted to get the stock price
for 1000 stocks and output a nice summary? This next version is
slightly better, here the input is a proper parameter, but you
still have no control over the output. You could get your 1000
quotes, but you would have no way to report on the output. Again,
this should be obvious, but I come across code that does this way
too often.</p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>(s):
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', 546.43
<span class="synComment">###</span>
<span class="synStatement">if</span> __name__ == "<span class=
"synConstant">__main__</span>":
    s = raw_input("<span class="synConstant">symbol:</span>")
    get_price(s)

</pre></div>
<p>The first respectable version adds a main() function that
handles the input and output. The main function should also get the
stock from the command line arguments, rather than interactively. I
think you tend to see things like this more often from windows
users, who like to double click on things rather than run them from
a shell. You could probably write a whole book on this subject
though <img src="http://bouncybouncy.net//ramblings/tags/python/../../../smileys/smile.png" alt=":-)" /></p>
<div class="syntax">
<pre>
<span class="synStatement">def</span> <span class=
"synIdentifier">get_price</span>(s):
    <span class="synStatement">if</span> s == '<span class=
"synConstant">MSFT</span>':
        <span class="synStatement">return</span> 28.23
    <span class="synStatement">elif</span> s == '<span class=
"synConstant">GOOG</span>':
        <span class="synStatement">return</span> 546.43
<span class="synComment">###</span>
<span class="synStatement">def</span> <span class=
"synIdentifier">main</span>():
    s = raw_input("<span class="synConstant">symbol:</span>")
    <span class="synStatement">print</span> '<span class=
"synConstant">price=</span>', get_price(s)

<span class="synStatement">if</span> __name__ == "<span class=
"synConstant">__main__</span>":
    main()

</pre></div>
<p>The final steps are to make a proper python package out of this
module, but I'll save that for a later post.</p>

]]></description>
	
</item>
<item>
	
	<title>how my dupe finding program works</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/how_my_dupe_finding_program_works/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/how_my_dupe_finding_program_works/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Thu, 21 Feb 2008 23:41:03 -0500</pubDate>
	<dcterms:modified>2008-02-22T04:59:18Z</dcterms:modified>
	
	<description><![CDATA[<h2>finding duplicate files</h2>
<p>This post is about my duplicate finding program available under
<a href="http://bouncybouncy.net//ramblings/tags/python/../../../programs/">Programs</a>. The program is a little
bare, and needs a nicer API, but the method it uses is the most
efficient one that I am aware of.</p>
<p>There are a couple of different ways you can find duplicate
files:</p>
<h3>Compute the hash of all the files, and look for duplicates</h3>
<p>This method works well if the files on disk are mostly static,
and files are added infrequently. In this case you can compute the
hashes once, and keep it around for later scans. However, if you
are only running the scan once, this method is not ideal since it
requires you to read the full contents of every file</p>
<h3>Compute the hash of files with the same size</h3>
<p>This is the method that I think fdupes still uses. It first
builds a candidate list of files that are the same size, and
computes the checksum of each. This method works well if most of
the files that are the same size are really duplicates, but
otherwise triggers too much unneeded IO.</p>
<h3>Compare all files with the same size in parallel</h3>
<p>This is the method that my program uses. Like fdupes, I first
built up a candidate list of files with the same size. Instead of
hashing the files, it simply reads each file at the same time,
comparing block by block. This is just like what the
<em>cmp(1)</em> program does, but for multiple files at the same
time. The benefit of this over calculating the files hash, is that
as soon as the files differ, you can stop reading.</p>
<h2>Implementation</h2>
<p>There are a couple of things you need to keep in mind to
implement this method.</p>
<h3>Don't open too many files.</h3>
<p>You have to be careful not to try and open too many files at
once. If the user has 5,000 files that all have the same size, the
program shouldn't try and open all 5,000 at once. My program uses a
simple helper class to handle opening and closing files. The
default blocksize in my program would probably waste a bit of
memory in this case, but that is easily changed.</p>
<h3>Correctly handle diverging sets.</h3>
<p>Imagine the filesystem contains 4 files of the same size, 'a',
'b','c', and 'd', where a==c, and b==d. While reading through the
files, it will become clear that a!=b, a==c, and a!=d. It is
important that at this step the program continues searching using
(a,c) and (b,d) as possible duplicates. This is implemented using
recursion, the sets (a,c) and (b,d) are fed back into the duplicate
finding function.</p>
<h2>Example run, compared to fdupes.</h2>
<p>Here is dupes.py running against fdupes on a modestly sized
directory. Notice how dupes.py only needs to read 600K(not counting
metadata).</p>
<p>According to iofileb.d from the dtrace toolkit, dupes.py reads
10M of data (which I think includes python), and fdupes reads 517M.
This alone explains the 20x speedup seen in dupes.py</p>
<div class="syntax">
<pre>
justin@pip:~$ du -hs $DIR
15G   $DIR

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m1.224s
user    0m0.234s
sys     0m0.494s

justin@pip:~$ time fdupes -r $DIR
real    0m41.694s
user    0m13.612s
sys     0m7.491s

justin@pip:~$ time python code/dupes.py $DIR
2896 total files
35 size collisions, max of length 5
bytes read 647168

real    0m3.662s
user    0m0.256s
sys     0m0.568s

justin@pip:~$ time fdupes -r $DIR
real    0m55.473s
user    0m11.383s
sys     0m6.433s

</pre></div>

]]></description>
	
</item>
<item>
	
	<title>regex with named groups</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/regex_with_named_groups/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/regex_with_named_groups/</link>
	
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Wed, 20 Feb 2008 11:42:21 -0500</pubDate>
	<dcterms:modified>2008-02-20T17:00:38Z</dcterms:modified>
	
	<description><![CDATA[<p>As I mentioned in a comment at <a href=
"http://handyfloss.wordpress.com/2008/02/19/some-more-tweaks-to-my-python-script/">
Some more tweaks to my Python script</a>, there are a lot of ways
you can use the re module. If you need to match multiple
expressions against each line, you can build up a single regular
expression that includes all the patterns, and used named groups to
tell them apart.</p>
<div class="syntax">
<pre>

<span class="synPreProc">import</span> re
<span class=
"synComment">#if you were matching many of these it would be a good idea</span>
<span class=
"synComment">#to make a function that simply fills in '%s&gt;(?P&lt;%s&gt;[^&lt;]+)&lt;'</span>
cpattern    = '<span class=
"synConstant">total_credit&gt;(?P&lt;credit&gt;[^&lt;]+)&lt;</span>'
opattern    = '<span class=
"synConstant">os_name&gt;(?P&lt;os&gt;[^&lt;]+)&lt;</span>'
pattern     = '<span class=
"synConstant">(%s)|(%s)</span>' % (cpattern, opattern)

search = re.compile(pattern).search

lines = [
    '<span class=
"synConstant">blah blah blah total_credit&gt;10&lt; blah blah</span>',
    '<span class=
"synConstant">hkfhsd klfjhs dfkljsdfsl fds</span>',
    '<span class=
"synConstant">hkashflksd os_name&gt;win&lt; hhkjhdflksj d</span>',
    '<span class=
"synConstant">hkfhsd klfjhs dfkljsdfsl fds</span>',
    '<span class=
"synConstant">blah blah blah total_credit&gt;20&lt; blah blah</span>',
]

<span class="synStatement">for</span> line <span class=
"synStatement">in</span> lines:
    r = search(line)
    <span class="synStatement">if</span> r:
        <span class="synStatement">print</span> r.groupdict()

</pre></div>
<p>Running this gives</p>
<div class="syntax">
<pre>
{'<span class="synConstant">credit</span>': '<span class=
"synConstant">10</span>', '<span class=
"synConstant">os</span>': None}
{'<span class="synConstant">credit</span>': None, '<span class=
"synConstant">os</span>': '<span class="synConstant">win</span>'}
{'<span class="synConstant">credit</span>': '<span class=
"synConstant">20</span>', '<span class=
"synConstant">os</span>': None}

</pre></div>
<p>In this case you could even generalize the regular expression
further, like so:</p>
<div class="syntax">
<pre>
pattern     = '<span class=
"synConstant">\s(?P&lt;key&gt;[^\s&gt;]+)&gt;(?P&lt;value&gt;[^&lt;]+)&lt;</span>'

</pre></div>
<p>Running that (probably less than optimal) regular expression
over the input gives</p>
<div class="syntax">
<pre>
{'<span class="synConstant">key</span>': '<span class=
"synConstant">total_credit</span>', '<span class=
"synConstant">value</span>': '<span class="synConstant">10</span>'}
{'<span class="synConstant">key</span>': '<span class=
"synConstant">os_name</span>', '<span class=
"synConstant">value</span>': '<span class=
"synConstant">win</span>'}
{'<span class="synConstant">key</span>': '<span class=
"synConstant">total_credit</span>', '<span class=
"synConstant">value</span>': '<span class="synConstant">20</span>'}

</pre></div>

]]></description>
	
</item>
<item>
	
	<title>dynamic ikiwiki pages</title>
	
	<guid>http://bouncybouncy.net//ramblings/posts/dynamic_ikiwiki_pages/</guid>
	<link>http://bouncybouncy.net//ramblings/posts/dynamic_ikiwiki_pages/</link>
	
	
	<category>tags/ikiwiki</category>
	
	<category>tags/meta</category>
	
	<category>tags/pylons</category>
	
	<category>tags/python</category>
	
	<category>tags/tech</category>
	
	
	<pubDate>Fri, 15 Feb 2008 20:57:58 -0500</pubDate>
	<dcterms:modified>2008-02-16T18:36:26Z</dcterms:modified>
	
	<description><![CDATA[<p>The static pages that <a href="http://ikiwiki.info">ikiwiki</a>
generates are great, but I want to have some dynamic content here
as well.</p>
<p>If this works, this page should include the servers uptime.</p>
<!--# include virtual="/dyn/demo/uptime" -->
<p>yay <img src="http://bouncybouncy.net//ramblings/tags/python/../../../smileys/smile.png" alt=":-)" /></p>
<p>So how does that work?</p>
<p>first configure nginx as follows</p>
<div class="syntax">
<pre>
server {
    listen       80;
    server_name  bouncybouncy.net  *.bouncybouncy.net web;

    location / {
        root   /home/justin/bbdotnet/static/;
        index  index.html index.htm;
        ssi on;
    }
    location /dyn {
        # All POST requests go to pylons directly
        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        if ($request_method = POST) {
            proxy_pass  http://127.0.0.1:5000;
            break;
        }
        default_type text/html; 

        set $memcached_key "$uri";
        memcached_pass localhost:11211;

        proxy_intercept_errors  on;

        # If no info would be found in memcache or memecache would be dead, go to real dynamic location
        error_page 404 502 = @dynamic_request;
    }
    location @dynamic_request{
        # This means, that we can't get to this location from outside - only by internal redirect
        internal;

        include /usr/local/nginx/conf/proxy.conf;
        proxy_redirect  default; 
        proxy_pass  http://127.0.0.1:5000;
    }

}

</pre></div>
<p>Pylons is setup to run on port 5000 as usual, nothing fancy
there.</p>
<p>Then anywhere we want some dynamic content we can simply do</p>
<div class="syntax">
<pre>
&lt;!--# include virtual="/dyn/demo/uptime" --&gt;

</pre></div>
<p>For now, you have to disable the htmlscrubber plugin for this to
work. There is probably a better solution. I think this would
simply involve a plugin that could run after htmlscrubber to insert
the include, then you would only need to have something like
[[include virtual="/dyn/demo/uptime"]] in your pages.</p>
<p>If you did not mind requring javscript, you could use <a href=
"http://www.mnot.net/javascript/hinclude/">HInclude</a> instead of
SSI.</p>
<p>To keep things running fast, we enable to caching on the pylons
controller. using a modified version of the beaker<em>cache
decorator. The following lines are inserted at the end of the
create</em>func method, which causes the page result to be cached
in memcache as well as in beaker.</p>
<div class="syntax">
<pre>
url = pylons.request.path_info
<span class="synStatement">if</span> pylons.request.params:
    url += "<span class=
"synConstant">?</span>" + pylons.request.environ['<span class=
"synConstant">QUERY_STRING</span>']

mc = memcache.Client(['<span class=
"synConstant">localhost</span>'])
mc.set(url, result, cache_expire)

</pre></div>
<p>The only remaining problem I see is a small race condition. If
the cache expires, and 20 concurrent requests all come in for the
page, most of them will end up hitting python instead of waiting
for the memcache key to appear. This might actually work better
using varnish or apache2 with <code>mod_disk_cache</code>, but the
last time I tried I could not get varnish to work at all, and
apache2 (I think) still does not support PURGE.</p>

]]></description>
	
</item>

</channel>
</rss>
