This feed contains pages in the "tech" category.
shared http caching
Posted Fri Jun 12 12:12:45 2009I've been wondering why the web doesn't have a mechanism for uniquely identifying a resource by a means other than its URL. I think if such a thing existed, then HTTP caches for common files could be shared between sites.
There has been a push lately to let Google host common JS libraries for you. The main reason for this is increased performance, there are two cases where this helps:
- The user has never loaded jQuery before - They get to download it from fast servers
- The user has visited another site that also hosted jQuery on google - They don't have to download it at all.
However, there are issues with this:
- This will not work on a restricted intranet
- If the copy of jQuery on google was somehow compromised, a large number of sites would be effected.
- If google is unreachable(it happens!), the site will fail to function properly
There should be a way to include a checksum like so:
<script type="text/javascript"
src="/js/jquery-1.3.2.min.js"
sha1="3dc9f7c2642efff4482e68c9d9df874bf98f5bcb">
</script>
(sha1 usage here is just an example, a more secure method could easily be used instead)
This would have two benefits:
- If the copy of jQuery was maliciously modified, or simply corrupted, the browser would refuse to load it.
- The browser may be able to use a cached copy of jQuery from another site with the same checksum.
This sort of fits in with one of the ideas in the A New Way to look at Networking talk by Van Jacobson.
remap capslock to z
Posted Thu Apr 16 19:52:25 2009My 'z' key has been (physically) broken for a while now. Generally this isn't a problem because there aren't that many places where I need to type a 'z' that I can't autocomplete it. Between tab completion in the shell, and the irssi dictcomplete plugin, it hasn't bothered me that much.
I finally got around to figuring out how to remap Caps lock to 'z', the magic lines to add to ~/.Xmodmap are
remove Lock = Caps_Lock keycode 66 = z
Most of the examples I found are for swapping capslock with control or escape(which are mostly obsolete now that you can use the Keyboard prefs thing in Gnome and swap keys around with a single click). Remapping caps lock to z is still too obscure to be in the nice GUI.
Now, if only two lines in a config file could fix the battery

json vs thrift and protocol buffers round 2
Posted Mon Mar 2 21:54:57 2009Following up to the previous posts, A few comments out on the internet mentioned that my first tests werent very fair to thrift and protocol buffers because they were mostly serializing strings. I gutted the test code and re-wrote the IDL files to use this structure:
message DnsRecord {
required fixed32 sip = 1;
required fixed32 dip = 2;
required uint32 sport = 3;
required uint32 dport = 4;
}
Nothing fancy, basically the standard ipv4 4-tuple.
I also replaced the random record generation with this:
def get_random_records(num=10000): data = [] for x in xrange(num): data.append({ 'sip': 192*255**3+168*255**2+255+random.randrange(0,255), 'dip': random.randrange(1,255**4), 'sport': random.randrange(1024,2048), 'dport': random.choice([21,22,25,80,110,443]) }) return data
This will generate 10000 records with:
- a random source IP on the 192.168.1.0/24 network
- a completely random destination IP
- a source port between 1024 and 2048
- a destination port chosen from six common ports.
The raw size of this data using fixed length ints would be 10000*(4+4+4+4) = 160,000 bytes. The variable length encoding that protocol buffers does should be able to save some space when storing the smaller port numbers.
Running the test code produces the following output:
10000 total records (0.280s) get_thrift (0.060s) get_pb (0.950s) ser_thrift (0.560s) 370009 bytes ser_pb (4.850s) 171650 bytes ser_json (0.080s) 680680 bytes ser_cjson (0.120s) 680680 bytes ser_yaml (17.330s) 610680 bytes ser_thrift_compressed (0.620s) 111326 bytes ser_pb_compressed (3.980s) 98571 bytes ser_json_compressed (0.110s) 124919 bytes ser_cjson_compressed (0.120s) 124919 bytes ser_yaml_compressed (17.160s) 121065 bytes serde_thrift (2.130s) serde_pb (7.550s) serde_json (0.130s) serde_cjson (0.110s) serde_yaml (56.740s)
These results show that protocol buffers and thrift do indeed excel at serializing numeric values. The pre-compressed output from protocol buffers is considerably smaller than the other serialization methods, with thrift ending up somewhere in the middle. In fact, the protocol buffers output is barely larger than the original data would be in compact binary form. Since JSON and YAML serialize numbers to strings, their output ends up being 4 times bigger.
However, once you add in compression, all this fancy extra work to save space only slightly improves on JSON. The speed and simplicity of the JSON+zlib approach can not be ignored...
The protocol buffers speed issues are still there, but I'm sure that over time things will improve. If the C extension for simplejson can speed up serialization by an order of magnitude, I have no doubt that similar improvements can be made to protocol buffers and thrift.
If you want to run these tests for yourself, the code is available from sertest2.tgz
Some other things to try would be to set the default dport to 80, and see how that effects serialization size and speed.
more on json vs thrift and protocol buffers
Posted Sun Mar 1 13:19:58 2009Following up to the previous post, a couple of people pointed out to me that the cjson library is faster than simplejson, and that the latest simplejson has a small C extension.
Re running the tests with simplejson 2.0.9 and cjson yields the following results:
5000 total records (0.730s) get_thrift (0.040s) get_pb (0.620s) ser_thrift (0.550s) 555125 bytes ser_pb (2.980s) 415125 bytes ser_json (0.030s) 718455 bytes ser_cjson (0.040s) 718455 bytes ser_yaml (12.770s) 623455 bytes ser_thrift_compressed (0.630s) 287621 bytes ser_pb_compressed (3.020s) 284441 bytes ser_json_compressed (0.090s) 293073 bytes ser_cjson_compressed (0.080s) 293073 bytes ser_yaml_compressed (13.260s) 291106 bytes serde_thrift (1.460s) serde_pb (5.250s) serde_json (0.070s) serde_cjson (0.060s) serde_yaml (44.110s)
There doesn't seem to be much doubt about it, if you need to serialize basic python data structures and don't need the extra features of thrift or protocol buffers, it is hard to beat JSON.
The test code at sertest.tgz has also been updated to use time.clock instead of time.time.
thrift and protocol buffers
Posted Sat Feb 28 11:53:08 2009I've been experimenting with thrift and protocol buffers recently. For the most part when I need to serialize something I've been using JSON or compressed JSON. Thrift and protocol buffers have a couple of advantages, and are also supposedly faster and produce smaller output.
The test I've been using is a simple list of hashes, nothing too complicated. here is the protocol buffers file. The thrift file is pretty much the same thing.
package passive_dns;
message DnsRecord {
required string key = 1;
required string value = 2;
required string first = 3;
required string last = 4;
optional string type = 5 [default = "A"];
optional int32 ttl = 6 [default = 86400];
}
message DnsResponse {
repeated DnsRecord records = 1;
}
The optional and default values are one of the benefits of both serialization libraries. A record that matches the default value does not need to be included in the serialized output.
I wrote up a simple test program to compare thrift, protocol buffers, json, and compressed json for size and speed. The results, at least for the type of data I use, are very interesting:
5000 total records (0.745s) get_thrift (0.044s) get_pb (0.608s) ser_thrift (0.474s) 554953 bytes ser_pb (3.087s) 414862 bytes ser_json (0.273s) 718191 bytes ser_yaml (13.121s) 623191 bytes ser_thrift_compressed (0.545s) 287617 bytes ser_pb_compressed (3.150s) 284297 bytes ser_json_compressed (0.326s) 292904 bytes ser_yaml_compressed (13.665s) 290993 bytes serde_thrift (1.289s) serde_pb (5.411s) serde_json (1.474s) serde_yaml (45.637s)
EDIT: Updated to include yaml results
The get_* functions are the times needed to covert the python data structure into the classes that the library needs.
The ser_* functions are the times needed to get and serialize the python data structure to a string.
The ser_*_compressed functions are the times needed to get, serialize, and compress the python data structure.
The serde_* functions are the times needed to get, serialize, and de-serialize the python data structure to and from a string.
The results show that serializing to compressed JSON is both smaller and faster than thrift, and serializing+de-serializing is only slightly slower. If I converted the python data to be (header, rows) like a csv file, rather than a flat list of dicts, the json output would be smaller, and likely faster to serialize.
The totally unexpected result was that protocol buffers clocked in at over 4 times slower than thrift. I find it hard to believe that protocol buffers could be that slow, so I will have to run some more tests to make sure that I am using the library correctly.
If you want to run my tests for yourself, the code is available from sertest.tgz
setInterval and setTimeout
Posted Sat Jan 3 17:35:42 2009What I learned today: setInterval is not the same as setTimeout. It seems you can still easily forkbomb firefox with some trivial JS code.
<html> Ran <span id="num">0</span> times <script> var f = function() { var e = document.getElementById("num"); e.innerHTML++; setInterval(f, 100); } f(); </script> </html>download file "/crash.html"
JQuery Dotspinner
Posted Sat Jan 3 15:39:10 2009I while back I got fed up with trying to find a decent spinner gif for some AJAX pages. I gave up and just wrote one using ascii. This has the benefit of always fitting in with the page style and users font settings.
do_dotspinner = function (scope) { var spin_element = function (el) { max = 5; var text = el.innerHTML if (text.length==max) text=""; text += "."; el.innerHTML = text; } var els = $(".dotspinner",scope); if(!els.length) return; els.each(function(){ spin_element(this); }); var repeat = function(){ do_dotspinner(scope); } setTimeout(repeat, 200); }download file "/ramblings/files/js/dotspinner.js"
And since I doubt it will work if I inline it into a blog post, here is a demo.
I've been trying to make it work like an actual plugin, but I haven't been able to get that to work yet.
I should be able to make something like this work:
$(".dotspinner").dotspinner();
But I need to figure out how to tell when an element goes away
For example, if I have
p=$("#foo"); //and do this: $("body").html("Hi"); // then these still works like nothing happend: p.text(); $(p).text(); //even though if I do this again, it is gone: p=$("#foo"); //The one thing I know will work, but is probably wrong, is this: p = $('#' + p.attr("id"));
web development would be a lot easier if
Posted Thu Jan 1 21:27:43 2009Web development would be a lot easier if the W3 would standardize some new Widgets that browsers have to support. I've noticed that many Javascript frameworks these days exist primarily to implement standard GUI widgets for the web. Boring GUI widgets like:
- Combo Boxes (see example)
- Tabs
- Trees
- Grids
- Dialogs
- Menus
- Tool tips
- etc etc
My problem isn't that these widgets aren't availabile in web applications, it is that a hundred different projects are all re-implementing the same thing. At what point should something so basic like a combo box just be added to HTML?
I guess there is always HTML6.
tech vision
Posted Thu Dec 4 20:14:46 2008One thing I've noticed recently is a complete lack of vision when it comes to technology in the future. This is really obvious when you read those "What the world will be like in 2000" articles from 1950.
One common theme is simply making things bigger or smaller.. Bigger when it comes to machines, buildings, airplanes. Smaller when it comes to computers and other gadgets.
I remember one that talked about a revolutionary mail/news delivery system.. something like pneumatic tubes. What I've noticed is that when people have tried to predict the future, they tend to come up with new ways for using existing technology, but not new technology. Another example: every future prediction story from 100 years ago always talks about flying cars, but never something like teleportation. A flying car is something we might see in the future, but to me, that is not futuristic. A system that delivers a newspaper to your door using pneumatic tubes might have been cool in 1950, but today I read the news from around the world on my cell phone.
Instead of thinking of new ways to solve the same problems(delivering a physical newspaper to someone) we should be taking one more step back and thinking of new problems to solve. The problem shouldn't be "How to deliver a newspaper to someones door", it should be "How to deliver the news to someone". If you think about it, this is also the same problem the recording industry ran into. Too much time spent thinking about ways to sell people shiny disks, not enough time spent thinking about how to sell people music.
Not to pick on Russel Coker, but I think he falls into the same trap:
In 2020 a device the size of an iPaQ H39xx with USB (for keyboard and mouse) and the new DisplayPort [5] digital display interface (that is used in recent Lenovo laptops [6]) would make a great PDA/desktop
While I think such a device would be cool, I hope that in 2020 we are not still using keyboards and mice and hardwired displays.
Here are some ideas for future portable computers(if such things even exist, we may just wirelessly tap into a global grid.. who knows)
There have been a lot of attempts at voice input, and I don't think it will ever work. I can type a lot faster than I can talk, and I can think even faster. I think the future in input devices is going to involve some kind of biofeedback input.. at least to replace mice.
We are already starting to see the development of tiny embedded projectors. I think all portable devices are going to have them within a few years. Even more futuristic would be something like a wireless high resolution heads up display embedded in a contact lense. Actually, that isn't nearly futuristic enough.. I'm sure someone will invent a device that meshes directly with your brain so you do not even have to read a display.
odd nmap timings
Posted Fri Aug 22 22:02:33 2008Back story
A section on a web application I have pings (using a background AJAX request) a list of IP addresses. Most of the time all of these adresses are up, sometimes one or two of them are down. One day I noticed that if all of them were down, nmap would take much longer to ping them all.
The odd part
Lets ping 19 addresses on my home network, none of which exist.
justin@latitude:~$ time nmap -sP 192.168.5.2-20 Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:56 EDT Nmap done: 19 IP addresses (0 hosts up) scanned in 4.072 seconds real 0m4.081s user 0m0.068s sys 0m0.004s
Ok... now lets add the routers address, which is pingable.
justin@latitude:~$ time nmap -sP 192.168.5.1-19 Starting Nmap 4.53 ( http://insecure.org ) at 2008-08-22 21:58 EDT Host router (192.168.5.1) appears to be up. Nmap done: 19 IP addresses (1 host up) scanned in 2.258 seconds real 0m2.259s user 0m0.048s sys 0m0.008s
Notice anything odd?
I have experimented with the usual host timeout and max rtt time options, but I am not sure what the problem is. As soon as I get a chance I will look into the code. I am not sure if it is a BUG or just user error. A simple strace of the two commands show much different 'select' behaviour.