Packrat parser for Python

Wed, 5 Oct 2011

I found this in my files. I can't remember what I wanted to do with it.

Get it here: nthp.py

The great idea is someone elses and I can't find the link. Really! Google and Wikipedia are not my friends today. The kewords used to be "packrat parser" and "memoization".

See Packrat Parsing: a Practical Linear-Time Algorithm with Backtracking.

The great problem is this: bottom-up generates too many results, top-down with backtracking generates the same results too may times.

The great solution is to do top-down and remember the results.

The remaining problem is how to handle left recursion.

The code solves some problems, but not all.

robots.txt parser for Erlang

Mon, 4 Apr 2011

I couldn't find a robots.txt parser for Erlang, so I wrote my own. If it sucks, it's because it's my "Hello world" in Erlang. :)

Get it here: robots_txt.erl

It requires mochiweb_util.erl for parsing URLs.

Use it like this:

Eshell V5.7.4  (abort with ^G)
1> O = robots_txt:parse("# comment\n\nUser-agent: foo\nDisallow: /\n\nUser-agent: *\nDisallow:").
[{'User-agent',"foo"},
 {'Disallow',"/"},
 {'User-agent',"*"},
 {'Disallow',[]}]
2> robots_txt:is_allowed("foo", "/bar", O).
false
3> robots_txt:is_allowed("baz", "/bar", O).
true
4>

It handles only the "User-agent" and "Disallow" directives. It ignores the "Allow" directive.

It makes its decision on the first matching directive. It assumes that specific rules precede the generic ones. That is what Wikipedia and Facebook assume as well. Google, as a server, doesn't.

It shouldn't crash on junk.

It is inspired by Python's robotparser, but it doesn't insist on fetching the file itself.

Circadian clock for Arduino

Sun, 20 Mar 2011

Circadian clock for Arduino sets its time by observing dusk and dawn. Once it's set, it shouldn't be offset by bad weather.

The clock can trigger alarms at a certain time or relative to time of dawn or dusk. The alarms are triggered according to last synchronized state, so bad weather shouldn't affect them. Because the clock knows when it's not synchronized, we can choose when to make decisions based on criteria other than time.

The clock's time will be close to apparent solar time. Most of the time it won't match standard time.

The clock is useful for controlling events in your garden without ever having to set time manually, via NTP or via GPS.

Read the rest