S.E.A.N.I.C.U.S.

Monday, July 17, 2006

Ruby as Enterprise "Glue"

Any experienced Rubyist knows that the language can be used to create short, powerful shell scripts and integrate disparate systems, even if the community is just now realizing how powerful Ruby can be in the enterprise.

For example, today I was encountered with two problems. As a result of cleartext, unencrypted email addresses on the KCKCC website, many at the college are receiving lots of spam. We are in the process of obfuscating all of these addresses using a pretty standard Javascript function that composes the address from its parts. However, we have around 8000 items in our web directory structure, only some of which contain email addresses and only some of which contain cleartext ones (we've already had one pass at encrypting some). I thought immediately, "Ruby and Regular Expressions are the answer!" Around 30 lines later (including comments), I had a powerful Ruby script that could run from the shell and change all exposed addresses to their obfuscated counterparts.

However, several tests revealed that not all of the cases were covered. Luckily we have our webserver in a nightly-rsync backup configuration with another box, so I was able to test it on that box without breaking the website. As Eric and I tested the script and watched its output, we realized that even if the script works 90% of the time, there are occasionally going to be problems. We decided that it was long overdue to backup the website into a Subversion repository.

So I ran an svn import on the root web folder into a fresh repository. Little did I know that Apache/mod_svn has a 2GB limit on commits!
In the middle of adding files, it hung.

Ruby script to the rescue!

My first attempted solution was to add each top-level directory individually. Unfortunately, even some of those were over 2GB in size! So I modified the script to add and commit each file one-by-one, which quickly resulted in 900+ commits and it wasn't even finished. Then, I had an epiphany and realized I could keep track of how big each file was, and when I had accrued a large enough commit size, I could commit multiple files, thus saving in number of commits and transfer time (commit operations are atomic, thus expensive). I quickly reached the end of the day before I realized that I was committing the whole directory and not the accrued files, thus it had to recurse the whole directory structure to figure out what had been added. So I just left it to go, since it seemed to be doing fine, albeit having slow commits. Another caveat I found is that svn will not return an error code when something can't be added - only a warning.

Here's the code, roughly, for the final script (that I left running at the end of the day):
#!/usr/bin/env ruby
# usage: svnimport.rb [directory] [commitsize]
Dir.chdir(ARGV.shift)
@maxsize = ARGV.shift.to_i
@size = 0
Dir["./**/*"].sort.each do |file|
fork do
exec("svn add -N \"#{file}\"")
end
Process.wait
@size += File.size(file) if $? == 0
if @size >= @maxsize
fork do
exec("svn commit -m \"Initial commit\"")
end
@size = 0
Process.wait
end
end

Thursday, July 13, 2006

Potential DSLs

Sidebar: Yes I haven't posted here in a while -- I've been trying to get RadiantCMS to bend to my needs in terms of blogging, but that may be a ways off. We now return you to your regularly scheduled blog post...

One of the recent patterns with Rails has been to move most things into Ruby. Most recently, developers are encouraged (but not required) to move their database definitions into ActiveRecord::Migration format, both through a change in the default environment.rb file, and through automatic migration creation when generating a model. "Cool!" I said when I saw that.

In listening to this week's Ruby on Rails Podcast the issue of YAML came up, and it got me thinking. Yes, YAML is very simple and cleanly defines both the database configuration and fixtures, but couldn't the same thing be accomplished in Ruby? Jamis and others have encouraged the use of DSLs to simplify and clarify code. Since Django uses Python to describe its database configuration, why not use Ruby for Rails'?

I'm not sure whether I know how to accomplish it, but I'm going to try! Here's a sample of my proposed database configuration DSL:
common {
driver "mysql"
username "root"
password ""
host "example.com"
port "3306"
}

development {
<< "common"
database "app_development"
}

production {
<< "common"
database "app_production"
}

test {
<< "common"
database "app_test"
}
Since the specifying of common attributes and the "including" of them into other configurations seems popular, I decided to allow that with the "<<" operator, similarly to the way it's done in YAML.

The fixtures DSL would follow a pretty similar pattern, with the fixture name, followed with a block and corresponding attributes.

I'll probably post the code here when I finish prototyping it.

Your thoughts?