Saturday, March 15, 2008

Scriptella - Like a Bat Out of ETL!

The Scriptella project has been around a while now. If you haven't checked it out it's really something worth looking at.

Prerequisites:
1. You hate data crap and you always seem to be asked to do it.
2. Brainiac biz guy needs data imported from 3rd party losers: X.
3. You are out of time since you've been browsing articles on
prostitution and Eliot Spitzer all week.
Well fret no more!

Scriptella may be a little lite on the examples, but an hour or so of coding leaves you pretty darn impressed.

You basically setup a few connections in a syntax somewhat related to properties and type only what's necessary to get the job done.

I estimate it takes about 5 minutes of work to have you downloading xml/csv files, parsing them, inserting into the database. Each input is configured in a few seconds and then the beauty of the technology is that it lets you reference variables created from other devices. Available datasources include csv, xml, various jdbc usuals, jexl, janino and a generic 'script' driver.

The script driver is great as it allows the .etl file to use a scripting language to to either invoke Java code or simply create data. By default this is the Javascript Rhino implementation. Badass! Now if I could only figure out how to use it to edit the data on the way in... That would be wicked cool.

Put this all together and this is what I churned out in about 5 minutes of work.

1. Downloaded an xml file
2. Parsed and had them inserted in prepared statements
3. output a simple log on every X records.

Brilliant!

Keep up the good work Scriptella team!