Tuesday, May 08, 2007

OpenKapow Fun: Scraping an AJAX Site

I'm interested in web services which represent operations that make a persistent change in the non-web world. Like buying a ordering food or checking in for a flight. These capabilities exist as human-powered web workflows today, but rarely as remixable web services. Sooner or later, that's going to change.

Many of these services resist initial attempts at scraping and remixing because they contain AJAX elements such as scripts that rewrite the DOM. Scraping such a site is more than a little challenging, since you either need to analyze exactly what the scripts do and try to grab and run just the scripts you want, or else you need to emulate a Javascript enabled browser and then scrape the screen when the script is finished and the DOM is in the "right" state.

One tool that seems promising in the quest to tame these sites is the openkapow service host and its Robomaker IDE, which both work by hosting a Javascript-enabled browsing engine. The IDE combines this engine with a set of tag-finding and flow-control tools so that you can point-and-click your way to a script that automates the target site.

I've been looking for an excuse to build something transactional with Robomaker, and I hacked up REST services that execute a checkin or an offload ("un-checkin") for a passenger on a United Airlines flight. It worked for me on a couple of flights. But without having a large sample of itineraries that I could abuse in the testing process, I was a little uncomfortable with how the robot might handle some multisegment flights. The source (".robot" XML files) are available here though if you want to try it out or tinker. Seat selection would be an interesting and nontrivial feature to add...

So I reined in my ambitions a little and created a service to get flight schedules, in order to add automation to a small Office 2007 tool I'm working on.

The Airwise flight schedule page seems simple enough at first glance, but turns out to be one of those pages that uses script to write the DOM data that the user ultimately sees. That makes it a perfect candidate for ... RoboMaker!

It was straightforward to configure my robot to enter travel dates and airports into the form, click "go," and find the table with the results. What I wanted to do was look through the table rows, create XML snippets for the data, and package them up into a response.

But here's where my inexperience with RoboMaker and impatience got the best of me: once I got beyond the point-and-click part of automating the web page, I wanted to just write some imperative code to pack up my XML. Since RoboMaker is a Java app, I wished I could just write a micro-plugin for this stage of my robot in Java. I found the deep spelunking in dialog boxes, squirrelly regular expressions, and mediocre help docs to be frustrating.

Maybe the idea is to give me a taste of Kapow so I'll license their enterprise technology, which isn't free. Not sure about that. But I did know that I could dump the entire flight schedule table as the service response and deal with it on the client. So I selected to return the whole table content as "advanced structured text" (which basically translates into plaintext with newlines in between every table cell).

Although on the client it's trivial to parse the resulting data set and find what I want, I feel a little guilty about the ugliness of the XML response. You can view/use/download/edit the REST robot on the openkapow portal, and if you run it with the defaults you can see the mildly embarrassing chunk of XML produced (in the browser, the newlines in XML are ignored, so it looks even worse). A REST call looks like this.

I got over my issues with the service, though, and moved on to using the data for my Office plugin, which I'll post about soon.


Anonymous said...

sometimes you can go to earn the Pirates of the Burning Sea Gold for your own life and you can also get a lot of happiness in the game if you play the game well. you can brush the potbs gold to upgrade and then you are very strong. You think that you want to play the game well and then you can get the potbs Doubloon for your own. you can earn the potbs money alone and you can give some for your friends; it can make you very happy I think. As long as you play the game you can go to buy potbs Doubloon as the rewards in this game.

The game has a lot of LOTRO Gold, as long as you join to play this game you will get
them. you should try your best to earn as much Lord Of The Rings Gold as you can, so you are strong and no one can
fight you. you can go to buy LOTRO Gold in the game, so many other players want to play with you together. As long as you have the cheap Lord Of The Rings Gold you will be strong and you can go to kill the monsters to upgrade alone.

Anonymous said...

aion chinaaion china gold,
aion cn goldaion chinese gold,
aion gold chinaaion gold chinese,
china aion goldchinese aion gold,
aion china kinaaion chinese kina,
aion kina chinachina aion kina,
aion china buybuy aion china,
aion chinese server goldaion cn server gold,
aion china server goldchina aion server gold,
chinese aion server goldaion chinese server gold,
aion cn server kinaaion china server kina,
china aion server kinachinese aion server kina

SYSRJ said...



teragoldd said...

Tera gold
wow gold

Tera gold
Tera Item
Tera account

Ding Michelle said...

While at this point outlawed regarding people not to ever disclose, quite a few are silver jewelry still possibly not aware of the policies as well as refuse the brand new judgments.

Seacanoeist Mark said...

I liked your article, I will share your article to everyone!!

WoW gold|Diablo 3 Gold|RS Gold|Cheap Diablo 3 Gold