Monday, January 14, 2008

Looking For: a Reverse-Search Web Service (aka Semantic Analysis the Easy Way)

For one of my projects, I'm looking for a web service that does search "backwards" to reveal page semantics. If anyone can point me in the right direction, I'm all ears!

What do I mean by backwards? A normal web search service (Google/Yahoo/MSLive API) takes a set of search terms and other conditions and returns the web pages that best match.

A reverse search takes a URL and returns the search terms for which this page scores well. I call this Semantic Analysis the Easy Way because, strictly speaking, it doesn't require actually understanding the content of the page -- yet you can get semantic data out. Of course the better your content analysis engine can understand the content, the better your search engine will work, so the hard problem figures in a little bit too.

The big search engines certainly have the data -- I'm sure it plays a big role in ad-placement mechanisms like AdSense. Just as the search engines expose their search APIs, confident you can't (or won't be allowed to) steal their search results and pretend they're your own indefinitely, they could theoretically expose the reverse search data too. But sadly, I haven't found one yet that does.

I did the requisite scan of the major players, looking through programmableweb, etc. No dice. (Although programmable has a link to an interesting "hard way" semantics service that did a nice job of analyzing the text I threw at it from real pages.) The closest I've come so far is the URL API, which will tell you the top tags associated with a specific URL -- valuable data indeed, but not the same thing.

Little help? Anyone?


