Pipes + SQL = Structured Web Query Language

21 Feb 2007

Let’s remix 2 original observations:

In Yahoo! Pipes, what used to be a table in the relational database is now: a web page, an RSS feed, etc. The current list of sources includes: Yahoo! Search, Yahoo! Local, Fetch (RSS feeds), Google Base and Flickr. Each source can be searched or queried using either pre-defined or user-defined parameters. For example, there can be a search of all french restaurants in Chicago via Yahoo! Local. The data source and the searches can be mixed together (think emergence), using a reach set of operators. Among them is the iterator (which lets the user loop through the results), a counter and many other functions that facilitate cleaning, manipulating and recombining the information.
Yahoo! Pipes and The Web As Database via PoorButHappy

and this one:

Command line interfaces. Once that was all we had. Then they disappeared, replaced by what we thought was a great advance: GUIs. GUIs were – and still are – valuable, but they fail to scale to the demands of today’s systems. So now command line interfaces are back again, hiding under the name of search. Now you see them, now you don’t. Now you see them again. And they will get better and better with time: mark my words, that is my prediction for the future of interfaces.
jnd.org

Pipes + SQL = SWQL

Imagine Yahoo! Pipes had a command-line interface too:

An RSS or Atom feed acts like a small table. The columns for each item are: title, link, description, date, author, categories, enclosure, geo:coordinates. The object “feed” itself also has properties like title, link, description. To get a list of feed items sorted by title, and filtered on existence of an enclosure:
```
SELECT title, description, enclosure
FROM rss:http://podcast.example.com/feed/ as rss_feed
WHERE len(rss_feed.enclosure) > 0
ORDER BY rss_feed.title
```

</pre>

A web page is like a 1 record-table, with a title, body and date (if given by the server). If we take it even larger, any URL can be an object with a MIME-type (text/html for web pages, audio/mp3 for MP3 files, application/pdf for PDF files …), a title (empty except for web pages, feeds, PDF), a body (readable for HTML, XML … or just a blob for MP3, MPEG, FLV, …). XML files can also easily be accessed. Wouldn’t this be nice:
```
SELECT trailers.title, trailers.description, trailers.enclosure
FROM xml:http://apple-trailers.example.com/hd/trailers.xml as trailers
WHERE trailers.title in
( SELECT TOP 100 title
FROM xml:http://www.imdb.com/chart/top/top250.xml as imdb
WHERE imdb.release_date > '1 Jan 2000'
ORDER BY imdb.score DESC )
ORDER BY imdb.release_date
```

</pre>

Almosty reads like English, doesn't it?</li> 

  * There should be operators for comparing stuff, for parsing and iterating comma-separated lists (like the categories in a feed), for parsing HTML. Try to guess what the following would do: 
    <pre>FOR blogpost IN rss:http://blog.example.com/feed/ LOOP INSERT INTO links (href, title, inner_html,date) SELECT href, title, inner_html,blogpost.date FROM htmlparse(blogpost.description,"&lt;a&gt;") END LOOP

SELECT title, href as title, inner_html as description, date INTO special:output_rss FROM links </pre></ul>

Because, sometimes, a GUI is too much.

Peter Forret

Pipes + SQL = Structured Web Query Language

Pipes + SQL = SWQL

Also on this blog ...