Google is listening: searching audio
04 Oct 2004There are rumours that Google would be rolling out search functionality for audio files. It is true that currently no sound files (.wav, .mp3, .wma, .mov, .ogg, …) files can be found by Google’s “normal search”, except for the odd mis-indexed mp3 file.
Multimedia search is a fascinating topic, let’s talk about audio for a moment:
INDEXING
- target = WAV, MP3, WMA, RAM, AIFF, MOV, OGG, … files. Playlist files (M3U, ASX, SMIL, …) would be essential to include too, since they often are the link between a web page and the audio they publish, and also provide a link between ‘related’ audio items. Maybe also MIDI files.
- 1st level indexing: no file content is indexed, only the URL and the content of the HTML page that referred to it. This is what Google Images does.
- 2nd level indexing: most audio formats have meta-data (artist, song, album, date, …) listed in the beginning of the file. This can be retrieved by just taking the first KB of an audio file and extracting that information – so one does not need to download the whole file.
- 3rd level indexing: the full content of the file is also converted to a text format and indexed. This is what Google currently does with PDF and DOC files. How does one convert audio to text? For spoken word, speech recognition comes to mind. An impressive example is HP Speechbot, that has converted 14.928 hours of radio programs into searchable text. For music files, one could use the lyrics as a text representation.
- Now that PODcasting (publishing MP3’s of discussions, conferences, radio programs … which are then included in RSS feeds and automatically downloaded by subscribed users) is becoming something of a hype (e.g. Adam Curry’s Daily Sourcecode), there will be a lot of information in audio-only format (I haven’t seen too many people also providing transcriptions – it’s a dirty job). It is crucial there that indexing is done on 3rd level, otherwise it is useless.
SEARCHING
-
preview: what should the results of a search look like? Just text (artist, song, album, year, filesize) or also audio samples? To make the analogy with Google Images, provide a 30 seconds 28Kbps MP3 preview file (which would be approx. 60KB) for every audio file found.
-
ranking: which search hits are shown first? Something like an audio SoundRank could be invented. My guess is that this is easier than the real Google PageRank, since audio files are typically an endpoint, they do not ‘cast votes’ to other URLs, so there is no recursivity in calculating this SoundRank. Every page/playlist referring to an audio file would represent a vote. Maybe high PageRank sites should have a heavier weight.
- Currently existing audio search engines:
- Search on “Donna Summer”
- AllTheWeb Audio Search: 104 results
- Altavista Audio Search: 245 results
- Espew.com: 53 results
- Lycos Multimedia Search: 350 results
- Singing Fish: 1867(!) results
- More audio search engines can be found on Google Directory: MP3 Search Engines
SIDE REMARKS
- What are the legal implications of pointing people to content that you don’t know is legal? I don’t see Google setting up BitTorrent/ISO search soon.
- There is a technology Shazam Music Recognition that allows one to call a number, point your phone at the speakers while a song is playing and get the name/title/ringtone of that song on your mobile. Or this technology by Fraunhofer Institute: query by humming.
- Judging by the PODcasts I’ve already heard, any speech recognition used on it should be extremely robust, due to many VoIP audio issues: low bandwidth, delay, glitches. Hopefully this will improve in the future. Skype, the PODcast edition?
Inspired by: oristus.com via Google Blogoscoped