Rss Feed
Facebook button
Reddit button
Delicious button

Archive for the 'bandwidth' Category

Page 3 of 3

Podcast hosting: cheap or free?


Podcasting is a fun hobby, but leaves you with several tens to hundreds of megabytes of MP3 files to host. If your podcast turns out to be popular, you might also have over 20GB of file downloads per month (‘bandwidth’). This rules out any free hosting option like Geocities or even your local ISP. What are the other options?

CCPublisher:

free
Creative Commons, together with Archive.org, offer you the option to host your content for free. This is directed towards CC-licensed or open-source audio, so your own speech or your own music. Don’t use it to host illegal/copyright-troubled content.

idisk.mac.com:

$100/year (or $8.5/month)
if you’re already a subscriber to Apple’s .Mac program, this is an easy option. It is not the fastest or most reliable option.

libsyn.com:

starts at $5/month (up to $30)
built for podcasting: based on the #MB you add per month, not on the #GB downloaded per month (so the cost is predictable). Has detailed statistics (although some graphics would be nice). “Liberated Syndication is podcasting made easy”

bluehost.com:

$6.95/month (2-year subscription)
2 GB storage, 75 GB/month bandwidth. Is a general purpose hoster, so if you want to add the actual podcast blog to it, you can (you can add a WordPress blog through the Fantastico interface)

EV1Servers VPS:

$39/month
for the bigger fish: 10GB of storage, 100GB/month bandwidth. If even this is not enough, you can go up to a $99/mon fully dedicated server: 60GB storage, 1000GB bandwidth.

For up-to-date information, keep an eye on the podcasters Yahoo! group.

Technorati:

CD-to-MP3 ripping speed estimation

As every sensible car-owner in Brussels, I rip my CDs to MP3 so I can put copies of them in my car. As every self-respecting geek, I have multiple PCs at home. Which brings me to following observation: not all PCs rip alike. On one PC the CPU maxes out at 100% for the whole ripping procedure, and on the other, I never get above 75%. So I started wondering: what are the elements to define the maximum ripping speed you can get on a PC?
My hunch:

the CD-ROM drive speed:

the original CD audio specification required a constant data rate. This was implemented by running the CD at 500 rpm for the first/inner tracks on the CD (ø 48mm) and at 200 rpm for the outer tracks (ø 118mm). If the CD would have been played at a constant 500 rpm, the data rate at the end would have been 500/200 = 2,5X. (cf Devnulled: Ripping speed)
With CD-ROM the data should be delivered as fast as possible. So the rotation speed is turned up as much as possible. The physical boundaries are the vibrations and the centrifugal forces that occur at high speeds. Maxwell claims the maximum safe speed is 48X. Since the “48X” is marketing speak, this speed is only obtained at the outer border of the CD: this means that the rotation speed would be 48 x 200 = 9200 rpm. Some CDs seem to explode above 10.000 rpm.
To convert this speed into a data rate: at 9200 rpm, the outer tracks would deliver 48x the data rate of an audio CD: 67,74 Mbps or 8.47 MB/s. The first tracks, at ø 48mm, deliver data 2,5 times slower: 27,52 Mbps or 3,44 MB/s.
Real-life tests of a whole bunch of drives on DAE speed results.
For the exact sizes: CD-R/CD-RW technical specifications

the bus speeds:

the CD-ROM drive is connected to the PC by a ATAPI, SCSI, FireWire or USB connection. In theory there could also be a network in between (e.g. when using a Ethernet connected CD Jukebox).
The slowest ATA-33 has a theoretical max throughput of 33MB/s. Most modern SCSIs go above 20MB/s and FireWire gives 50 MB/s. So they would not be the bottleneck in the ripping process.
USB1.1 is limited to 1,5 MB/s (in practice even lower). Most common networks would be a bottleneck too (even Fast Ethernet at a theoretical 12,5 MB/s since 7MB/s would be more of a realistic top rate in practice, certainly if the network is used for other stuff too. Same thing with WiFi standards: 802.11g’s advertised “54Mbps” will in real life never translate in an actual 6,75MB/s throughput.

the CPU speed:

encoding raw audio data to MP3 is CPU intensive. Main parameter will be the clock speed – which I would expect to scale linearly: a 2GHz processor does it twice as fast as a 1GHz. Extra influences: brand of processor (Intel/AMD), model (Celeron/Pentium4/Athlon/Athlon64), number of processors (or HyperThreading). Also, the software you use to encode (LAME/GOGO/RealPlayer/Windows Media Player/…) will have an impact.
Some data can be found on GamePC.com: an Intel P4 3.06 GHz encodes 200MB of raw data info 160 kbps MP3 in 57 seconds: 3,5 MB/s or 20X. The AMD AthlonXP 2700+: 3,28 MB/s or 18.6. More info on GamePC.com confirms our hunch that performance scales linearly with clock speed. For the Pentium4: (1,15 MB/s) per GHz or 6,5X per GHz.

the MP3 bitrate:

the above numbers are for 160 kbps, but what with 192 kbps and 64 kbps? Is encoding faster or slower? I found no data on the net, and I haven’t tested it myself. So no hunch here. Also, the output of the encoding process, even at a very high quality 320kbps is largely within the capacity of any output, even Bluetooth, god forbid. So I don’t take that parameter into account.


So in the following situation:

  • a 24X CD-ROM drive
  • a Pentium 4 2,8GHz processor
  • ripping with the LAME encoder to 160 kbps

Your ripping will start at about 9,8X and speed up until your CPU is saturated at 18,2X. Which gives the graphic at the right. Now there’s a rule of fist.

Remark: looking at the benchmarks, adding a second processor (or HyperThreading) does not enhance the ripping speed (probably since the MP3 encoding code does not do parallelisation). But if you have 2 CPU’s, only one CPU will go to 100% and you keep some breathing room while your PC is creating the MP3s.

Binary confusion: kilobytes and kibibytes

When I created my Bandwidth Calculator, easily the most popular web tool I ever made, I came across the following problem: in computer technology there is a habit of using kilobyte (KB) as 1024 bytes, megabyte (MB) as 1024*1024 (1.048.576) bytes. Most of you might think this is correct, but it’s not. The International System of Units (SI) (that defines the kilo, mega, giga, … and milli, micro, nano prefixes) uses only base 10 values. A kilo is always 1000, even for bytes. In order to find a solution for the IT ‘contamination’ of using kilo for 210 instead of 103, the IEC introduced new units in 1998:

In 1999, the International Electrotechnical Commission (IEC) published Amendment 2 to “IEC 60027-2: Letter symbols to be used in electrical technology – Part 2: Telecommunications and electronics”;. This standard, which had been approved in 1998, introduced the prefixes kibi-, mebi-, gibi-, tebi-, pebi-, exbi-, to be used in specifying binary multiples of a quantity. The names come from the first two letters of the original SI prefixes followed by bi which is short for “binary”. It also clarifies that, from the point of view of the IEC, the SI prefixes only have their base-10 meaning and never have a base-2 meaning.
(from en.wikipedia.org)

So this is the correct usage for file, disk, memory size:

Kilobytes (KB) 1.000 Kibibyte (KiB) 1024
Megabyte (MB) 1.000 ^ 2 Mebibyte (MiB) 1024 ^ 2
Gigabyte (GB) 1.000 ^ 3 Gibibyte (GiB) 1024 ^ 3
Terabyte (TB) 1.000 ^ 4 Tebibyte (TiB) 1024 ^ 4
Petabyte (PB) 1.000 ^ 5 Pebibyte (PiB) 1024 ^ 5

The problem is: the industry has not adopted these standards. If Windows shows the size of a disk, it converts 28.735.078.400 bytes to “26.7 GB”. It should be either 28.7 GB, or 26.7 GiB. Remember the 1.44MB floppy? It actually never existed: it is either 1.40MiB or 1.47MB.

On September 18 2003 Reuters has reported that Apple, Dell, Gateway, Hewlett-Packard, IBM, Sharp, Sony and Toshiba have been sued in a class-action suit in Los Angeles Superior Court for “deceiving” the true capacity of their hard drives. This of course was due to ambiguity of “GB” when used by software and hardware vendors. This precedent might prompt Apple to adapt binary prefixes in its Mac OS, as well as other companies to put pressure on Microsoft to adapt them in its Windows operating systems.
from members.optus.net

One could argue: people have always used the MB = 1024*1024 for disk drives, why change now? Well, clarity is a good reason, and unambiguity. NASA lost the Mars Orbiter because engineers had mixed metric speed (km/h) with English speed (mi/h). Don’t even get me started on miles per gallon.

So: a disk of 160GB should have 160.000.000.000 bytes. And it is about 150GiB. Get over it.

How do you move a terabyte?


I recently discovered Brewster Kahle’s speech on the NotCon ‘04 podcast about the ambition of The Internet Archive to archive absolutely everything (all books, all movies, all music, …). (There is an excellent transcript on www.hotales.org .) They are currently setting up a second datacentre in Amsterdam, as an off-site copy of the original archive.org. They use massive parallel storage nodes grouped together in a PetaBox rack. You actually need 10 Petaboxes to get to 1 Petabyte (1 rack = 80 servers x 4 disks x 300 GB/disk = +- 100 TB). Since the rack uses node-to-node replication (every node has a sister node that holds a copy of all its data, so that if one of both nodes crashes, the data is still available), the net storage is 50TB.
So this got me thinking: how do you ‘copy’ the contents of PetaBox A to PetaBox B, how do you move 50TB?
Let’s try some numbers from my bandwidth calculator:

  • ISP: a regular ADSL/cable throughput is 1,3 TB/month. 50TB would take 38 months, almost 3,3 years – a bit slow.
  • LAN: a 100Mbps connection can theoretically deliver 32 TB/month, but let’s take 15 TB/month as a reasonable real throughput: 50TB would take 3,3 months. Gigabit Ethernet can go to 10TB/day max, so might realistically enable a full transfer in 7 days.
  • WAN: a dedicated OC12 optical line delivers a 6.72 TB/day. Even if in practice this would be only 3TB/day, this cuts the copy time to 17 days. With OC48, this goes up to a theoretical 26,4 TB/day, so the transfer is possible in something like 3-4 days.

Can this be done without a network connection? Can we tap the data out of one system, put it on some kind of transport and reload it at the new location? (see Microsoft’s Jim Gray who calls this a kind of ‘sneakernet‘)

  • use a third PetaBox C, set it up next to PetaBox A, connect them via Gigabit Ethernet, let them synchronize for 7 days, put PetaBox C on a truck/boat/plane, hope not too many disks are damaged during transport (this is a tricky bit, if both copies of a file are lost, you can start again), set it up next to PetaBox B and again let them replicate for a week. If the total procedure takes three weeks, you’ve just moved data at 2,17 TB/day or about 200 Mbps.
  • copy everything to Apple Xserve RAID systems. These have 14 disks of 400 GB, which is 5,4 TB/system unprotected storage, or (using RAID-5 per set of 7 disks) 4,8 TB. Since it uses a 400MB/s (3.2 Gbps) Fibre-Channel interface, the disk speed should not be a bottleneck. A system is filled over Gigabit Ethernet in a bit more than half a day (let’s say 1 week for all data), and you need 11 systems to store all data. Luckily, you can start shipping the first Apple RAID right after it’s filled, while the 2nd Xserve is still busy being filled. A fully equipped Xserve RAID weighs 45kg/110pounds, so you’ll need more than an envelope, but let’s say you could ship it anywhere in 2 days. Then the whole procedure will take 7 + 2 + 1 days = 10 days, which is a 5 TB/day or 463 Mbps transfer rate.
  • You could do something similar with Lacie Bigger Disk Extreme 1.6TB disks (although in my experience, these type of disks do not support continuous writing very well). Their bottleneck is probably the FireWire-800 write speed, which can be estimated at 25 MB/s or 90GB/hour. This means that it takes 17 hours to fill a Bigger Disk 1.6TB. You could probably fill several disks at the same time, since the Gigabit Ethernet can easily deliver that. In total you would need at least 32 full disks, but since there is no redundancy on the disks, you would need a system to check if all objects were copied correctly on the target system. This you could do by exchanging lists of object identifiers, file sizes and hashes, probably in files that are ‘only’ megabytes. So let’s say you need 40 disks (some objects will be transferred a 2nd or 3rd time if they arrived in bad state). We can ship them in packages of 5 disks – that’s 8TB at a time. These 5 disks take something like 30 hours to load (if we can always load 3 disks simultaneously). Total procedure: 8 * 30hrs + 2 days shipping + 30hrs to load the last pack: 13.25 days or 3,7 TB/day (350 Mbps).
  • There is the Sun StorEdge L500 Tape Library that could backup the complete 50TB, using up to 400 LTO cartridges. Its speed is 126 GB/hour or about 3 TB/day. So it would easily take over a month to backup PetaBox A, ship the StorEdge and restore the data to PetaBox B. That’s less than 150 Mbps.
  • just for fun: you would need over 60.000 CD-ROMs to pack those 50 TB. Don’t even think about how long it would take to actually write them, or who would write their unique number on the sleeves. There are double-sided writable DVDs of 8,75 GB each. With about 5800 of them, you could do the job.

This exercise is only half of the picture, of course. I did not take into account bandwidth, system, media and shipping prices. But since the PetaBox has no public pricetag, I didn’t bother searching for the other ones. Maybe later.

It’s the latency, stupid!

While working on some bandwidth-related stuff (my bandwidth calculator), I came across an excellent article on “latency vs. bandwidth” by Stuart Cheshire. It was originally written in 1996, so focuses a lot on modems, but Fact 1, 2 and 4 are still valid.

His points:

Fact One: Making more bandwidth is easy

You can just put enough slow connections in parallel to get a fast one.

Fact Two: Once you have bad latency you’re stuck with it

Parallel devices, compression, … nothing helps!

Fact Three: Current consumer devices have appallingly bad latency

Modems are evil (but now, with cable and ADSL, this is less of an issue)

Fact Four: Making limited bandwidth go further is easy

Compression and caching help a lot. (This article was written about the time MP3 was invented, but long before it became hugely popular. DivX came later, in 1999)

The following calculation is eye-opening:

# The distance from Stanford to Boston is 4320km.
# The speed of light in vacuum is 300 x 10^6 m/s.
# The speed of light in fibre is roughly 66% of the speed of light in vacuum.
# The speed of light in fibre is 300 x 10^6 m/s * 0.66 = 200 x 10^6 m/s.
# The one-way delay to Boston is 4320 km / 200 x 10^6 m/s = 21.6ms.
# The round-trip time to Boston and back is 43.2ms.
# The current ping time from Stanford to Boston over today’s Internet is about 85ms:
[cheshire@nitro]$ ping -c 1 lcs.mit.edu
PING lcs.mit.edu (18.26.0.36): 56 data bytes
64 bytes from 18.26.0.36: icmp_seq=0 ttl=238 time=84.5 ms

# So: the hardware of the Internet can currently achieve within a factor of two of the speed of light.

Definitions of latency:

Latency, a synonym for delay, is an expression of how much time it takes for a packet of data to get from one designated point to another

techtarget.com

Latency is the time a message takes to traverse a system

wikipedia.org