Benchmark: cut characters in bash

Post #1 in this bash benchmark series, measuring the speed of common bash text manipulations.

Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]

Cutting the first 20 characters from each line

Bash benchmarks

Imagine you want just the first 20 characters of a string , or of every line of a file. How would you do this in bash?

Using cut

cut -c1-20

Using awk

awk '{print substr($0,1,20)}'`

Using bash variable

echo ${line:0:20}`

Benchmark via pforret/bash_benchmarks

I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.

method throughput invocation
awk 328 MB/s (!) 247 ops/sec
cut 43 MB/s 903 ops/sec
${line:0:20} 8 MB/s 10638 ops/sec (!)

Some lessons from these benchmarks:

So what is my recommendation for cutting N chars off?

More info: pforret.github.io/bash_benchmarks/chars.html

💬 bash 🏷 benchmark 🏷 awk 🏷 cut 🏷 bash-benchmark 🏷 shell