Benchmark: cut characters in bash

25 Mar 2022

Post #1 in this bash benchmark series, measuring the speed of common bash text manipulations.

Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]

Cutting the first 20 characters from each line

Bash benchmarks

Imagine you want just the first 20 characters of a string , or of every line of a file. How would you do this in bash?

Using `cut`

the most common way to do this, is probably with the program cut

cut -c1-20

Using `awk`

another way is to use awk: `

awk '{print substr($0,1,20)}'`

Using bash variable

you could also do character cutting during variable expansion:

echo ${line:0:20}`

Benchmark via pforret/bash_benchmarks

I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.

method	throughput	invocation
awk	328 MB/s (!)	247 ops/sec
cut	43 MB/s	903 ops/sec
${line:0:20}	8 MB/s	10638 ops/sec (!)

Some lessons from these benchmarks:

${line:0:20} has low throughput because of the whole while read -r line construction, but the invocation speed is less than 100 nanoseconds (+-10600 ops/sec), which makes it the fastest method by far when you have just 1 line as input.
awk’s throughput speed is almost 10 times faster than cut.

So what is my recommendation for cutting N chars off?

if you need to cut the first chars of a small string, use ${string:0:<length>}.
if you need to cut the first chars of each line in a larger txt/csv file, use awk

More info: pforret.github.io/bash_benchmarks/chars.html

Peter Forret

Benchmark: cut characters in bash

Cutting the first 20 characters from each line

Using cut

Using awk

Using bash variable

Benchmark via pforret/bash_benchmarks

Also on this blog ...

Using `cut`

Using `awk`