Benchmark: cut characters in bash
25 Mar 2022Post #1 in this bash benchmark series, measuring the speed of common bash text manipulations.
Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]
Cutting the first 20 characters from each line
Imagine you want just the first 20 characters of a string , or of every line of a file. How would you do this in bash
?
Using cut
- the most common way to do this, is probably with the program
cut
cut -c1-20
Using awk
- another way is to use
awk
: `
awk '{print substr($0,1,20)}'`
Using bash variable
- you could also do character cutting during variable expansion:
echo ${line:0:20}`
Benchmark via pforret/bash_benchmarks
I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.
method | throughput | invocation |
---|---|---|
awk | 328 MB/s (!) | 247 ops/sec |
cut | 43 MB/s | 903 ops/sec |
${line:0:20} | 8 MB/s | 10638 ops/sec (!) |
Some lessons from these benchmarks:
${line:0:20}
has low throughput because of the wholewhile read -r line
construction, but the invocation speed is less than 100 nanoseconds (+-10600 ops/sec), which makes it the fastest method by far when you have just 1 line as input.awk
’s throughput speed is almost 10 times faster thancut
.
So what is my recommendation for cutting N chars off?
- if you need to cut the first chars of a small string, use
${string:0:<length>}
. - if you need to cut the first chars of each line in a larger txt/csv file, use
awk
More info: pforret.github.io/bash_benchmarks/chars.html