Benchmark: trimming whitespace in bash
25 Mar 2022Post #3 in this bash benchmark series, measuring the speed of common bash text manipulations.
Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]
Trimming leading/trailing whitespace
This is about the removing of spaces in the beginning and at the end of a line, within a bash script.
using awk
This has been my go-to for trimming strings, after I picked it up from stackoverflow.com. The long version there
awk '
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
{return trim($0)}
'
can be condensed to a one-liner:
Command: awk '{sub(/^[ \t\r\n]+/, ""); sub(/[ \t\r\n]+$/, ""); print}'
Before: ' This is sentence #1
And this is #2 '
After : 'This is sentence #1
And this is #2'
using sed
for the sed version, we need to it to perform 2 substitute operations in a row:
Command: sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
Before: ' This is sentence #1
And this is #2 '
After : 'This is sentence #1
And this is #2'
using xargs
It is not the main function of xargs
, but one side effect it has can be used to remove whitespace. However, it does not work well on multi-line inputs, as it will join all lines in 1.
Command: xargs
Before: ' This is sentence #1
And this is #2 '
After : 'This is sentence #1 And this is #2' ## on 1 line instead of 2
using php
PHP has a trim() function built in, so we can use it in a while loop:
Command: php -r 'while($f = fgets(STDIN)){ printf("%s\n", trim($f)) ; }'
Before: ' This is sentence #1
And this is #2 '
After : 'This is sentence #1
And this is #2'
using variable expansion
There is a way to do this in pure bash via variable expansion.
trim() {
local var="$*"
# remove leading whitespace characters
var="${var#"${var%%[![:space:]]*}"}"
# remove trailing whitespace characters
var="${var%"${var##*[![:space:]]}"}"
printf '%s' "$var"
}
I’ve added this to my benchmark as follows:
Command: '$(line="${line#"${line%%[![:space:]]*}"}"; echo "${line%"${line##*[![:space:]]}"}")'
Before: ' This is sentence #1
And this is #2 '
After : 'This is sentence #1 And this is #2' ## also this method cannot manage multiline inputs
Benchmark via pforret/bash_benchmarks
I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.
method | throughput | invocation |
---|---|---|
awk | 312 MB/s | 255 ops/sec |
sed | 56 MB/S | 903 ops/sec |
xargs | 21 MB/s | 623 ops/sec |
php | 267 MB/s | 61 op/sec |
${line#…} | 5 MB/s | 1650 ops/sec |
Some lessons from these benchmarks:
- invocation speed for
awk
(+- 250 ops/sec),sed
(+- 900 ops/sec) andphp
(+- 60 ops/sec) remains stable for most of these bash benchmarks. It will be interesting to see if this changes if e.g.awk
get a lot of instructions, like we will see with romanization of text (soon). - the last variable expansion method might be fast to execute for 1 line, but no one remembers such a complex command. It’s hard to beat the simplicity of
$(<<< $input xargs)
So what is my recommendation for trimming spaces from lines?
- if you need to trim a single line of text, use
xargs
. It’s just so easy. - if you need to trim all lines in a file, use
awk