Benchmark: trimming whitespace in bash

25 Mar 2022

Post #3 in this bash benchmark series, measuring the speed of common bash text manipulations.

Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]

Trimming leading/trailing whitespace

Bash benchmarks

This is about the removing of spaces in the beginning and at the end of a line, within a bash script.

using `awk`

This has been my go-to for trimming strings, after I picked it up from stackoverflow.com. The long version there

awk '
    function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
    function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
    function trim(s) { return rtrim(ltrim(s)); }
    {return trim($0)}
'

can be condensed to a one-liner:

Command: awk '{sub(/^[ \t\r\n]+/, ""); sub(/[ \t\r\n]+$/, ""); print}'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using `sed`

for the sed version, we need to it to perform 2 substitute operations in a row:

Command: sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using `xargs`

It is not the main function of xargs, but one side effect it has can be used to remove whitespace. However, it does not work well on multi-line inputs, as it will join all lines in 1.

Command: xargs
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1 And this is #2' ## on 1 line instead of 2

using `php`

PHP has a trim() function built in, so we can use it in a while loop:

Command: php -r 'while($f = fgets(STDIN)){ printf("%s\n", trim($f)) ; }'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using variable expansion

There is a way to do this in pure bash via variable expansion.

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"   
    printf '%s' "$var"
}

I’ve added this to my benchmark as follows:

Command: '$(line="${line#"${line%%[![:space:]]*}"}"; echo "${line%"${line##*[![:space:]]}"}")'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1        And this is #2' ## also this method cannot manage multiline inputs

Benchmark via pforret/bash_benchmarks

I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.

method	throughput	invocation
awk	312 MB/s	255 ops/sec
sed	56 MB/S	903 ops/sec
xargs	21 MB/s	623 ops/sec
php	267 MB/s	61 op/sec
${line#…}	5 MB/s	1650 ops/sec

Some lessons from these benchmarks:

invocation speed for awk (+- 250 ops/sec), sed (+- 900 ops/sec) and php (+- 60 ops/sec) remains stable for most of these bash benchmarks. It will be interesting to see if this changes if e.g. awk get a lot of instructions, like we will see with romanization of text (soon).
the last variable expansion method might be fast to execute for 1 line, but no one remembers such a complex command. It’s hard to beat the simplicity of $(<<< $input xargs)

So what is my recommendation for trimming spaces from lines?

if you need to trim a single line of text, use xargs. It’s just so easy.
if you need to trim all lines in a file, use awk

Peter Forret

Benchmark: trimming whitespace in bash

Trimming leading/trailing whitespace

using awk

using sed

using xargs

using php