Benchmark: trimming whitespace in bash

Post #3 in this bash benchmark series, measuring the speed of common bash text manipulations.

Update: I now have developed bash benchmarking for both throughput (MB/s) and invocation (ops/sec) speed in my project, combined with all kinds of other improvements, so the content in this article was updated [2022-04-08]

Trimming leading/trailing whitespace

Bash benchmarks

This is about the removing of spaces in the beginning and at the end of a line, within a bash script.

using awk

This has been my go-to for trimming strings, after I picked it up from stackoverflow.com. The long version there

awk '
    function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
    function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
    function trim(s) { return rtrim(ltrim(s)); }
    {return trim($0)}
'

can be condensed to a one-liner:

Command: awk '{sub(/^[ \t\r\n]+/, ""); sub(/[ \t\r\n]+$/, ""); print}'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using sed

for the sed version, we need to it to perform 2 substitute operations in a row:

Command: sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using xargs

It is not the main function of xargs, but one side effect it has can be used to remove whitespace. However, it does not work well on multi-line inputs, as it will join all lines in 1.

Command: xargs
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1 And this is #2' ## on 1 line instead of 2

using php

PHP has a trim() function built in, so we can use it in a while loop:

Command: php -r 'while($f = fgets(STDIN)){ printf("%s\n", trim($f)) ; }'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1
And this is #2'

using variable expansion

There is a way to do this in pure bash via variable expansion.

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"   
    printf '%s' "$var"
}

I’ve added this to my benchmark as follows:

Command: '$(line="${line#"${line%%[![:space:]]*}"}"; echo "${line%"${line##*[![:space:]]}"}")'
Before: '   This is sentence #1
       And this is #2    '
After : 'This is sentence #1        And this is #2' ## also this method cannot manage multiline inputs

Benchmark via pforret/bash_benchmarks

I will focus here on the relative speeds compared to each other, the absolute speeds depend on your machine, and my 2021 MacBookPro M1 16” is quite fast. I’ve tested these benchmarks on a Ubuntu-on-Windows WSL1 environment, and that is wayyyyyy slower.

method throughput invocation
awk 312 MB/s 255 ops/sec
sed 56 MB/S 903 ops/sec
xargs 21 MB/s 623 ops/sec
php 267 MB/s 61 op/sec
${line#…} 5 MB/s 1650 ops/sec

Some lessons from these benchmarks:

So what is my recommendation for trimming spaces from lines?

💬 bash 🏷 benchmark 🏷 sed 🏷 tr 🏷 awk 🏷 bash-benchmark