Benchmark: MacOS tr vs GNU gtr

Post #6 in this bash benchmark series, measuring the speed of common bash text manipulations.

This one is a bit different. I will specifically be looking at the difference in performance and behaviour between the factory installed tr (/usr/bin/tr) and the optionally installable GNU version of the same tool, gtr.

If you want to try this on your Mac, install with brew install coreutils (and maybe also gnu-sed, gawk … )

Bash benchmarks

I installed the GNU coreutils on my Mac because I noticed that there were some –flags I couldn’t use with the native version. The first comparison I did between tr and gtr was very promising.

using tr

Command: 'tr [:upper:] [:lower:]'
Before: 'LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT'
After : 'lorem ipsum dolor sit amet, consectetur adipiscing elit'  (LANG = en_US.UTF-8)

using gtr

Command: 'gtr [:upper:] [:lower:]'
Before: 'LOREM IPSUM DOLOR SIT AMET, CONSECTETUR ADIPISCING ELIT'
After : 'lorem ipsum dolor sit amet, consectetur adipiscing elit'  (LANG = en_US.UTF-8)

This means gtr is 40x faster in throughput than the standard tr!

However, my enthusiasm quickly subsided when I did the following test.

Command: 'gtr [:upper:] [:lower:]'
Before: 'ŁORÈM ÎPSÙM DÔLÕR SIT AMÉT ŒßÞ'
After : '�or�m �ps�m d�l�r sit am�t ���'  (LANG = en_US.UTF-8)

It seems the GNU version of tr does not support accented characters and diacritics. In fact, the GNU tr page explicitly says that “GNU tr fully supports only safe single-byte locales, where each possible input byte represents a single character”. No Multi-byte, so no UTF-8. This makes it unusable for most of my bash benchmarks.

Benchmark via pforret/bash_benchmarks

method throughput invocation
tr 25 MB/s (!) 838 ops/sec
gtr 1020 MB/s 750 ops/sec

So what is my recommendation for tr or gtr?

💬 bash 🏷 benchmark 🏷 tr 🏷 bash-benchmark