tl;dr

Introduction

Recently I’ve been reading some buzz about GPU powered terminal emulators (“terminals”) — in particular kitty and alacritty. The developers claim that these terminals perform significantly better than traditional counterparts, such as xterm, konsole, st etc. Unfortunately the justification and characterisation of these claims is rather lacking; I don’t know how kitty is benchmarked, and I believe that alacritty is benchmarked using the developers vtebench tool. The latter is markedly lacking in rigour. Hence here I present a scheme for benchmarking terminals and present some results calculated on my rig.

The performance claims are poorly characterised. Alacritty, for example, claims

…it should be faster than any other terminal emulator available.

which is rather vague. I infer that the claimed performance is in the responsiveness of the terminal. I expect that on a high-performing terminal commands are executed quickly, and output is produced on screen and scrolls rapidly. Consider a low-performing terminal, such as the TTY: this terminal takes a second or more to refresh the screen when new output is produced. Hence to benchmark a terminal essentially all I need to do is record how long it takes to output some fixed amount of data to the terminal. In order to be rigorous I need to:

  1. Produce output many times, in order fill up the screen;
  2. Subtract the cost of simply running an executable; and
  3. Repeat the measurement many times and take some simple statistics.

A brief perusal of vtebench shows that it lacks this rigour. One can naturally extend this scheme with more complicated printing recipes, and more powerful statistics. Additionally, I intuit that ones choice of shell may be significant. Hence I implement this scheme as both fish and bash scripts, and compare the results.

Results

Here I present plots showing the time it takes for a benchmark to run, for many runs. There are six benchmarks, in two cases (groups): in one case I use the built-in echo to produce output, and in the other I use the external lessecho. For each case I run a reference benchmark of the form echo > /dev/null i.e. how long does it take to execute the command, ignoring output? This benchmark is run 500 times. I then run a benchmark of the form echo > /dev/stderr (I use stderr for technical reasons, so I can record times; see appendix). I run this benchmark 500 times. Finally I take the median execution time (from the first benchmark) from the second, producing an adjusted benchmark which considers only the time to print to the terminal. All benchmarks are then repeated for each terminal. Here I compare xterm, kitty, and alacrityy, and all benchmarks are run in tmux (hence the benchmarks closely mirror actual use).

xterm

My primary terminal is the venerable xterm. It’s a bit of a pain in the arse to configure, but works quite well and subjectively has high-performance. First consider the performance of the built-in echo.

Reference benchmark for executing built-in echo on xterm, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Reference benchmark for executing built-in echo on xterm, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Unadjusted benchmark for executing built-in echo on xterm, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal.
Unadjusted benchmark for executing built-in echo on xterm, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal.
Adjusted benchmark for executing built-in echo. Both fish and bash have very similar performance, although the spread for fish is markedly greater. This may indicate room for optimisation in fish. The similar performance also indicates that the approach is reasonable; intuitively, the time to print to the terminal should be independent of shell. Note the median time ~14 ms is close to 60 Hz, and well below the 100 ms threshold suggested for UI performance.
Adjusted benchmark for executing built-in echo. Both fish and bash have very similar performance, although the spread for fish is markedly greater. This may indicate room for optimisation in fish. The similar performance also indicates that the approach is reasonable; intuitively, the time to print to the terminal should be independent of shell. Note the median time ~14 ms is close to 60 Hz, and well below the 100 ms threshold suggested for UI performance. 1

Now consider the external lessecho.

Reference benchmark for executing built-in echo on xterm, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. fish does appear slightly faster.
Reference benchmark for executing built-in echo on xterm, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. fish does appear slightly faster.
Unadjusted benchmark for executing built-in echo on xterm, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too.
Unadjusted benchmark for executing built-in echo on xterm, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too.
Adjusted benchmark for executing built-in echo. The performance difference in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is much closer to 100 ms so here the performance is not so high.
Adjusted benchmark for executing built-in echo. The performance difference in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is much closer to 100 ms so here the performance is not so high.

alacritty

First consider the performance of the built-in echo.

Reference benchmark for executing built-in echo on alacritty, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Reference benchmark for executing built-in echo on alacritty, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Unadjusted benchmark for executing built-in echo on alacritty, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal. Again, there is also significant spread in the fish results.
Unadjusted benchmark for executing built-in echo on alacritty, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal. Again, there is also significant spread in the fish results.
Adjusted benchmark for executing built-in echo. Both fish and bash have similar performance, and again the spread for fish is markedly greater. From a UX perspective, this spread is a little concerning: consistent performance is important too. Note the bash median time ~20 ms is close to 50 Hz, and well below the 100 ms threshold suggested for UI performance. Given the spread of fish, this conversion is not justifiable. It is strictly below the 100 ms threshold, but not by much. Importantly alacritty is slower than xterm, in this benchmark.
Adjusted benchmark for executing built-in echo. Both fish and bash have similar performance, and again the spread for fish is markedly greater. From a UX perspective, this spread is a little concerning: consistent performance is important too. Note the bash median time ~20 ms is close to 50 Hz, and well below the 100 ms threshold suggested for UI performance. Given the spread of fish, this conversion is not justifiable. It is strictly below the 100 ms threshold, but not by much. Importantly alacritty is slower than xterm, in this benchmark.

Now consider the external lessecho.

Reference benchmark for executing built-in echo on alacritty, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. bash does appear slightly faster, inverting the xterm result.
Reference benchmark for executing built-in echo on alacritty, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. bash does appear slightly faster, inverting the xterm result.
Unadjusted benchmark for executing built-in echo on alacritty, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too. Additionally, there is some slight drift in the bash results. Perhaps some background process is influencing these results?
Unadjusted benchmark for executing built-in echo on alacritty, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too. Additionally, there is some slight drift in the bash results. Perhaps some background process is influencing these results?
Adjusted benchmark for executing built-in echo. The performance difference in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is much closer to 100 ms so here the performance is not so high. Again, alacritty is slower than xterm, in this benchmark.
Adjusted benchmark for executing built-in echo. The performance difference in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is much closer to 100 ms so here the performance is not so high. Again, alacritty is slower than xterm, in this benchmark.

kitty

First consider the performance of the built-in echo.

Reference benchmark for executing built-in echo on kitty, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Reference benchmark for executing built-in echo on kitty, in fish and bash. Note the significant performance difference: it would be interesting to compare to the latest (v3) fish version; there may be some work to be done in optimising fish.
Unadjusted benchmark for executing built-in echo on kitty, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal. The spread in the fish results is less significant here.
Unadjusted benchmark for executing built-in echo on kitty, in fish and bash. Note the times are a little higher than in the reference, so it appears there is some cost to printing to the terminal. The spread in the fish results is less significant here.
Adjusted benchmark for executing built-in echo. Both fish and bash have similar performance, and again the spread for fish is markedly greater. From a UX perspective, this spread is a little concerning: consistent performance is important too. Note the bash median time ~20 ms is close to 50 Hz, and well below the 100 ms threshold suggested for UI performance. Given the spread of fish, this conversion is not justifiable. It is strictly below the 100 ms threshold, and significantly so. Importantly kitty has similar performance to xterm, in this benchmark.
Adjusted benchmark for executing built-in echo. Both fish and bash have similar performance, and again the spread for fish is markedly greater. From a UX perspective, this spread is a little concerning: consistent performance is important too. Note the bash median time ~20 ms is close to 50 Hz, and well below the 100 ms threshold suggested for UI performance. Given the spread of fish, this conversion is not justifiable. It is strictly below the 100 ms threshold, and significantly so. Importantly kitty has similar performance to xterm, in this benchmark.

Now consider the external lessecho.

Reference benchmark for executing built-in echo on kitty, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. fish does appear slightly faster, similar to the xterm result.
Reference benchmark for executing built-in echo on kitty, in fish and bash. Note the similar performance: this seems an obvious result given the nature of executables and shells. fish does appear slightly faster, similar to the xterm result.
Unadjusted benchmark for executing built-in echo on kitty, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too. Additionally, there is some slight drift in both the bash and fish results. Perhaps some background process is influencing these results? The spread is also significant in both cases.
Unadjusted benchmark for executing built-in echo on kitty, in fish and bash. Again, note that the times are a little higher than in the reference. It seems unusual to me that fish is noticably slower too. Additionally, there is some slight drift in both the bash and fish results. Perhaps some background process is influencing these results? The spread is also significant in both cases.
Adjusted benchmark for executing built-in echo. The performance difference, spread, and drift in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is relatively poor — this is the worst performing case. Again, kitty is slower than xterm, in this benchmark.
Adjusted benchmark for executing built-in echo. The performance difference, spread, and drift in the unadjusted benchmark is reflected here. This is unsettling — perhaps there is a flaw in the benchmark, or some unusual behaviour in the implementation of lessecho. Note the median time is relatively poor — this is the worst performing case. Again, kitty is slower than xterm, in this benchmark.

Discussion

I’m particularly surprised by the difference in fish and bash performance. This is something to look further into; I’d be interested in getting some feedback on this point. The rest of the results are fairly reasonable, and there may be some value in repeating the benchmarks on other systems. Note that it may take >15 min to run a benchmark.

Summary

Median adjusted benchmark times for each terminal (an adjusted benchmark is the time to print a line to the terminal, 500 times, minus time to execute echo)
xterm kitty alacritty
bash 14.09±2.639 20.51±3.768 18.9±2.547
fish 15.32±10.84 0.1433±10.8 45.44±15.8


Table: Median adjusted benchmark times for each terminal (an adjusted benchmark is the time to print a line to the terminal, 500 times, minus time to execute lessecho)

xterm kitty alacritty
bash 70.6±14.87 111.2±29.14 136.1±15.59
fish 183.9±15.6 243.2±21.73 243.8±23.99

Appendices

Environment

All benchmarks were taken on my ThinkPad T410 (NVidia GT218M/NVS 3100M, i5 520M). All benchmarks were run in tmux with no other interactive processes running. Each benchmark was run 1000 times. xterm has only cosmetic configuration; alacritty and kitty have default configuration. I run fish v2.7.1, bash 4.4.18, xterm 337, kitty v0.12.3, alacritty 0.2.2. I run Funtoo with kernel v4.19.1. My kernel is configure very close to Debian’s.

Scripts and data

All scripts and data are here.


  1. See, for example: https://stackoverflow.com/q/536300, https://stackoverflow.com/q/6880856, https://www.reddit.com/r/askscience/comments/ue3mm/whats_the_shortest_amount_of_time_we_can_percieve/