Do GPU powered terminal emulators perform better than traditional terminal emulators?

Recently I’ve been reading some buzz about GPU powered terminal emulators (“terminals”) — in particular kitty and alacritty. The developers claim that these terminals perform significantly better than traditional counterparts, such as xterm, konsole, st etc. Unfortunately the justification and characterisation of these claims is rather lacking; I don’t know how kitty is benchmarked, and I believe that alacritty is benchmarked using the developers vtebench tool. The latter is markedly lacking in rigour. Hence here I present a scheme for benchmarking terminals and present some results calculated on my rig.

which is rather vague. I infer that the claimed performance is in the responsiveness of the terminal. I expect that on a high-performing terminal commands are executed quickly, and output is produced on screen and scrolls rapidly. Consider a low-performing terminal, such as the TTY: this terminal takes a second or more to refresh the screen when new output is produced. Hence to benchmark a terminal essentially all I need to do is record how long it takes to output some fixed amount of data to the terminal. In order to be rigorous I need to:

A brief perusal of vtebench shows that it lacks this rigour. One can naturally extend this scheme with more complicated printing recipes, and more powerful statistics. Additionally, I intuit that ones choice of shell may be significant. Hence I implement this scheme as both fish and bash scripts, and compare the results.

Results

Here I present plots showing the time it takes for a benchmark to run, for many runs. There are six benchmarks, in two cases (groups): in one case I use the built-in echo to produce output, and in the other I use the external lessecho. For each case I run a reference benchmark of the form echo > /dev/null i.e. how long does it take to execute the command, ignoring output? This benchmark is run 500 times. I then run a benchmark of the form echo > /dev/stderr (I use stderr for technical reasons, so I can record times; see appendix). I run this benchmark 500 times. Finally I take the median execution time (from the first benchmark) from the second, producing an adjusted benchmark which considers only the time to print to the terminal. All benchmarks are then repeated for each terminal. Here I compare xterm, kitty, and alacrityy, and all benchmarks are run in tmux (hence the benchmarks closely mirror actual use).

xterm

My primary terminal is the venerable xterm. It’s a bit of a pain in the arse to configure, but works quite well and subjectively has high-performance. First consider the performance of the built-in echo.

alacritty

kitty

Discussion

I’m particularly surprised by the difference in fish and bash performance. This is something to look further into; I’d be interested in getting some feedback on this point. The rest of the results are fairly reasonable, and there may be some value in repeating the benchmarks on other systems. Note that it may take >15 min to run a benchmark.

Summary

Median adjusted benchmark times for each terminal (an adjusted benchmark is the time to print a line to the terminal, 500 times, minus time to execute `echo`)
	xterm	kitty	alacritty
bash	14.09±2.639	20.51±3.768	18.9±2.547
fish	15.32±10.84	0.1433±10.8	45.44±15.8

Table: Median adjusted benchmark times for each terminal (an adjusted benchmark is the time to print a line to the terminal, 500 times, minus time to execute lessecho)

Appendices

Environment

All benchmarks were taken on my ThinkPad T410 (NVidia GT218M/NVS 3100M, i5 520M). All benchmarks were run in tmux with no other interactive processes running. Each benchmark was run 1000 times. xterm has only cosmetic configuration; alacritty and kitty have default configuration. I run fish v2.7.1, bash 4.4.18, xterm 337, kitty v0.12.3, alacritty 0.2.2. I run Funtoo with kernel v4.19.1. My kernel is configure very close to Debian’s.

Scripts and data

	xterm	kitty	alacritty
bash	70.6±14.87	111.2±29.14	136.1±15.59
fish	183.9±15.6	243.2±21.73	243.8±23.99

See, for example: https://stackoverflow.com/q/536300, https://stackoverflow.com/q/6880856, https://www.reddit.com/r/askscience/comments/ue3mm/whats_the_shortest_amount_of_time_we_can_percieve/↩