7-Zip LZMA Benchmark

ISA CPU Threads Frequency
(MHz)
Compressing
(MIPS)
Decompressing
(MIPS)
ARM
Apple M1
8 cores
7-zip 21.03
clang-12 -O2
1 3200 7091 8068
2 3100 20728 15293
4 3000 36372 29051
8 4x 3000
4x 2064
50365 45009
Apple A12Z
8 cores
p7zip 16.02
1 2500 4410 3640
2 2380 10530 6860
4 2325 19930 13270
8 4x 2325
4x 1538
29510 20310
Apple A9 (Twister)
2 cores
1850 MHz (1 core)
1800 MHz (2 cores) ?
clang-4.0 -O3
1 arm32-thumb2 1940 1900
arm32 2100 1960
arm64-lsl 2500 2510
1 arm64 2480 2430
2 5760 4690
4 6380 4570
Hisilicon SD5113 (ARM11) 530 113 283
Marvell XScale PXA270 520 85 270
Marvell Kirkwood 88F6281 (SheevaPlug) 1200 385 710
Qualcomm QSD8250 (Snapdragon) 1000 430 700
Qualcomm Krait 300
MSM8230AB
Snapdragon 400
2 cores
1 1728 850 1480
2 1350 2900
4 1620 2900
Qualcomm Krait 400
MSM8974
Snapdragon 800
4 cores
1 2260 1120 2060
2 1800 3900
4 3100 7870
8 3500 8030
Qualcomm Snapdragon 835
8 cores
aarch64
1 2450 1935 2470
8 9500 13800
Freescale i.MX515 (Cortex-A8) 800 325 645
TI DM3730 (Cortex-A8) 1000 480 900
Samsung Hummingbird (Cortex-A8) 1000 560 930
Allwinner A20
Cortex-A7
2 cores
1 1000 470 810
2 750 1560
4 880 1560
Samsung Exynos 4210 (Cortex-A9)
2 cores
1 1200 790 1080
2 1180 2140
4 1380 2130
Samsung Exynos 4412 (Cortex-A9)
4 cores
Android
1 (v5) 1400 660 1180
1 (v5 Thumb) 530 745
1 (v7 Thumb-2) 710 1010
1 740 1210
2 1200 2400
4 1700 4700
Samsung Exynos 4412 (Cortex-A9)
4 cores, Linux
1 1600 900 1350
4 2460 5200
Samsung Exynos 5250 (Cortex-A15)
2 cores
1 1700 1350 1830
2 2270 3560
4 2450 3540
Rockchip RK3288 (Cortex-A17)
4 cores
7-Zip 21.06
arm32
1 1800 1889 2026
2 2907 3981
4 4921 7777
NVIDIA Tegra K1
(Cortex-A15)
4 cores
1 2200 1680 2320
2 2880 4600
4 5130 9100
8 5360 9100
NVIDIA Tegra K1
(Denver)
2 cores, 32-bit
1 2500 2220 2940
Amlogic S905
(Cortex-A53)
4 cores
1 1536

32-bit
arm32
880 1600
2 1430 3150
4 2560 5940
6 2820 5940
1 1536

32-bit
Thumb2
890 1370
2 1450 2700
4 2630 5130
6 2820 5150
1 1536

64-bit
860 1420
2 1450 2800
4 2600 5250
6 2850 5300
Snapdragon 855
Cortex-A55
4 cores
arm64, gcc-6 -O2, twrp
1 1780 1120 1830
2 2310 3490
4 4500 7080
Snapdragon 855
Cortex-A76
4 cores
gcc-6 -O2
arm64
twrp
1, Thumb2 2840 2640 3600
1, arm32 2690 4000
1 1x A76 2840
3x A76 2420
2830 3360
2 6680 5800
4 12440 11720
Snapdragon 855
8 cores
gcc-6 -O2, arm64, twrp
8 1x A76 2840
3x A76 2420
4x A53 1780
15550 18970
AMD Opteron A1170 (Cortex-A57)
8 cores
32-bit (ARMv7-A)
64-bit (aarch64)
1 2000
32-bit
2030 2650
8 12700 19700
1 2000
64-bit
2160 2040
8 13200 15800
APM X-Gene1
8 cores
32-bit (ARMv7-A)
64-bit (aarch64)
1 2400
32-bit
1620 2270
8 9600 17000
1 2400
64-bit
1770 1980
8 10500 14900
1 (2 MB pages) 2080 1980
8 (2 MB pages) 11400 14900
16 (2 MB pages) 13300 14900
Cavium ThunderX
12 virtual cores
64-bit (aarch64)
1 2000 1230 1970
12 13700 22200
RISC-V SiFive FU740 ( U74 )
4 cores
THP (2 MB pages)
1 1200 844 1108
2 1498 2169
4 2526 4070
MIPS
Cavium Octeon II
2 cores
THP (1 MB pages), n32 ABI
1 1000 750 880
2 1070 1700
4 1330 1700
TI AR7 (MIPS 4K) 150 53 107
Broadcom BCM6338 (MIPS32) 240 64 110
Atheros QCA9533 (MIPS 24Kc) 650 229 447
Broadcom BCM4718 (MIPS32 74K) 480 200 300
Atheros AR9344 (MIPS32 74K) (4 KB pages) 560 220 360
(16 KB pages) 253 363
Ingenic JZ4780
2 cores
1 1200 360 690
2 535 1300
4 657 1300
ICT Loongson 2F 800 440 570
ICT Loongson 3A
4 cores
1 (32MB pages) 900 645 650
1 476 650
2 900 1260
4 1400 2400
6 1440 2400
Broadcom BCM7356 (BRCM 5000 / Zephyr)
1 core, 2 threads
1 1300 454 808
2 688 1230
Baikal-T1 (MIPS P5600)
2 cores
1 1200 870 990
2 1460 1920
LoongArch Loongson 3A5000
4 cores
v21.06
1 2300 3764 2639
2 7408 5259
4 13783 10380
PowerPC IBM Cell PPE
1 core, 2 threads
1 3200 720 1060
2 900 1500
4 1000 1500
IBM PowerPC 970FX (G5) 1800 750 1330
IBM PowerPC 970MP (G5)
4 cores
1 2500 1230 2050
2 2500 4000
4 4400 8000
IBM POWER7
8 cores
4 threads per core
32 threads per CPU
1 (T0) 3550 2700 3350
1 (T3) 2200 2870
2 (T0/T1) 4100 5030
2 (T0/T3) 3770 5000
2 (T2/T3) 3250 3900
4 (1 core) 4800 7100
6 (1 core) 5200 7100
32 (8 cores) 35000 56000
40 (8 cores) 37000 56000
IBM POWER8
2 chips * 5 cores * 8 threads
10 cores
80 threads
1 (1 core) 3690 3200 3100
2 (1 core) 4400 5000
4 (1 core) 5900 6900
8 (1 core) 6900 7900
10 (1 core) 7300 7900
10 (5 cores) 15000 19000
20 (5 cores) 25000 32500
40 (5 cores) 30000 39000
80 (10 cores) 57000 74000
IBM POWER9
2 chips * 16 cores * 4 threads
32 cores
128 threads
1 3800
1 core
4090 3140
2 5540 4560
4 7260 7070
8 8370 7270
16 3300
1 socket
42300 41500
32 67000 61400
64 3200
1 socket
87900 92200
128 93600 90400
128 3200
2 sockets
159000 177000
SPARC Sun UltraSPARC II
6 cores
1 400 280 290
6 1130 1700
10 1400 1700
Sun UltraSPARC IIe 520 300 365
Sun UltraSPARC IIIi 1000 600 780
Sun UltraSPARC T1
8 cores, 32 threads
1 1000 344 426
8 1740 2600
32 3000 6100
64 4000 6000
Fujitsu SPARC64_VII
4 cores
8 threads
-m32 -O3
1 2520 -m64 1460 1940
1 2520 -m32 1580 2190
2 2180 2880
8 6430 11070
Oracle SPARC T5
1 core
8 threads
1 3600 2240 2100
2 3600 3230
4 4320 4570
8 4600 5460
MCST R1000
4 CPUs, 16 cores
64-bit
1 1000 577 793
4 2130 3059
8 3527 5915
16 5269 11332
MCST R2000
4 CPUs, 32 cores
64-bit
1 2000 826 1528
4 2962 5872
8 4933 11752
16 8392 22353
32 12441 41212
Elbrus MCST Elbrus-2C+
2 cores, 32-bit
1 500 675 644
2 937 1262
MCST Elbrus-4C
Elbrus 401-PC
4 cores
32-bit
1 800 1024 1038
2 1593 2069
4 3130 3954
MCST Elbrus-4C
Elbrus 404
4 sockets, 16 cores
64-bit
1 800 1085 1045
4 2936 4025
8 5891 7841
1611048 14643
MCST Elbrus-1C+
Elbrus 101-PC
1 core, 32-bit
1 1000 1301 1254
MCST Elbrus-8C
Elbrus 801-PC
8 cores
1 1300
e2k, 32-bit
1732 1689
4 5008 6587
8 9625 12857
1 1300
x86-64 RTC
GCC
1673 1680
4 5073 6468
8 9400 12424
MCST Elbrus-8C
Elbrus 804
4 sockets, 32 cores
e2k
64-bit
1 1200 1538 1536
4 4313 5868
8 8483 11366
1616202 21952
3228449 39894
MCST Elbrus-8SV
Elbrus 901
8 cores
1 1500
e2k, 32-bit
1922 1873
4 5632 7293
8 10768 14297
1 1500
e2k, 64-bit
1845 1791
4 5356 7013
8 10306 13754
1 1500
x86-64 RTC
GCC
1886 1813
4 5641 7283
8 10829 14370
MCST Elbrus-2S3
2 cores
2.0 GHz
1 7-Zip 16.02
32-bit
2349 2429
2 3681 4842
1 7-Zip 21.07
64-bit
2344 2560
2 3877 5106
MCST Elbrus-16S
2 sockets * 16 cores
32 cores
e2k, 64-bit
7-Zip 16.02
1 2000 2301 2391
8 12377 17218
1623373 32186
3241145 65514
MCST Elbrus-16S
16 cores
2.0 GHz
7-Zip 21.07
1 e2k, 64-bit 2553 2520
8 14741 20127
1628176 39802
1 x86-64 RTC
(no ASM)
2873 2247
8 16577 17498
1631544 34291
PA-RISC HP PA-8600
2 CPUs
1 552400327
2 780 645
IA-64 Intel Itanium 2
2 cores
1 130012101220
2 1500 2430
4 2230 2400
x86 VIA C7 1500 470 730
VIA L4700E (VIA Nano)
4 cores
1 1200 1060 1010
4 3400 3900
AMD Am386DX 40 6 6
AMD Am486 80 19 19
Cyrix 486 dx2 66 13 23
AMD K5 75 69 81
AMD Geode LX800 500 280 270
AMD K6-2 500 260 440
AMD E-350 (Bobcat) 1 1600 1120 1480
2 2080 2900
AMD AMD A8-6410 (Jaguar)
(4 cores)
@ 1800 MHz
1 1800 1450 1650
2 2600 3300
4 4400 6530
AMD Athlon 64 X2 (K8) 1 2000 1800 2080
2 3400 4170
AMD Phenom II X4 965 (K10) 4 3400 11900 13500
AMD Phenom II X6 1100T (K10) 6 3300 16200 19600
AMD FX-8350 (Piledriver) 8 4000 22800 24900
AMD FX-8300 (Piledriver)
4 modules, 8 threads
7-Zip 20.00
1 4200 4780 5390
4 3900 17500 19000
8 3800 27000 34300
Ryzen 1400
4 cores, 8 threads
Linux (THP off)
gcc-6 -m64 -O3
1 3450 3750 3420
8 3200 18700 20400
Ryzen 1700X
8 cores, 16 threads

Linux (THP off)
gcc-6 -m64 -O3
1 3900 4210 3900
2 (1 core) 6400 6300
8 (one CCX) 3500 20200 21600
8 30700 27000
16 35200 43700
Ryzen 3950X (Zen2)
16 cores, 32 threads
7-Zip 20.00
1 4400- 5930 8390
16 76900 113700
32 84600 182400
Ryzen 5600G (Zen3)
6 cores, 12 threads
7-Zip 21.07 (Linux)
1 4450 6775 9393
6 40913 48622
12 55507 83299
Intel Pentium 100 64 62
Intel Pentium MMX 200 130 120
Intel Atom N270
(1 core, 2 threads)
1 1600 700 900
2 1000 1500
Intel Atom N2800
(2 cores, 4 threads)
1 1862 870 1160
2 1640 2260
4 2540 3530
8 2700 3500
Intel Atom Z2760 (2 cores) (Cloverview) 4 1800 2100 3400
Intel Celeron N2840
Silvermont
2 cores
1 2580- 1620 2070
2 2420 4050
4 3080 4080
Intel Atom Z3740 (4 cores) (Silvermont) 4 1860- 3900 5900
Intel Atom Z3770 (4 cores) (Silvermont) 4 2400- 4500 7000
Intel Pentium N4200 (Goldmont)
4 cores
1 2500- 1600 2200
Intel Pentium 4 (180 nm) 1700 760 760
Intel Pentium 4 (130 nm)
1 core, 2 threads
1 2400 1220 1080
2 1500 1780
Intel Pentium 4 (65 nm)
1 core, 2 threads
1 3000 1500 1530
2 2000 2330
Intel Pentium II
2 CPUs
1 350 290 300
2 410 600
4 520 590
Intel Celeron (P6) 1200 760 980
Intel Pentium III-S
2 CPUs
1 1400 980 1250
2 1600 2380
Intel Core 2 (1 core) 2000 2000 2000
Intel Core 2 Quad Q9550 4 2833 9340 11100
Intel i5-650 (Westmere)
2 cores, 4 threads
Turbo Boost disabled
1 3200 3150 3180
2 6150 6200
4 8200 9460
Intel Xeon x5650 (Westmere)
1 CPU, 6 cores, 12 threads
Turbo Boost disabled
1 2670 3100 2600
2 (1 core) 4360 3800
12 (1 cpu) 16200 22700
Intel i7 920 (4 cores) 8 2666 15700 16800
Intel i7 875K (4 cores) 8 2933+ 19000 19700
Intel i7 980X (6 cores) 12 3333 29000 30800
Intel i3-2120 (Sandy Bridge)
2 cores, 4 threads
1 3300 3800 3450
2 7200 6800
4 9000 9400
6 10100 9300
Intel i7 2600K (4 cores) 8 3400+ 20100 20700
Intel i7 3960X (6 cores) 12 3300+ 31900 31500
Intel i7 3770 (Ivy Bridge)
4 cores, 8 threads
Turbo Boost disabled
Single RAM channel
Linux (THP off)
GCC-4.6.3 -O3
1 3400 4200 3760
2 (1 core) 6300 5000
2 8500 7400
4 15100 14700
8 21500 20300
Intel i7 3770K (4 cores) 8 3500+ 23700 22300
Intel i7 4770 (Haswell)
4 cores, 8 threads
Turbo Boost disabled
1 3400 4000 4000
8 20500 21000
Intel Core i7-5960X (8 cores, 16 threads) 16 3000+ 39600 40900
Intel E5-2697 v2 (2 sockets, 24 cores) 48 2700+ 85000 102000
Intel E5-2699 v3 (2 sockets, 36 cores) 72 2300+ 124000 141000
Intel E7-8890 v3 (4 sockets, 72 cores) 144 2500+ 228000 255000
Intel i7-6900K (Broadwell)
ver 9.22
1 4000 4900 4800
Intel Xeon E5-2699 v4 (Broadwell)
2 cpus, 44 cores, 88 threads
THP on
ver. 16.02
gcc-6 -O3
1 3600 5100 3900
2 (1 core) 3600 7350 5170
44 (1 cpu) 2800 107000 87000
88 2800 186000 174000
Intel i7-7700K (Kaby Lake)
ver 9.22
1 4000 4900 4700
Intel i7-6700 (Skylake)
4 cores, 8 threads
ver 9.22
1 4000 4640 4640
8 3400-4000 24700 22900
Intel i7-7820X (Skylake X)
8 cores, 16 threads
1 (9.22) 4300 4950 5080
1 (17.01) 4300 5300 4600
2 (1 core) (17.01) 4300 8270 6150
16 (17.01) 4000 49900 46700
Intel i7-1065G7 (Ice Lake)
4 cores, 8 threads
ver 9.22
1 3900- 3800 4020
8 3500- 23300 21200
Intel i7-1065G7 (Ice Lake)
4 cores, 8 threads
ver 19.02
1 3900- 5000 7000


LZMA Benchmark Description

The LZMA benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off, and measured with old version of 7-Zip.

The test data that is used for compression in that test is produced with special algorithm, that creates data stream that has some properties of real data, like text or execution code. Note that the speed of LZMA for real data can be slightly different. The data in benchmark workload is too artificial and is more random than real world data.

Compression

Compression speed strongly depends from memory (RAM) latency, Data Cache size/speed and TLB. Also it uses simple 32-bit integer instructions: "shift", "add", "multiply" and other. Out-of-Order execution feature of CPU is also important for that test.

Decompression

Decompression speed strongly depends on CPU integer operations. The most important things for that test are: branch misprediction penalty (the length of pipeline) and the latencies of 32-bit instructions ("multiply", "shift", "add" and other).

The decompression test has very high number of unpredictable branches. Note that some CPU architectures (for example, 32-bit ARM) support instructions that can be conditionally executed. So such CPUs can work without branches (and without pipeline flushing) in many cases in LZMA decompression code. And such CPUs can have some speed advantages over other architectures that don't support complex conditional execution.

Note: latest version of 7-Zip for x64 and arm64 contains optimized LZMA decoder that uses conditional move instructions instead of some of the unpredictable branches. That optimized code increases the speed of LZMA decompression in benchmark for up to 1.7 times.

Out-of-Order execution capability is not so important for LZMA Decompression.

ISA instructions, memory latency/bandwidth, IPC, SMT

The benchmark code doesn't use FPU and SSE. Most of the code is 32-bit integer code. Only some minor part in compression code uses also 64-bit integers.

The latencies of RAM/Cache are very important for compression speed.

The RAM bandwidth is not so important in single-thread compression/decompression, or if there is small number of working threads. But the RAM bandwidth can be main limiting factor for LZMA compression speed, if a big number of working threads are used.

The CPU's IPC (Instructions per cycle) rate is not very high for benchmark workloads. The estimated value of benchmark IPC is 1-2 (instructions per cycle) for modern CPU. The compression test has big number of random accesses to RAM and Data Cache. So big part of execution time the CPU waits the data from Data Cache or from RAM. The decompression test has big number of pipeline flushes after mispredicted branches and waiting for long dependency chains of instructions like 32-bit multiply. Such low IPC means that there are some unloaded CPU resources. And the CPU with SMT (Hyper-Threading) feature can load these free CPU resources using two threads. So SMT (Hyper-Threading) provides pretty big improvement in these tests.

LZMA in multithreading mode

When you specify (N*2) threads for test, the program creates N copies of LZMA encoder, and each LZMA encoder instance compresses separated block of test data. Each LZMA encoder instance creates 3 unsymmetrical execution threads: two big threads and one small thread. The total CPU load for these 3 threads can vary from 140% to 200%. To provide better CPU load during compression, we also can test the mode, where the number of benchmark threads is larger than the number of hardware threads.

Each LZMA encoder instance in multithreading mode divides the task of compression into 3 different tasks, where each task is executed in separated thread. Each of these tasks is simpler than original task, and it uses less memory. So each thread uses the data cache and TLB more effectively in multithreading mode. And LZMA encoder is slightly more effective in multithreading mode in value of "the Speed" divided to "CPU usage".

Note that there is some data traffic between 3 threads of LZMA encoder. So data exchange bandwidth via memory between CPU threads is also can be important, especially in multi-core system with big number of cores or CPUs.

All LZMA decoder threads are symmetrical and independent. So the decompression test uses all hardware threads, if the number of hardware threads is used.

LZMA results

We use benchmark results for 32 MB dictionary ("25:" line in results of console version). If 32 MB dictionary results are not available, we use the results for smaller dictionary. Most x86 tests were performed on Windows with official 7-Zip binaries. Some tests were performed in 64-bit mode. Most of the tests for other platforms were performed with p7zip compiled by GCC with speed optimization.

Note: new versions of 7-Zip provide improved performance. For example, latest versions of 7-Zip for x64 and arm64 platforms use optimized code for decompression written in assembler, so the rating results can be 1.7 times larger that with previous version of 7-Zip. But most results in the table represent measures performed with old version of 7-Zip before these optimizations. If some CPU was tested with new version of 7-Zip, there is a mark about version number of 7-Zip. If there is no version mark about version number, it was tested with version that provides the performance similar to version 7-Zip (p7zip) 16.02.


You can download binaries and source code of 7-Zip benchmark here:

7-Zip

LZMA SDK

7-Benchmark (Memlat and Pipelen)

If you have new interesting results, write about them on 7-max forum:

7-max forum