7-Zip LZMA Benchmark

ISA CPU Threads Frequency
(MHz)
Compressing
(MIPS)
Decompressing
(MIPS)
ARM Hisilicon SD5113 (ARM11) 530 113 283
Marvell XScale PXA270 520 85 270
Marvell Kirkwood 88F6281 (SheevaPlug) 1200 385 710
Qualcomm QSD8250 (Snapdragon) 1000 430 700
Qualcomm Krait 300
MSM8230AB
Snapdragon 400
2 cores
1 1728 850 1480
2 1350 2900
4 1620 2900
Freescale i.MX515 (Cortex-A8) 800 325 645
TI DM3730 (Cortex-A8) 1000 480 900
Samsung Hummingbird (Cortex-A8) 1000 560 930
Allwinner A20
Cortex-A7
2 cores
1 1000 470 810
2 750 1560
4 880 1560
Samsung Exynos 4210 (Cortex-A9)
2 cores
1 1200 790 1080
2 1180 2140
4 1380 2130
Samsung Exynos 4412 (Cortex-A9)
4 cores
Android
1 (v5) 1400 660 1180
1 (v5 Thumb) 530 745
1 (v7 Thumb-2) 710 1010
1 740 1210
2 1200 2400
4 1700 4700
Samsung Exynos 4412 (Cortex-A9)
4 cores, Linux
1 1600 900 1350
4 2460 5200
Samsung Exynos 5250 (Cortex-A15)
2 cores
1 1700 1350 1830
2 2270 3560
4 2450 3540
Unknown
(Cortex-A15)
4 cores
1 1900 1470 2000
2 2500 3830
4 4200 7300
8 4660 7300
MIPS TI AR7 (MIPS 4K) 150 53 107
Broadcom BCM6338 (MIPS32) 240 64 110
Broadcom BCM4718 (MIPS32 74K) 480 200 300
Atheros AR9344 (MIPS32 74K) (4 KB pages) 560 220 360
(16 KB pages) 253 363
ICT Loongson 2F 800 440 570
ICT Loongson 3A
4 cores
1 (32MB pages) 900 645 650
1 476 650
2 900 1260
4 1400 2400
6 1440 2400
PowerPC IBM Cell PPE
1 core, 2 threads
1 3200 720 1060
2 900 1500
4 1000 1500
IBM PowerPC 970FX (G5) 1800 750 1330
IBM PowerPC 970MP (G5)
4 cores
1 2500 1230 2050
2 2500 4000
4 4400 8000
IBM POWER7
8 cores, 32 threads
1 (T0) 3550 2700 3350
1 (T3) 2200 2870
2 (T0/T1) 4100 5030
2 (T0/T3) 3770 5000
2 (T2/T3) 3250 3900
4 (T0-T3) 4800 7100
6 (T0-T3) 5200 7100
32 (T0-T31) 35000 56000
40 (T0-T31) 37000 56000
SPARC Sun UltraSPARC II
6 cores
1 400 280 290
6 1130 1700
10 1400 1700
Sun UltraSPARC IIe 520 300 365
Sun UltraSPARC IIIi 1000 600 780
Sun UltraSPARC T1
8 cores, 32 threads
1 1000 344 426
8 1740 2600
32 3000 6100
64 4000 6000
MCST R1000
4 CPUs, 16 cores
1 800 420 620
4 1400 2450
8 2400 4760
16 2750 7170
Elbrus MCST Elbrus-2C+
2 cores
1 500 600 585
2 830 1170
4 1050 1140
PA-RISC HP PA-8600
2 CPUs
1 552400327
2 780 645
IA-64 Intel Itanium 2
2 cores
1 130012101220
2 1500 2430
4 2230 2400
x86 VIA C7 1500 470 730
VIA L4700E (VIA Nano)
4 cores
1 1200 1060 1010
4 3400 3900
AMD Am386DX 40 6 6
Cyrix 486 dx2 66 13 23
AMD K5 75 69 81
AMD Geode LX800 500 230 260
AMD K6-2 500 260 440
AMD E-350 (Bobcat) 1 1600 1120 1480
2 2080 2900
AMD A4-5000 (Jaguar) 4 1500 3800 5400
AMD Athlon 64 X2 (K8) 1 2000 1800 2080
2 3400 4170
AMD Phenom II X4 965 (K10) 4 3400 11900 13500
AMD Phenom II X6 1100T (K10) 6 3300 16200 19600
AMD FX-8350 (Piledriver) 8 4000 22800 24900
Intel Pentium 100 64 62
Intel Pentium MMX 200 130 120
Intel Atom N270
(1 core, 2 threads)
1 1600 700 900
2 1000 1500
Intel Atom N2800
(2 cores, 4 threads)
1 1862 870 1160
2 1640 2260
4 2540 3530
8 2700 3500
Intel Atom Z2760 (2 cores) (Cloverview) 4 1800 2100 3400
Intel Atom Z3740 (4 cores) (Silvermont) 4 1860- 3900 5900
Intel Atom Z3770 (4 cores) (Silvermont) 4 2400- 4500 7000
Intel Atom C2750 (8 cores) (Silvermont) 8 2400+ 13500
Intel Pentium 4 (180 nm) 1700 760 760
Intel Pentium 4 (130 nm)
1 core, 2 threads
1 2400 1220 1080
2 1500 1780
Intel Pentium 4 (65 nm)
1 core, 2 threads
1 3000 1500 1530
2 2000 2330
Intel Pentium II
2 CPUs
1 350 290 300
2 410 600
4 520 590
Intel Celeron (P6) 1200 760 980
Intel Pentium III-S
2 CPUs
1 1400 980 1250
2 1600 2380
Intel Core 2 (1 core) 2000 2000 2000
Intel Core 2 Quad Q9550 4 2833 9340 11100
Intel i5-650 (Westmere)
2 cores, 4 threads
Turbo Boost disabled
1 3200 3150 3180
2 6150 6200
4 8200 9460
Intel Xeon x5650 (Westmere)
1 CPU, 6 cores, 12 threads
Turbo Boost disabled
1 2670 3100 2600
2 (1 core) 4360 3800
12 (1 cpu) 16200 22700
Intel i7 920 (4 cores) 8 2666 15700 16800
Intel i7 875K (4 cores) 8 2933+ 19000 19700
Intel i7 980X (6 cores) 12 3333 29000 30800
Intel i3-2120 (Sandy Bridge)
2 cores, 4 threads
1 3300 3800 3450
2 7200 6800
4 9000 9400
6 10100 9300
Intel i7 2600K (4 cores) 8 3400+ 20100 20700
Intel i7 3960X (6 cores) 12 3300+ 31900 31500
Intel i7 3770 (Ivy Bridge)
4 cores, 8 threads
Turbo Boost disabled
Single RAM channel
Linux (THP off)
GCC-4.6.3 -O3
1 3400 4200 3760
2 (1 core) 6300 5000
2 8500 7400
4 15100 14700
8 21500 20300
Intel i7 3770K (4 cores) 8 3500+ 23700 22300
Intel i7 4770 (Haswell)
4 cores, 8 threads
Turbo Boost disabled
1 3400 4000 4000
8 20500 21000
Intel E5-2697v2 2x (2 sockets, 24 cores) 48 ? 2700+ 83000 95600


LZMA Benchmark Description

The LZMA benchmark shows a rating in MIPS (million instructions per second). The rating value is calculated from the measured speed, and it is normalized with results of Intel Core 2 CPU with multi-threading option switched off. So if you have modern CPU from Intel or AMD, rating values in single-thread mode must be close to real CPU frequency.

The test data that is used for compression in that test is produced with special algorithm, that creates data stream that has some properties of real data, like text or execution code. Note that the speed of LZMA for real data can be slightly different.

Compression speed strongly depends from memory (RAM) latency, Data Cache size/speed and TLB. Out-of-Order execution feature of CPU is also important for that test.

Decompression speed strongly depends on CPU integer operations. The most important things for that test are: branch misprediction penalty (the length of pipeline) and the latencies of 32-bit instructions ("multiply", "shift", "add" and other). The decompression test has very high number of unpredictable branches. Note that some CPU architectures (for example, ARM) support instructions that can be conditionally executed. So such CPUs can work without branches (and without pipeline flushing) in many cases in LZMA decompression code. And such CPUs can have some speed advantages over other architectures that don't support complex conditionally execution. Out-of-Order execution capability is not so important for LZMA Decompression.

The test code doesn't use FPU and SSE. Most of the code is 32-bit integer code. Only some minor part in compression code uses also 64-bit integers. RAM and Cache bandwidth are not so important for these tests. The latencies are much more important.

The CPU's IPC (Instructions per cycle) rate is not very high for these tests. The estimated value of test's IPC is 1 (one instruction per cycle) for modern CPU. The compression test has big number of random accesses to RAM and Data Cache. So big part of execution time the CPU waits the data from Data Cache or from RAM. The decompression test has big number of pipeline flushes after mispredicted branches. Such low IPC means that there are some unloaded CPU resources. But the CPU with Hyper-Threading feature can load these CPU resources using two threads. So Hyper-Threading provides pretty big improvement in these tests.

LZMA in multithreading mode

When you specify (N*2) threads for test, the program creates N copies of LZMA encoder, and each LZMA encoder instance compresses separated block of test data. Each LZMA encoder instance creates 3 unsymmetrical execution threads: two big threads and one small thread. The total CPU load for these 3 threads can vary from 140% to 200%. To provide better CPU load during compression, we also test the mode, where the number of benchmark threads is larger than the number of hardware threads.

Each LZMA encoder instance in multithreading mode divides the task of compression into 3 different tasks, where each task is executed in separated thread. Each of these tasks is simpler than original task, and it uses less memory. So each thread uses the data cache and TLB more effectively in multithreading mode. And LZMA encoder is slightly more effective in multithreading mode in value of "the Speed" divided to "CPU usage".

Note that there is some data traffic between 3 threads of LZMA encoder. So data exchange bandwidth via memory between CPU threads is also can be important, especially in multi-core system with big number of cores or CPUs.

All LZMA decoder threads are symmetrical and independent. So the decompression test uses all hardware threads, if the number of hardware threads is used.

LZMA results

We use benchmark results for 32 MB dictionary ("25:" line in results of console version). If 32 MB dictionary results are not available, we use the results for smaller dictionary. Most x86 tests were performed on Windows with official 7-Zip binaries. Some tests were performed in 64-bit mode. Most of the tests for other platforms were performed with p7zip compiled by GCC with speed optimization.


You can download binaries and source code of 7-Zip benchmark here:

7-Zip

LZMA SDK

7-Benchmark (Memlat and Pipelen)

If you have new interesting results, write about them on 7-max forum:

7-max forum