IBM POWER9

Configuration: IBM Power System LC922: IBM POWER9 3800 MHz (Nimbus), 2 sockets, 16 cores per chip, 4 threads per core), 256 GB (DDR4-2667).

Notes:

Power9 system now uses mitigations for Spectre/Meltdown vulnerabilities. One mitigation enables "L1D private per thread" mode. Each cache line in L1D has 2-bit thread ID. Only one thread can read data from L1D line in "L1D private" mode. If another thread reads data from such line, the line will be reloaded from L2 cache, so the access to shared data in L1D is 16 cycles instead of 4 cycles for 3 another threads from 4 SMT threads. And there is performance degradation, if threads use same data, for example, if there are some common tables with constants, and all threads read data from these tables.

2 MiB pages mode (64-bit Linux, Radix translation)

  Size        Latency       Increase   Description

  32 K     4                          
  64 K    10               6           + 12 (L2)        
 128 K    13               3           
 256 K    14               1
 512 K    15               1           
   1 M    26              11           + 21 (L3)
   2 M    31               5
   4 M    34               3           
   8 M    36               2 +    ns
  16 M    37 +  28 ns      1 + 28 ns   + 64 ns (RAM) 
  32 M    37 +  29 ns        +  1 ns
  64 M    37 +  46 ns        + 17 ns
 128 M    47 +  58 ns    10  + 12 ns   + 20 (ERAT miss)
 256 M    52 +  62 ns     5  +  4 ns   
 512 M    55 +  63 ns     3  +  1 ns 
   1 G    56 +  64 ns     1  +  1 ns 
   2 G    75 +  64 ns    19  +    ns   + 36 (TLB miss)
   4 G    84 +  64 ns     9  +    ns   
   8 G    89 +  64 ns     5  +    ns   
  16 G    91 +  64 ns     2  +    ns
  32 G    94 +  64 ns     3  +    ns   
  64 G    98 +  64 ns     4  +     

64 KiB pages mode (64-bit Linux, Radix translation)

  Size        Latency       Increase   Description

  32 K     4                          
  64 K    10               6           + 12 (L2)        
 128 K    13               3           
 256 K    14               1
 512 K    15               1           
   1 M    26              11           + 21 (L3)
   2 M    31               5
   4 M    34               3           
   8 M    46 +   1 ns     12 +  1 ns   + 20 (ERAT miss)
  16 M    52 +  25 ns      6 + 24 ns   + 64 ns (RAM) 
  32 M    55 +  29 ns      3 +  4 ns
  64 M    74 +  44 ns     19 + 15 ns   + 36 (TLB miss)
 128 M    84 +  57 ns     10 + 13 ns   
 256 M    89 +  62 ns      5 +  5 ns   
 512 M    97 +  63 ns      8 +  1 ns 
   1 G   117 +  64 ns     20 +  1 ns 
   2 G   129 +  64 ns     12 +    ns   
   4 G   135 +  64 ns     14 +    ns   
   8 G   140 +  64 ns      5 +    ns   
  16 G   162 +  64 ns     22 +    ns
  32 G   265 +  64 ns    103 +    ns   
  64 G   351 +  64 ns     86      

7-Zip Benchmark

Notes:

7z b -mm=* : MIPS and Effectiveness values are normalized with AMD K8 cpu.

Notes: "L1D private per thread" mitigation affects AES256CBC and CRC multi-threading results. We test also special version of p7zip where each thread has own copy of common data tables.



### clang-7 -O3
## THP off, MY_CPU_LE_UNALIGN
# numactl -m0 -C28-31


freq= 3800
LE
CPU Freq:  1892  1893  1894  1893  1893  1894  1894  1894  1894

RAM size:  257614 MB,  # CPU hardware threads:   4
RAM usage:    225 MB,  # Benchmark threads:      1


Method           Speed Usage    R/U Rating   E/U Effec
                 KiB/s     %   MIPS   MIPS     %     %

CPU                      100   1893   1891    50    50
CPU                      100   1888   1890    50    50
CPU                      100   1893   1893    50    50

LZMA:x1          13448   100   4930   4916   130   129
                 36461   100   2964   2969    78    78
LZMA:x5:mt1       3632   100   4538   4538   119   119
                 36500   100   3078   3079    81    81
LZMA:x5:mt2       4785   163   3675   5979    97   157
                 36536   100   3085   3082    81    81
Deflate:x1       38635   100   4903   4906   129   129
                101874   100   3168   3165    83    83
Deflate:x5       11605   100   4458   4468   117   118
                102011   100   3172   3167    83    83
Deflate:x7        4085   100   4527   4526   119   119
                102711   100   3184   3187    84    84
Deflate64:x5     10844   100   4689   4686   123   123
                102011   100   3196   3191    84    84
BZip2:x1          6207   100   3764   3751    99    99
                 27636   100   2991   2996    79    79
BZip2:x5          5697   100   4748   4755   125   125
                 25145   100   4947   4935   130   130
BZip2:x5:mt2      7673   196   3271   6404    86   169
                 29969   141   4176   5882   110   155
BZip2:x7          1783   100   4619   4620   122   122
                 25316   100   4962   4965   131   131
PPMD:x1           4530   100   4681   4685   123   123
                  3535   100   4167   4163   110   110
PPMD:x5           3526   100   5984   5976   157   157
                  2724   100   5100   5105   134   134
Delta:4         575429   100   3535   3535    93    93
                563837   100   3470   3464    91    91
BCJ            1327653   100   5429   5438   143   143
               1331109   100   5468   5452   144   143
AES256CBC:1     188021   100   4620   4621   122   122
                207201   100   5084   5092   134   134
AES256CBC:2 

CRC32:1         307853   100   2242   2241    59    59
CRC32:4         879880   100   1964   1964    52    52
CRC32:8        1194732   100   1620   1620    43    43
CRC64           805290   100   1650   1649    43    43
SHA256          206928   100   4222   4221   111   111
SHA1            429681   100   4021   4022   106   106
BLAKE2sp        280716   100   6178   6176   163   163

CPU                      100   1891   1892    50    50
------------------------------------------------------
Tot:                     109   3840   4161   101   110



RAM usage:    901 MB,  # Benchmark threads:      4


Method           Speed Usage    R/U Rating   E/U Effec
                 KiB/s     %   MIPS   MIPS     %     %

CPU                      395   1727   6822    45   180
CPU                      392   1715   6721    45   177
CPU                      396   1736   6874    46   181

LZMA:x1          29939   399   2743  10945    72   288
                 88856   399   1812   7236    48   190
LZMA:x5:mt1       7027   398   2204   8779    58   231
                 86137   399   1820   7264    48   191
LZMA:x5:mt2       7276   396   2295   9091    60   239
                 85371   399   1804   7199    47   189
Deflate:x1       72464   399   2305   9201    61   242
                213091   399   1658   6621    44   174
Deflate:x5       21824   398   2111   8403    56   221
                213482   399   1660   6627    44   174
Deflate:x7        7258   400   2013   8042    53   212
                214428   399   1669   6654    44   175
Deflate64:x5     20815   398   2260   8995    59   237
                213066   400   1668   6665    44   175
BZip2:x1         11327   400   1711   6844    45   180
                 55482   400   1505   6015    40   158
BZip2:x5          9783   399   2048   8165    54   215
                 49215   398   2425   9660    64   254
BZip2:x5:mt2      9870   396   2082   8238    55   217
                 48835   384   2498   9585    66   252
BZip2:x7          3066   399   1991   7945    52   209
                 49382   397   2439   9684    64   255
PPMD:x1           7374   399   1911   7627    50   201
                  6165   399   1818   7260    48   191
PPMD:x5           5428   400   2303   9201    61   242
                  4636   399   2177   8689    57   229
Delta:4         708191   400   1089   4351    29   115
                686198   399   1056   4216    28   111
BCJ            1672737   400   1714   6852    45   180
               1753549   400   1797   7183    47   189
AES256CBC:1     117317   399    723   2883    19    76
                123355   399    761   3032    20    80
AES256CBC:2 

CRC32:1         625149   399   1142   4551    30   120
CRC32:4        1444809   400    807   3225    21    85
CRC32:8        1673850   399    570   2270    15    60
CRC64          1477231   400    757   3025    20    80
SHA256          261507   399   1337   5335    35   140
SHA1            666711   400   1561   6240    41   164
BLAKE2sp        426756   400   2349   9389    62   247

CPU                      395   1756   6934    46   182
------------------------------------------------------
Tot:                     398   1941   7728    51   203



### clang-7 -O3
## THP off, thread-local storage hack (aes, blake2sp, crc, deflate, sha256) + MY_CPU_LE_UNALIGN
# numactl -m0 -C28-31



RAM usage:    901 MB,  # Benchmark threads:      4


Method           Speed Usage    R/U Rating   E/U Effec
                 KiB/s     %   MIPS   MIPS     %     %

CPU                      396   1729   6855    46   180
CPU                      393   1713   6723    45   177
CPU                      396   1726   6829    45   180

LZMA:x1          29998   398   2754  10966    72   289
                 88450   399   1805   7203    47   190
LZMA:x5:mt1       6443   399   2019   8049    53   212
                 86521   399   1827   7296    48   192
LZMA:x5:mt2       7136   387   2306   8916    61   235
                 86474   399   1828   7292    48   192
Deflate:x1       72690   399   2315   9230    61   243
                219825   399   1713   6830    45   180
Deflate:x5       22128   399   2134   8520    56   224
                220260   399   1715   6838    45   180
Deflate:x7        7336   400   2035   8129    54   214
                221478   399   1723   6873    45   181
Deflate64:x5     21106   399   2285   9121    60   240
                219763   399   1723   6874    45   181
BZip2:x1         11345   400   1714   6854    45   180
                 55400   400   1503   6006    40   158
BZip2:x5         10036   396   2117   8376    56   220
                 49172   398   2424   9652    64   254
BZip2:x5:mt2      9768   400   2039   8152    54   215
                 48410   383   2480   9502    65   250
BZip2:x7          3057   400   1983   7921    52   208
                 49573   399   2439   9722    64   256
PPMD:x1           7351   399   1904   7603    50   200
                  6106   399   1801   7191    47   189
PPMD:x5           5408   399   2295   9166    60   241
                  4522   393   2154   8475    57   223
Delta:4         713635   400   1097   4385    29   115
                702532   400   1080   4316    28   114
BCJ            1755966   400   1799   7192    47   189
               1748352   399   1794   7161    47   188
AES256CBC:1     254067   397   1573   6244    41   164
                271973   400   1673   6684    44   176
AES256CBC:2 

CRC32:1        1136614   400   2071   8275    54   218
CRC32:4        2859582   400   1597   6383    42   168
CRC32:8        2981125   400   1011   4042    27   106
CRC64          2429159   399   1246   4975    33   131
SHA256          262536   400   1340   5356    35   141
SHA1            665986   400   1559   6234    41   164
BLAKE2sp        449079   400   2471   9880    65   260

CPU                      395   1752   6916    46   182
------------------------------------------------------
Tot:                     397   1981   7862    52   207



Links

Power9 at Wikipedia