Dienstag, 23. Januar 2018

Fine grained Impact of KPTI on HPC nodes I/O performance

In the previous post, I investigated the impact of a RedHat patched kernel for Meltdown on the IO-500 results. Since the results where not significant, I now investigate the fine grained timing behavior of individual results using the MD-Real-IO benchmark.
Again using tmpfs but this time both runs where conducted one the same physical node (btc2 of the Mistral test system) and the same kernel but in one case KPTI was disabled via the debug interface.

As parameters for MD-Real-IO, it was used:
-O=1 -I=10000 -D=1 -P=10 -R=10 --process-reports -S=3901 --latency-all -- -D=/dev/shm/test
It was run either with one or 10 processes.
The 10 latency files produced after the run where merged such that timings for 100k individual I/Os could be assessed.
Note that the analyzed file contains now the measurements of all processes!

Understanding latency for 1 process

Firstly, let's look at the mean performance and the relative performance loss when KPTI is enabled and for 1 process as this is expected to have the highest impact:

Disabled KPTI With KPTI enabled Relative speed with KPTI
Create 3.84E-06 4.33E-06 0.89
Read 2.96E-06 3.65E-06 0.81
Delete 2.47E-06 2.73E-06 0.91
Stat 1.80E-06 1.98E-06 0.91

It can be seen that indeed there is some performance loss, especially reads are now 19% slower than without KPTI enabled. Still the performance degradation happens in the order of microseconds. The exact distribution is shown in the density distributions:

Fig 1: Without KPTI

Fig 2: With KPTI enabled

Understanding latency for 10 processes

The same experiment has been run with 10 processes producing a comparable table:

Disabled KPTI With KPTI enabled Relative speed with KPTI
Create 1.31E-05 1.33E-05 0.99
Read 1.13E-05 1.13E-05 0.99
Delete 1.09E-05 1.06E-05 1.03
Stat 8.74E-06 8.35E-06 1.05

Huh, that is surprising, isn't it? While the latency from a single process actually increased with KPTI enabled, with 10 processes the latency mean actually improved by 3% and 5% for delete and stat.

 The exact distribution is shown in the density distributions:

Fig 3: 10 Processes, without KPTI enabled

Fig 4: 10 Processes, with KPTI enabled

As expected, the density distributions are a bit smoother and wider compared to a single process.
This indeed explains the previous reported and counterintuitive results that with enabled KPTI patch, the performance improved for some IO-500 benchmarks.


The KPTI patch has an impact on the latency of a single process which is in the order of 10-20% by about 2-4 microseconds on our system. This is far away from the Lustre latency which is at least in the order of 100 microseconds when running the same benchmark, thus will not influence our operational setup -- except for cached cases but we have a cache issue on our system anyhow. With multiple processes per node, the impact is neglectible and, KPTI actually improves overall performance slightly -- the reason should be investigated.

Keine Kommentare:

Kommentar posten

Email security with Postfix/DKIM/DMARC on Ubuntu 20.04

A while ago, I had setup DKIM on my Ubuntu 20.04 server with Postfix.  In a nutshell,  DomainKeys Identified Mail (DKIM)   provides means to...