Optimizations

Some optimizations I'm putting on my Hardware/BIOS





8/1/2025 – Semtex

I knew for a while that I’m not using my hardware to its full potential, so I started looking into some optimizations I can do on my hardware and BIOS settings.

For context, I’ve replaced my CPU from a 13-900KF to a 14-900KF as the older one fried due to the intel’s microcode gate. At this time I was encountering a lot of BSOD or Kernel Panics, until the system wouldn’t even boot anymore. So I’ve tweaked the BIOS as much as I could while I was waiting for the new CPU to arrive. Leaving my computer working, great to be fair, but not as well it could.

References

ComponentNameTemperature IdleTemperature LoadFrequency OCCTFrequency Idle
CPUIntel Core i9-14900KF35-40°C90-96°C4.7GHz~0.8GHz
GPUNVIDIA RTX 4090 Gainward Phantom44°C60°C2.7GHz
RAMCorsair Vengeance Black 5600MHz 4x32GB DDR530°C40°C4200MHz4200MHz
Crucial P5 Plus NVMe SSDs 12TB39°C40°CN/AN/A
Crucial P5 Plus NVMe SSDs 22TB47°C40°CN/AN/A

Load tests were performed using OCCT for around 5 minutes once the peak stabilized.

CPU

I’m not using CPU contact frame and I’m using air cooling (Noctua NHD15), usually installing a CPU contact frame is a good idea for long term stability. I’m not doing any overclocking as I value silence and stability more but it usually helps on load from 8 to 10°C. I’ll give it a try. I went for this model, it is around 10€.

thermal grizzly

RAM

I’ve disabled XMP Optimization in the BIOS, so I know my RAM is underperforming. I did that as for some reason it eased a lot while my older CPU was dying. I need to re-enable it and see how it goes.

Benchmarks

Tests run through unixbench to be fair, I didn’t let the system completely idle while running the benchs as I wanted to monitor some temperatures and frequencies while it was under load, so the benchmarks are not 100% accurate but should give a good idea of the performance before and after optimizations.

I’ve run the tests pretty much the simplest way, without parameters:

$ ubench

XMP

Disabled

Benchmark Run: ven. aout 01 2025 10:26:08 - 10:54:29
32 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables              0.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    13981.3 MWIPS (10.0 s, 7 samples)
Execl Throughput                               2755.5 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       3022110.3 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          825910.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       7758706.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             4895982.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 243221.3 lps   (10.0 s, 7 samples)
Process Creation                               2614.4 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2889.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   5765.0 lpm   (60.0 s, 2 samples)
System Call Overhead                        3486949.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0          0.1      0.0
Double-Precision Whetstone                       55.0      13981.3   2542.1
Execl Throughput                                 43.0       2755.5    640.8
File Copy 1024 bufsize 2000 maxblocks          3960.0    3022110.3   7631.6
File Copy 256 bufsize 500 maxblocks            1655.0     825910.9   4990.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    7758706.8  13377.1
Pipe Throughput                               12440.0    4895982.5   3935.7
Pipe-based Context Switching                   4000.0     243221.3    608.1
Process Creation                                126.0       2614.4    207.5
Shell Scripts (1 concurrent)                     42.4       2889.9    681.6
Shell Scripts (8 concurrent)                      6.0       5765.0   9608.3
System Call Overhead                          15000.0    3486949.4   2324.6
                                                                   ========
System Benchmarks Index Score                                        1411.9

------------------------------------------------------------------------
Benchmark Run: ven. aout 01 2025 10:54:29 - 11:22:46
32 CPUs in system; running 32 parallel copies of tests

Dhrystone 2 using register variables              3.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   328609.9 MWIPS (10.0 s, 7 samples)
Execl Throughput                              84181.9 lps   (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks      21700319.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks        14042561.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks      22207296.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                            92051572.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching               12354386.9 lps   (10.0 s, 7 samples)
Process Creation                             235537.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                 230575.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                  31589.4 lpm   (60.0 s, 2 samples)
System Call Overhead                       63752430.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0          3.2      0.0
Double-Precision Whetstone                       55.0     328609.9  59747.2
Execl Throughput                                 43.0      84181.9  19577.2
File Copy 1024 bufsize 2000 maxblocks          3960.0   21700319.8  54798.8
File Copy 256 bufsize 500 maxblocks            1655.0   14042561.7  84849.3
File Copy 4096 bufsize 8000 maxblocks          5800.0   22207296.9  38288.4
Pipe Throughput                               12440.0   92051572.0  73996.4
Pipe-based Context Switching                   4000.0   12354386.9  30886.0
Process Creation                                126.0     235537.5  18693.5
Shell Scripts (1 concurrent)                     42.4     230575.5  54381.0
Shell Scripts (8 concurrent)                      6.0      31589.4  52649.0
System Call Overhead                          15000.0   63752430.3  42501.6
                                                                   ========
System Benchmarks Index Score                                        9037.2

(XMP1) Enabled

Benchmark Run: ven. aout 01 2025 14:06:23 - 14:34:44
32 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables              0.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    14002.0 MWIPS (10.0 s, 7 samples)
Execl Throughput                               2837.6 lps   (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       3119013.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          854635.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       8549343.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             4955957.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 395560.4 lps   (10.0 s, 7 samples)
Process Creation                               2886.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2715.4 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   4907.3 lpm   (60.0 s, 2 samples)
System Call Overhead                        3496417.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0          0.1      0.0
Double-Precision Whetstone                       55.0      14002.0   2545.8
Execl Throughput                                 43.0       2837.6    659.9
File Copy 1024 bufsize 2000 maxblocks          3960.0    3119013.8   7876.3
File Copy 256 bufsize 500 maxblocks            1655.0     854635.3   5164.0
File Copy 4096 bufsize 8000 maxblocks          5800.0    8549343.6  14740.2
Pipe Throughput                               12440.0    4955957.0   3983.9
Pipe-based Context Switching                   4000.0     395560.4    988.9
Process Creation                                126.0       2886.2    229.1
Shell Scripts (1 concurrent)                     42.4       2715.4    640.4
Shell Scripts (8 concurrent)                      6.0       4907.3   8178.8
System Call Overhead                          15000.0    3496417.7   2330.9
                                                                   ========
System Benchmarks Index Score                                        1480.6

------------------------------------------------------------------------
Benchmark Run: ven. aout 01 2025 14:34:44 - 15:03:01
32 CPUs in system; running 32 parallel copies of tests

Dhrystone 2 using register variables              3.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   326370.3 MWIPS (10.0 s, 7 samples)
Execl Throughput                              85059.4 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks      28254307.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks        13817825.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks      28816103.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                            90499015.8 lps   (10.0 s, 7 samples)
Pipe-based Context Switching               12136097.5 lps   (10.0 s, 7 samples)
Process Creation                             232925.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                 240292.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                  32868.8 lpm   (60.0 s, 2 samples)
System Call Overhead                       63164501.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0          3.2      0.0
Double-Precision Whetstone                       55.0     326370.3  59340.1
Execl Throughput                                 43.0      85059.4  19781.2
File Copy 1024 bufsize 2000 maxblocks          3960.0   28254307.8  71349.3
File Copy 256 bufsize 500 maxblocks            1655.0   13817825.1  83491.4
File Copy 4096 bufsize 8000 maxblocks          5800.0   28816103.7  49682.9
Pipe Throughput                               12440.0   90499015.8  72748.4
Pipe-based Context Switching                   4000.0   12136097.5  30340.2
Process Creation                                126.0     232925.2  18486.1
Shell Scripts (1 concurrent)                     42.4     240292.0  56672.6
Shell Scripts (8 concurrent)                      6.0      32868.8  54781.3
System Call Overhead                          15000.0   63164501.1  42109.7
                                                                   ========
System Benchmarks Index Score                                        9451.1

Insights

There are some good outliers that may or may not have impact on how the system behave.

XMP on/off chart
  • Index Score (1 thread): The overall performance score for single-threaded tasks improved by 4.9% with XMP enabled.
  • Index Score (32 threads): The overall performance score for multi-threaded tasks improved by 4.6% with XMP enabled.

Raw calculation (CPU) has obviously no major impact (+/-0.x%) which is background noise and is expected as the CPU is not overclocked and the XMP profile is not changing the base frequency.

  • File Copy 1024 (1 thread): The speed of copying files with a buffer size of 1024 bytes increased by 3.2%.
  • File Copy 1024 (32 threads): The speed of copying files with a buffer size of 1024 bytes in a multi-threaded environment increased significantly by 30%.
  • File Copy 4096 (32 threads): The speed of copying files with a buffer size of 4096 bytes in a multi-threaded environment also increased significantly by 29.8%.

these are good indicators of how XMP profile impacted the performance of the system, It shall take more out of my 2 NVMe M2 SSDs.

Context Switching in monothread is where there is the biggest impact of around 62% which is quite significant, mostly for nodejs/electron apps or angular compilation as these are single threaded or anything related to javascript runtimes.

Conclusion on XMP

Enabling XMP and benchmarking shows that RAM was a bottleneck in my system in its current state. I’m pretty sure it still is as a jump from 4200MHz to 5600MHz, despite being not negligible, isn’t as huge as switching to 6400MHz or 7200MHz. But given I already own 4 sticks of 32GB, It would be a shame to change them all while I can just boost it right now for free.

A quick update on XMP

After being fine with XMP for desktop usage, I ended up disabling XMP as it made some games crash (Unreal Engine ones). I guess I’ll have to deep dive into it, I guess I’ll just set the frequency manually then leave other settings to automatic with XMP disabled.

I’ve run a 10 minutes OCCT RAM test with XMP2 enabled (RAM stock settings, XMP1 was Asus ones) without much trouble. I’ll update this post later on if I find any issues.

A second update

I’ve ran into issues while playing with XMP profiles that lead me to disable it again. I’ve understood that this has to do with the fact that I’m running 4x32GB sticks which isn’t suitable for higher frequencies as this bottlenecks the memory bus. I’m back at 4200MHz as long as I’m not changing the RAM kit.