Optimizations
Some optimizations I'm putting on my Hardware/BIOSI knew for a while that I’m not using my hardware to its full potential, so I started looking into some optimizations I can do on my hardware and BIOS settings.
For context, I’ve replaced my CPU from a 13-900KF to a 14-900KF as the older one fried due to the intel’s microcode gate. At this time I was encountering a lot of BSOD or Kernel Panics, until the system wouldn’t even boot anymore. So I’ve tweaked the BIOS as much as I could while I was waiting for the new CPU to arrive. Leaving my computer working, great to be fair, but not as well it could.
References
| Component | Name | Temperature Idle | Temperature Load | Frequency OCCT | Frequency Idle |
|---|---|---|---|---|---|
| CPU | Intel Core i9-14900KF | 35-40°C | 90-96°C | 4.7GHz | ~0.8GHz |
| GPU | NVIDIA RTX 4090 Gainward Phantom | 44°C | 60°C | 2.7GHz | |
| RAM | Corsair Vengeance Black 5600MHz 4x32GB DDR5 | 30°C | 40°C | 4200MHz | 4200MHz |
| Crucial P5 Plus NVMe SSDs 1 | 2TB | 39°C | 40°C | N/A | N/A |
| Crucial P5 Plus NVMe SSDs 2 | 2TB | 47°C | 40°C | N/A | N/A |
Load tests were performed using OCCT for around 5 minutes once the peak stabilized.
CPU
I’m not using CPU contact frame and I’m using air cooling (Noctua NHD15), usually installing a CPU contact frame is a good idea for long term stability. I’m not doing any overclocking as I value silence and stability more but it usually helps on load from 8 to 10°C. I’ll give it a try. I went for this model, it is around 10€.
RAM
I’ve disabled XMP Optimization in the BIOS, so I know my RAM is underperforming. I did that as for some reason it eased a lot while my older CPU was dying. I need to re-enable it and see how it goes.
Benchmarks
Tests run through unixbench to be fair, I didn’t let the system completely idle while running the benchs as I wanted to monitor some temperatures and frequencies while it was under load, so the benchmarks are not 100% accurate but should give a good idea of the performance before and after optimizations.
I’ve run the tests pretty much the simplest way, without parameters:
$ ubench
XMP
Disabled
Benchmark Run: ven. aout 01 2025 10:26:08 - 10:54:29
32 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 0.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 13981.3 MWIPS (10.0 s, 7 samples)
Execl Throughput 2755.5 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 3022110.3 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 825910.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 7758706.8 KBps (30.0 s, 2 samples)
Pipe Throughput 4895982.5 lps (10.0 s, 7 samples)
Pipe-based Context Switching 243221.3 lps (10.0 s, 7 samples)
Process Creation 2614.4 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2889.9 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 5765.0 lpm (60.0 s, 2 samples)
System Call Overhead 3486949.4 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 0.1 0.0
Double-Precision Whetstone 55.0 13981.3 2542.1
Execl Throughput 43.0 2755.5 640.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 3022110.3 7631.6
File Copy 256 bufsize 500 maxblocks 1655.0 825910.9 4990.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 7758706.8 13377.1
Pipe Throughput 12440.0 4895982.5 3935.7
Pipe-based Context Switching 4000.0 243221.3 608.1
Process Creation 126.0 2614.4 207.5
Shell Scripts (1 concurrent) 42.4 2889.9 681.6
Shell Scripts (8 concurrent) 6.0 5765.0 9608.3
System Call Overhead 15000.0 3486949.4 2324.6
========
System Benchmarks Index Score 1411.9
------------------------------------------------------------------------
Benchmark Run: ven. aout 01 2025 10:54:29 - 11:22:46
32 CPUs in system; running 32 parallel copies of tests
Dhrystone 2 using register variables 3.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 328609.9 MWIPS (10.0 s, 7 samples)
Execl Throughput 84181.9 lps (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 21700319.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 14042561.7 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 22207296.9 KBps (30.0 s, 2 samples)
Pipe Throughput 92051572.0 lps (10.0 s, 7 samples)
Pipe-based Context Switching 12354386.9 lps (10.0 s, 7 samples)
Process Creation 235537.5 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 230575.5 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 31589.4 lpm (60.0 s, 2 samples)
System Call Overhead 63752430.3 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 3.2 0.0
Double-Precision Whetstone 55.0 328609.9 59747.2
Execl Throughput 43.0 84181.9 19577.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 21700319.8 54798.8
File Copy 256 bufsize 500 maxblocks 1655.0 14042561.7 84849.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 22207296.9 38288.4
Pipe Throughput 12440.0 92051572.0 73996.4
Pipe-based Context Switching 4000.0 12354386.9 30886.0
Process Creation 126.0 235537.5 18693.5
Shell Scripts (1 concurrent) 42.4 230575.5 54381.0
Shell Scripts (8 concurrent) 6.0 31589.4 52649.0
System Call Overhead 15000.0 63752430.3 42501.6
========
System Benchmarks Index Score 9037.2
(XMP1) Enabled
Benchmark Run: ven. aout 01 2025 14:06:23 - 14:34:44
32 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 0.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 14002.0 MWIPS (10.0 s, 7 samples)
Execl Throughput 2837.6 lps (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 3119013.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 854635.3 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 8549343.6 KBps (30.0 s, 2 samples)
Pipe Throughput 4955957.0 lps (10.0 s, 7 samples)
Pipe-based Context Switching 395560.4 lps (10.0 s, 7 samples)
Process Creation 2886.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2715.4 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 4907.3 lpm (60.0 s, 2 samples)
System Call Overhead 3496417.7 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 0.1 0.0
Double-Precision Whetstone 55.0 14002.0 2545.8
Execl Throughput 43.0 2837.6 659.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 3119013.8 7876.3
File Copy 256 bufsize 500 maxblocks 1655.0 854635.3 5164.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 8549343.6 14740.2
Pipe Throughput 12440.0 4955957.0 3983.9
Pipe-based Context Switching 4000.0 395560.4 988.9
Process Creation 126.0 2886.2 229.1
Shell Scripts (1 concurrent) 42.4 2715.4 640.4
Shell Scripts (8 concurrent) 6.0 4907.3 8178.8
System Call Overhead 15000.0 3496417.7 2330.9
========
System Benchmarks Index Score 1480.6
------------------------------------------------------------------------
Benchmark Run: ven. aout 01 2025 14:34:44 - 15:03:01
32 CPUs in system; running 32 parallel copies of tests
Dhrystone 2 using register variables 3.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 326370.3 MWIPS (10.0 s, 7 samples)
Execl Throughput 85059.4 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 28254307.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 13817825.1 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 28816103.7 KBps (30.0 s, 2 samples)
Pipe Throughput 90499015.8 lps (10.0 s, 7 samples)
Pipe-based Context Switching 12136097.5 lps (10.0 s, 7 samples)
Process Creation 232925.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 240292.0 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 32868.8 lpm (60.0 s, 2 samples)
System Call Overhead 63164501.1 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 3.2 0.0
Double-Precision Whetstone 55.0 326370.3 59340.1
Execl Throughput 43.0 85059.4 19781.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 28254307.8 71349.3
File Copy 256 bufsize 500 maxblocks 1655.0 13817825.1 83491.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 28816103.7 49682.9
Pipe Throughput 12440.0 90499015.8 72748.4
Pipe-based Context Switching 4000.0 12136097.5 30340.2
Process Creation 126.0 232925.2 18486.1
Shell Scripts (1 concurrent) 42.4 240292.0 56672.6
Shell Scripts (8 concurrent) 6.0 32868.8 54781.3
System Call Overhead 15000.0 63164501.1 42109.7
========
System Benchmarks Index Score 9451.1
Insights
There are some good outliers that may or may not have impact on how the system behave.
- Index Score (1 thread): The overall performance score for single-threaded tasks improved by 4.9% with XMP enabled.
- Index Score (32 threads): The overall performance score for multi-threaded tasks improved by 4.6% with XMP enabled.
Raw calculation (CPU) has obviously no major impact (+/-0.x%) which is background noise and is expected as the CPU is not overclocked and the XMP profile is not changing the base frequency.
- File Copy 1024 (1 thread): The speed of copying files with a buffer size of 1024 bytes increased by 3.2%.
- File Copy 1024 (32 threads): The speed of copying files with a buffer size of 1024 bytes in a multi-threaded environment increased significantly by 30%.
- File Copy 4096 (32 threads): The speed of copying files with a buffer size of 4096 bytes in a multi-threaded environment also increased significantly by 29.8%.
these are good indicators of how XMP profile impacted the performance of the system, It shall take more out of my 2 NVMe M2 SSDs.
Context Switching in monothread is where there is the biggest impact of around 62% which is quite significant, mostly for nodejs/electron apps or angular compilation as these are single threaded or anything related to javascript runtimes.
Conclusion on XMP
Enabling XMP and benchmarking shows that RAM was a bottleneck in my system in its current state. I’m pretty sure it still is as a jump from 4200MHz to 5600MHz, despite being not negligible, isn’t as huge as switching to 6400MHz or 7200MHz. But given I already own 4 sticks of 32GB, It would be a shame to change them all while I can just boost it right now for free.
A quick update on XMP
After being fine with XMP for desktop usage, I ended up disabling XMP as it made some games crash (Unreal Engine ones). I guess I’ll have to deep dive into it, I guess I’ll just set the frequency manually then leave other settings to automatic with XMP disabled.
I’ve run a 10 minutes OCCT RAM test with XMP2 enabled (RAM stock settings, XMP1 was Asus ones) without much trouble. I’ll update this post later on if I find any issues.
A second update
I’ve ran into issues while playing with XMP profiles that lead me to disable it again. I’ve understood that this has to do with the fact that I’m running 4x32GB sticks which isn’t suitable for higher frequencies as this bottlenecks the memory bus. I’m back at 4200MHz as long as I’m not changing the RAM kit.