Question System stability issues ?

May 20, 2025
2
0
10
Hello,

I have been struggling with an issue for the past couple months that's left me stumped. My PC on Windows 11 will randomly crash with no BSOD at seemingly random times. I can be playing a game or running a stress test, leaving it on idle with no programs open, watching a YouTube video, checking emails, etc. and it will crash at a random time. I took it to a repair shop that did a stress test on all components and replaced the motherboard and RAM, but the issue still persists (pc was running fine for 2 weeks after receiving it back). I also just replaced the motherboard 2 days ago and it was running fine, but today I had the same issue again. I have also replaced the PSU and have a UPS so I don't think it's a power issue. My question is do I just have a CPU issue, and what other types of tests can I perform to see what component is faulty?

Crashing symptoms: my pc completely freezes and is unresponsive from all inputs for 1-5 seconds before the screen turns off (all components like fans and rgb stay on) before attempting to restart. Sometimes it will crash <10 minutes into booting up and logging into windows, other times I can be using it for 2 hours before it crashes. The luckiest crashes are it just restarts the system automatically, but sometimes I have to reseat the CPU/RAM or do a bios flashback to get back into windows. The worst crashes don't even let me get into BIOS even after a flashback.

For some of the crashes, the DRAM light on the motherboard will come on after it crashes but that can usually be fixed if I hold down the power button and restart. In the event of the first two motherboards, eventually the DRAM and CPU lights come on so they had to be replaced. Checking the pins/connections there are no bent pins on the socket or any other visual issues on the RAM or CPU. EventViewer always shows a Kernel Power 41 error, but there's no other crash logs.

Specs:
CPU: AMD Ryzen 9 7900X 4.7 GHz 12-Core Processor
CPU Cooler: NZXT Kraken 360 78.02 CFM Liquid
GPU: Gigabyte GAMING OC Rev. 2.0 GeForce RTX 3060 12GB
Motherboard: MSI MAG B650 Tomahawk WIFI ATX --> Replaced with another MSI MAG B650 --> ASUS TUF GAMING B650-PLUS WIFI
RAM: G.SKILL Flare X5 DDR5 64GB 6000MT/s CL30 --> Corsair Vengeance RGB 32GB DDR5-6000 CL36
PSU: NZXT C850 Gold ATX3.1 --> CORSAIR RM1000x Fully Modular ATX 3.1
Storage: WD_BLACK SN7100 2 TB M.2-2280 PCIe 4.0 X4 NVME SSD

What I've tried so far:
CPU:

  • Reseating it with new thermal paste (does not fix issue but helps a bit when I can't even get into BIOS)
  • Monitoring temps while idle (crashes when it's <60°C, never goes above 90°C)
  • Running stress tests with OCCT (no errors or crashes)
GPU:
  • Monitoring temps while idle (crashes when it's <60°C, doesn't go above 83°C)
  • Updating to the latest drivers through the NVIDIA app, but also using DDU and installing the latest driver manually
  • Running stress tests with OCCT (no errors or crashes)
Motherboard:
  • Updating BIOS to newest versions with all 3 motherboards (did not solve issue but helped with stability a tiny bit)
  • Turning off PBO
  • Running stress tests with OCCT (no errors or crashes)
  • Disabling fast boot
  • Enabling/disabling any overclocking for CPU and RAM
RAM:
  • Reseating or running with 1 stick (does not fix issue but helps a bit when I can't even get into BIOS)
  • Running memtest + windows memory diagnostic for RAM (passed both, did not find any issues)
Misc.
  • Checking all the PSU cable connections and making sure all fans spin
  • Completely reinstalling windows 11 with all the default apps (crashed within an hour with just keyboard + mouse plugged in)
  • Removing all software and components so there's nothing other than the base OS running (still crashes randomly)
  • Running SFC and DISM commands (no issues found), in addition to windows update and windows troubleshooters
  • Checking the status of the M.2 SSD (healthy 100%)
  • Running antivirus on EVERYTHING (Malwarebytes), nothing found
  • Changing power settings to balanced, ultra performance
  • Switching monitor ports, running off just 1 monitor
 
Hello,

I have been struggling with an issue for the past couple months that's left me stumped. My PC on Windows 11 will randomly crash with no BSOD at seemingly random times. I can be playing a game or running a stress test, leaving it on idle with no programs open, watching a YouTube video, checking emails, etc. and it will crash at a random time. I took it to a repair shop that did a stress test on all components and replaced the motherboard and RAM, but the issue still persists (pc was running fine for 2 weeks after receiving it back). I also just replaced the motherboard 2 days ago and it was running fine, but today I had the same issue again. I have also replaced the PSU and have a UPS so I don't think it's a power issue. My question is do I just have a CPU issue, and what other types of tests can I perform to see what component is faulty?

Crashing symptoms: my pc completely freezes and is unresponsive from all inputs for 1-5 seconds before the screen turns off (all components like fans and rgb stay on) before attempting to restart. Sometimes it will crash <10 minutes into booting up and logging into windows, other times I can be using it for 2 hours before it crashes. The luckiest crashes are it just restarts the system automatically, but sometimes I have to reseat the CPU/RAM or do a bios flashback to get back into windows. The worst crashes don't even let me get into BIOS even after a flashback.

For some of the crashes, the DRAM light on the motherboard will come on after it crashes but that can usually be fixed if I hold down the power button and restart. In the event of the first two motherboards, eventually the DRAM and CPU lights come on so they had to be replaced. Checking the pins/connections there are no bent pins on the socket or any other visual issues on the RAM or CPU. EventViewer always shows a Kernel Power 41 error, but there's no other crash logs.

Specs:
CPU: AMD Ryzen 9 7900X 4.7 GHz 12-Core Processor
CPU Cooler: NZXT Kraken 360 78.02 CFM Liquid
GPU: Gigabyte GAMING OC Rev. 2.0 GeForce RTX 3060 12GB
Motherboard: MSI MAG B650 Tomahawk WIFI ATX --> Replaced with another MSI MAG B650 --> ASUS TUF GAMING B650-PLUS WIFI
RAM: G.SKILL Flare X5 DDR5 64GB 6000MT/s CL30 --> Corsair Vengeance RGB 32GB DDR5-6000 CL36
PSU: NZXT C850 Gold ATX3.1 --> CORSAIR RM1000x Fully Modular ATX 3.1
Storage: WD_BLACK SN7100 2 TB M.2-2280 PCIe 4.0 X4 NVME SSD

What I've tried so far:
CPU:

  • Reseating it with new thermal paste (does not fix issue but helps a bit when I can't even get into BIOS)
  • Monitoring temps while idle (crashes when it's <60°C, never goes above 90°C)
  • Running stress tests with OCCT (no errors or crashes)
GPU:
  • Monitoring temps while idle (crashes when it's <60°C, doesn't go above 83°C)
  • Updating to the latest drivers through the NVIDIA app, but also using DDU and installing the latest driver manually
  • Running stress tests with OCCT (no errors or crashes)
Motherboard:
  • Updating BIOS to newest versions with all 3 motherboards (did not solve issue but helped with stability a tiny bit)
  • Turning off PBO
  • Running stress tests with OCCT (no errors or crashes)
  • Disabling fast boot
  • Enabling/disabling any overclocking for CPU and RAM
RAM:
  • Reseating or running with 1 stick (does not fix issue but helps a bit when I can't even get into BIOS)
  • Running memtest + windows memory diagnostic for RAM (passed both, did not find any issues)
Misc.
  • Checking all the PSU cable connections and making sure all fans spin
  • Completely reinstalling windows 11 with all the default apps (crashed within an hour with just keyboard + mouse plugged in)
  • Removing all software and components so there's nothing other than the base OS running (still crashes randomly)
  • Running SFC and DISM commands (no issues found), in addition to windows update and windows troubleshooters
  • Checking the status of the M.2 SSD (healthy 100%)
  • Running antivirus on EVERYTHING (Malwarebytes), nothing found
  • Changing power settings to balanced, ultra performance
  • Switching monitor ports, running off just 1 monitor
I had same problem caused by CPU VDDCR_SOC Voltage (SVI3 TFN) too low, setting it manually to 1.250v fixed it.
 
  • Like
Reactions: drivinfast247
RAM: G.SKILL Flare X5 DDR5 64GB 6000MT/s CL30 --> Corsair Vengeance RGB 32GB DDR5-6000 CL36
If you're running with XMP/EXPO/DOCP enabled at 6000MT/s, switch off the memory overclock and return to the JEDEC default of 4800MT/s. The BIOS might be memory training at POST and failing to find a set of compatible timings at 6000MT/s.

Are you running 2 DIMMs or 4 DIMMs? 2 DIMMs should be more stable. Slots A2/B2 are usually the best for a 2 DIMM setup.

Run a full pass (several hours) of MemTest86 on each RAM speed setting to check stabilty. Any errors indicate a problem. Your RAM should be OK when tested with MemTest86 at 4800MT/s, but might generate errors at 6000MT/s.
https://www.memtest86.com/

If/when you get the system(s) running at 4800MT/s, you can start to overclock RAM using manual timings if Auto 6000MT/s causes crashes.
 
Does the Windows Event Viewer turn up any critical events?
The only critical in Event Viewer is Kernel-Power 41 (63) with these details.

I had same problem caused by CPU VDDCR_SOC Voltage (SVI3 TFN) too low, setting it manually to 1.250v fixed it.
Mine is at 1.015V Min / 1.054V Max from HWMonitor. Did you set it to constant 1.250V in BIOS?

If you're running with XMP/EXPO/DOCP enabled at 6000MT/s, switch off the memory overclock and return to the JEDEC default of 4800MT/s. The BIOS might be memory training at POST and failing to find a set of compatible timings at 6000MT/s.

Are you running 2 DIMMs or 4 DIMMs? 2 DIMMs should be more stable. Slots A2/B2 are usually the best for a 2 DIMM setup.

Run a full pass (several hours) of MemTest86 on each RAM speed setting to check stabilty. Any errors indicate a problem. Your RAM should be OK when tested with MemTest86 at 4800MT/s, but might generate errors at 6000MT/s.
https://www.memtest86.com/

If/when you get the system(s) running at 4800MT/s, you can start to overclock RAM using manual timings if Auto 6000MT/s causes crashes.
I am running 2 DIMMs on A2/B2 with XMP/EXPO/DOCP disabled (4800MT/s currently which I believe are JEDEC speeds). These are new ram sticks and they both passed memtest86, but I can always test again. Should I test both at the same time or try with one stick at a time?

Have you considered a software related issue?
Look at your installed programs and remove anything you do not recognize, of that is not essential.
I did a fresh install of Windows with NO software installed and it still crashed, so I do not believe it is a software issue.

Also update: did another 30 min CPU + RAM and GPU stress test and it did not crash, so it's definitely not an overheating issue. I did also turn OFF in Settings -> Display -> Graphics -> Advanced -> Hardware-accelerated GPU scheduling to see if that helps with stability, because sometimes when I alt-tab frequently between windows it causes the crash.
 
The only critical in Event Viewer is Kernel-Power 41 (63) with these details.


Mine is at 1.015V Min / 1.054V Max from HWMonitor. Did you set it to constant 1.250V in BIOS?


I am running 2 DIMMs on A2/B2 with XMP/EXPO/DOCP disabled (4800MT/s currently which I believe are JEDEC speeds). These are new ram sticks and they both passed memtest86, but I can always test again. Should I test both at the same time or try with one stick at a time?


I did a fresh install of Windows with NO software installed and it still crashed, so I do not believe it is a software issue.

Also update: did another 30 min CPU + RAM and GPU stress test and it did not crash, so it's definitely not an overheating issue. I did also turn OFF in Settings -> Display -> Graphics -> Advanced -> Hardware-accelerated GPU scheduling to see if that helps with stability, because sometimes when I alt-tab frequently between windows it causes the crash.
‚My RAM is Kingston (Hynix.a) 6000MHz Cl32 running at 6200MHz Cl30 so that may be the reason for needing higher VDDCR_SOC Voltage but it should be minimum 1.1- 1.2v depending on memory controller quality, it may vary a lot. I did set it manually but it still varies +/- 10% depending on load.
 
OSZAR »