Random reboots increasing in frequency

Need help with peripherals or devices?
Post Reply
Message
Author
joga
Posts: 20
Joined: 2021-09-27 16:23

Random reboots increasing in frequency

#1 Post by joga »

I'm running Debian 12 with MATE on a Ryzen 9 7900 on a PRIME B650M-A WIFI II motherboard and 32G of RAM. When I built the system 1.5 years ago, I had stability issues which resolved when I lowered the RAM clock (from spec to under spec). But recently I had some spontaneous reboots, first like once a week, and now its every few hours.

I have checked the voltages, the AC component (ripple) and the power behaves just fine. I did run a memtest86 for a few hours, reporting no problem. I've tested Prime95. The tests 1 and 2 (Smallest and Small FFT) works just fine, the other tests is killed by the kernel due to allocating too much memory if I run it on more than one core. One core runs just fine. I've also tested "memtester" (a in-OS tool for testing a part of memory in CLI), no problem. I've also stress testing with S-TUI and no problem.

The reboots is always random and not under load. Most guides on the internet on checking logs and such is written before the days of journalctl and I have a hard time finding anything in any logs (suggestions would be helpful!). "last reboot" doesn't say much apart from "crash". I've tried "journalctl | grep [something]" for things like kernel panic, temperature, crash, etc, and not found anything related. I also installed "kdump-tools" and "kexec-tools" but there is simply no dump in /var/crash after a reboot. So, I can't tell if i't's a HW issue or a SW one. High load and memory issues doesn't seem to causing the problem and the power supply is delivering nice and clean DC at specified voltages. I've reseated the RAM just in case, and reseated all other connectors I could find (SATA, etc). I have no cards installed and using the built in-graphics. I haven't tested all the buck regulators on the motherboard of course, so I only know that the ATX voltages are good. But again, reboots occur during normal use and stress testing does not cause reboot.

What should I do next? This is infuriating as stability is all I want. Nothing is overclocked (and never was) and right now my RAM is underclocked as a test. At random interval (record this day, five times) the screen goes blank and the system restarts. I had hoped that I could get a clue from some logs, such as a sensor or something. But I don't know what to look for really.

Edit: I'm someone who typically buys a new computer once every decade. This UEFI BS is new to me and very confusing and seems like a grift to push unsecure proprietary BS into a system. But, if someone know of anything to check there, let me know.

Bulkley
Posts: 6409
Joined: 2006-02-11 18:35
Has thanked: 5 times
Been thanked: 46 times

Re: Random reboots increasing in frequency

#2 Post by Bulkley »

I'd look for a hardware problem. With the cover off use a thin non-conducting probe to gently tap items and connections while the machine works hard (stream a movie ?). A cold soldered connection can be very difficult to find.

These problems can drive you nuts so get yourself a cup of coffee, a pencil and note pad. Use the pad to record what you were doing when the machine rebooted. After a week or two you may find a pattern.

joga
Posts: 20
Joined: 2021-09-27 16:23

Re: Random reboots increasing in frequency

#3 Post by joga »

Yeah, I also think it feels like a HW problem and I've already shaken, tapped, pressed and knocked on various components and connectors, without causing a reboot. I've stalled fans, checked for hotspots with a thermal camera and of course I could do more of that, but it is rather tiresome and since I could not provoke a reboot that way I hoped for something software-wise to either provoke a reboot (such as a more specific stress test) or any kind of logs that could indicate something. Even if it's a HW issue, it's not trivial such as plain old bad RAM, unstable power or overheating. I need my computer to work and cannot spend a week or two just to maybe catch some pattern. But I will do more "poking tests" and tap items and connectors. But I'm not hopeful. I can flex the entire motherboard without a reboot.

Edit: Maybe I'm too HW focused by the way. I mean, random reboots and nothing to find in logs have all the hallmarks of an HW issue. Buuuut, nothing in logs is kind of a PEBKAC, since I really don't know where to look and after what AND all obvious HW tests come back negative. Maybe it is a SW-issue. Not that Linux is know for random reboots without trace exactly, but I can have messed things up. I haven't changed any configuration recently, but it could be something that is badly configured. Bluetooth often bugs out and requires manual reboots and toggling in BIOS/UEFI to get back to work and I have messed with the swap partition (reboots doesn't seem to relate to high memory usage tho). I saw something somewhere on the internet about Prime 95 allocating too much memory and crashing, something about hugepages I didn't understand (Prime 95 gets killed after a few seconds with some settings as per my TP, but doesn't cause reboots).

So, I still haven't been able to exclude software problems. But it has to be a special one, I mean, nothing should be able to f--k up linux soo much.

Bulkley
Posts: 6409
Joined: 2006-02-11 18:35
Has thanked: 5 times
Been thanked: 46 times

Re: Random reboots increasing in frequency

#4 Post by Bulkley »

I once had a mother board that was responsible for random crashes. After weeks of troubleshooting I put the mobo part number into Google and got several hits about exactly my problem. Stressing a computer might help.

Edited to add this thread MAG B650M MORTAR WIFI - 7D76vAC unstable and random reboot. It's a bit confusing to me but it might help you.

joga
Posts: 20
Joined: 2021-09-27 16:23

Re: Random reboots increasing in frequency

#5 Post by joga »

Yeah, thanks for the tip! That problem is not identical, but similar. I've updated BIOS, my version was rather old. Still, that the problems come after 1,5 years and increasing in frequency doesn't smell like a BIOS problem. Time to sleep for me, report back tomorrow!

joga
Posts: 20
Joined: 2021-09-27 16:23

Re: Random reboots increasing in frequency

#6 Post by joga »

Okay, an update...

After I flashed a new BIOS, the stability seems to be fully restored. No particular settings in BIOS, just "optimized defaults". Haven't had a restart today or yesterday. I didn't thought of the BIOS at first, since my computer have been running fine for 1.5 years. Silly modern tech, near impossible to know if/how something slowly has degraded.

Stability is hard to "measure" of course, just time can tell really. I report back again if something changes.

joga
Posts: 20
Joined: 2021-09-27 16:23

Re: Random reboots increasing in frequency

#7 Post by joga »

Okay, it happened again. No reboots since I last wrote until now. I'm still don't have any clue about any journalctl /log stuff I can use. It's utterly frustrating. I don't know how to check sensors, under- or overvolting, temperatures, etc. Searching the net only gives ways to check those things in /var/log, but that's apparently not how things work anymore (with systemd?).

arzgi
Posts: 1614
Joined: 2008-02-21 17:03
Location: Finland
Has thanked: 1 time
Been thanked: 85 times

Re: Random reboots increasing in frequency

#8 Post by arzgi »

Many options, I use gkrellm, which shows a lot, also conky is in the repo, does the same.

lm-sensors if you want do it in the command line.

Post Reply