Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

?howto troubleshoot? systems dies

If none of the specific sub-forums seem right for your thread, ask here.
Post Reply
Message
Author
te36
Posts: 12
Joined: 2018-01-29 17:36

?howto troubleshoot? systems dies

#1 Post by te36 »

I am out of idead what i could do to troubleshoot the situation when in a certain condition the system just dies/hangs.
I see nothing on console nor on any log files after reboot.

Any options to enable more troubleshooting of the kernel or the like ?

The specific condition under which this happens is when hd-idle (SCSI stop unit ioctl()) is executed on one disk while other disks perform a lot of I/O (e.g.: copy from disk 1 to disk 2, switch off disk 3). If disks do not have a lot of I/O activity, it works.

I first had this on a banana PI with 5 SATA HDD connected via SATA port port multiplier, but i attributed the problem to old bananian, or port multiplier issues.

Now i upgraded to rockpi4b where the 5 SATA HDD are connected via a 5x SATA miniPCIe card, system running the standard debian stretch (4.4 kernel) from radxas web page. Same effect.

Of course, i could blame the non-amd64 hardware and try to put the miniPCIe card into an amd64 system, or blame the non-latest kernel and try to update it, but what would i do if i had the same problem on amd64 with the latest kernel ? Nobody could try to figure out the problem with more than NO diagnostics that i have right now.

User avatar
ComputerBob
Posts: 1181
Joined: 2007-11-30 04:49
Location: The Mountains of the Sunshine State
Been thanked: 1 time

Re: ?howto troubleshoot? systems dies

#2 Post by ComputerBob »

Many years ago, I ran into a similar problem. Sudden, random freezes, with nothing in the log files to indicate what had gone wrong.

In my case, the problem ended up being just ONE of the drive connector "tubes" fitting onto one of the drive pins too loosely (apparently, microscopically), so that, when the drive had to do a lot of work, that one pin vibrated enought (again, apparently microscopically) to cause it to intermittently disconnect. Pinching it a tiny bit, to "tighten it", solved the problem to this day.

It sounds like that may NOT be your problem, but maybe it could give you some ideas of things to check on your system. Good luck -- I know it can be very frustrating.
ComputerBob - Making Geek-Speak Chic (TM)
ComputerBob.com - Nearly 6,000 Posts and 23 Million Views
My Massive Stroke
Help! (off-topic)
_________________
Your Life Matters

Bulkley
Posts: 6383
Joined: 2006-02-11 18:35
Has thanked: 2 times
Been thanked: 39 times

Re: ?howto troubleshoot? systems dies

#3 Post by Bulkley »

Intermittent problems can drive one crazy. Keep a pencil and pad next to your machine and note every detail when it crashes.

Intermittent freezes are frequently hardware related such as the bad connection ComputerBob found. A weak power supply can cause unpredictable failures. Resource happy software can stress components that normally cruise along.

Get the covers off an blow out the dust. Make sure all fans are loose and spin freely. Disconnect and reconnect every plug/jack including auxiliary circuit boards and memory sticks. Do a memory stress test.

As to software, have you installed anything not from an official Debian repository appropriate for your version?

User avatar
eor2004
Posts: 251
Joined: 2013-10-01 22:49
Location: Puerto Rico
Has thanked: 4 times
Been thanked: 5 times

Re: ?howto troubleshoot? systems dies

#4 Post by eor2004 »

Hi, in my experience system hangs are related to either CPU damage or CPU High Temperatures or HDD going bad, check your CPU cooling system for dust and/or dirt collection or dry thermal grease, also check your HDD for "SMART" warnings, also check if the HDD LED activity light keeps bright on steadily all the time and doesn't blink at all when you're having the issue, this can indicate HDD problems, to test if the power supply is sending enough electric power, disconnect any device that is not necesary to boot the system and see if the issue comes back again, good luck!

P.S. I would also make a visual inspection of the motherboard for any damaged circuits like capacitors and connectors, ect...
Debian 12 Gnome on a MSI H61M-P25 (B3) PC & on a Dell Latitude E6410 & HP EliteBook 8540p Laptops.
LMDE 6 on a Panasonic ToughBook CF-C1 Laptop.
Bodhi Linux 7 on a HP Compaq DC5750 Small Form Factor PC.
Windows 11 on a Intel DH55TC PC.

te36
Posts: 12
Joined: 2018-01-29 17:36

Re: ?howto troubleshoot? systems dies

#5 Post by te36 »

ComputerBob wrote: In my case, the problem ended up being just ONE of the drive connector "tubes" fitting onto one of the drive pins too loosely (apparently, microscopically), so that, when the drive had to do a lot of work, that one pin vibrated enought (again, apparently microscopically) to cause it to intermittently disconnect. Pinching it a tiny bit, to "tighten it", solved the problem to this day.
Thanks. My issue always only happens under reproducable circumstances: high-load on two disks, triggering hd-idle on another.
Doing hd-idle all day long when there is no load on other disks works fine.

Of course it could still be an electrical issue of some timing / signal on the PCIex4 to the host going awray, one never knows, but i was hoping that anything like this should never cause the CPU to freeze up. Especially given how i can pefectly run the 5 disks in parallel at sustained 430MByte/sec, which is a lot more load than the test where the SCSI stop causes the issue.

User avatar
Head_on_a_Stick
Posts: 14114
Joined: 2014-06-01 17:46
Location: London, England
Has thanked: 81 times
Been thanked: 132 times

Re: ?howto troubleshoot? systems dies

#6 Post by Head_on_a_Stick »

Enable persistent logging:

Code: Select all

# mkdir -p /var/log/journal
Then use the systemd journal to investigate.
deadbang

Post Reply