Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
Diagnosing lock up with Debian
Diagnosing lock up with Debian
I have an older HP tower to run Linux things. I generally run this headless with XRDP, but currently have KVM hooked up because I am trying to debug a periodic lockup.
The machine has been having lock-ups for a long time and I have managed it by turning it off when I am done and back on when I need it. If I leave it running, it locks up within a couple of days.
By lock up I mean I can't connect via Remote Desktop. It does not respond to Ping. The keyboard lights do not change when you press caps lock. The display is also black, but this is probably because the screen goes black after some time to prevent burn in and then the lockup happens.
This is a stock install of the latest Debian with some libraries added. (eclipse, opencv and some related things.) I don't have any particular process running when the lockups hit. It is essentially idle waiting for me to log in.
The syslog and several other logs just show normal activity right up to the lock up. Then there is a long gap till I reboot.
I have tried a memtest. I left it running nearly 24 hours giving me several complete passes. No failures.
I have tried logging the temperatures via a cron task every minute. The temperature of the CPUs right before what I presume was a lockup were 29 and 35 C. (This is all the temp data I get. It is an older machine and I guess doesn't have as much data available.)
The weirdest clue is a garbage characters in the log file. Both my temperature log and the syslog had repeating ^@ in reverse color, which I believe means there are unprintable characters in the log files.
I am guessing that the next thing to check is the power supply, but I am hesitant to simply shotgun this by replacing the power supply without some idea if that is the problem. (How many parts do i replace before I get a new machine?)
Any ideas? Are the garbage characters a useful clue?
Well another clue occurs to me. Previous to putting the temperature test in the cron, the syslog did not have any funny characters. It just ended prior to the lockup. Now there are funny characters and they come right after the line describing that a cron job runs. The log my teperature log script creates has the same funny characters. So these are probably related.
Thanks!
The machine has been having lock-ups for a long time and I have managed it by turning it off when I am done and back on when I need it. If I leave it running, it locks up within a couple of days.
By lock up I mean I can't connect via Remote Desktop. It does not respond to Ping. The keyboard lights do not change when you press caps lock. The display is also black, but this is probably because the screen goes black after some time to prevent burn in and then the lockup happens.
This is a stock install of the latest Debian with some libraries added. (eclipse, opencv and some related things.) I don't have any particular process running when the lockups hit. It is essentially idle waiting for me to log in.
The syslog and several other logs just show normal activity right up to the lock up. Then there is a long gap till I reboot.
I have tried a memtest. I left it running nearly 24 hours giving me several complete passes. No failures.
I have tried logging the temperatures via a cron task every minute. The temperature of the CPUs right before what I presume was a lockup were 29 and 35 C. (This is all the temp data I get. It is an older machine and I guess doesn't have as much data available.)
The weirdest clue is a garbage characters in the log file. Both my temperature log and the syslog had repeating ^@ in reverse color, which I believe means there are unprintable characters in the log files.
I am guessing that the next thing to check is the power supply, but I am hesitant to simply shotgun this by replacing the power supply without some idea if that is the problem. (How many parts do i replace before I get a new machine?)
Any ideas? Are the garbage characters a useful clue?
Well another clue occurs to me. Previous to putting the temperature test in the cron, the syslog did not have any funny characters. It just ended prior to the lockup. Now there are funny characters and they come right after the line describing that a cron job runs. The log my teperature log script creates has the same funny characters. So these are probably related.
Thanks!
- Head_on_a_Stick
- Posts: 14114
- Joined: 2014-06-01 17:46
- Location: London, England
- Has thanked: 81 times
- Been thanked: 133 times
Re: Diagnosing lock up with Debian
Use systemd's journal to analyse your problem: http://forums.debian.net/viewtopic.php?p=659961#p659961
See also https://www.digitalocean.com/community/ ... stemd-logs
See also https://www.digitalocean.com/community/ ... stemd-logs
deadbang
Re: Diagnosing lock up with Debian
I have set up journald to persist log data. Now just to wait for the next lockup.
Re: Diagnosing lock up with Debian
Got another lock up. That was quick.
The journalctl listing does not show anything obvious. The only thing I saw was persistent attempts to do something with the wireless card. Since a WLAN is not configured this always fails. I have disabled the WLAN so now this won't keep coming up, but since it comes up all the time, not just before a lockup, I am skeptical that is it.
I have attached the log of activity from an hour before until the lock-up.
The journalctl listing does not show anything obvious. The only thing I saw was persistent attempts to do something with the wireless card. Since a WLAN is not configured this always fails. I have disabled the WLAN so now this won't keep coming up, but since it comes up all the time, not just before a lockup, I am skeptical that is it.
I have attached the log of activity from an hour before until the lock-up.
- GarryRicketson
- Posts: 5644
- Joined: 2015-01-20 22:16
- Location: Durango, Mexico
Re: Diagnosing lock up with Debian
The forum does not accept attachments, you will need to upload the file
to a storage site and post a link to it. Many member use:
http://paste.debian.net/
to a storage site and post a link to it. Many member use:
http://paste.debian.net/
Re: Diagnosing lock up with Debian
Lockups and freezes are frequently caused by hardware failure. I'd take the covers off and blow the dust out. Reseat all electrical connections including cards and memory. Check for sticking fans.
Consider that the power supply might be failing. Transistors and microprocessors don't like unstable voltages. A failing power supply can drive you to pulling your hair out.
Do a Web search about your mother board. Look for failure issues.
Consider that the power supply might be failing. Transistors and microprocessors don't like unstable voltages. A failing power supply can drive you to pulling your hair out.
Do a Web search about your mother board. Look for failure issues.
Re: Diagnosing lock up with Debian
Hi Bulkley
I am waiting to see if disabling the WiFi works. (I don't know why it would.)
I will try cleaning it out next. I think your are right the PS is the next logical step.
Hadn't thought about googling the MoBo. Thanks!
I am waiting to see if disabling the WiFi works. (I don't know why it would.)
I will try cleaning it out next. I think your are right the PS is the next logical step.
Hadn't thought about googling the MoBo. Thanks!
Re: Diagnosing lock up with Debian
what is this:
Code: Select all
Dec 04 02:08:01 blackhole CRON[5461]: pam_unix(cron:session): session closed for user root
Dec 04 02:09:01 blackhole CRON[5469]: pam_unix(cron:session): session opened for user root by (uid=0)
Dec 04 02:09:01 blackhole CRON[5470]: (root) CMD (/root/templog.sh)
Dec 04 02:09:01 blackhole CRON[5469]: pam_unix(cron:session): session closed for user root
it seems to be running ALLthe time, look at the timestamps, opening & closing in less than a second.
look at /root/templog.sh.
Re: Diagnosing lock up with Debian
Something recently went wrong causing lockups related to gdm3 on my primary workstation. About 2 weeks ago It started randomly (2-3 times a day) halting, kicking me out to the login screen. Troubleshooting after a few days led to switching to lightdm and the problem stopped. No evidence I could find was ever logged anywhere I looked.
Nobody would ever ask questions If everyone possessed encyclopedic knowledge of the man pages.
Re: Diagnosing lock up with Debian
Hi debiman,debiman wrote:what is this:???Code: Select all
Dec 04 02:08:01 blackhole CRON[5461]: pam_unix(cron:session): session closed for user root Dec 04 02:09:01 blackhole CRON[5469]: pam_unix(cron:session): session opened for user root by (uid=0) Dec 04 02:09:01 blackhole CRON[5470]: (root) CMD (/root/templog.sh) Dec 04 02:09:01 blackhole CRON[5469]: pam_unix(cron:session): session closed for user root
it seems to be running ALLthe time, look at the timestamps, opening & closing in less than a second.
look at /root/templog.sh.
This is the cron job I put in to log the temperature. The script is just appending the output of lmsensors to a log file. It also time stamps the entry. I set the script to run once a minute.