mce: [Hardware Error] emerg level entries in dmesg

Getting your soundcard to work, using Debian on non-i386 hardware, etc

mce: [Hardware Error] emerg level entries in dmesg

Postby kmph » 2020-02-13 18:56

Code: Select all
root@delldebian:~# dmesg --level=emerg
[    0.341027] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: ae0000000040110a
[    0.341027] mce: [Hardware Error]: TSC 0 ADDR ffb070c0 MISC 38a0000086
[    0.341027] mce: [Hardware Error]: PROCESSOR 0:40651 TIME 1581621039 SOCKET 0 APIC 0 microcode 25


Such messages are also shown during boot.

Where does this come from?

AFAIK level emerg is supposed to be reserved for conditions that make the system unusable. Interestingly though, the system seems to work fine - or at least I was able to successfully connect to the internet, issue apt install udisks2, mount a pendrive and save the above results to this pendrive. Also I had installed intel-microcode. (If the system will be able to perform more complex tasks we'll see later)

Additional info, if this is relevant:

Code: Select all
root@delldebian:~# dmesg | grep mce
[    0.341027] mce: [Hardware Error]: Machine check events logged
[    0.341027] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: ae0000000040110a
[    0.341027] mce: [Hardware Error]: TSC 0 ADDR ffb070c0 MISC 38a0000086
[    0.341027] mce: [Hardware Error]: PROCESSOR 0:40651 TIME 1581621039 SOCKET 0 APIC 0 microcode 25
[    1.799985] mce: Using 7 MCE banks


Code: Select all
root@delldebian:~# dmesg | grep microcode
[    0.000000] microcode: microcode updated early to revision 0x25, date = 2019-02-26
[    0.341027] mce: [Hardware Error]: PROCESSOR 0:40651 TIME 1581621039 SOCKET 0 APIC 0 microcode 25
[    1.800050] microcode: sig=0x40651, pf=0x40, revision=0x25
[    1.800133] microcode: Microcode Update Driver: v2.2.


What is going on here and how to fix it?

EDIT: Conforming to the requirement to STFW before posting: Googling revealed two somewhat similar threads:

  • https://forum.manjaro.org/t/hardware-error-cpu-0-tsc-0-and-processor-0-on-boot/70200 - but the 'solution' posted there is, IIUC, to shut down the alarms without diagnosing or eliminating their cause :( Is it really the correct course of action?
  • https://software.intel.com/en-us/forums/intel-c-compiler/topic/779705 - this might be enlightening - if the issue is indeed similar than this is a bug in BIOS? Then I'm screwed because I cannot fiddle with BIOS settings sadly :( (Set a password a few years ago and forgot the password promptly) Maybe a BIOS update is available but how do I flash BIOS without Windows. Though this is likely to be a different issue because error codes do not match.
kmph
 
Posts: 1
Joined: 2020-02-13 18:45

Re: mce: [Hardware Error] emerg level entries in dmesg

Postby Dai_trying » 2020-02-14 07:39

If you have a hardware error there is very little that can be advised to you without providing the hardware information, (I use inxi -Fxx to get as much information as is available).
Dai_trying
 
Posts: 864
Joined: 2016-01-07 12:25

Re: mce: [Hardware Error] emerg level entries in dmesg

Postby Head_on_a_Stick » 2020-02-15 11:17

Use the rasdaemon package to analyse the failure, see the relevant man pages for more detail.

To reset the firmware ("BIOS") password try removing the battery pack and motherboard CMOS battery for a minute or so. My ThinkPad's firmware can be updated using the ISO image provided by Lenovo, no Windows required. Otherwise perhaps try FreeDOS to run .exe files.
User avatar
Head_on_a_Stick
 
Posts: 11114
Joined: 2014-06-01 17:46
Location: /dev/chair


Return to Hardware

Who is online

Users browsing this forum: No registered users and 6 guests

fashionable