mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Need help with peripherals or devices?
Message
Author
LE_746F6D617A7A69
Posts: 521
Joined: 2020-05-03 14:16

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#21 Post by LE_746F6D617A7A69 »

Thermal throttling means insufficient cooling - this is a fact, not a guess.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

CwF
Posts: 1187
Joined: 2018-06-20 15:16
Has thanked: 2 times
Been thanked: 6 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#22 Post by CwF »

GeNe64 wrote:Any tips?
...get your money back.
First off, that isn't exactly server grade hardware.
Most server grade stuff would never throttle due to core temp. The socket temp would be the trigger in a proper setup.

LE_746F6D617A7A69
Posts: 521
Joined: 2020-05-03 14:16

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#23 Post by LE_746F6D617A7A69 »

CwF wrote:
GeNe64 wrote:Any tips?
...get your money back.
First off, that isn't exactly server grade hardware.
Most server grade stuff would never throttle due to core temp. The socket temp would be the trigger in a proper setup.
+1 ;)
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

GeNe64
Posts: 10
Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#24 Post by GeNe64 »

Finally, I've solved the issue by adding intel_idle.max_cstate=1 to the file /etc/default/grub

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 intel_idle.max_cstate=1"

Code: Select all

# update-grub
and rebooting.

LE_746F6D617A7A69
Posts: 521
Joined: 2020-05-03 14:16

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#25 Post by LE_746F6D617A7A69 »

GeNe64 wrote:Finally, I've solved the issue by adding intel_idle.max_cstate=1
This makes completely no sense.
In Your previous post You said that the connection was lost after a series of warnings saying that the critical temperature has been reached. Practically this means, that the CPU has nearly melted down (max Tjunction is 100deg.C, and the treshold is 95deg.C).
The kernel parameter intel_idle.max_cstate=1 completely disables power saving - how could it help in this situation?
Yes, there was a problem with hard lookups in BayTrail CPUs, where this option was used as a workaround - but this not the case here.

I think that some other factors could have come into play here - like f.e. someone have "fixed" a problem with air conditioning system by opening all the doors and windows in that not-so-cold room ;)
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

GeNe64
Posts: 10
Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#26 Post by GeNe64 »

LE_746F6D617A7A69 wrote:This makes completely no sense.
In Your previous post You said that the connection was lost after a series of warnings saying that the critical temperature has been reached. Practically this means, that the CPU has nearly melted down (max Tjunction is 100deg.C, and the treshold is 95deg.C).
The kernel parameter intel_idle.max_cstate=1 completely disables power saving - how could it help in this situation?
Yes, there was a problem with hard lookups in BayTrail CPUs, where this option was used as a workaround - but this not the case here.

I think that some other factors could have come into play here - like f.e. someone have "fixed" a problem with air conditioning system by opening all the doors and windows in that not-so-cold room ;)
Yep, that's the problem. The bug is very strange and described here https://forum.proxmox.com/threads/rando ... 597/page-3
It's not possible to find anything useful in logs but server crashes all the time.
I was trying to link any weird messages in logs (temp, mce, etc) and crashing but couldn't resolve it anyway.
It's a bug of Intel CPUs that can be fixed by adding intel_idle.max_cstate=1

LE_746F6D617A7A69
Posts: 521
Joined: 2020-05-03 14:16

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#27 Post by LE_746F6D617A7A69 »

GeNe64 wrote:It's a bug of Intel CPUs that can be fixed by adding intel_idle.max_cstate=1
So it happened Again?! :shock:
Anyway, it's good to know...
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

Post Reply