Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Need help with peripherals or devices?
Message
Author
LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 65 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#21 Post by LE_746F6D617A7A69 »

Thermal throttling means insufficient cooling - this is a fact, not a guess.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

CwF
Global Moderator
Global Moderator
Posts: 2638
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 41 times
Been thanked: 192 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#22 Post by CwF »

GeNe64 wrote:Any tips?
...get your money back.
First off, that isn't exactly server grade hardware.
Most server grade stuff would never throttle due to core temp. The socket temp would be the trigger in a proper setup.

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 65 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#23 Post by LE_746F6D617A7A69 »

CwF wrote:
GeNe64 wrote:Any tips?
...get your money back.
First off, that isn't exactly server grade hardware.
Most server grade stuff would never throttle due to core temp. The socket temp would be the trigger in a proper setup.
+1 ;)
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

GeNe64
Posts: 10
Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#24 Post by GeNe64 »

Finally, I've solved the issue by adding intel_idle.max_cstate=1 to the file /etc/default/grub

Code: Select all

GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 intel_idle.max_cstate=1"

Code: Select all

# update-grub
and rebooting.

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 65 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#25 Post by LE_746F6D617A7A69 »

GeNe64 wrote:Finally, I've solved the issue by adding intel_idle.max_cstate=1
This makes completely no sense.
In Your previous post You said that the connection was lost after a series of warnings saying that the critical temperature has been reached. Practically this means, that the CPU has nearly melted down (max Tjunction is 100deg.C, and the treshold is 95deg.C).
The kernel parameter intel_idle.max_cstate=1 completely disables power saving - how could it help in this situation?
Yes, there was a problem with hard lookups in BayTrail CPUs, where this option was used as a workaround - but this not the case here.

I think that some other factors could have come into play here - like f.e. someone have "fixed" a problem with air conditioning system by opening all the doors and windows in that not-so-cold room ;)
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

GeNe64
Posts: 10
Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#26 Post by GeNe64 »

LE_746F6D617A7A69 wrote:This makes completely no sense.
In Your previous post You said that the connection was lost after a series of warnings saying that the critical temperature has been reached. Practically this means, that the CPU has nearly melted down (max Tjunction is 100deg.C, and the treshold is 95deg.C).
The kernel parameter intel_idle.max_cstate=1 completely disables power saving - how could it help in this situation?
Yes, there was a problem with hard lookups in BayTrail CPUs, where this option was used as a workaround - but this not the case here.

I think that some other factors could have come into play here - like f.e. someone have "fixed" a problem with air conditioning system by opening all the doors and windows in that not-so-cold room ;)
Yep, that's the problem. The bug is very strange and described here https://forum.proxmox.com/threads/rando ... 597/page-3
It's not possible to find anything useful in logs but server crashes all the time.
I was trying to link any weird messages in logs (temp, mce, etc) and crashing but couldn't resolve it anyway.
It's a bug of Intel CPUs that can be fixed by adding intel_idle.max_cstate=1

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 65 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#27 Post by LE_746F6D617A7A69 »

GeNe64 wrote:It's a bug of Intel CPUs that can be fixed by adding intel_idle.max_cstate=1
So it happened Again?! :shock:
Anyway, it's good to know...
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

Post Reply