Kernel panic with "Multip-processor" Enabled in BIOS

Kernels & Hardware, configuring network, installing services

Kernel panic with "Multip-processor" Enabled in BIOS

Postby dbaron » 2017-03-13 08:55

Hey,

Just changed computer at the office to a new HP Workstation z240 and have massive problems getting it stable.
Had problems already when installing Jessie with random hangs, but after many tries I managed to install. The freezes continued though, and after a while i pinned it down to that when "Multi-processor" is disable in bios everything works as intended.

But this is of course a far from optimal workaround since I really need all the power I can get from my shiny i7-6700.

lscpu with multi-processor disabled in BIOS
Code: Select all
> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Stepping:              3
CPU MHz:               800.000
CPU max MHz:           3401.0000
CPU min MHz:           800.0000
BogoMIPS:              6812.98
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0,1


At one occasion I happened to be in tty1 during a kernel crash, which at least gave me some crash info
Image
http://storage7.static.itmages.com/i/17/0313/h_1489397189_3734387_ca167641b1.jpeg
But I'm very unsure what to make of it.

I have tried to run the latest kernel from jessie-backports (think it was 4.8), but with no success, so changed back to latest stable one again.

Computer info
CPU: Intel i7-6700 Quad Core @ 3.4/4.0 GHz
RAM: 32GB (2x PC4-17000 16GB non-ECC)
HDD: Samsung NVME SSD 512GB
Chassie/Mobo: HP Workstation Z240

I have ruled out a hardware error, but have not tested it extensively. It's a completely new computer, and even though I did not run Windows on it for more than about 30 minutes (just booted the pre-installed windows, ran setup to update BIOS and some other firmwares from whitin windows) before formatting and installing Debian, all problems started as soon as Debian came in to the picture. Also have not tried other distros for the moment since its Debian I want to run, but maybe some live distro from USB could be runt to rule out hardware issues.

Anyone with an idea how to proceed to get this machine fully running?
dbaron
 
Posts: 7
Joined: 2013-07-03 07:26

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby Segfault » 2017-03-13 10:33

I have ruled out a hardware error,

Then rule it back in, because that's what it is. Most likely the motherboard, although out of specs power supply can be the culprit.
Segfault
 
Posts: 412
Joined: 2005-09-24 12:24

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dbaron » 2017-03-13 11:29

Segfault wrote:
I have ruled out a hardware error,

Then rule it back in, because that's what it is. Most likely the motherboard, although out of specs power supply can be the culprit.


Hi thanks for a quick reply. Ruled it out mostly since it never happened in Windows (happens within 0-5 minutes in Linux), and it still makes me a bit skeptical. Just tried a live image of the latest xubuntu with exactly the same error though.
But I'm hearing what you are saying segfault. I sell HP Workstation by the 100s yearly (but with Windows OS) and never had any problems like this before.
But of course it comes when I buy one for my self. My old workstation being 7 years old have run Debian with like 10 reboots totally, most of them for some kind hardware upgrade.

Mobos are hard to troubleshoot, but I agree that (if it's HW) it could be the case, will run some memtest just for the sake of it though. The PSU could possibly be defect, but should not wrong dimensioned since this is a "standard configured machine from HP". Have never in my 15 year life in the IT business seen ha CPU that gives these kind of errors, they usually just work, or just not work, as long as they are under good temps.

Even though I've run Linux for more than a decade on both servers and my own desktop/laptops I rarely do much troubleshooting. Always run stable distros and high quality hardware, so i'm abut out in the wild here. There's no suggestions of last resort tests I can do to rule out a software error before I open a case with HP to get a new mobo?
dbaron
 
Posts: 7
Joined: 2013-07-03 07:26

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dasein » 2017-03-13 13:45

dbaron wrote:Ruled [hardware] out mostly since it never happened in Windows (happens within 0-5 minutes in Linux)

That's not nearly as informative/definitive as you imagine it to be, especially since you say you ran Windows only briefly.

dbaron wrote:I sell HP Workstation by the 100s yearly... and never had any problems like this before.

Irrelevant.

dbaron wrote:There's no suggestions of last resort tests I can do to rule out a software error before I open a case with HP to get a new mobo?

A much better question is how to isolate this hardware problem to the make/model, or to your specific machine.

There are a couple of things you could try. The process will be quite the timesink, but hey, your time is yours to waste as you see fit. And the probability of success is quite low, because what you have is almost certainly a hardware problem.

In approximate order in which I would try them...

- Do what the error message tells you to do and run the error output through mcelog. Consult the manpage for mcelog to learn more. (Never used mcelog myself, so can't provide additional guidance.)

- Use a CPU monitoring tool to verify your untested assumption that only one CPU core is being used.

- Try backing out the firmware update you mentioned. HP explicitly says that the Z240 is "Linux ready," so one imagines that they tested that claim before the original RTM, but maybe not after the latest firmware change(s).

- Your initial post makes no mention of trying either of the distros explicitly mentioned in the Z240 FAQ. Not likely to make a difference, but nothing software-related is particularly likely to make a difference.

- If you still have time to spare, you could try to reproduce the problem in Stretch (currently Debian Testing, but on the fast-track to become the new Stable). Consider also trying to reproduce the problem in Wheezy (currently OLDSTABLE); the Z240 was bleeding-edge hardware two years ago, but regressions do happen.

- Believe the error messages from the kernel panic (along with the basic laws of probability) and call HP to get an RMA number :razz:

(If I think of any other "wild-hairs," I'll add 'em to the list.)
Last edited by dasein on 2017-03-13 13:56, edited 5 times in total.
User avatar
dasein
 
Posts: 7474
Joined: 2011-03-04 01:06
Location: Terra Incantationum

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby stevepusser » 2017-03-13 13:47

Maybe that option is for multiple physical processors, instead of just multiple cores in one? Your lscpu shows two cores with two threads per core (hyperthreading).
The MX Linux repositories: Backports galore! If we don't have something, just ask and we'll try--we like challenges. New packages: AzPainter 2.0.6, Pale Moon 27.3.0, Liquorix kernel 4.11-9, mpv 0.25.0, Kodi 17.3, Ksnip 1.3.1, Mesa 13.0.6
User avatar
stevepusser
 
Posts: 8324
Joined: 2009-10-06 05:53

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dbaron » 2017-03-13 14:18

dasein wrote:
dbaron wrote:Ruled [hardware] out mostly since it never happened in Windows (happens within 0-5 minutes in Linux)

That's not nearly as informative/definitive as you imagine it to be, especially since you say you ran Windows only briefly.

dbaron wrote:I sell HP Workstation by the 100s yearly... and never had any problems like this before.

Irrelevant.

dbaron wrote:There's no suggestions of last resort tests I can do to rule out a software error before I open a case with HP to get a new mobo?

A much better question is how to isolate this hardware problem to the make/model, or to your specific machine.

There are a couple of things you could try. The process will be quite the timesink, but hey, your time is yours to waste as you see fit. And the probability of success is quite low, because what you have is almost certainly a hardware problem.

- Do what the error message tells you to do and run the error output through mcelog. Consult the manpage for mcelog to learn more. (Never used mcelog myself, so can't provide additional guidance.)

- Try backing out the firmware update you mentioned. HP explicitly says that the Z240 is "Linux ready," so one imagines that they tested that claim before the original RTM, but maybe not after the latest firmware change(s).

- Your initial post makes no mention of trying either of the distros explicitly mentioned in the Z240 FAQ. Not likely to make a difference, but nothing software-related is particularly likely to make a difference.

- If you still have time to spare, you could try to reproduce the problem in Stretch (currently Debian Testing, but on the fast-track to become the new Stable). Consider also trying to reproduce the problem in Wheezy (currently OLDSTABLE); the Z240 was bleeding-edge hardware two years ago, but regressions do happen.

- Believe the error messages from the kernel panic (along with the basic laws of probability) and call HP to get an RMA number :razz:

(If I think of any other "wild-hairs," I'll add 'em to the list.)


Well, desein, I don't have to read between the lines to get your point, broken HW, period :roll:
I got stuck on the idea of the HW being fine, I'll change my focus.
I have already started the RMA process. HP support technicians are usually a little bit meh when calling and claiming HW-error with linux. They still want info from the BSOD :cry: Damit.
Thank you segfault and dasein for you input here. I'll report back when it's solved.

stevepusser wrote:Maybe that option is for multiple physical processors, instead of just multiple cores in one? Your lscpu shows two cores with two threads per core (hyperthreading).

Nah, the option "Multi-processor" has nothing to do with how many CPU-sockets that are populated (this model even just has one socket). The setting controls if the CPU should use all the cores or just one.
Disabling it and lscpu lists 1 socket and 1 core per socket. Enabling it and lscpu lists 1 socket and 4 cores per socket.
dbaron
 
Posts: 7
Joined: 2013-07-03 07:26

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dasein » 2017-03-13 14:31

dbaron wrote:Nah, the option "Multi-processor" has nothing to do with how many CPU-sockets that are populated (this model even just has one socket). The setting controls if the CPU should use all the cores or just one.
Disabling it and lscpu lists 1 socket and 1 core per socket. Enabling it and lscpu lists 1 socket and 4 cores per socket.

I take your point, but IMO it'd still be worth running something like htop for a few seconds. Way quicker/easier than packing up your rig, shipping it to HP, waiting for a replacement. etc.

And backing out the firmware change(s) is also probably worth doing. It'd suck to go through the whole return process, flash the BIOS on your replacement rig, only to discover that the flash was/is the root cause of your problem.
User avatar
dasein
 
Posts: 7474
Joined: 2011-03-04 01:06
Location: Terra Incantationum

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby Segfault » 2017-03-13 15:15

Your rig is under warranty, otherwise I'd take a voltmeter and check all the voltages on the ATX connector, will take a few minutes ... (I'm the type who grabs for soldering iron when some electronics fail).
Segfault
 
Posts: 412
Joined: 2005-09-24 12:24

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dbaron » 2017-03-13 15:18

dasein wrote:
dbaron wrote:Nah, the option "Multi-processor" has nothing to do with how many CPU-sockets that are populated (this model even just has one socket). The setting controls if the CPU should use all the cores or just one.
Disabling it and lscpu lists 1 socket and 1 core per socket. Enabling it and lscpu lists 1 socket and 4 cores per socket.

I take your point, but IMO it'd still be worth running something like htop for a few seconds. Way quicker/easier than packing up your rig, shipping it to HP, waiting for a replacement. etc.

And backing out the firmware change(s) is also probably worth doing. It'd suck to go through the whole return process, flash the BIOS on your replacement rig, only to discover that the flash was/is the root cause of your problem.


Yeah, thing is, I did upgrade intel management engine and bios from windows, but might have been I was a bit fast to do that. One of the first things i tried to do after the problems araised was to revert to the older BIOS-version, but HP does not allow downgrade from the currently installed 1.50 Rev.A. After all, my 30 or so minutes in windows was with the older BIOS-version. But that the new bios has a built in restriction to downgrade, stupid, but fact.
My 7 year old rig is still rocking, so will not be to handicapped. And on these enterprise machines i think HP actually sends a technician with new HW to do the repair at my shop.

Your point in you first reply has some truth about it. I have already spent to much time getting this to work, time that I could have done work instead.
dbaron
 
Posts: 7
Joined: 2013-07-03 07:26

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dasein » 2017-03-13 17:38

dbaron wrote:After all, my 30 or so minutes in windows was with the older BIOS-version.

Indeed. (And I notice that the date on that latest BIOS update is ~1 month ago. All the more reason to be at least a little suspicious/cautious.)

the new bios has a built in restriction to downgrade, stupid, but fact.

Lesson learned for when the replacement arrives, then. Might be fun to try to repro this issue on the old BIOS/new mobo combination, just for lulz.

(P.S. Speaking of lessons, please learn to avoid full quotes in your replies.)
User avatar
dasein
 
Posts: 7474
Joined: 2011-03-04 01:06
Location: Terra Incantationum

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dbaron » 2017-03-14 08:35

Well well, some news.

The computer I ordered apparently is a renew silver. For them of you that isn't familiar with these retailer programs it basically means that this computer have been returned to HP as a DOA and then repaired to be sold again. I will let it be unsaid if it's I who missed this when ordering it or my distributor who did not give this information, because I don't know atm but it should be clearly stated and usually brings down the price by 10-20% of the machine.
But after talking to HP-support we could find out that this computer have been returned to HP and that they then have changed the CPU in it before selling it again.
So it doesn't take a genius from that background information to understand that they "repaired" it by replacing the wrong component. Which component that is defected is still not 100% clear, but MOBO that has been suggested in the thread above should be a very good guess, even though PSU and RAM (despite extensive memtesting) can't be totally ruled out. GPU or SSD seems unlikely.

Thanks again for steering me in the right direction.

dasein wrote:Lesson learned for when the replacement arrives, then. Might be fun to try to repro this issue on the old BIOS/new mobo combination, just for lulz.

Believe me when i say, I wont try to reproduce the issue again, especially by flashing a possibly problematic bios without a way back. Because it wont bring any lulz, at least not here :wink: .
dbaron
 
Posts: 7
Joined: 2013-07-03 07:26

Re: Kernel panic with "Multip-processor" Enabled in BIOS

Postby dasein » 2017-03-14 13:26

Huh. Things seem to have changed a lot since my brief acquaintance with the computer manufacturing world. Back in my day, refurb/rework units actually got more love and attention than "factory new."

The reason was simple: profit margins on computer boxen are (or were) so razor-thin that the very act of having to futz with a unit a second time meant that the company basically "broke even" on any unit that they had to rework. Needless to say, they definitely didn't want it coming back for a second rework. So they doted on each refurb unit as if it were the only one they were working on, cooing and coddling, inspecting and reinspecting, testing and retesting. The predictable result was a measurably higher-quality product.

Admittedly (and probably obviously), that was a very long time ago.

dbaron wrote:
dasein wrote:Lesson learned for when the replacement arrives, then. Might be fun to try to repro this issue on the old BIOS/new mobo combination, just for lulz.

Believe me when i say, I wont try to reproduce the issue again, especially by flashing a possibly problematic bios without a way back. Because it wont bring any lulz, at least not here :wink: .

Apologies for my lack of clarity. I was suggesting trying to repro the problem without the flash. But your news regarding the refurb status of your unit makes the whole point moot anyway.
User avatar
dasein
 
Posts: 7474
Joined: 2011-03-04 01:06
Location: Terra Incantationum


Return to System configuration

Who is online

Users browsing this forum: No registered users and 6 guests

fashionable