Problems with a Kingston A2000 1TB (NVMe SSD)

Need help with peripherals or devices?
Post Reply
Message
Author
negora
Posts: 15
Joined: 2016-09-15 06:28
Has thanked: 3 times
Been thanked: 1 time

Problems with a Kingston A2000 1TB (NVMe SSD)

#1 Post by negora »

Hi:

I've a Dell Inspiron 7577 laptop, with Debian 10.4 (Buster) installed in a NVMe SSD. I'm using LUKS 1 encryption and, on top of it, a Btrfs file-system. It works OK.

Some days ago, I purchased a new NVMe SSD for my laptop: a Kingston A2000 with 1 TB capacity. I migrated the SO and all my data from my old SSD to the new one. However, when I turned my laptop on, it hanged after some time of use. This occurred more often when I did massive copies of big files, such as virtual machines. These are the messages that I could catch:

Code: Select all

nvme nvme0: I/O 502 QID 7 timeout, aborting
nvme nvme0: I/O 503 QID 7 timeout, aborting
nvme nvme0: I/O 504 QID 7 timeout, aborting
nvme nvme0: I/O 505 QID 7 timeout, aborting
nvme nvme0: I/O 506 QID 7 timeout, aborting
nvme nvme0: I/O 502 QID 7 timeout, reset controller
Just to try, during the periods that the computer didn't hang, I installed the version 5.6.0 of the kernel, from the Backports repository. But it didn't make a difference.

I also checked the S.M.A.R.T. logs, and everything looked like fine.

This error occurred only when the OS was running in the affected drive. If I run the OS from another hard disk, or from a live image in a pendrive, I couldn't reproduce it.

Has anybody a clue of what's happening?

Thank you.

Update: I've tried the A2000 in a desktop computer, which has an Asrock Z97 Pro4 motherboard, and has worked without issues. I've tried to copy a relatively big virtual machine, and no problems at all. May it be an incompatibility between my laptop model and this hard drive?

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 68 times

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#2 Post by LE_746F6D617A7A69 »

negora wrote:May it be an incompatibility between my laptop model and this hard drive?
It could be, but also it can be a faulty NVMe SSD.
There's a significant difference between Your laptop and Your MB chipsets: Dell Inspiron 7577 has HM175 chipset which has PCI-E 3.0 M.2 slots, while the Asrock Z97 Pro4 has only PCIE-E 2.0 M.2.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

negora
Posts: 15
Joined: 2016-09-15 06:28
Has thanked: 3 times
Been thanked: 1 time

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#3 Post by negora »

LE_746F6D617A7A69 wrote:It could be, but also it can be a faulty NVMe SSD.
First of all, thank you for your answer.

If this were a faulty SSD, Shouldn't it also fail in the desktop computer? In that computer the SSD seems to work OK, though.
LE_746F6D617A7A69 wrote:There's a significant difference between Your laptop and Your MB chipsets: Dell Inspiron 7577 has HM175 chipset which has PCI-E 3.0 M.2 slots, while the Asrock Z97 Pro4 has only PCIE-E 2.0 M.2.
Yes, the hardware of one computer and the other one is very different. I guess that fact may be contributing to this error.

Fortunately, I've found a few threads in other forums that describe similar errors to mine, and a possible solution. The latter consists in disabling certain power states of the drive by incrementing the value of the nvme_core.default_ps_max_latency_us kernel parameter to 5500. In some threads they recommend even disabling all states by setting its value to 0 (at the cost of much more power usage, of course).

The thread that gave me the first hint was Sudden problems when starting VMs, at Unraid. Today, I've read EXT4-fs error after Ubuntu 17.04 upgrade
nvme_core
, at AskUbuntu. And, finally, Solid state drive/NVMe, at the Arch Wiki.

I'll give it a try this weekend. I keep my fingers crossed, he, he, he.

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 68 times

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#4 Post by LE_746F6D617A7A69 »

negora wrote:The thread that gave me the first hint was Sudden problems when starting VMs
Yes, overheating seems to be the right explanation, especially that Your case is very similar: PCI-E 3.0 uses ~2 times higher frequency for data transfer and this means ~2 times more power consumed by data bus drivers -> higher temperatures.
Have You checked the temps reported by SMART?
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

negora
Posts: 15
Joined: 2016-09-15 06:28
Has thanked: 3 times
Been thanked: 1 time

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#5 Post by negora »

LE_746F6D617A7A69 wrote:Yes, overheating seems to be the right explanation, especially that Your case is very similar: PCI-E 3.0 uses ~2 times higher frequency for data transfer and this means ~2 times more power consumed by data bus drivers -> higher temperatures.
Have You checked the temps reported by SMART?
When I did the 2nd attempt of migrating my data, I checked the temperature of the new drive and I believe that it wasn't too high... But I don't remember well.

So, some days ago, I just made another attempt with the "nvme_core.default_ps_max_latency_us" parameter set to "0". And I monitored the temperatures. To my surprise, the Kingston A2000 was cooler while writing than my Toshiba KXG50ZNV256G while reading: 49° C vs. 60° C. So temperatures seem not to be a problem. The Toshiba drive is the one that came with the laptop, and has a small removable plate that acts as heat sink. So I put it on the new drive, of course.

Since that day, I've been using my laptop with the new drive regularly, and it hasn't ever hanged. So that parameter seems to have made "the trick". I've run this command:

Code: Select all

nvme get-feature /dev/nvme0 -f 0x0c -H
And, effectively, APST is disabled now. That's what caused the hangs.

Now it's time to experiment with higher values of "nvme_core.default_ps_max_latency_us". Although I'm not sure if it has any sense, because the unit, when APST was enabled, showed only 2 available power states. If I increase the value of that parameter to disable the deepest state, that will leave me with 1 state only. And I guess that's the one that I'm currently using when APST is disabled. Am I right?

LE_746F6D617A7A69
Posts: 932
Joined: 2020-05-03 14:16
Has thanked: 7 times
Been thanked: 68 times

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#6 Post by LE_746F6D617A7A69 »

It works a little bit different: the "nvme_core.default_ps_max_latency_us" tells the kernel how long it should wait for device response before reporting I/O timeout. So theoretically this period should be set to a value which is higher than the latency reported by SSD - and the APST stays enabled. Alternatively, You can disable the APST using nvme set-feature, but this makes no sense unless the low-power state is causing problems.

Side note: don't trust the temp. sensors too much - the sensors used in HDDs/SSDs/CPUs are of very low quality (linearity/offset) - f.e. I have 2 identical Kingston SSDs with serial numbers that differ by 1(one), and they reporting completely different temps (8 degrees difference) ;)
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed

negora
Posts: 15
Joined: 2016-09-15 06:28
Has thanked: 3 times
Been thanked: 1 time

Re: Problems with a Kingston A2000 1TB (NVMe SSD)

#7 Post by negora »

LE_746F6D617A7A69 wrote:It works a little bit different: the "nvme_core.default_ps_max_latency_us" tells the kernel how long it should wait for device response before reporting I/O timeout. So theoretically this period should be set to a value which is higher than the latency reported by SSD - and the APST stays enabled.
OK, I see. Thanks. I thought that increasing the maximum latency you could disable certain power states that are problematic.

When I've some spare time, I'll try to re-enable APST with a proper value. I hope not to corrupt my file system.

LE_746F6D617A7A69 wrote:Alternatively, You can disable the APST using nvme set-feature, but this makes no sense unless the low-power state is causing problems.
Thank you. I'll take a look to that feature. Although I need to disable it right during the boot of the kernel.

Post Reply