SSD (hynix BC501) failure using debian

Getting your soundcard to work, using Debian on non-i386 hardware, etc

SSD (hynix BC501) failure using debian

Postby paratrap » 2020-07-20 12:36

Not sure wher should I report this, but my SSD failing under debian, and works fine uder Ubuntu. I've tested it for a bit now and ubuntu gives me no issue for about a week, when debian give me a lot of filesystem crashes (currpoted files and file system it self, apt report broken /var/lib/dpkg/*.list files or filesytem starts autocheck, or games failed to start reporting corrupted game data files) once a week. I suspect issues with SSD + kernel. According to hw-probe my hardware has no issues: https://linux-hardware.org/?probe=8e5e5de5c3

HFM512GDJTNG-8310A
Device: 02:00.0 Non-Volatile memory controller [0108]: SK hynix BC501 NVMe Solid State Drive 512GB [1c5c:1327]

My config is:

Code: Select all
axet@axet-laptop:~$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex [1022:15d0]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU [1022:15d1]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] [1022:15d3]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] [1022:15d3]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A [1022:15db]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B [1022:15dc]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0 [1022:15e8]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1 [1022:15e9]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2 [1022:15ea]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3 [1022:15eb]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4 [1022:15ec]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5 [1022:15ed]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6 [1022:15ee]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7 [1022:15ef]
01:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 78)
02:00.0 Non-Volatile memory controller [0108]: SK hynix BC501 NVMe Solid State Drive 512GB [1c5c:1327]
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c2)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller [1002:15de]
03:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
03:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 [1022:15e0]
03:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 [1022:15e1]
03:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2]
03:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
04:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 61)
axet@axet-laptop:~$ cat /sys/devices/virtual/dmi/id/product_name
ZenBook UX431DA_UM431DA


Debian 10:
vmlinuz-5.6.0-0.bpo.2-amd64

Ubuntu 20.04:
axet@axet-laptop:~$ uname -r
5.4.0-29-generic
Last edited by paratrap on 2020-07-25 16:00, edited 2 times in total.
paratrap
 
Posts: 34
Joined: 2010-09-05 13:08

Re: SSD (hynix BC501) failure using debian

Postby Head_on_a_Stick » 2020-07-20 16:44

Have you checked the filesystem integrity? Does the problem still occur with the stock Debian stable kernel?
Black Lives Matter

Debian buster-backports ISO image: for new hardware support
User avatar
Head_on_a_Stick
 
Posts: 12316
Joined: 2014-06-01 17:46
Location: /dev/chair

Re: SSD (hynix BC501) failure using debian

Postby RU55EL » 2020-07-20 17:10

paratrap,

Please use the 'Code' button to post information. For example:

Code: Select all
axet@axet-laptop:~$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex [1022:15d0]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 IOMMU [1022:15d1]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] [1022:15d3]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] [1022:15d3]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A [1022:15db]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B [1022:15dc]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0 [1022:15e8]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1 [1022:15e9]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2 [1022:15ea]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3 [1022:15eb]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4 [1022:15ec]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5 [1022:15ed]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6 [1022:15ee]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7 [1022:15ef]
01:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 78)
02:00.0 Non-Volatile memory controller [0108]: SK hynix BC501 NVMe Solid State Drive 512GB [1c5c:1327]
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Picasso [1002:15d8] (rev c2)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller [1002:15de]
03:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
03:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 [1022:15e0]
03:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 [1022:15e1]
03:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor [1022:15e2]
03:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller [1022:15e3]
04:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 61)
axet@axet-laptop:~$ cat /sys/devices/virtual/dmi/id/product_name
ZenBook UX431DA_UM431DA



It helps make your posts easier to read.
User avatar
RU55EL
 
Posts: 471
Joined: 2014-04-07 03:42
Location: /home/russel

Re: SSD (hynix BC501) failure using debian

Postby paratrap » 2020-07-21 12:48

I'm testing (using) ubuntu for next few weeks to see if problem 100% gone with ubuntu. I'm 100% positive about issues with debian kernel (?). Maybe anyone can suggest good and simple stress filessystem test I can run on debian to speed up testing? This test should create a lot of files and directories, delete it in random order and read for integrity checks in random order?

It started when I got random filesystem integrity failure during the boot. Then I start to notice apt failed with "unrecoverable fatal error, aborting: files list file for package 'PACKAGENAME' is missing final newline" or "dpkg: unrecoverable fatal error, aborting: files list file for package 'PACKAGENAME' contains empty filename" . Or apt files completely empty. Or my games (Path Of Exile size of 23GB) fails to start with integrity check failure.

So, it something writing random sectors with random data. At the same time filesystem integrity just fine (IT not always corrupting index files).
paratrap
 
Posts: 34
Joined: 2010-09-05 13:08

Re: SSD (hynix BC501) failure using debian

Postby LE_746F6D617A7A69 » 2020-07-21 13:32

1. Have You tested the Debian stable stock kernel, as suggested by HOAS?
2. You should check the kernel logs for entries related to filesystem and the drive itself
3. Check the SMART report for the SSD.

Are You dual-booting Debian and Ubuntu? - I mean do You have 2 partitions on that SSD one for each OS?
What file system are You using? If it's ext4, then please post the output of:
Code: Select all
tune2fs -l </dev/sdX> #X is the target partition number

Do You have TRIM enabled? (the discard option for mount)

paratrap wrote:Maybe anyone can suggest good and simple stress filessystem test I can run on debian to speed up testing?

This is a very bad idea, for many, many reasons, and this is the most important one:
SLC=100K writes/page, PSLC/PMLC/MLC=10..20K, TLC=3..5K, QLC=1..2K, upcoming PLC=~0.5K writes/page <- this is not a joke, unfortunately...
Your SSD is built on QLC probably, just like most SSDs today.
(Hynix decided to hide the specs, including the TBW factor, which is really bad: BC501_(M.2_2280_S3) )
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 280
Joined: 2020-05-03 14:16

Re: SSD (hynix BC501) failure using debian

Postby paratrap » 2020-07-21 13:54

1. No. For two reasons: 1) i need recent kernel for my amd video card to play games 2) I'm testing ubuntu right now
2. Kinda. I'v seen system journal complains about damaged and truncating logs as result. But I haven't seen any related logs. I guess here is none.
3) smart quite empty for SSD devices:

Code: Select all
axet@axet-laptop:~$ sudo smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-29-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       HFM512GDJTNG-8310A
Serial Number:                      CD9AN74501410AR62
Firmware Version:                   80001C00
PCI Vendor/Subsystem ID:            0x1c5c
IEEE OUI Identifier:                0xace42e
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512 110 190 592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Jul 21 17:39:43 2020 MSK
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0016):     Wr_Unc DS_Mngmt Sav/Sel_Feat
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     81 Celsius
Critical Comp. Temp. Threshold:     82 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +   3.5000W       -        -    0  0  0  0        5       5
 1 +   2.4000W       -        -    1  1  1  1       30      30
 2 +   1.9000W       -        -    2  2  2  2      100     100
 3 -   0.0350W       -        -    3  3  3  3     1000    1000
 4 -   0.0035W       -        -    3  3  3  3     1000    5000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    26%
Data Units Read:                    32 295 828 [16,5 TB]
Data Units Written:                 24 013 318 [12,2 TB]
Host Read Commands:                 397 387 625
Host Write Commands:                379 587 224
Controller Busy Time:               3 415
Power Cycles:                       408
Power On Hours:                     568
Unsafe Shutdowns:                   46
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               42 Celsius
Temperature Sensor 2:               42 Celsius

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged




4. I'm using one partition. Actially I'm hot swapping OS debian and ubuntu from / to /old. I wrote 'swap' static binary for that purpose. Before you say anything: it works!

https://gitlab.com/axet/swap

Code: Select all
axet@axet-laptop:~$ sudo tune2fs -l /dev/nvme0n1p3
tune2fs 1.45.5 (07-Jan-2020)
tune2fs: Bad magic number in super-block while trying to open /dev/nvme0n1p3
/dev/nvme0n1p3 contains a BitLocker file system
axet@axet-laptop:~$ sudo tune2fs -l /dev/mapper/vg0-root
tune2fs 1.45.5 (07-Jan-2020)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          fed361a4-7c82-4e60-8fd6-ed7585b72417
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              28049408
Block count:              112174080
Reserved block count:     5480241
Free blocks:              13855741
Free inodes:              25545033
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1018
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sun Feb 16 12:21:12 2020
Last mount time:          Tue Jul 21 13:17:47 2020
Last write time:          Tue Jul 21 13:17:47 2020
Mount count:              12
Maximum mount count:      30
Last checked:             Sat Jul 18 17:36:45 2020
Check interval:           15552000 (6 months)
Next check after:         Thu Jan 14 17:36:45 2021
Lifetime writes:          7582 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:             256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
First orphan inode:       14695367
Default directory hash:   half_md4
Directory Hash Seed:      be47dc56-e8d5-447c-9238-26b3e79cc025
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0xb809db46
axet@axet-laptop:~$


I see. You meant it will burnout very soon?

> Do You have TRIM enabled? (the discard option for mount)

I guess no. My fstab pretty clean. Simple mount "errors=remount-ro 0 1"

...

I just enabled it. Look good. I had to change /etc/crypttab and /etc/fstab add "discard" option in booth files. After:

Code: Select all
root@axet-laptop:/home/axet# fstrim -v /
/: 52,7 GiB (56585904128 bytes) trimmed
root@axet-laptop:/home/axet#
paratrap
 
Posts: 34
Joined: 2010-09-05 13:08

Re: SSD (hynix BC501) failure using debian

Postby negora » 2020-07-21 14:42

May it be a similar problem to mine?: Problems with a Kingston A2000 1TB (NVMe SSD). My drive seems to not work fine while Autonomous Power State Transition (APST) is on.
negora
 
Posts: 13
Joined: 2016-09-15 06:28

Re: SSD (hynix BC501) failure using debian

Postby LE_746F6D617A7A69 » 2020-07-21 15:58

paratrap wrote:2. Kinda. I'v seen system journal complains about damaged and truncating logs as result. But I haven't seen any related logs. I guess here is none.
What about kern.log?
I mean, if the logs are truncated due to filesystem errors, then You can't know if there were any nvme I/O errors.
I would suggest to mount some USB drive as /var/log to keep the log data consistent.
paratrap wrote:3) smart quite empty for SSD devices:
It depends on the manufacturer - Hynix indeed doesn't show much...
paratrap wrote:4. I'm using one partition. Actially I'm hot swapping OS debian and ubuntu from / to /old. I wrote 'swap' static binary for that purpose. Before you say anything: it works!
Your "swap" method is very clever -> it allows to share free drive space for both OSes - I like it :)
paratrap wrote:I see. You meant it will burnout very soon?
In case of stress testing - yes, especially that usable write endurance is much lower than the declared theoretical maximum.

To summarize: it's very unlikely that the file system gets corrupted without any I/O errors - it would mean that the device is faulty - but apparently it isn't, since it works correctly with v5.4 kernel.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 280
Joined: 2020-05-03 14:16

Re: SSD (hynix BC501) failure using debian

Postby paratrap » 2020-07-25 18:16

I've done testing Ubuntu 20.04 it works fine. I face no SSD issues. Moving back to Debian 10 kernel-5.4.
paratrap
 
Posts: 34
Joined: 2010-09-05 13:08


Return to Hardware

Who is online

Users browsing this forum: No registered users and 7 guests

fashionable