Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Hardware] [Solved] SMART error (CurrentPendingSector) detected on host (SSD disk)

Need help with peripherals or devices?
Post Reply
Message
Author
nimmis
Posts: 5
Joined: 2023-03-20 11:18

[Hardware] [Solved] SMART error (CurrentPendingSector) detected on host (SSD disk)

#1 Post by nimmis »

I have a problem on my debian system (Proxmox 7.2, Debian 11) , regarding smarterror

I have googled and read a lot about this problem and I can't understad the impact and if there is any more actions I need to do to acknowledge the error (complete the reallocation?)

The questing is this an error or an indication that 2 sectors where reallocated to new space on the SSD disk?

if it's an error, how to fix it w/o destroying data
If its no error, how do to stop smartd report it as an error

I get this message each day from the system

Code: Select all

This message was generated by the smartd daemon running on:

   host name:  ****
   DNS domain: ****.**

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], 2 Currently unreadable (pending) sectors

Device info:
INTENSO, S/N:AA000000000000001088, FW:U0803A0, 1.02 TB

smartctl -A /dev/sda give the following

Code: Select all

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.85-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
D# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   050    Old_age   Always       -       0
  5 Reallocated_Sector_Ct   0x0032   100   100   050    Old_age   Always       -       2
  9 Power_On_Hours          0x0032   100   100   050    Old_age   Always       -       7957
 12 Power_Cycle_Count       0x0032   100   100   050    Old_age   Always       -       32
160 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       0
161 Unknown_Attribute       0x0033   100   100   050    Pre-fail  Always       -       97
163 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       21
164 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       31453
165 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       152
166 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       1
167 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       22
168 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       5050
169 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       100
175 Program_Fail_Count_Chip 0x0032   100   100   050    Old_age   Always       -       0
176 Erase_Fail_Count_Chip   0x0032   100   100   050    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0032   100   100   050    Old_age   Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       2
181 Program_Fail_Cnt_Total  0x0032   100   100   050    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   050    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always       -       18
194 Temperature_Celsius     0x0022   100   100   050    Old_age   Always       -       40
195 Hardware_ECC_Recovered  0x0032   100   100   050    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   050    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0032   100   100   050    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   050    Old_age   Always       -       0
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       97
241 Total_LBAs_Written      0x0030   100   100   050    Old_age   Offline      -       116263
242 Total_LBAs_Read         0x0030   100   100   050    Old_age   Offline      -       163128
245 Unknown_Attribute       0x0032   100   100   050    Old_age   Always       -       85464
see ID 197 where smartd gets it error from


doing dd if=/dev/sda of=/dev/null gives the following output without any error

Code: Select all

2000409264+0 records in
2000409264+0 records out
1024209543168 bytes (1.0 TB, 954 GiB) copied, 4605.12 s, 222 MB/s
layout of /dev/sda

Code: Select all

fdisk -l /dev/sda
Disk model: INTENSO
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 69617857-682C-49D5-8E64-4F0774C39FC1

Device       Start        End    Sectors   Size Type
/dev/sda1       34       2047       2014  1007K BIOS boot
/dev/sda2     2048    1050623    1048576   512M EFI System
/dev/sda3  1050624 2000409230 1999358607 953.4G Linux LVM
------------------------------------------------------------------
and lvm properties

Code: Select all

/dev/sda3 is a lvm - pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda3
  VG Name               pve
  PV Size               <953.37 GiB / not usable <1.32 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              244062
  Free PE               4095
  Allocated PE          239967
  PV UUID               v7GP0v-xw6D-N8Cc-2plH-njY6-GsA7-dL9JIG
Last edited by nimmis on 2023-03-21 10:25, edited 2 times in total.

User avatar
sunrat
Administrator
Administrator
Posts: 6412
Joined: 2006-08-29 09:12
Location: Melbourne, Australia
Has thanked: 116 times
Been thanked: 462 times

[Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#2 Post by sunrat »

Please us CODE tags for terminal text output.

For a drive with only 7k hours it's a bad sign. It may still work for some time but if the count increases, be watchful. My main system one has 22k hours with no reallocated sectors.
Make sure all the data is backed up.
Read some of these search links and decide for yourself - https://www.startpage.com/do/dsearch?qu ... _Sector_Ct
“ computer users can be divided into 2 categories:
Those who have lost data
...and those who have not lost data YET ”
Remember to BACKUP!

nimmis
Posts: 5
Joined: 2023-03-20 11:18

Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#3 Post by nimmis »

Updated the post (when approved)

steve_v
df -h | grep > 20TiB
df -h | grep > 20TiB
Posts: 1400
Joined: 2012-10-06 05:31
Location: /dev/chair
Has thanked: 79 times
Been thanked: 175 times

Re: [Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#4 Post by steve_v »

nimmis wrote: 2023-03-20 11:52is this an error or an indication that 2 sectors where reallocated to new space on the SSD disk?
It's an indication that the drive currently has 2 sectors it knows are unreliable, and it will reallocate them if the next write-read cycle at that location fails. That's how it usually works at any rate, but exact behaviour depends on the device firmware.
That's slightly different from Reallocated_Sector_Ct and Used_Rsvd_Blk_Cnt, which in your case indicate 2 other sectors were already reallocated.
nimmis wrote: 2023-03-20 11:52if it's an error, how to fix it w/o destroying data
Run an "offline" self-test (probably), try to write to those sectors, or run something like hdrecover (read the documentation and heed the warnings). Again, exactly what will provoke the drive into making up it's mind about the reliability of those sectors is up to it's firmware.
In any case, the only safe way to avoid destroying data is to have a backup. Even if those sectors are still readable (or located in unused space), reallocations are, at least IMO, cause to be very suspicious of a drives reliability. Everyone should have backups anyway.
nimmis wrote: 2023-03-20 11:52how do to stop smartd report it as an error
Read the smartd.conf manual, specifically WRT the '-i' parameter... But you probably don't want to ignore such things.

Aside, I've never heard of "Intenso" SSDs, and they appear to be suspiciously cheap... Again, back up your data.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.

nimmis
Posts: 5
Joined: 2023-03-20 11:18

Re: [Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#5 Post by nimmis »

steve_v wrote: 2023-03-20 13:13
nimmis wrote: 2023-03-20 11:52is this an error or an indication that 2 sectors where reallocated to new space on the SSD disk?
It's an indication that the drive currently has 2 sectors it knows are unreliable, and it will reallocate them if the next write-read cycle at that location fails. That's how it usually works at any rate, but exact behaviour depends on the device firmware.
That's slightly different from Reallocated_Sector_Ct and Used_Rsvd_Blk_Cnt, which in your case indicate 2 other sectors were already reallocated.


nimmis wrote: 2023-03-20 11:52if it's an error, how to fix it w/o destroying data
Run an "offline" self-test (probably), try to write to those sectors, or run something like hdrecover (read the documentation for more). Again, exactly what will provoke the drive into making up it's mind about the reliability of those sectors is up to it's firmware.
In any case, the only safe way to avoid destroying data is to have a backup. Even if those sectors are still readable, reallocations are (at least IMO) cause to be very suspicious of a drives reliability. Everyone should have backups anyway.
nimmis wrote: 2023-03-20 11:52how do to stop smartd report it as an error
Read the smartd.conf manual, specifically WRT the '-i' parameter... But you probably don't want to ignore such things.
as I read on other forums
Relocated sectors on SSD are similar to those on HDD. It just means slightly different things.
SSD is expected to develop relocated sectors. This is normal operation.
On HDD - relocated sectors are considered pre-failure warning.
The problem is that running smartd does not think it's an error (se missed clip of smarctl) and a dd don't produce an read error

Code: Select all

SMART Self-test log structure revision number 1
Num  Test_Description    Status          thy         Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7958         -
# 2  Extended offline    Completed without error       00%      7646         -
and ofc I don't want to ignore actual errors and I have 14 TB local backup and 10TB off-site. I whant to know if I need to install a new SSD immediate or if it's a warning.

SSD do get blocks that stop working, the difference between a cheap SSD and a Server-grade is the number of extra blocks there are on the disk. When they are used you will get loss of data. I assume

Code: Select all

232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       97
this show how many more blocks that can fail before the disk is "bad".

nimmis
Posts: 5
Joined: 2023-03-20 11:18

Re: [Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#6 Post by nimmis »

Looking at smart attributes (on this list https://www.cropel.com/library/smart-at ... -list.aspx)

these id can be guiding

IDNameDescription
178Used Reserved Block CountOn an SSD, this attribute describes the state of the reserve block pool. The value of the attribute shows the percentage of the pool remaining. The Raw value sometimes contains the actual number of used reserve blocks.
197Current Pending SectorsThe number of unstable sectors which are waiting to be re-tested and possibly remapped.
232Available Reserved SpaceThe attribute is used in SSDs to denote the remaining reserved space. The value counts down, typically from 100 to 0. The Raw value is vendor-specific.
looking at those parameters on my disk gives

Code: Select all

178 Used_Rsvd_Blk_Cnt_Chip  0x0032   100   100   050    Old_age   Always       -       2
197 Current_Pending_Sector  0x0032   100   100   050    Old_age   Always       -       2
232 Available_Reservd_Space 0x0032   100   100   050    Old_age   Always       -       97
I my mind the ID#197 is present because the system has not written on these sectors again and I would have 97% left of realocatable sectors.

Am I wrong in my conclusion?

steve_v
df -h | grep > 20TiB
df -h | grep > 20TiB
Posts: 1400
Joined: 2012-10-06 05:31
Location: /dev/chair
Has thanked: 79 times
Been thanked: 175 times

Re: [Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#7 Post by steve_v »

nimmis wrote: 2023-03-20 14:04Am I wrong in my conclusion?
Not at all. That's pretty much exactly what I said earlier.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.

nimmis
Posts: 5
Joined: 2023-03-20 11:18

Re: [Hardware] Re: [Hardware] SMART error (CurrentPendingSector) detected on host

#8 Post by nimmis »

steve_v wrote: 2023-03-20 19:49
nimmis wrote: 2023-03-20 14:04Am I wrong in my conclusion?
Not at all. That's pretty much exactly what I said earlier.
Sorry about not understanding that, thank you for clarifying

laserholle
Posts: 1
Joined: 2023-04-20 11:34

Re: [Hardware] [Solved] SMART error (CurrentPendingSector) detected on host (SSD disk)

#9 Post by laserholle »

Hello nimmis,
sorry for my bad english..
I found your thread because I searched for Intenso FW U0803A0...
I think the firmware has problems, I uses about 15 SSD from intenso. 7 of them has this firmware and all got several problems between 1 Week and 4 month of usage. The other 8 have another firmware and they are stable...

BR, holle

Post Reply