disk failure when laptop is unplugged

Getting your soundcard to work, using Debian on non-i386 hardware, etc

disk failure when laptop is unplugged

Postby hussainmkj » 2017-07-31 18:30

I have a Dell Precision M7510 laptop running Debian Stretch (installed August 2016),
with an Intel Pro 2500 series 2.5' 256GB SSD (part #SSDSC2BF256A5).
The root partition is EXT4, and the boot partition is EXT2.


Recently (after June 24, 2017), the laptop has been experiencing SSD IO failures when it is not plugged in,
which make the machine unusable.

I have appended below the dmesg output when the adapter is unplugged then plugged in.

When it is plugged in, there are no disk issues whatsoever; fsck, badblocks, and smartctl all report a completely healthy disk;
the output from smartctl is also appended below.

I have tried downgrading the kernel, from 4.9.0-3 to 4.9.0-2, 4.7.8-1, and 4.6.4-1, with no effect, so it is definitely not a kernel regression.

The failure happens both when laptop-mode-tools is enabled and disabled.

I would appreciate any help in identifying the cause of this failure and hopefully fixing it.

====== dmesg ======
ACPI Error: [\_SB_.PCI0.LPCB.H_EC.CHRG] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
ACPI Error: Method parse/execution failed [\PNOT] (Node ffff9282c10d31b8), AE_NOT_FOUND (20160831/psparse-543)
ACPI Error: Method parse/execution failed [\_SB.AC._PSR] (Node ffff9282c10e62d0), AE_NOT_FOUND (20160831/psparse-543)
ACPI Exception: AE_NOT_FOUND, Error reading AC Adapter state (20160831/ac-128)
ahci 0000:00:17.0: port does not support device sleep
EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,data=ordered,commit=600
ACPI Error: [\_SB_.PCI0.LPCB.H_EC.CHRG] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
ACPI Error: Method parse/execution failed [\PNOT] (Node ffff9282c10d31b8), AE_NOT_FOUND (20160831/psparse-543)
ACPI Error: Method parse/execution failed [\_SB.AC._PSR] (Node ffff9282c10e62d0), AE_NOT_FOUND (20160831/psparse-543)
ACPI Exception: AE_NOT_FOUND, Error reading AC Adapter state (20160831/ac-128)
ata4.00: exception Emask 0x0 SAct 0x3000000 SErr 0x10000 action 0x6 frozen
ata4: SError: { PHYRdyChg }
ata4.00: failed command: WRITE FPDMA QUEUED
ata4.00: cmd 61/02:c0:02:08:10/00:00:00:00:00/40 tag 24 ncq dma 1024 out
res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: failed command: WRITE FPDMA QUEUED
ata4.00: cmd 61/80:c8:d0:0b:dd/00:00:0d:00:00/40 tag 25 ncq dma 65536 out
res 40/00:01:c8:ff:dd/00:00:1d:00:00/e0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: configured for UDMA/133
ahci 0000:00:17.0: port does not support device sleep
ata4: EH complete
EXT4-fs (sda2): re-mounted. Opts: block_validity,barrier,user_xattr,acl
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata4.00: failed command: FLUSH CACHE EXT
ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 1
res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: configured for UDMA/133
ata4.00: retrying FLUSH 0xea Emask 0x4
ahci 0000:00:17.0: port does not support device sleep
ata4: EH complete
ata4.00: exception Emask 0x0 SAct 0xc00100 SErr 0x0 action 0x6 frozen
ata4.00: failed command: WRITE FPDMA QUEUED
ata4.00: cmd 61/c8:40:00:bb:23/00:00:0d:00:00/40 tag 8 ncq dma 102400 out
res 40/00:01:e8:ff:dd/00:00:1d:00:00/e0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: failed command: READ FPDMA QUEUED
ata4.00: cmd 60/18:b0:88:2e:eb/00:00:0c:00:00/40 tag 22 ncq dma 12288 in
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4.00: failed command: WRITE FPDMA QUEUED
ata4.00: cmd 61/08:b8:50:0c:dd/00:00:0d:00:00/40 tag 23 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: configured for UDMA/133
ahci 0000:00:17.0: port does not support device sleep
ata4: EH complete
NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,data=ordered,commit=0
ata2: SATA link down (SStatus 0 SControl 300)
ata5: SATA link down (SStatus 4 SControl 300)
ata4.00: NCQ disabled due to excessive errors
ata4.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x6 frozen
ata4.00: failed command: WRITE FPDMA QUEUED
ata4.00: cmd 61/10:a0:58:0c:dd/01:00:0d:00:00/40 tag 20 ncq dma 139264 out
res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: configured for UDMA/133
ahci 0000:00:17.0: port does not support device sleep
ata4: EH complete
EXT4-fs (sda2): re-mounted. Opts: block_validity,barrier,user_xattr,acl
systemd[1]: apt-daily.timer: Adding 8h 23min 44.416390s random time.
ACPI Error: [\_SB_.PCI0.LPCB.H_EC.CHRG] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
ACPI Error: Method parse/execution failed [\PNOT] (Node ffff9282c10d31b8), AE_NOT_FOUND (20160831/psparse-543)
ACPI Error: Method parse/execution failed [\_SB.AC._PSR] (Node ffff9282c10e62d0), AE_NOT_FOUND (20160831/psparse-543)
ACPI Exception: AE_NOT_FOUND, Error reading AC Adapter state (20160831/ac-128)
ahci 0000:00:17.0: port does not support device sleep
EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,data=ordered,commit=600
EXT4-fs (sda2): re-mounted. Opts: block_validity,barrier,user_xattr,acl
ACPI Error: Cannot release Mutex [PATM], not acquired (20160831/exmutex-393)
ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.ECDV._Q66] (Node ffff9282c10e4d20), AE_AML_MUTEX_NOT_ACQUIRED (20160831/psparse-543)
ACPI Error: [\_SB_.PCI0.LPCB.H_EC.CHRG] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
ACPI Error: Method parse/execution failed [\PNOT] (Node ffff9282c10d31b8), AE_NOT_FOUND (20160831/psparse-543)
ACPI Error: Method parse/execution failed [\_SB.AC._PSR] (Node ffff9282c10e62d0), AE_NOT_FOUND (20160831/psparse-543)
ACPI Exception: AE_NOT_FOUND, Error reading AC Adapter state (20160831/ac-128)
NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro,data=ordered,commit=0
ata2: SATA link down (SStatus 0 SControl 300)
ata5: SATA link down (SStatus 4 SControl 300)
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata4.00: failed command: WRITE DMA
ata4.00: cmd ca/00:30:a0:39:de/00:00:00:00:00/ed tag 4 dma 24576 out
res 50/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: configured for UDMA/133
ahci 0000:00:17.0: port does not support device sleep
ata4: EH complete
EXT4-fs (sda2): re-mounted. Opts: block_validity,barrier,user_xattr,acl
------------

====== smartctl -a ======
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.7.0-1-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, http://www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BF256A5 SATA 256GB
Serial Number: CVTR612502VS256HGN
LU WWN Device Id: 5 5cd2e4 14cb45b8c
Firmware Version: LB1i
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jul 31 15:14:27 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 6251) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 58) minutes.
Conveyance self-test routine
recommended polling time: ( 4) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3238
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1480
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 19
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 0
176 Erase_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 0
177 Wear_Leveling_Count 0x0033 097 100 005 Pre-fail Always - 4253248
178 Used_Rsvd_Blk_Cnt_Chip 0x0033 100 100 010 Pre-fail Always - 0
179 Used_Rsvd_Blk_Cnt_Tot 0x0033 100 100 010 Pre-fail Always - 0
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 010 Pre-fail Always - 4544
181 Program_Fail_Cnt_Total 0x0033 100 100 010 Pre-fail Always - 0
182 Erase_Fail_Count_Total 0x0033 100 100 010 Pre-fail Always - 0
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 321
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 035 100 000 Old_age Always - 35 (Min/Max 9/43)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
195 Hardware_ECC_Recovered 0x0032 120 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
225 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 59964
226 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 65535
227 Unknown_SSD_Attribute 0x0032 100 100 000 Old_age Always - 24
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 65535
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 59964
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 19866
249 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 14862

SMART Error Log not supported

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Interrupted (host reset) 00% 3238 -
# 2 Short offline Completed without error 00% 3221 -
# 3 Short offline Completed without error 00% 3221 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
------

Note that for the most recent self-test, I unplugged the adapter during the test.
hussainmkj
 
Posts: 2
Joined: 2013-04-20 23:49

Return to Hardware

Who is online

Users browsing this forum: arochester, Onsemeliot and 3 guests

fashionable