HDD or File System issue?

New to Debian (Or Linux in general)? Ask your questions here!

HDD or File System issue?

Postby ghultstrand » 2018-05-07 14:38

I'm still relativity new to Linux world especially when it comes to diagnostics.

Last Friday I had a boot problem and my OS wouldn't load. It was giving Unexpected Inconsistency Run fsck Manually. I did and P1 was repaired and the OS loaded, granted some apps were corrupt. When I did smartctl on the drive it was reporting fine. Now n1p2 is having an issue.


Code: Select all
/dev/nvme0n1p2: Unexpected Inconsistency; Run fsck Manually.

[Buffer I/O Error on dev nvme0n


The line that worries me the most is:
Code: Select all
nvme nvme0: Device not ready; aborting initialisation


Should I replace the drive? It is about 1 year old.
Is it just a cascading error in the file system
Should I just nuke the machine and start over?
Or is there a good way to stem the problem?

Screenshots of the current error: https://1drv.ms/f/s!AtItXRNac1nKhpZmKneTiOe6xvLWVg

Code: Select all
root@Pixy-PC:~# sudo smartctl -a /dev/nvme0
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-6-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Force MP500
Serial Number:                      17127956000123380135
Firmware Version:                   E7FM02.1
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          120,034,123,776 [120 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Fri May  4 09:14:49 2018 CDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x001e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     110 Celsius
Critical Comp. Temp. Threshold:     130 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.90W       -        -    0  0  0  0        0       0
 1 +     2.40W       -        -    1  1  1  1      600     600
 2 +     1.90W       -        -    2  2  2  2      600     600
 3 -   0.1100W       -        -    3  3  3  3      600     600
 4 -   0.0050W       -        -    4  4  4  4   100000  160000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    2%
Data Units Read:                    3,044,091 [1.55 TB]
Data Units Written:                 6,453,922 [3.30 TB]
Host Read Commands:                 77,758,445
Host Write Commands:                174,001,563
Controller Busy Time:               0
Power Cycles:                       34
Power On Hours:                     7,214
Unsafe Shutdowns:                   10
Media and Data Integrity Errors:    2,874
Error Information Log Entries:      2,874
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               59 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0       2874     1  0x029b  0x4502      -   4294967288     1     -
  1       2873     1  0x029b  0x4502      -   4294967288     1     -
  2       2872     1  0x029b  0x4502      -   4294967288     1     -
  3       2871     1  0x029b  0x4502      -   4294967288     1     -
  4       2870     1  0x029b  0x4502      -   4294967288     1     -
  5       2869     1  0x029b  0x4502      -   4294967288     1     -
  6       2868     1  0x029b  0x4502      -   4294967288     1     -
  7       2867     1  0x029b  0x4502      -   4294967288     1     -
  8       2866     1  0x029b  0x4502      -   4294967288     1     -
  9       2865     1  0x029b  0x4502      -   4294967288     1     -
 10       2864     1  0x029b  0x4502      -   4294967288     1     -
 11       2863     1  0x029b  0x4502      -   4294967288     1     -
 12       2862     1  0x029b  0x4502      -   4294967288     1     -
 13       2861     1  0x029b  0x4502      -   4294967288     1     -
 14       2860     1  0x029b  0x4502      -   4294967288     1     -
 15       2859     1  0x029b  0x4502      -   4294967288     1     -
... (47 entries not shown)
ghultstrand
 
Posts: 2
Joined: 2018-05-07 14:08

Re: HDD or File System issue?

Postby debiman » 2018-05-07 17:46

it does sound like a hardware issue, or filesystem corruption, but 1 year is not too old.
so, you must have done something wrong...
why is the drive called nvme0n1p2? that's not standard for a desktop installation.
User avatar
debiman
 
Posts: 3064
Joined: 2013-03-12 07:18

Re: HDD or File System issue?

Postby ghultstrand » 2018-05-07 18:24

debiman wrote:why is the drive called nvme0n1p2? that's not standard for a desktop installation.


I don't know. I just assumed that is because it is an M2 SSD. /dev/nvme0 is the root drive.
ghultstrand
 
Posts: 2
Joined: 2018-05-07 14:08

Re: HDD or File System issue?

Postby sunrat » 2018-05-08 01:28

Your screen photos show a firmware failure. What hardware do you have? If it's recent Intel you may need intel-microcode. May be significant or maybe not.
“ computer users can be divided into 2 categories:
Those who have lost data
...and those who have not lost data YET ”
Remember to BACKUP!
User avatar
sunrat
 
Posts: 2495
Joined: 2006-08-29 09:12
Location: Melbourne, Australia

Re: HDD or File System issue?

Postby pendrachken » 2018-05-08 14:04

It's I/O errors so far, usually that means it's hardware issues. This doesn't mean the hardware is necessarily bad though.


I would first check the NVME card seating. Thermal cycling could have caused it to come unseated just enough to cause intermittent disconnections ( or you could have bumped the computer ETC) which would be consistent with your filesystem issues.


Once you rule that out as a problem, and the issue still persists, try an older kernel that you know works / worked with that drive. If the older kernel has no problems check the kernel bug list and file a regression report if you don't find anything reported for the issue on current kernels.
fortune -o
Your love life will be... interesting.
:twisted: How did it know?

The U.S. uses the metric system too, we have tenths, hundredths and thousandths of inches :-P
pendrachken
 
Posts: 1332
Joined: 2007-03-04 21:10
Location: U.S.A. - WI.


Return to Beginners Questions

Who is online

Users browsing this forum: No registered users and 8 guests

fashionable