HDD errors

If none of the more specific forums is the right place to ask

HDD errors

Postby Ltlbkofjim » 2018-11-19 19:30

Hi

Not sure if this is the right place to request help with this, if not please direct me to a more appropriate group.

Had a small incident with my raspberry pi running raspbian the other day where a power interruption led to the external HDD (ext4 fs)not mounting on boot, therefore putting the it into single user emergency mode.
After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.
However it’s displaying some funny behaviour. The files are still on the drive and they are all still accessible, du reports the mount point being 186GB but df reports the mount point being only 77MB in size?? - I’ve seen df reporting a higher usage than du before due to deleted files being used by running processes but never seen it reporting a practically empty drive that has plenty of data on.
Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.
Currently running bad blocks on the drive but looks like this could take a few days on the size of the drive, so far no errors found at 54% in.
Like I said I still do have access to all the files and looks like the data is intact but I worry about just carrying on and using it if there’s something wrong underneath that I’m missing

Any ideas welcome

Thanks
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby milomak » 2018-11-20 17:32

it is often easier for us reading your posts if you actually post the output of the commands you have run. so we can see the actual results rather than trying to build them in our minds.

you may also find that someone may spot something else you have missed.

also post the dmesg output for when that drive is being mounted
Desktop: iMac Late-2015 27" 5K Retina (17,1 - 3.3GHz) - MacOS and Windows 10 (Bootcamp)/ Debian Sid (External SSD)
Laptop: Lenovo ideapad Y700 [nVidia Optimus] (64-bit) - Debian Sid, Win10,
Kodi Box: AMD Athlon 5150 APU w/Radeon HD 8400 - Debian Sid
milomak
 
Posts: 1855
Joined: 2009-06-09 22:20

Re: HDD errors

Postby Segfault » 2018-11-20 17:46

First, you should check the SMART data. However, SMART may not report errors it is not aware of. To make it aware you need to force write. This is where badblocks -n comes in. After running it run smartctl -t long /dev/<device>. After it finishes run smartctl -a, see if there are any errors. OTOH, if the test errors out and does not finish then it is time to replace the drive.
Next layer is filesystem. There is not much point repairing it if there are bad blocks and I/O errors indeed.
Segfault
 
Posts: 811
Joined: 2005-09-24 12:24

Re: HDD errors

Postby debiman » 2018-11-20 18:23

i take it the drive does not contain the operating system?
but couldn't that also have been damaged?
maybe you should check the root partition as well.

apart from that the answers to your problem (data loss through power outage) are just a few web searches away.
but we are here to help, regardless, so please do provide what was requested & answer our questions.
User avatar
debiman
 
Posts: 3064
Joined: 2013-03-12 07:18

Re: HDD errors

Postby milomak » 2018-11-20 18:37

i used to run raspbian (on the original pi) some years ago and this was very common when a power failure happened

my external was an lvm setup (ext4) which required a bit more work to fix. but usually once i had recovered the lvm, an e2fsck would work.
Desktop: iMac Late-2015 27" 5K Retina (17,1 - 3.3GHz) - MacOS and Windows 10 (Bootcamp)/ Debian Sid (External SSD)
Laptop: Lenovo ideapad Y700 [nVidia Optimus] (64-bit) - Debian Sid, Win10,
Kodi Box: AMD Athlon 5150 APU w/Radeon HD 8400 - Debian Sid
milomak
 
Posts: 1855
Joined: 2009-06-09 22:20

Re: HDD errors

Postby Ltlbkofjim » 2018-11-20 21:06

Thanks for the responses all - hopefully ill answer all your questions to assist

milomak wrote:it is often easier for us reading your posts if you actually post the output of the commands you have run. so we can see the actual results rather than trying to build them in our minds.

you may also find that someone may spot something else you have missed.

also post the dmesg output for when that drive is being mounted

Here are the original df and du commands that led me to see something was wrong
Code: Select all
pi@raspberrypi:~ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        29G  1.3G   26G   5% /
devtmpfs        460M     0  460M   0% /dev
tmpfs           464M     0  464M   0% /dev/shm
tmpfs           464M   12M  452M   3% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           464M     0  464M   0% /sys/fs/cgroup
/dev/sda1       916G  77M  916G  1% /mnt/ext1
/dev/mmcblk0p1   43M   22M   21M  52% /boot
tmpfs            93M     0   93M   0% /run/user/1000
pi@raspberrypi:~ $ du -sh /mnt/ext1
186G   /mnt/ext1
pi@raspberrypi:~ $


Output from fsck
Code: Select all
pi@raspberrypi:~ $ sudo fsck -Vt ext4 /dev/sda1
fsck from util-linux 2.29.2
[/sbin/fsck.ext4 (1) -- /mnt/ext1] fsck.ext4 /dev/sda1
e2fsck 1.43.4 (31-Jan-2017)
/dev/sda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
fsck: Warning... fsck.ext4 for device /dev/sda1 exited with signal 9.


Output from dmesg
Code: Select all
pi@raspberrypi:~ $ sudo dmesg | grep sda1
[    4.741198]  sda: sda1
[    5.378211] EXT4-fs (sda1): warning: mounting unchecked fs, running e2fsck is recommended
[    5.439727] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[  174.839239] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 1: bad block bitmap checksum
[  175.588450] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 2: bad block bitmap checksum
[  175.604480] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 3: bad block bitmap checksum
[  175.638518] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 4: bad block bitmap checksum
[  175.654950] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 5: bad block bitmap checksum
[  175.672891] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 6: bad block bitmap checksum
[  175.691135] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 7: bad block bitmap checksum
[  175.708230] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 8: bad block bitmap checksum
[  175.731107] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 9: bad block bitmap checksum
[  175.748550] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 10: bad block bitmap checksum
[  310.234981] EXT4-fs (sda1): error count since last fsck: 14137
[  310.235027] EXT4-fs (sda1): initial error at time 1542582174: ext4_validate_inode_bitmap:101
[  310.235053] EXT4-fs (sda1): last error at time 1542747678: ext4_validate_block_bitmap:387



debiman wrote:i take it the drive does not contain the operating system?
but couldn't that also have been damaged?
maybe you should check the root partition as well.

apart from that the answers to your problem (data loss through power outage) are just a few web searches away.
but we are here to help, regardless, so please do provide what was requested & answer our questions.


The OS is on a separate flash drive and Yes that is a possibility that the OS on the flash drive is also damaged, and I will look into this, but at the moment there doesn't "seem" to be any issues with the OS, but I will check soon regardless.
I have done quite a number of web searches on this issue but did not find anything at all that helped me, hence the post to the forum. The confusing part is there doesn't seem to be any data loss at all that I can see, I've just pulled a random 5Gb from the drive and there wasn't any problems with that sample.

Segfault wrote:First, you should check the SMART data. However, SMART may not report errors it is not aware of. To make it aware you need to force write. This is where badblocks -n comes in. After running it run smartctl -t long /dev/<device>. After it finishes run smartctl -a, see if there are any errors. OTOH, if the test errors out and does not finish then it is time to replace the drive.
Next layer is filesystem. There is not much point repairing it if there are bad blocks and I/O errors indeed.


I have just ran the smart tools and the results didn't give me exactly what I was expecting, doesn't look like it can even run the tests

Code: Select all
pi@raspberrypi:~ $ sudo smartctl -t long /dev/sda1
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Extended Background Self Test has begun
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
Use smartctl -X to abort test
pi@raspberrypi:~ $ sudo smartctl -a /dev/sda1
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               Seagate
Product:              Desktop
Revision:             0130
User Capacity:        1,000,204,886,016 bytes [1.00 TB]
Logical block size:   512 bytes
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.


The results of Badblocks is also in
Code: Select all
sudo@raspberrypi:~ $ badblocks -svn /dev/sda1
Checking for bad blocks in non-destructive read-write mode
From block 0 to 976761559
Checking for bad blocks (non-destructive read-write test)
Pass completed , 0 bad blocks found. (0/0/0 errors)


milomak wrote:i used to run raspbian (on the original pi) some years ago and this was very common when a power failure happened

my external was an lvm setup (ext4) which required a bit more work to fix. but usually once i had recovered the lvm, an e2fsck would work.


Yes the previous iteration of this pi I used to run as an lvm setup and IMO it was more hassle that it was worth, but now after this I'm thinking it may just be the pi itself.

If I've missed anything helpful its please let me know, I think Ive covered all the questions
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby Segfault » 2018-11-20 23:31

RE: smartctl failure. It probably has to do with your USB-SATA adapter, it may not support smartctl commands. I'd say hook it up to a real SATA port for diagnostics.

Edit: Just noticed you are trying to run smartctl on partition sda1, it must be run on raw device sda.
Segfault
 
Posts: 811
Joined: 2005-09-24 12:24

Re: HDD errors

Postby Ltlbkofjim » 2018-11-20 23:42

Segfault wrote:RE: smartctl failure. It probably has to do with your USB-SATA adapter, it may not support smartctl commands. I'd say hook it up to a real SATA port for diagnostics.

Edit: Just noticed you are trying to run smartctl on partition sda1, it must be run on raw device sda.


Thanks, spot on! sometimes you stare at something for so long you just miss the obvious, its now running!

EDIT: And the results of the smart test
Code: Select all
pi@raspberrypi:~ $ sudo smartctl -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST31000528AS
Serial Number:    6VP23DPQ
LU WWN Device Id: 5 000c50 01b7f7a77
Firmware Version: CC38
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Nov 21 07:35:58 2018 UTC

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)   Offline data collection activity
               was completed without error.
               Auto Offline Data Collection: Enabled.
Self-test execution status:      (  25)   The self-test routine was aborted by
               the host.
Total time to complete Offline
data collection:       (  600) seconds.
Offline data collection
capabilities:           (0x7b) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               Offline surface scan supported.
               Self-test supported.
               Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 175) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x103f)   SCT Status supported.
               SCT Error Recovery Control supported.
               SCT Feature Control supported.
               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       81162197
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   095   095   020    Old_age   Always       -       5978
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       4310433297
  9 Power_On_Hours          0x0032   073   071   000    Old_age   Always       -       24166
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       122
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       210456608817
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   078   037   045    Old_age   Always   In_the_past 22 (Min/Max 22/44 #3586)
194 Temperature_Celsius     0x0022   022   063   000    Old_age   Always       -       22 (0 9 0 0 0)
195 Hardware_ECC_Recovered  0x001a   041   023   000    Old_age   Always       -       81162197
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       9732 (113 126 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       617330298
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       1866303093

SMART Error Log Version: 1
ATA Error Count: 1
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  00 00 00 00 00 00 00 ff  10d+03:35:16.783  NOP [Abort queued commands]
  b0 d4 00 82 4f c2 00 00  10d+03:32:41.652  SMART EXECUTE OFF-LINE IMMEDIATE
  b0 d0 01 00 4f c2 00 00  10d+03:32:41.619  SMART READ DATA
  ec 00 01 00 00 00 00 00  10d+03:32:41.611  IDENTIFY DEVICE
  b0 d5 01 09 4f c2 00 00  10d+03:26:12.455  SMART READ LOG

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%     24159         -
# 2  Extended captive    Interrupted (host reset)      90%     22222         -
# 3  Short captive       Completed without error       00%     22222         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby vbrummond » 2018-11-21 15:35

I thought this line was disconcerting...
Code: Select all
Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


But that actually seems to be just a failure to read smart data. It might be because of running the command you did above.

Looks like the drive is fine, might be an error reporting space by df. Can you access all of the data just fine?

If you are worried about this I would copy all of the data off, reformat it fresh, and copy it back. Also always have backups.
System: Retina 5K iMac, 27-inch, Late 2015 - Intel i5-6600 3.3ghz, 8gb RAM, AMD Radeon R9 M395 2048 MB
OS: Mac OS 10.12
vbrummond
 
Posts: 4422
Joined: 2010-03-02 01:42

Re: HDD errors

Postby Segfault » 2018-11-21 15:56

Code: Select all
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%     24159         -

The test failed. You can't say the drive is fine until the test passes and you can see the actual results.
Segfault
 
Posts: 811
Joined: 2005-09-24 12:24

Re: HDD errors

Postby llivv » 2018-11-21 16:20

Ltlbkofjim wrote:Hi

After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.

Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.

Any ideas welcome

brummond - long time - good to see

HDD errors
If I had this issue I'd look at ext4 tools since fsck borked.
ext4
e2fsck - all the current options for ext2 3 and 4 - note read the - see also: section at the bottom of the man page.
e2image - for filesystem, superblock and inode backup
debugfs - might be able to show you more of what's going on and/or why fsck borked.

if all the disk needs is to clean the superblocks and inodes from the restore?

Edit: I see milomak suggested e2fsck back in this post. :wink:
http://forums.debian.net/viewtopic.php?f=10&t=139106#p685285
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
User avatar
llivv
 
Posts: 5704
Joined: 2007-02-14 18:10
Location: cold storage

Re: HDD errors

Postby debiman » 2018-11-22 06:08

i'd say that the filesystem on /dev/sda1 is partially corrupt.
this just seems the most plausible explanation at this moment.
smartmontools look for other, more physical problems, they will not see a broken filesystem.

you need to fix the filesystem:
https://www.startpage.com/do/dsearch?qu ... filesystem
please have a good look at these results.
this problem is not distro-specific, most solutions should apply.

another option is to pull of the data (1% of 1TB - should be possible) - since you say it is accessible, then just reformat the whole thing.
User avatar
debiman
 
Posts: 3064
Joined: 2013-03-12 07:18

Re: HDD errors

Postby Segfault » 2018-11-22 13:58

There is a reason why filesystem gets corrupted. Often this reason is failing hard drive. As we see the drive is not healthy, it is not passing the test. IMHO repairing the filesystem or reformatting is fixing consequences, not reasons.
Segfault
 
Posts: 811
Joined: 2005-09-24 12:24

Re: HDD errors

Postby Ltlbkofjim » 2018-11-22 18:39

Thanks all for getting back to me

vbrummond wrote:I thought this line was disconcerting...
Code: Select all
Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours)
  When the command that caused the error occurred, the device was in an unknown state.


But that actually seems to be just a failure to read smart data. It might be because of running the command you did above.

Looks like the drive is fine, might be an error reporting space by df. Can you access all of the data just fine?

If you are worried about this I would copy all of the data off, reformat it fresh, and copy it back. Also always have backups.


Yes from the few gb of data I have pulled off and tested, it seems like its fine (obviously I can't say 100% for the rest of the data, but it seems that way)
I don't know how df works behind the scenes, is that a possibility that it could be a problem with just what it's reporting?
I may end up copying all the data and reformatting anyway, especially if its looking like it going to be a difficult fix. But its always useful to know how to fix these things rather than just restore from a backup, which I do actually have of most the important data, its just a bit of a pain to restore it from where its all stored.

llivv wrote:
Ltlbkofjim wrote:Hi

After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.

Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.

Any ideas welcome

brummond - long time - good to see

HDD errors
If I had this issue I'd look at ext4 tools since fsck borked.
ext4
e2fsck - all the current options for ext2 3 and 4 - note read the - see also: section at the bottom of the man page.
e2image - for filesystem, superblock and inode backup
debugfs - might be able to show you more of what's going on and/or why fsck borked.

if all the disk needs is to clean the superblocks and inodes from the restore?

Edit: I see milomak suggested e2fsck back in this post. :wink:
http://forums.debian.net/viewtopic.php?f=10&t=139106#p685285


Sorry not really following what your suggesting I should do, I have already ran e2fsck above but the process looks like it gets killed at some point for some reason. And after reading the man pages of the other two I am clueless where to even start to be honest - although more than welcome to give it a go if someone can point me in the right direction

debiman wrote:i'd say that the filesystem on /dev/sda1 is partially corrupt.
this just seems the most plausible explanation at this moment.
smartmontools look for other, more physical problems, they will not see a broken filesystem.

you need to fix the filesystem:
https://www.startpage.com/do/dsearch?qu ... filesystem
please have a good look at these results.
this problem is not distro-specific, most solutions should apply.

another option is to pull of the data (1% of 1TB - should be possible) - since you say it is accessible, then just reformat the whole thing.


Thanks I will have a look through these later on tonight when I get a chance to have a thorough look, but as you say I may just end up pulling the data and reformatting if I don't get much success.

Segfault wrote:There is a reason why filesystem gets corrupted. Often this reason is failing hard drive. As we see the drive is not healthy, it is not passing the test. IMHO repairing the filesystem or reformatting is fixing consequences, not reasons.


I appreciate what you're saying but from what I think you were saying before you only think its failing because of the previous smart test result, which didn't in fact say it failed, it just said it was aborted. I did a little digging and it appeared it was aborted when the drive merely went to sleep after so many minutes, a small script to keep the drive alive every 60 seconds for the duration of the test revealed the following result result.

Code: Select all
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     24192         -


If I've missed something and you think the drive is failing for another reason I am missing then please let me know as if its necessary I would rather replace it than have it completely fail.
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby Segfault » 2018-11-22 18:45

Test passed is good, you may want to look at the results, too. Below are the ones to look at.
Code: Select all
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       16
Segfault
 
Posts: 811
Joined: 2005-09-24 12:24

Next

Return to General Questions

Who is online

Users browsing this forum: No registered users and 6 guests

fashionable