Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
HDD errors
-
- Posts: 8
- Joined: 2018-11-19 19:25
HDD errors
Hi
Not sure if this is the right place to request help with this, if not please direct me to a more appropriate group.
Had a small incident with my raspberry pi running raspbian the other day where a power interruption led to the external HDD (ext4 fs)not mounting on boot, therefore putting the it into single user emergency mode.
After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.
However it’s displaying some funny behaviour. The files are still on the drive and they are all still accessible, du reports the mount point being 186GB but df reports the mount point being only 77MB in size?? - I’ve seen df reporting a higher usage than du before due to deleted files being used by running processes but never seen it reporting a practically empty drive that has plenty of data on.
Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.
Currently running bad blocks on the drive but looks like this could take a few days on the size of the drive, so far no errors found at 54% in.
Like I said I still do have access to all the files and looks like the data is intact but I worry about just carrying on and using it if there’s something wrong underneath that I’m missing
Any ideas welcome
Thanks
Not sure if this is the right place to request help with this, if not please direct me to a more appropriate group.
Had a small incident with my raspberry pi running raspbian the other day where a power interruption led to the external HDD (ext4 fs)not mounting on boot, therefore putting the it into single user emergency mode.
After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.
However it’s displaying some funny behaviour. The files are still on the drive and they are all still accessible, du reports the mount point being 186GB but df reports the mount point being only 77MB in size?? - I’ve seen df reporting a higher usage than du before due to deleted files being used by running processes but never seen it reporting a practically empty drive that has plenty of data on.
Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.
Currently running bad blocks on the drive but looks like this could take a few days on the size of the drive, so far no errors found at 54% in.
Like I said I still do have access to all the files and looks like the data is intact but I worry about just carrying on and using it if there’s something wrong underneath that I’m missing
Any ideas welcome
Thanks
Re: HDD errors
it is often easier for us reading your posts if you actually post the output of the commands you have run. so we can see the actual results rather than trying to build them in our minds.
you may also find that someone may spot something else you have missed.
also post the dmesg output for when that drive is being mounted
you may also find that someone may spot something else you have missed.
also post the dmesg output for when that drive is being mounted
Desktop: A320M-A PRO MAX, AMD Ryzen 5 3600, GALAX GeForce RTX™ 2060 Super EX (1-Click OC) - Sid, Win10, Arch Linux, Gentoo, Solus
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
Re: HDD errors
First, you should check the SMART data. However, SMART may not report errors it is not aware of. To make it aware you need to force write. This is where badblocks -n comes in. After running it run smartctl -t long /dev/<device>. After it finishes run smartctl -a, see if there are any errors. OTOH, if the test errors out and does not finish then it is time to replace the drive.
Next layer is filesystem. There is not much point repairing it if there are bad blocks and I/O errors indeed.
Next layer is filesystem. There is not much point repairing it if there are bad blocks and I/O errors indeed.
Re: HDD errors
i take it the drive does not contain the operating system?
but couldn't that also have been damaged?
maybe you should check the root partition as well.
apart from that the answers to your problem (data loss through power outage) are just a few web searches away.
but we are here to help, regardless, so please do provide what was requested & answer our questions.
but couldn't that also have been damaged?
maybe you should check the root partition as well.
apart from that the answers to your problem (data loss through power outage) are just a few web searches away.
but we are here to help, regardless, so please do provide what was requested & answer our questions.
Re: HDD errors
i used to run raspbian (on the original pi) some years ago and this was very common when a power failure happened
my external was an lvm setup (ext4) which required a bit more work to fix. but usually once i had recovered the lvm, an e2fsck would work.
my external was an lvm setup (ext4) which required a bit more work to fix. but usually once i had recovered the lvm, an e2fsck would work.
Desktop: A320M-A PRO MAX, AMD Ryzen 5 3600, GALAX GeForce RTX™ 2060 Super EX (1-Click OC) - Sid, Win10, Arch Linux, Gentoo, Solus
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
-
- Posts: 8
- Joined: 2018-11-19 19:25
Re: HDD errors
Thanks for the responses all - hopefully ill answer all your questions to assist
Output from fsck
Output from dmesg
I have done quite a number of web searches on this issue but did not find anything at all that helped me, hence the post to the forum. The confusing part is there doesn't seem to be any data loss at all that I can see, I've just pulled a random 5Gb from the drive and there wasn't any problems with that sample.
The results of Badblocks is also in
If I've missed anything helpful its please let me know, I think Ive covered all the questions
Here are the original df and du commands that led me to see something was wrongmilomak wrote:it is often easier for us reading your posts if you actually post the output of the commands you have run. so we can see the actual results rather than trying to build them in our minds.
you may also find that someone may spot something else you have missed.
also post the dmesg output for when that drive is being mounted
Code: Select all
pi@raspberrypi:~ $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 1.3G 26G 5% /
devtmpfs 460M 0 460M 0% /dev
tmpfs 464M 0 464M 0% /dev/shm
tmpfs 464M 12M 452M 3% /run
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 464M 0 464M 0% /sys/fs/cgroup
/dev/sda1 916G 77M 916G 1% /mnt/ext1
/dev/mmcblk0p1 43M 22M 21M 52% /boot
tmpfs 93M 0 93M 0% /run/user/1000
pi@raspberrypi:~ $ du -sh /mnt/ext1
186G /mnt/ext1
pi@raspberrypi:~ $
Code: Select all
pi@raspberrypi:~ $ sudo fsck -Vt ext4 /dev/sda1
fsck from util-linux 2.29.2
[/sbin/fsck.ext4 (1) -- /mnt/ext1] fsck.ext4 /dev/sda1
e2fsck 1.43.4 (31-Jan-2017)
/dev/sda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
fsck: Warning... fsck.ext4 for device /dev/sda1 exited with signal 9.
Code: Select all
pi@raspberrypi:~ $ sudo dmesg | grep sda1
[ 4.741198] sda: sda1
[ 5.378211] EXT4-fs (sda1): warning: mounting unchecked fs, running e2fsck is recommended
[ 5.439727] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 174.839239] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 1: bad block bitmap checksum
[ 175.588450] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 2: bad block bitmap checksum
[ 175.604480] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 3: bad block bitmap checksum
[ 175.638518] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 4: bad block bitmap checksum
[ 175.654950] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 5: bad block bitmap checksum
[ 175.672891] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 6: bad block bitmap checksum
[ 175.691135] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 7: bad block bitmap checksum
[ 175.708230] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 8: bad block bitmap checksum
[ 175.731107] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 9: bad block bitmap checksum
[ 175.748550] EXT4-fs error (device sda1): ext4_validate_block_bitmap:387: comm kworker/u8:2: bg 10: bad block bitmap checksum
[ 310.234981] EXT4-fs (sda1): error count since last fsck: 14137
[ 310.235027] EXT4-fs (sda1): initial error at time 1542582174: ext4_validate_inode_bitmap:101
[ 310.235053] EXT4-fs (sda1): last error at time 1542747678: ext4_validate_block_bitmap:387
The OS is on a separate flash drive and Yes that is a possibility that the OS on the flash drive is also damaged, and I will look into this, but at the moment there doesn't "seem" to be any issues with the OS, but I will check soon regardless.debiman wrote:i take it the drive does not contain the operating system?
but couldn't that also have been damaged?
maybe you should check the root partition as well.
apart from that the answers to your problem (data loss through power outage) are just a few web searches away.
but we are here to help, regardless, so please do provide what was requested & answer our questions.
I have done quite a number of web searches on this issue but did not find anything at all that helped me, hence the post to the forum. The confusing part is there doesn't seem to be any data loss at all that I can see, I've just pulled a random 5Gb from the drive and there wasn't any problems with that sample.
I have just ran the smart tools and the results didn't give me exactly what I was expecting, doesn't look like it can even run the testsSegfault wrote:First, you should check the SMART data. However, SMART may not report errors it is not aware of. To make it aware you need to force write. This is where badblocks -n comes in. After running it run smartctl -t long /dev/<device>. After it finishes run smartctl -a, see if there are any errors. OTOH, if the test errors out and does not finish then it is time to replace the drive.
Next layer is filesystem. There is not much point repairing it if there are bad blocks and I/O errors indeed.
Code: Select all
pi@raspberrypi:~ $ sudo smartctl -t long /dev/sda1
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
Extended Background Self Test has begun
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
Use smartctl -X to abort test
pi@raspberrypi:~ $ sudo smartctl -a /dev/sda1
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: Seagate
Product: Desktop
Revision: 0130
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Logical block size: 512 bytes
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
scsiModePageOffset: response length too short, resp_len=4 offset=4 bd_len=0
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
Code: Select all
sudo@raspberrypi:~ $ badblocks -svn /dev/sda1
Checking for bad blocks in non-destructive read-write mode
From block 0 to 976761559
Checking for bad blocks (non-destructive read-write test)
Pass completed , 0 bad blocks found. (0/0/0 errors)
Yes the previous iteration of this pi I used to run as an lvm setup and IMO it was more hassle that it was worth, but now after this I'm thinking it may just be the pi itself.milomak wrote:i used to run raspbian (on the original pi) some years ago and this was very common when a power failure happened
my external was an lvm setup (ext4) which required a bit more work to fix. but usually once i had recovered the lvm, an e2fsck would work.
If I've missed anything helpful its please let me know, I think Ive covered all the questions
Re: HDD errors
RE: smartctl failure. It probably has to do with your USB-SATA adapter, it may not support smartctl commands. I'd say hook it up to a real SATA port for diagnostics.
Edit: Just noticed you are trying to run smartctl on partition sda1, it must be run on raw device sda.
Edit: Just noticed you are trying to run smartctl on partition sda1, it must be run on raw device sda.
-
- Posts: 8
- Joined: 2018-11-19 19:25
Re: HDD errors
Thanks, spot on! sometimes you stare at something for so long you just miss the obvious, its now running!Segfault wrote:RE: smartctl failure. It probably has to do with your USB-SATA adapter, it may not support smartctl commands. I'd say hook it up to a real SATA port for diagnostics.
Edit: Just noticed you are trying to run smartctl on partition sda1, it must be run on raw device sda.
EDIT: And the results of the smart test
Code: Select all
pi@raspberrypi:~ $ sudo smartctl -a /dev/sda
smartctl 6.6 2016-05-31 r4324 [armv7l-linux-4.14.70-v7+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST31000528AS
Serial Number: 6VP23DPQ
LU WWN Device Id: 5 000c50 01b7f7a77
Firmware Version: CC38
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Wed Nov 21 07:35:58 2018 UTC
==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213891en
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 25) The self-test routine was aborted by
the host.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 175) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 81162197
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5978
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 4310433297
9 Power_On_Hours 0x0032 073 071 000 Old_age Always - 24166
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 122
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 210456608817
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 078 037 045 Old_age Always In_the_past 22 (Min/Max 22/44 #3586)
194 Temperature_Celsius 0x0022 022 063 000 Old_age Always - 22 (0 9 0 0 0)
195 Hardware_ECC_Recovered 0x001a 041 023 000 Old_age Always - 81162197
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 9732 (113 126 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 617330298
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1866303093
SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 51 00 00 00 00 00 Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
00 00 00 00 00 00 00 ff 10d+03:35:16.783 NOP [Abort queued commands]
b0 d4 00 82 4f c2 00 00 10d+03:32:41.652 SMART EXECUTE OFF-LINE IMMEDIATE
b0 d0 01 00 4f c2 00 00 10d+03:32:41.619 SMART READ DATA
ec 00 01 00 00 00 00 00 10d+03:32:41.611 IDENTIFY DEVICE
b0 d5 01 09 4f c2 00 00 10d+03:26:12.455 SMART READ LOG
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 24159 -
# 2 Extended captive Interrupted (host reset) 90% 22222 -
# 3 Short captive Completed without error 00% 22222 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Re: HDD errors
I thought this line was disconcerting...
But that actually seems to be just a failure to read smart data. It might be because of running the command you did above.
Looks like the drive is fine, might be an error reporting space by df. Can you access all of the data just fine?
If you are worried about this I would copy all of the data off, reformat it fresh, and copy it back. Also always have backups.
Code: Select all
Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours)
When the command that caused the error occurred, the device was in an unknown state.
Looks like the drive is fine, might be an error reporting space by df. Can you access all of the data just fine?
If you are worried about this I would copy all of the data off, reformat it fresh, and copy it back. Also always have backups.
Always on Debian Testing
Re: HDD errors
Code: Select all
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 24159 -
Re: HDD errors
brummond - long time - good to seeLtlbkofjim wrote:Hi
After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.
Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.
Any ideas welcome
HDD errors
If I had this issue I'd look at ext4 tools since fsck borked.
ext4
e2fsck - all the current options for ext2 3 and 4 - note read the - see also: section at the bottom of the man page.
e2image - for filesystem, superblock and inode backup
debugfs - might be able to show you more of what's going on and/or why fsck borked.
if all the disk needs is to clean the superblocks and inodes from the restore?
Edit: I see milomak suggested e2fsck back in this post.
http://forums.debian.net/viewtopic.php? ... 06#p685285
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
Re: HDD errors
i'd say that the filesystem on /dev/sda1 is partially corrupt.
this just seems the most plausible explanation at this moment.
smartmontools look for other, more physical problems, they will not see a broken filesystem.
you need to fix the filesystem:
https://www.startpage.com/do/dsearch?qu ... filesystem
please have a good look at these results.
this problem is not distro-specific, most solutions should apply.
another option is to pull of the data (1% of 1TB - should be possible) - since you say it is accessible, then just reformat the whole thing.
this just seems the most plausible explanation at this moment.
smartmontools look for other, more physical problems, they will not see a broken filesystem.
you need to fix the filesystem:
https://www.startpage.com/do/dsearch?qu ... filesystem
please have a good look at these results.
this problem is not distro-specific, most solutions should apply.
another option is to pull of the data (1% of 1TB - should be possible) - since you say it is accessible, then just reformat the whole thing.
Re: HDD errors
There is a reason why filesystem gets corrupted. Often this reason is failing hard drive. As we see the drive is not healthy, it is not passing the test. IMHO repairing the filesystem or reformatting is fixing consequences, not reasons.
-
- Posts: 8
- Joined: 2018-11-19 19:25
Re: HDD errors
Thanks all for getting back to me
I don't know how df works behind the scenes, is that a possibility that it could be a problem with just what it's reporting?
I may end up copying all the data and reformatting anyway, especially if its looking like it going to be a difficult fix. But its always useful to know how to fix these things rather than just restore from a backup, which I do actually have of most the important data, its just a bit of a pain to restore it from where its all stored.
If I've missed something and you think the drive is failing for another reason I am missing then please let me know as if its necessary I would rather replace it than have it completely fail.
Yes from the few gb of data I have pulled off and tested, it seems like its fine (obviously I can't say 100% for the rest of the data, but it seems that way)vbrummond wrote:I thought this line was disconcerting...But that actually seems to be just a failure to read smart data. It might be because of running the command you did above.Code: Select all
Error 1 occurred at disk power-on lifetime: 22222 hours (925 days + 22 hours) When the command that caused the error occurred, the device was in an unknown state.
Looks like the drive is fine, might be an error reporting space by df. Can you access all of the data just fine?
If you are worried about this I would copy all of the data off, reformat it fresh, and copy it back. Also always have backups.
I don't know how df works behind the scenes, is that a possibility that it could be a problem with just what it's reporting?
I may end up copying all the data and reformatting anyway, especially if its looking like it going to be a difficult fix. But its always useful to know how to fix these things rather than just restore from a backup, which I do actually have of most the important data, its just a bit of a pain to restore it from where its all stored.
Sorry not really following what your suggesting I should do, I have already ran e2fsck above but the process looks like it gets killed at some point for some reason. And after reading the man pages of the other two I am clueless where to even start to be honest - although more than welcome to give it a go if someone can point me in the right directionllivv wrote:brummond - long time - good to seeLtlbkofjim wrote:Hi
After repairing the superblock and replacing it with one of the backups the drive now mounts and the it boots correctly.
Running fsck on the drive initially reports a file system with errors but gets to “pass 5: checking group summary information” and just reports fsck exited with signal 9, or just killed, but I am not killing the process nor is any other user.
Any ideas welcome
HDD errors
If I had this issue I'd look at ext4 tools since fsck borked.
ext4
e2fsck - all the current options for ext2 3 and 4 - note read the - see also: section at the bottom of the man page.
e2image - for filesystem, superblock and inode backup
debugfs - might be able to show you more of what's going on and/or why fsck borked.
if all the disk needs is to clean the superblocks and inodes from the restore?
Edit: I see milomak suggested e2fsck back in this post.
http://forums.debian.net/viewtopic.php? ... 06#p685285
Thanks I will have a look through these later on tonight when I get a chance to have a thorough look, but as you say I may just end up pulling the data and reformatting if I don't get much success.debiman wrote:i'd say that the filesystem on /dev/sda1 is partially corrupt.
this just seems the most plausible explanation at this moment.
smartmontools look for other, more physical problems, they will not see a broken filesystem.
you need to fix the filesystem:
https://www.startpage.com/do/dsearch?qu ... filesystem
please have a good look at these results.
this problem is not distro-specific, most solutions should apply.
another option is to pull of the data (1% of 1TB - should be possible) - since you say it is accessible, then just reformat the whole thing.
I appreciate what you're saying but from what I think you were saying before you only think its failing because of the previous smart test result, which didn't in fact say it failed, it just said it was aborted. I did a little digging and it appeared it was aborted when the drive merely went to sleep after so many minutes, a small script to keep the drive alive every 60 seconds for the duration of the test revealed the following result result.Segfault wrote:There is a reason why filesystem gets corrupted. Often this reason is failing hard drive. As we see the drive is not healthy, it is not passing the test. IMHO repairing the filesystem or reformatting is fixing consequences, not reasons.
Code: Select all
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 24192 -
Re: HDD errors
Test passed is good, you may want to look at the results, too. Below are the ones to look at.
Code: Select all
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 16
Re: HDD errors
what is the fsck or e2fsck command you run?
Desktop: A320M-A PRO MAX, AMD Ryzen 5 3600, GALAX GeForce RTX™ 2060 Super EX (1-Click OC) - Sid, Win10, Arch Linux, Gentoo, Solus
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
Laptop: hp 250 G8 i3 11th Gen - Sid
Kodi: AMD Athlon 5150 APU w/Radeon HD 8400 - Sid
Re: HDD errors
besides milomak inquiry above regarding the full commands used for fsck and e2fsck
(if they were just plain ole # fsck and # e2fsck that's ok because that is usually all a user needs most of the time )
my questions are:
is the 186GB disk one ext4 partition?
what was the badblocks command you used?
example #badblocks -svn /dev/sdb
and did it report any badblocks ?
finally
is the disk mounted when running the commands?
(if they were just plain ole # fsck and # e2fsck that's ok because that is usually all a user needs most of the time )
my questions are:
is the 186GB disk one ext4 partition?
what was the badblocks command you used?
example #badblocks -svn /dev/sdb
and did it report any badblocks ?
finally
is the disk mounted when running the commands?
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
-
- Posts: 8
- Joined: 2018-11-19 19:25
Re: HDD errors
When i first ran fsck on the /dev/sda1 it was giving errors about superblock and therefore would not run (I haven't got the exact wording as I wasn't recording them at the time)
And therefore I ran
either e2fsck -b 32768 /dev/sda or fsck -b 32768 /dev/sda , I can't remember which, but it ended with
But after this it then allowed me to run
fsck -Vt ext4 /dev/sda1
Which went a lot further than it did before but still ended with
e2fsck /dev/sda1 also ended the same way as fsck
In answer to your other questions
the 186GB is on a single disk which has just one partition
I used the command badblocks -svn /dev/sda1 which resulted in 0 bad blocks found (0/0/0 errors)
The drive was unmounted for all the above commands
I hope I haven't missed anything
And therefore I ran
either e2fsck -b 32768 /dev/sda or fsck -b 32768 /dev/sda , I can't remember which, but it ended with
Code: Select all
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Killed
fsck -Vt ext4 /dev/sda1
Which went a lot further than it did before but still ended with
Code: Select all
Pass 5: Checking group summary information
fsck: Warning... fsck.ext4 for device /dev/sda1 exited with signal 9.
In answer to your other questions
the 186GB is on a single disk which has just one partition
I used the command badblocks -svn /dev/sda1 which resulted in 0 bad blocks found (0/0/0 errors)
The drive was unmounted for all the above commands
I hope I haven't missed anything
Re: HDD errors
I'd read the man page again for both fsck and e2fsck looking specifically for exit code 9
I don't find it in the debian man pages for those commands.
I also don't see an option -b for fsck
there is an option -b for e2fsck
but there is also a warning in the fsck man about issuing options from specific filesystem checkers to generic fsck
saying options from specific filesystem checkers don't take arguments when runing fsck because fsck has not way to guess what the arguments are and results may not be what's expected.
after checking the raspbian man page to see if the options for e2fsck -pv are p=preen v=verbose
tryand post the results
I don't find it in the debian man pages for those commands.
I also don't see an option -b for fsck
there is an option -b for e2fsck
but there is also a warning in the fsck man about issuing options from specific filesystem checkers to generic fsck
saying options from specific filesystem checkers don't take arguments when runing fsck because fsck has not way to guess what the arguments are and results may not be what's expected.
after checking the raspbian man page to see if the options for e2fsck -pv are p=preen v=verbose
try
Code: Select all
e2fsck -pv /dev/sda1
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
-
- Posts: 8
- Joined: 2018-11-19 19:25
Re: HDD errors
Yeah Ive read the man pages for both, and like the debian ones, the raspbian man pages don't mention 9, seems to be every other option but 9. I seem to remember someone on a different forum commenting 9 was supposedly a process externally killed, but can't confirm this anywhere reputable.
I checked the man page for e2fsck and pv was preen and verbose so after running the commond I was given this
It did take quite a while to do this, say 5-10mins
I also ran -b with e2fsck rather than against generic fsck for completeness as I see what you mean about the arguments not being passed along correctly, and this is what I got
I have managed to pull all the data off the drive so would be quite trivial to just start again from fresh, but at this stage I think I'm just quite interested to find out what happened and how to fix it in case I have issues in the future where I can't simply just pull the data
I checked the man page for e2fsck and pv was preen and verbose so after running the commond I was given this
Code: Select all
e2fsck -pv /dev/sda1
/dev/sda1 contains a file system with errors, check forced.
Killed
I also ran -b with e2fsck rather than against generic fsck for completeness as I see what you mean about the arguments not being passed along correctly, and this is what I got
Code: Select all
e2fsck -b 32768 /dev/sda1
e2fsck 1.43.4 (31-Jan-2017)
/dev/sda1 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Killed