HDD errors

If none of the more specific forums is the right place to ask

Re: HDD errors

Postby milomak » 2018-11-22 22:30

what is the fsck or e2fsck command you run?
Desktop: iMac Late-2015 27" 5K Retina (17,1 - 3.3GHz) - MacOS and Windows 10 (Bootcamp)/ Debian Sid (External SSD)
Laptop: Lenovo ideapad Y700 [nVidia Optimus] (64-bit) - Debian Sid, Win10,
Kodi Box: AMD Athlon 5150 APU w/Radeon HD 8400 - Debian Sid
milomak
 
Posts: 1855
Joined: 2009-06-09 22:20

Re: HDD errors

Postby llivv » 2018-11-23 10:28

besides milomak inquiry above regarding the full commands used for fsck and e2fsck
(if they were just plain ole # fsck and # e2fsck that's ok because that is usually all a user needs most of the time )

my questions are:
is the 186GB disk one ext4 partition?

what was the badblocks command you used?
example #badblocks -svn /dev/sdb

and did it report any badblocks ?

finally
is the disk mounted when running the commands?
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
User avatar
llivv
 
Posts: 5709
Joined: 2007-02-14 18:10
Location: cold storage

Re: HDD errors

Postby Ltlbkofjim » 2018-11-23 19:10

When i first ran fsck on the /dev/sda1 it was giving errors about superblock and therefore would not run (I haven't got the exact wording as I wasn't recording them at the time)

And therefore I ran
either e2fsck -b 32768 /dev/sda or fsck -b 32768 /dev/sda , I can't remember which, but it ended with
Code: Select all
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Killed


But after this it then allowed me to run
fsck -Vt ext4 /dev/sda1

Which went a lot further than it did before but still ended with
Code: Select all
Pass 5: Checking group summary information
fsck: Warning... fsck.ext4 for device /dev/sda1 exited with signal 9.


e2fsck /dev/sda1 also ended the same way as fsck

In answer to your other questions
the 186GB is on a single disk which has just one partition
I used the command badblocks -svn /dev/sda1 which resulted in 0 bad blocks found (0/0/0 errors)
The drive was unmounted for all the above commands

I hope I haven't missed anything
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby llivv » 2018-11-24 19:26

I'd read the man page again for both fsck and e2fsck looking specifically for exit code 9
I don't find it in the debian man pages for those commands.
I also don't see an option -b for fsck
there is an option -b for e2fsck
but there is also a warning in the fsck man about issuing options from specific filesystem checkers to generic fsck
saying options from specific filesystem checkers don't take arguments when runing fsck because fsck has not way to guess what the arguments are and results may not be what's expected.

after checking the raspbian man page to see if the options for e2fsck -pv are p=preen v=verbose
try
Code: Select all
e2fsck -pv /dev/sda1
and post the results
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
User avatar
llivv
 
Posts: 5709
Joined: 2007-02-14 18:10
Location: cold storage

Re: HDD errors

Postby Ltlbkofjim » 2018-11-24 20:48

Yeah Ive read the man pages for both, and like the debian ones, the raspbian man pages don't mention 9, seems to be every other option but 9. I seem to remember someone on a different forum commenting 9 was supposedly a process externally killed, but can't confirm this anywhere reputable.

I checked the man page for e2fsck and pv was preen and verbose so after running the commond I was given this

Code: Select all
e2fsck -pv /dev/sda1
/dev/sda1 contains a file system with errors, check forced.
Killed


It did take quite a while to do this, say 5-10mins

I also ran -b with e2fsck rather than against generic fsck for completeness as I see what you mean about the arguments not being passed along correctly, and this is what I got

Code: Select all
e2fsck -b 32768 /dev/sda1
e2fsck 1.43.4 (31-Jan-2017)
/dev/sda1 was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Killed


I have managed to pull all the data off the drive so would be quite trivial to just start again from fresh, but at this stage I think I'm just quite interested to find out what happened and how to fix it in case I have issues in the future where I can't simply just pull the data
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby milomak » 2018-11-24 21:48

is there a dmesg or journalctld message?

i feel you when you want to understand a problem rather than run away from it. but sometimes practically that's what you have to do.
Desktop: iMac Late-2015 27" 5K Retina (17,1 - 3.3GHz) - MacOS and Windows 10 (Bootcamp)/ Debian Sid (External SSD)
Laptop: Lenovo ideapad Y700 [nVidia Optimus] (64-bit) - Debian Sid, Win10,
Kodi Box: AMD Athlon 5150 APU w/Radeon HD 8400 - Debian Sid
milomak
 
Posts: 1855
Joined: 2009-06-09 22:20

Re: HDD errors

Postby llivv » 2018-11-25 01:24

Again read the man to see what you're doing before doing it.
I've seen A LOT of e2fsck options used together that don't make much sense in data recovery posts

I'll show the results I get on a clean ext4 partition

see if force using the -f option helps
Code: Select all
me@c10:/# /sbin/e2fsck -vptf /dev/sdb10

27415 inodes used (2.81%, out of 977280)
68 non-contiguous files (0.2%)
33 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 25275/24
625046 blocks used (16.00%, out of 3905795)
0 bad blocks
1 large file

22026 regular files
3272 directories
12 character device files
25 block device files
2 fifos
21 links
2069 symbolic links (2067 fast symbolic links)
0 sockets
------------
27427 files
Memory used: 876k/0k (192k/685k), time: 1.76/ 0.20/ 0.03
I/O read: 34MB, write: 1MB, rate: 19.33MB/s
-----------------------------------------------------------------------------------------

if the command above gets killed at the same place as before when using the restored superblock location -b 32768
get the backup superblock locations
Code: Select all
me@c10:/# /sbin/mke2fs -n /dev/sdb10

mke2fs 1.44.4 (18-Aug-2018)
/dev/sdb10 contains a ext4 file system
last mounted on / on Sat Nov 24 18:51:03 2018
Proceed anyway? (y,N) y
Creating filesystem with 3905795 4k blocks and 977280 inodes
Filesystem UUID: abcdefgh-ijkl-mnop-qrstuvwxyzAB
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208
---------------------------------------------------------------------------------------------------------

and try restoring from each backup superblock one at a time

if a backup superblock restore is sucessful
run
Code: Select all
e2fsck -v /dev/sda1

right after the restore

regards
edit:
Code: Select all
me@b10:~$ df -hlT
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  978M     0  978M   0% /dev
tmpfs          tmpfs     198M  660K  197M   1% /run
/dev/sdc10     xfs        13G   11G  1.6G  88% /
tmpfs          tmpfs     5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs     1.1G   73M  999M   7% /run/shm
/dev/sdc11     jfs        13G  5.7G  6.4G  47% /home/me/11
/dev/sdb14     xfs        14G  1.9G   13G  14% /home/me/14
/dev/sdb10     ext4       15G  2.1G   12G  15% /home/me/10

Code: Select all
me@b10:~$ du -h 10
[...]
du: cannot read directory '10/var/cache/ldconfig': Permission denied
4.0K    10/var/cache/ldconfig
836K    10/var/cache/fontconfig
4.0K    10/var/cache/apt/archives/partial
468M    10/var/cache/apt/archives
516M    10/var/cache/apt
36K     10/var/cache/dictionaries-common
522M    10/var/cache
610M    10/var
19M     10/boot
du: cannot read directory '10/root': Permission denied
4.0K    10/root
2.0G    10
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
User avatar
llivv
 
Posts: 5709
Joined: 2007-02-14 18:10
Location: cold storage

Re: HDD errors

Postby debiman » 2018-11-25 10:15

man e2fsck wrote: EXIT CODE
The exit code returned by e2fsck is the sum of the following conditions:
0 - No errors
1 - File system errors corrected
2 - File system errors corrected, system should be rebooted
4 - File system errors left uncorrected
8 - Operational error
16 - Usage or syntax error
32 - E2fsck canceled by user request
128 - Shared library error

so 9 would be:

File system errors corrected
plus
Operational error

that's somewhat inconclusive.
i'd try to run a read-only fsck and see what sort of return code you get from that.

PS: you do know that you cannot repair a currently mounted filesystem? in fact you should always run fsck on an unmounted filesystem, i think.
User avatar
debiman
 
Posts: 3064
Joined: 2013-03-12 07:18

Re: HDD errors

Postby Ltlbkofjim » 2018-11-25 17:42

Thanks llivv, you're post was very informative and made perfect sense, I did give it a go but after trying about 8 different superblock locations and each failing with a "killed" message I put the rest on the back burner for a while as there are a lot!!

I did also try to force with the -f flag with no joy as it still gave me an exit 9 unfortunately.

Which brings me on to debimans message to try just in readonly mode, so i tried e2fsck with the -n option (i presume this is what you meant?) and instead of exiting with an exit code of 9, it exited just saying "killed" at the end just like it does when I specify superblock location.
(Just for clarity, yes all the fsck/e2fsck have been tried on an unmounted fs)

However it does look like the was some kernel messages that seems to imply that infact fsck is running out of memory and therefore is being killed
Code: Select all
pi@raspberrypi:~ $ sudo cat /var/log/messages | grep 18:08:14
Nov 25 18:08:14 raspberrypi kernel: [76436.728192] kthreadd invoked oom-killer: gfp_mask=0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null),  order=1, oom_score_adj=0
Nov 25 18:08:14 raspberrypi kernel: [76436.728214] kthreadd cpuset=/ mems_allowed=0
Nov 25 18:08:14 raspberrypi kernel: [76436.728235] CPU: 2 PID: 2 Comm: kthreadd Tainted: G         C      4.14.70-v7+ #1144
Nov 25 18:08:14 raspberrypi kernel: [76436.728239] Hardware name: BCM2835
Nov 25 18:08:14 raspberrypi kernel: [76436.728268] [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
Nov 25 18:08:14 raspberrypi kernel: [76436.728281] [<8010c240>] (show_stack) from [<80787b24>] (dump_stack+0xd4/0x118)
Nov 25 18:08:14 raspberrypi kernel: [76436.728296] [<80787b24>] (dump_stack) from [<80224564>] (dump_header+0xac/0x208)
Nov 25 18:08:14 raspberrypi kernel: [76436.728309] [<80224564>] (dump_header) from [<802238cc>] (oom_kill_process+0x478/0x584)
Nov 25 18:08:14 raspberrypi kernel: [76436.728321] [<802238cc>] (oom_kill_process) from [<8022422c>] (out_of_memory+0x124/0x334)
Nov 25 18:08:14 raspberrypi kernel: [76436.728334] [<8022422c>] (out_of_memory) from [<80229ce8>] (__alloc_pages_nodemask+0x1060/0x11d8)
Nov 25 18:08:14 raspberrypi kernel: [76436.728350] [<80229ce8>] (__alloc_pages_nodemask) from [<8011b114>] (copy_process.part.5+0xec/0x1858)
Nov 25 18:08:14 raspberrypi kernel: [76436.728363] [<8011b114>] (copy_process.part.5) from [<8011ca10>] (_do_fork+0xc8/0x408)
Nov 25 18:08:14 raspberrypi kernel: [76436.728376] [<8011ca10>] (_do_fork) from [<8011cdc0>] (kernel_thread+0x40/0x48)
Nov 25 18:08:14 raspberrypi kernel: [76436.728389] [<8011cdc0>] (kernel_thread) from [<8013eb94>] (kthreadd+0x1e0/0x268)
Nov 25 18:08:14 raspberrypi kernel: [76436.728403] [<8013eb94>] (kthreadd) from [<8010810c>] (ret_from_fork+0x14/0x28)
Nov 25 18:08:14 raspberrypi kernel: [76436.728408] Mem-Info:
Nov 25 18:08:14 raspberrypi kernel: [76436.728427] active_anon:112944 inactive_anon:112956 isolated_anon:0
Nov 25 18:08:14 raspberrypi kernel: [76436.728427]  active_file:83 inactive_file:192 isolated_file:0
Nov 25 18:08:14 raspberrypi kernel: [76436.728427]  unevictable:0 dirty:0 writeback:0 unstable:0
Nov 25 18:08:14 raspberrypi kernel: [76436.728427]  slab_reclaimable:1550 slab_unreclaimable:2225
Nov 25 18:08:14 raspberrypi kernel: [76436.728427]  mapped:578 shmem:493 pagetables:676 bounce:0
Nov 25 18:08:14 raspberrypi kernel: [76436.728427]  free:4412 free_pcp:0 free_cma:61
Nov 25 18:08:14 raspberrypi kernel: [76436.728439] Node 0 active_anon:451776kB inactive_anon:451824kB active_file:332kB inactive_file:768kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2312kB dirty:0kB writeback:0kB shmem:1972kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Nov 25 18:08:14 raspberrypi kernel: [76436.728456] Normal free:17648kB min:16384kB low:20480kB high:24576kB active_anon:451776kB inactive_anon:451824kB active_file:388kB inactive_file:916kB unevictable:0kB writepending:0kB present:970752kB managed:949448kB mlocked:0kB kernel_stack:848kB pagetables:2704kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:244kB
Nov 25 18:08:14 raspberrypi kernel: [76436.728459] lowmem_reserve[]: 0 0
Nov 25 18:08:14 raspberrypi kernel: [76436.728471] Normal: 495*4kB (UMEC) 263*8kB (UMEHC) 148*16kB (UMEHC) 36*32kB (UMEHC) 43*64kB (UMEH) 27*128kB (UMEH) 6*256kB (UEH) 1*512kB (H) 0*1024kB 1*2048kB (H) 0*4096kB = 17908kB
Nov 25 18:08:14 raspberrypi kernel: [76436.728535] 5120 total pagecache pages
Nov 25 18:08:14 raspberrypi kernel: [76436.728540] 4320 pages in swap cache
Nov 25 18:08:14 raspberrypi kernel: [76436.728545] Swap cache stats: add 395010, delete 390707, find 36559/86180
Nov 25 18:08:14 raspberrypi kernel: [76436.728548] Free swap  = 0kB
Nov 25 18:08:14 raspberrypi kernel: [76436.728551] Total swap = 102396kB
Nov 25 18:08:14 raspberrypi kernel: [76436.728555] 242688 pages RAM
Nov 25 18:08:14 raspberrypi kernel: [76436.728558] 0 pages HighMem/MovableOnly
Nov 25 18:08:14 raspberrypi kernel: [76436.728561] 5326 pages reserved
Nov 25 18:08:14 raspberrypi kernel: [76436.728565] 2048 pages cma reserved
Nov 25 18:08:14 raspberrypi kernel: [76436.728569] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Nov 25 18:08:14 raspberrypi kernel: [76436.728589] [  237]   100   237     4320       14       9       0       95             0 systemd-timesyn
Nov 25 18:08:14 raspberrypi kernel: [76436.728597] [  280]     0   280     1324       16       6       0       40             0 cron
Nov 25 18:08:14 raspberrypi kernel: [76436.728606] [  281]     0   281     5937      102       9       0      127             0 rsyslogd
Nov 25 18:08:14 raspberrypi kernel: [76436.728614] [  282]     0   282     1402       12       8       0      143             0 smartd
Nov 25 18:08:14 raspberrypi kernel: [76436.728622] [  289] 65534   289     1324        4       7       0       56             0 thd
Nov 25 18:08:14 raspberrypi kernel: [76436.728631] [  290]   105   290     1628       32       7       0       70          -900 dbus-daemon
Nov 25 18:08:14 raspberrypi kernel: [76436.728640] [  313]     0   313     1853       28       8       0       80             0 systemd-logind
Nov 25 18:08:14 raspberrypi kernel: [76436.728649] [  314]   108   314     1634       62       8       0       59             0 avahi-daemon
Nov 25 18:08:14 raspberrypi kernel: [76436.728658] [  319]   108   319     1601        0       7       0       77             0 avahi-daemon
Nov 25 18:08:14 raspberrypi kernel: [76436.728667] [  390]     0   390      721       21       6       0       62             0 dhcpcd
Nov 25 18:08:14 raspberrypi kernel: [76436.728675] [  397]     0   397      458       16       5       0       10             0 minissdpd
Nov 25 18:08:14 raspberrypi kernel: [76436.728684] [  409]     0   409     1049        0       6       0       35             0 agetty
Nov 25 18:08:14 raspberrypi kernel: [76436.728692] [  411]     0   411      993        0       7       0       34             0 agetty
Nov 25 18:08:14 raspberrypi kernel: [76436.728701] [  496]     0   496     2552        0       8       0      152         -1000 sshd
Nov 25 18:08:14 raspberrypi kernel: [76436.728709] [  713]   110   713     2698        0       9       0      151             0 exim4
Nov 25 18:08:14 raspberrypi kernel: [76436.728717] [  718]     0   718    11561     4358      25       0      819             0 transmission-rs
Nov 25 18:08:14 raspberrypi kernel: [76436.728727] [16565]     0 16565     2882        2       9       0      182             0 sshd
Nov 25 18:08:14 raspberrypi kernel: [76436.728736] [16570]  1000 16570     2414       17       8       0      153             0 systemd
Nov 25 18:08:14 raspberrypi kernel: [76436.728744] [16573]  1000 16573     2820        0       9       0      307             0 (sd-pam)
Nov 25 18:08:14 raspberrypi kernel: [76436.728752] [16580]  1000 16580     2882       26       9       0      164             0 sshd
Nov 25 18:08:14 raspberrypi kernel: [76436.728761] [16583]  1000 16583     1472       13       6       0      254             0 bash
Nov 25 18:08:14 raspberrypi kernel: [76436.728771] [18662]     0 18662     3198      530       9       0       59             0 systemd-journal
Nov 25 18:08:14 raspberrypi kernel: [76436.728779] [18927]     0 18927     1807        2       7       0       92             0 sudo
Nov 25 18:08:14 raspberrypi kernel: [76436.728788] [18931]     0 18931   234406   216196     462       0    17047             0 e2fsck
Nov 25 18:08:14 raspberrypi kernel: [76436.728797] [19068]     0 19068      450       15       5       0        0             0 dhcpcd-run-hook
Nov 25 18:08:14 raspberrypi kernel: [76436.728805] [19069]     0 19069      457       15       4       0        0             0 modprobe
Nov 25 18:08:14 raspberrypi kernel: [76436.728814] [19072]     0 19072      895       17       4       0        0             0 systemd-cgroups
Nov 25 18:08:14 raspberrypi kernel: [76437.039107] oom_reaper: reaped process 18931 (e2fsck), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB


I am hoping that I can somehow find a way to get fsck to use a swap file or something other than RAM (specifying scratch files in /etc/e2fsck.conf seems a possibility if I can get it to work) to solve this situation as this would explain why fsck seems to fail at the same spot every time


EDIT: sorry I missed the question about dmesg and journalctl and no there was nothing to note in the immediately after running the commands that hasn't already been posted
EDIT AGAIN: sorry milomak turns out because i can't type properly there was infact loads in both journalctl and dmesg about e2fsck being out of memory, if only id spotted it a week ago!!
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Re: HDD errors

Postby llivv » 2018-11-25 19:31

is it possible to plug the usb disk into another linux and check it from there?
Than see if it mounts without errors on the pi, again

When you'd had your fill of trying to make the pi do what you want it to
why not see what the pi can do the way it was built.

ie make a few different sized partition on the disk and see how big a partition can be checked without running out of memory.
embrace the curve
In memory of Ian Ashley Murdock (1973 - 2015) founder of the Debian project.
User avatar
llivv
 
Posts: 5709
Joined: 2007-02-14 18:10
Location: cold storage

Re: HDD errors

Postby milomak » 2018-11-26 19:07

llivv wrote:is it possible to plug the usb disk into another linux and check it from there?
Than see if it mounts without errors on the pi, again

this has jogged my memory and this is what i used to have to do when i come to think about it
Desktop: iMac Late-2015 27" 5K Retina (17,1 - 3.3GHz) - MacOS and Windows 10 (Bootcamp)/ Debian Sid (External SSD)
Laptop: Lenovo ideapad Y700 [nVidia Optimus] (64-bit) - Debian Sid, Win10,
Kodi Box: AMD Athlon 5150 APU w/Radeon HD 8400 - Debian Sid
milomak
 
Posts: 1855
Joined: 2009-06-09 22:20

Re: HDD errors

Postby Ltlbkofjim » 2018-11-27 21:00

Yep thanks guys that one did the trick, launched an ubuntu VM on my mac, plugged it in ran e2fsck -pv, waited for what felt like an eternity with no signs of progress until all of a sudden I was greeted with success!! - although didn't explicitly say it fixed anything
Plugged it back in the pi, ran e2fsck reported no errors and now df and du both match up.
Going to keep an eye on it just incase it is failing but looks all good at the moment.
Also I will do what you said when I have some time and try a few different partition sizes on the pi to see where the break point is and post back here for anyone who comes across this thread in the future.

Thank you to everyone for all their input
Ltlbkofjim
 
Posts: 8
Joined: 2018-11-19 19:25

Previous

Return to General Questions

Who is online

Users browsing this forum: No registered users and 5 guests

fashionable