Badblocks

Message

xur17 · #1 Post by **xur17** » 2005-12-29 05:07

I had some problems with my debian server crashing, and when I turned it back on after a crash, I got errors about the hard drive (I didn't write them down). Anyway, I ran badblocks, and it gave me some numbers:

49893184
49893192
49893193
49893194
49893195
49893196
49893197
49893198
49893199
49893200
49893201
49893202
49893203
49893204
49893205
49893206
49893207
49893208
49893209
49893210
49893211
49893212
49893213
49893214
49893215
49893216
49893217
49893218
49893219
49893220
49893221
49893222
49893223
49893224
49893225
49893226
49893227
49893228
49893229
49893230
49893231
49893232
49893233
49893234
49893235
49893236
49893237
49893238
49893239

I assume these are bad blocks? Do I have to do something to them (like block them from being used), or does the system fix this on its own? Thanks!

domecq · #2 Post by **domecq** » 2005-12-29 19:55

Reboot, open a terminal, run 'dmesg' and paste its contents here. Additionaly, paste the contents of the most recent system log file.

Domecq

xur17 · #3 Post by **xur17** » 2006-01-04 22:54

domecq wrote:Reboot, open a terminal, run 'dmesg' and paste its contents here. Additionaly, paste the contents of the most recent system log file.

Domecq

dmseg:

Linux version 2.4.27-2-k7 (horms@tabatha.lab.ultramonkey.org) (gcc version 3.3.5 (Debian 1:3.3.5-13)) #1 Wed Aug 17 11:28:09 UTC 2005
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000007ff0000 (usable)
BIOS-e820: 0000000007ff0000 - 0000000007ff3000 (ACPI NVS)
BIOS-e820: 0000000007ff3000 - 0000000008000000 (ACPI data)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
127MB LOWMEM available.
On node 0 totalpages: 32752
zone(0): 4096 pages.
zone(1): 28656 pages.
zone(2): 0 pages.
ACPI: RSDP (v000 AMD750 ) @ 0x000f5c00
ACPI: RSDT (v001 AMD750 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x07ff3000
ACPI: FADT (v001 AMD750 AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x07ff3040
ACPI: DSDT (v001 AMD750 AWRDACPI 0x00001000 MSFT 0x0100000c) @ 0x00000000
Kernel command line: root=/dev/hda1 ro
Local APIC disabled by BIOS -- reenabling.
Found and enabled local APIC!
Initializing CPU#0
Detected 698.673 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1392.64 BogoMIPS
Memory: 122980k/131008k available (1187k kernel code, 7644k reserved, 452k data, 116k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After generic, caps: 0183fbff c1c7fbff 00000000 00000000
CPU: Common caps: 0183fbff c1c7fbff 00000000 00000000
CPU: AMD Athlon(tm) Processor stepping 02
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 698.6431 MHz.
..... host bus clock speed is 199.6121 MHz.
cpu: 0, clocks: 1996121, slice: 998060
CPU0<T0:1996112,T1:998048,D:4,S:998060,C:1996121>
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
ACPI: Subsystem revision 20040326
ACPI: Interpreter disabled.
PCI: PCI BIOS revision 2.10 entry at 0xfb620, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: ACPI tables contain no PCI IRQ routing entries
PCI: Probing PCI hardware (bus 00)
PCI: Using IRQ router default [1022/7006] at 00:00.0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
devfs: v1.12c (20020818) Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x0
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with HUB-6 MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
COMX: driver version 0.85 (C) 1995-1999 ITConsult-Pro Co. <info@itc.hu>
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
Initializing Cryptographic API
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 16384)
Linux IP multicast router 0.06 plus PIM-SM
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 4040 blocks [1 disk] into ram disk... done.
Freeing initrd memory: 4040k freed
VFS: Mounted root (cramfs filesystem).
Freeing unused kernel memory: 116k freed
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ide: late registration of driver.
AMD7409: IDE controller at PCI slot 00:07.1
AMD7409: chipset revision 7
AMD7409: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD7409: 00:07.1 (rev 07) UDMA66 controller
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
hda: Maxtor 6Y060L0, ATA DISK drive
blk: queue c8829720, I/O limit 4095Mb (mask 0xffffffff)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: attached ide-disk driver.
hda: 120103200 sectors (61493 MB) w/2048KiB Cache, CHS=119150/16/63, UDMA(33)
Partition check:
/dev/ide/host0/bus0/target0/lun0: [PTBL] [7476/255/63] p1 p2 < p5 >
Journalled Block Device driver loaded
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding Swap: 377488k swap-space (priority -1)
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
Real Time Clock Driver v1.10f
spurious 8259A interrupt: IRQ7.
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 94M
agpgart: Detected AMD Irongate chipset
agpgart: AGP aperture is 128M @ 0xd0000000
natsemi dp8381x driver, version 1.07+LK1.0.17, Sep 27, 2002
originally by Donald Becker <becker@scyld.com>
http://www.scyld.com/network/natsemi.html
2.4.x kernel port by Jeff Garzik, Tjeerd Mulder
eth0: NatSemi DP8381[56] at 0xc8901000, 00:02:e3:0b:fb:98, IRQ 11.
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-ohci.c: USB OHCI at membase 0xc891a000, IRQ 11
usb-ohci.c: usb-00:07.4, Advanced Micro Devices [AMD] AMD-756 [Viper] USB
usb-ohci.c: AMD756 erratum 4 workaround
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 4 ports detected
hub.c: new USB device 00:07.4-2, assigned address 2
usb.c: USB device 2 (vend/prod 0x51d/0x2) is not claimed by any active driver.
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: acpi_shpchprm:get_device PCI ROOT HID fail=0x1001
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: acpi_pciehprm:get_device PCI ROOT HID fail=0x1001
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hiddev0: USB HID v1.10 Device [APC Back-UPS ES 350 FW:800.e6.D USB FW:e6] on usb1:2.0
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik <vojtech@suse.cz>
hid-core.c: USB HID support drivers
eth0: link up.
eth0: Setting full-duplex based on negotiated link capability.

-------------------------------------------------------

Syslog:

http://xur17.bravehost.com/syslog.txt

Thank you very much!

domecq · #4 Post by **domecq** » 2006-01-04 23:24

It seems that the bad blocks were isolated.

You can run e2fsck (if your system has ext2/ext3 file system) and see the result.

You could have the following:
EXIT CODE
The exit code returned by e2fsck is the sum of the following conditions: 0 - No errors 1 - File system errors corrected 2 - File system errors corrected, system should be rebooted 4 - File system errors left uncorrected 8 - Operational error 16 - Usage or syntax error 32 - E2fsck canceled by user request 128 - Shared library error.

For details, enter man 8 e2fsck in a terminal.

Guest · #5 Post by **Guest** » 2006-01-04 23:58

domecq wrote:It seems that the bad blocks were isolated.

You can run e2fsck (if your system has ext2/ext3 file system) and see the result.

You could have the following:
EXIT CODE
The exit code returned by e2fsck is the sum of the following conditions: 0 - No errors 1 - File system errors corrected 2 - File system errors corrected, system should be rebooted 4 - File system errors left uncorrected 8 - Operational error 16 - Usage or syntax error 32 - E2fsck canceled by user request 128 - Shared library error.

For details, enter man 8 e2fsck in a terminal.

when I run this, I get this error:

debian-server:~# e2fsck /dev/hda1
e2fsck 1.37 (21-Mar-2005)
/dev/hda1 is mounted.

WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)?

---

So this was the cause of my errors?

Is there any easy way to run this command without rebooting the machine, and using a cdrom?

xur17 · #6 Post by **xur17** » 2006-01-05 00:13

sorry, that last post was by me. I wasn't logged in.

domecq · #7 Post by **domecq** » 2006-01-05 00:54

sorry, that last post was by me. I wasn't logged in.

That's OK. I figured that out.

You have to run that command with the filesystem unmounted as it was mentioned to you. The easiest is to reboot in safe mode or get a shell from installation CD.

Have the man page printed close to you. And, for the safe side, back up your data first (nothing will be deleted, but it's just a precaution).

Guest · #8 Post by **Guest** » 2006-01-05 01:25

domecq wrote:
sorry, that last post was by me. I wasn't logged in.
That's OK. I figured that out.

You have to run that command with the filesystem unmounted as it was mentioned to you. The easiest is to reboot in safe mode or get a shell from installation CD.

Have the man page printed close to you. And, for the safe side, back up your data first (nothing will be deleted, but it's just a precaution).

I just checked the machine, and it looked like the ide cable on the harddrive wasn't fully seated (I just remembered that I had to unplug that to put new ram in). Could that have caused it? I am running badblocks now, to see how it works.

xur17 · #9 Post by **xur17** » 2006-01-05 01:33

Anonymous wrote:
domecq wrote:
sorry, that last post was by me. I wasn't logged in.
That's OK. I figured that out.

You have to run that command with the filesystem unmounted as it was mentioned to you. The easiest is to reboot in safe mode or get a shell from installation CD.

Have the man page printed close to you. And, for the safe side, back up your data first (nothing will be deleted, but it's just a precaution).
I just checked the machine, and it looked like the ide cable on the harddrive wasn't fully seated (I just remembered that I had to unplug that to put new ram in). Could that have caused it? I am running badblocks now, to see how it works.

Never mind, it looks like I just jinxed myself. I just checked again, and I got the same bad blocks. Would any linux cd work (like a cd called rescuecd?)

domecq · #10 Post by **domecq** » 2006-01-05 02:13

The installation CD of Debian could do the work. You can alternatively choose to boot in safe mode (the second option of the boot menu).

xur17 · #11 Post by **xur17** » 2006-01-07 18:13

Do you think this is a bad hard drive? I ran a linux rescue cd, and ran e2fsck -c -k /dev/hda1. This scanned, and then about 2/3 of the way through, it gave me this error:
end_request:I/O error, dev 03:01 (hda), sector 99786472
hda: dma_intr: status = 0x51 {DriveReady Seek Complete Error}
hda:dma_intr: error=0x40{Uncorrectable Error} LBASEC+=99786541, sector =99786472

and repeats this with sector 99786474, 99786476, and 99786478.

After it finished scanning, I rebooted into debian, and ran bad blocks, and still found bad blocks.

I came back to my computer later, and saw this error:
spurious 8259A, interrupt: IRQ7

I am not sure what is wrong. Should I get a new hard drive, and transfer the files over to it?

xur17 · #12 Post by **xur17** » 2006-01-12 04:49

I just ran df on the server, and got this:

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda1 58735612 54662524 1089420 99% /
tmpfs 63568 0 63568 0% /dev/shm

I definately don't have that much space used on it. Is that a sign of drive failure?

domecq · #13 Post by **domecq** » 2006-01-20 00:43

Yes, it looks like it's a bad HD you've got.
Since you're considering a new HD, you could try before doing that (and after backing up your things), to reinstall Debian. Perhaps a new install would deal with the black blocks issue in a way to isolate them.

xur17 · #14 Post by **xur17** » 2006-01-21 20:56

domecq wrote:Yes, it looks like it's a bad HD you've got.
Since you're considering a new HD, you could try before doing that (and after backing up your things), to reinstall Debian. Perhaps a new install would deal with the black blocks issue in a way to isolate them.

I think I am probably just going to get a new hard drive. The drive seems to be doing even worse now. I had the machine on, and then I noticed that it was constantly doing something (the hard drive indicator was on). I couldn't access any of its services (apache, ssh...). How to I go about copying the data to a new hard drive once I purchase one (I found a 100 gb for $30 at compusa tommorow). I have the linux cd called rescue cd. I just don't know how to format the new hard drive, and copy the data over to the hard drive. Any help would be really appreciated.

Thanks in advance.

edit: I was looking around at the rescuecd site, and it looks like there is a qtparted program. I just don't know how I start it.