Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

machine crashes after upgrade

If none of the specific sub-forums seem right for your thread, ask here.
Post Reply
Message
Author
karen

machine crashes after upgrade

#1 Post by karen »

A year ago, I put testing-sarge on a new server. Woody was just too old. All the security updates done over the last year were done with just the security update line in sources.list. Everything was working fine including doing backups. Machine was backed up to an unpartitioned USB external hard drive at /dev/sdb with a tar command (similar to backing up to tape).

This January, I upgraded to stable-sarge. I didn't use aptitude dist-upgrade as I was going from the old testing-sarge to stable-sarge. In January this year, I replaced the commented out testing lines from the original install with uncommented sarge lines in sources.list, left the security one in, did an aptitude update followed by an aptitude upgrade. I then installed the 2.6.8-2-686-smp kernel-image from stable-Sarge.
That kernel-image as well as the original testing-sarge one (2.6.8-1-686-smp) are both crashing when doing the backups. The backup may run for anywhere from 25 minutes to an hour before the machine crashes. Different USB harddrives have been tried with no change in outcome.

Any help diagnosing this problem and resolving it would be most appreciated. Some info appears below. Other info can be provided if needed.

Thanks in advance. I have run out of ideas.
Karen


The call trace in the most recent messages displayed to console include:
scan_async
echi-work
echi_irq
usb__hcd_irq
handle_IRQ
do_IRQ
common_interrupt
default_idle
default_idle
cpu_idle
printk
print_cpu_info

earlier ones also had:
echi_watchdog
run_timer_softirq
do_softirq
smp_apic_timer_interrupt
apic_timer_interrupt
before the default_idle

Code and final messages are always the same:
Code : 89 50 04 89 02 c7 41 04 00 02 20 00 c7 46 38 00 01 10 00 8b

<0> Kernel Panic: fatal exception in interrupt
in interrupt handler - not syncing

The packages upgraded were:
===============================================================================
[HOLD] lilo
[UPGRADE] binutils 2.15-5 -> 2.15-6
[UPGRADE] cpp-3.3 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] dash 0.5.2-4 -> 0.5.2-5
[UPGRADE] dictionaries-common 0.25.9 -> 0.25.12
[UPGRADE] dpkg 1.10.27 -> 1.10.28
[UPGRADE] dpkg-dev 1.10.27 -> 1.10.28
[UPGRADE] dselect 1.10.27 -> 1.10.28
[UPGRADE] e2fslibs 1.37-2 -> 1.37-2sarge1
[UPGRADE] e2fsprogs 1.37-2 -> 1.37-2sarge1
[UPGRADE] g++-3.3 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] gcc-3.3 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] gcc-3.3-base 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] gdb 6.3-5 -> 6.3-6
[UPGRADE] gzip 1.3.5-9 -> 1.3.5-10sarge1
[UPGRADE] initrd-tools 0.1.79 -> 0.1.81.1
[UPGRADE] klogd 1.4.1-16 -> 1.4.1-17
[UPGRADE] libblkid1 1.37-2 -> 1.37-2sarge1
[UPGRADE] libbz2-1.0 1.0.2-6 -> 1.0.2-7
[UPGRADE] libc6 2.3.2.ds1-21 -> 2.3.2.ds1-22
[UPGRADE] libc6-dev 2.3.2.ds1-21 -> 2.3.2.ds1-22
[UPGRADE] libcomerr2 1.37-2 -> 1.37-2sarge1
[UPGRADE] libgcc1 1:3.4.3-12 -> 1:3.4.3-13
[UPGRADE] libgnutls11 1.0.16-9 -> 1.0.16-13.1
[UPGRADE] libgpmg1 1.19.6-19 -> 1.19.6-19sarge1
[UPGRADE] libnss-db 2.2-6.2 -> 2.2-6.3
[UPGRADE] libss2 1.37-2 -> 1.37-2sarge1
[UPGRADE] libstdc++5 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] libstdc++5-3.3-dev 1:3.3.5-12 -> 1:3.3.5-13
[UPGRADE] libusb-0.1-4 2:0.1.10a-9 -> 2:0.1.10a-9.sarge.1
[UPGRADE] libuuid1 1.37-2 -> 1.37-2sarge1
[UPGRADE] locales 2.3.2.ds1-21 -> 2.3.2.ds1-22
[UPGRADE] logrotate 3.7-3 -> 3.7-5
[UPGRADE] mutt 1.5.9-1 -> 1.5.9-2
[UPGRADE] rmail 8.13.4-1 -> 8.13.4-3
[UPGRADE] sendmail 8.13.4-1 -> 8.13.4-3
[UPGRADE] sendmail-base 8.13.4-1 -> 8.13.4-3
[UPGRADE] sendmail-bin 8.13.4-1 -> 8.13.4-3
[UPGRADE] sendmail-cf 8.13.4-1 -> 8.13.4-3
[UPGRADE] sensible-mda 8.13.4-1 -> 8.13.4-3
[UPGRADE] sysklogd 1.4.1-16 -> 1.4.1-17
[UPGRADE] vim 1:6.3-071+1 -> 1:6.3-071+1sarge1
[UPGRADE] vim-common 1:6.3-071+1 -> 1:6.3-071+1sarge1
[UPGRADE] wget 1.9.1-11 -> 1.9.1-12
===============================================================================
===============================================================================
[INSTALL, DEPENDENCIES] libdevmapper1.01
[UPGRADE] lilo 1:22.6.1-4 -> 1:22.6.1-6.2
===============================================================================
===============================================================================
[INSTALL, DEPENDENCIES] irqbalance
[INSTALL] kernel-image-2.6.8-2-686-smp
===============================================================================

User avatar
domecq
Moderator Team Member
Moderator Team Member
Posts: 549
Joined: 2005-10-18 00:53
Location: Montréal, Canada

#2 Post by domecq »

I have 2 points on this issue you have:
1) I found a reference somewhere in Debian site, which I used to post another thread in this forum (someone asked if 2.6 kernel is stable why it is not default in the installation).
I answered using that reference, in which Debian team recommends to update to kernel 2.4, which is the default of Sarge installation.
The reason they recommend that, is to verify if everything runs smoothly as woody had kernel 2.4.
Well, I know you upgraded (via apt-get, Synaptic or something similar) rather than run a new install of Sarge, but I guess the same concept of kernel 2.4 vs. kernel 2.6 could apply in this scenario too.
2) Another thing that I noticed myself, and I noticed with Synaptic, is that when we select the "status" to be shown on its left side, we could see a group categorized as "obsolete". Whithin that "obsolete" category, I realized that I had only the packages installed from non-Debian repositories (example: Marillat's, Skype or some packages that I "debianized" - via alien - myself).
I reached to a conclusion that, before running any upgrade, using apt-get or its graphical tools (like Synaptic), it would be good to removed all these packages categorized as "obsolete" and reinstall (or rebuild) with the new Debian version.
My conclusion is due to the fact that I did that once, keeping non-Debian repositories packages and the system was crashing too, after upgrading to Sarge, keeping those "obsolete" packages.
Thus, I would recommend, besides the kernel 2.4 test, to also uninstall these "obsolete" packages and, install the ones available for the new Debian version (or rebuild the ones, if applicable).
Cheers,

domecq

karen

machine crashes after upgrade

#3 Post by karen »

Thank you for your response.

When I first installed, I used the testing-sarge (only) packages and also installed using the 2.6.8-1-386-smp kernel at that time. That worked for a year. But, after upgrading, I could not even reboot to that kernel and do backups without crashing. I never did use a 2.4 kernel on the system.

If there were obsolete packages in the install, wouldn't they have been removed when the upgrading was done? How do I use aptitude to check for undesirable packages that still may be on the system? How is this kernel (or modules perhaps) different from the original 2.6.8 kernel that had worked. Something is conflicting somewhere I think, but I don't know how to locate it.

User avatar
domecq
Moderator Team Member
Moderator Team Member
Posts: 549
Joined: 2005-10-18 00:53
Location: Montréal, Canada

#4 Post by domecq »

I never did use a 2.4 kernel on the system.
You can give a try because that's the version you used to run with Woody, right?
If there were obsolete packages in the install, wouldn't they have been removed when the upgrading was done?
Roughly speaking, no, because the upgrade seems to keep whatever is installed and works in a way to upgrade only the packages that are officialy in Debian lists. No removals are applied. I think it would not do anything towards whatever is not in the Debian lists.

The term obsolete is given by Synaptic (a graphical tool that does some trivial apt-get tasks) to packages installed from non-Debian repositories or to the ones installed manually, and it doesn't necessarily mean that they are actually obsolete or undesirable. It's just a way that Synaptic defines whatever is not from an official Debian repository.
I really don't know how Aptitude treats these kinds of files.
If you can use x, run Synaptic and click the Status button. You would be able to see something on the left panel of Synaptic, defined as obsolete. Then, click on that obsolete line and Synaptic will group on the right panel, all the packages like I defined above. If you find something, first take note of the package(s) name(s) that are listed, then you select them all for complete removal (complete meaning with configuration files too, and you can do it for each package simply by right-cicking on it), remove, reboot your system and observe if it crashes.
If everything seems to be OK, go back to your list and try to install them (or build via alien or source, whatever is the case).

Cheers,

domecq

karen

#5 Post by karen »

I thank you for listening to my pleas and for your reply.

A year plus ago this was a brand new server. I installed testing-sarge at the time. Woody was never installed as it was too old and I would have had to create a custom kernel before installing. I installed with the 2.6 kernel that came with testing-sarge. This machine doesn't have X installed as it was not needed. Everything that was installed was a debian package from the distribution. The release notes for upgrading from Woody with the 2.4 kernel (which I am not doing on both counts) regarding 2.6 kernel issues, seemed to be concerned with X issues. That shouldn't apply as there is no X.

I am hoping to resolve this problem without having to start over with a fresh install. Could there be package(s) that were in testing-sarge that are not in stable-sarge that might be causing conflicts? I am using a KVM switch. Could that be an issue? Why do backups work for 1/2 - 1 hour before crashing? Small backups of 15 minutes do work. Why isn't the 2.6 kernel working when a 2.6 kernel was working originally? Why can't I go back to the old kernel and have the system work? ...

Part of the problem with testing all the theories is that I only have very small windows of time in which to do so. For the most part the machine is doing what it was designed to do and we depend on it for those things.

Thanks in advance for any further thoughts on this subject. I do appreiate having someone more knowlegeable about debian to discuss this with. Sorry if I sound argumentative. I am frustrated.

Bulkley
Posts: 6386
Joined: 2006-02-11 18:35
Has thanked: 2 times
Been thanked: 39 times

#6 Post by Bulkley »

I know it's a long shot, but you don't have something in /etc/modules that is interfering with your upgrade, do you?

karen

#7 Post by karen »

Another question or two: Do I have the ones I need? Did any of them change?
I have used default install originally and in the upgrade.

/etc/modules:
ide-cd
ide-generic
sd_mod

lsmod:
Module Size Used by
af_packet 23976 2
ipv6 281764 26
capability 4744 0
commoncap 7552 1 capability
ext3 129704 3
jbd 70584 1 ext3
mbcache 10340 1 ext3
evdev 9824 0
pcspkr 3884 0
aic7xxx 208856 0
hw_random 5844 0
pciehp 99756 0
shpchp 102860 0
pci_hotplug 35708 2 pciehp,shpchp
ehci_hcd 33188 0
ohci_hcd 22596 0
uhci_hcd 34096 0
piix 13824 1
e1000 86884 0
dm_mod 61120 0
ide_generic 1632 0
ide_cd 43232 0
cdrom 41148 1 ide_cd
rtc 14184 0
sd_mod 22144 5
usb_storage 70048 0
usbcore 122116 6 ehci_hcd,ohci_hcd,uhci_hcd,usb_storage
ide_core 142556 4 piix,ide_generic,ide_cd,usb_storage
megaraid 43016 4
aic79xx 311580 0
scsi_mod 127972 5 aic7xxx,sd_mod,usb_storage,megaraid,aic79xx
unix 31156 36
font 8544 0
vesafb 6880 0
cfbcopyarea 4096 1 vesafb
cfbimgblt 3264 1 vesafb
cfbfillrect 4000 1 vesafb

Bulkley
Posts: 6386
Joined: 2006-02-11 18:35
Has thanked: 2 times
Been thanked: 39 times

#8 Post by Bulkley »

I really can't tell, Karen. I can suggest a couple of things. First, I'd try an earlier kernel. I don't think the kernel is the problem, but kernel-images are easily apt-gettable.

The other suggestion is to change your sources to the next up, (testing? unstable? I can't keep them straight) and dist-upgrade again. That just might clean up the missing link. Your move "from the old testing-sarge to stable-sarge" may have gone in the wrong direction.

karen

#9 Post by karen »

I can now do backups. I rebooted with 'noapic' and 'nolapic' boot options and now the backups are completing without the machine crashing. I am not sure this is the best solution, but it solved that problem.

Thanks for all the suggestions.

Karen

Post Reply