Debian 7 system hung and released automatically after 4 days

Kernels & Hardware, configuring network, installing services

Debian 7 system hung and released automatically after 4 days

Postby amarnadh.m » 2020-07-23 14:43

Hi, We have a cluster of debian 7 instances which all got hung in the span of 2-3 days. The systems were able to run the existing processes but were not able to fork new processes. They were in this state for about 4 days and then all the instances started working normally around same time.

The common thing that's happening on all these instances is repeated install of auditd (apt-get --yes --force-yes --allow-unauthenticated install auditd) which of-course doesn't install if its already installed. All the logs stop at a point with the following log lines in syslog
audispd: audispd initialized with q_depth=80 and 1 active plugins
auditd[24376]: Init complete, auditd 1.7.18 listening for events (startup state enable)
rsyslogd: [origin software="rsyslogd" swVersion="8.12.4.1." x-pid="32509" x-info="http://www.rsyslog.com"] exiting on signal 15.


The above pattern is seen before the hang but rsyslogd always came up.

Could this be auditd issue as this is the pattern found or any other thing that could be the reason. Not able to infer much from other logs

System specs: 40 cores, 96gb ram
They run with >95% mem utilization with most of it cached
amarnadh.m
 
Posts: 3
Joined: 2020-07-23 13:55

Re: Debian 7 system hung and released automatically after 4

Postby CwF » 2020-07-23 22:57

Mysterious storage subsystem log jam?
CwF
 
Posts: 721
Joined: 2018-06-20 15:16

Re: Debian 7 system hung and released automatically after 4

Postby amarnadh.m » 2020-07-24 04:40

CwF wrote:Mysterious storage subsystem log jam?

Could you please elaborate a little
amarnadh.m
 
Posts: 3
Joined: 2020-07-23 13:55

Re: Debian 7 system hung and released automatically after 4

Postby CwF » 2020-07-24 11:31

amarnadh.m wrote:Could you please elaborate a little

No, not really. I've just seen in my own stress test how system storage can be a culprit in vm interruptions and leave no trace. Out of memory somewhere? Something wanted to shutdown, other things wanted to wait... I have nothing helpful other than my observation that host storage seems to be able to say 'wait, I'm doing something' and the system stops time in a sense and doesn't error. Only real time things notice.

My own advice to me concerning this concept, 'don't us a vm as a stopwatch'.
CwF
 
Posts: 721
Joined: 2018-06-20 15:16

Re: Debian 7 system hung and released automatically after 4

Postby Bulkley » 2020-07-24 15:41

amarnadh.m, have you considered that Debian 7 and the hardware it runs on is getting old? I know that my question is incidental to your immediate problem but I can't help wondering if your machines are cycling over a function that didn't exist when they were designed.
Bulkley
 
Posts: 5971
Joined: 2006-02-11 18:35

Re: Debian 7 system hung and released automatically after 4

Postby amarnadh.m » 2020-07-25 06:26

Bulkley, It is true that debian 7 is old and we might consider upgrading it. I want to know what could have caused such issue and that too not on just one instance but 270 of them. Were there any such hangs recorded previously on debian 7. Could this be a kernel bug?
amarnadh.m
 
Posts: 3
Joined: 2020-07-23 13:55


Return to System configuration

Who is online

Users browsing this forum: No registered users and 8 guests

fashionable