One bad process causes my system to hard lock

New to Debian (Or Linux in general)? Ask your questions here!

One bad process causes my system to hard lock

Postby kerryhall » 2020-03-02 08:51

Fresh install of Debian 10.

I've noticed generally firefox has a very high cpu and ram usage. I have a good number of tabs open, but most things like wikipedia don't bog down my system very much.

Every now and then though, a site will be full of ads or videos or whatever, and my system will simply freeze. Can't even ssh into it, can't even toggle the numlock on my keyboard. Have to hard reset. This also happens if I just leave firefox running for a few days.

This is generally rare enough that it's not too much of a big deal, but I was able to reproduce this behavior in another way.

I was messing around in python, testing generating Cartesian products. I'm generating some lists with a few million elements, etc.

Everything is fine until I generate a list with too many elements, and my system instantly hard locks. (I upped the number of symbols causing massive memory allocation, without first calculating my set size)

I'm guessing this is an issue with ram usage, not cpu. I tried to peg the cpu with sysbench, but looks like it's deprecated in Debian 10? (Why? How do people benchmark their systems now? Seems crazy to me to deprecate such a crucial tool.) So instead I tried:
time echo "scale=25000; 4*a(1)" | bc -l

But I realized this will only use 1 of my 4 cores. So I used the "parallel" command to run it four times, and that ran fine for 30 min, wasn't causing any issues on my system.

So...basic question: How to I prevent an OOM event from one badly behaving process from causing my system to hard lock up? I would prefer that the OOM killer simply nuke any badly behaving process instantly, since the alternative is for the system to simply hard lock. (say the process crosses a 16 gig ram usage threshold)

But should the OOM killer even enter the picture here? Isn't that sort of a last ditch effort to save the system? I would imagine that what should happen is python will try to malloc, the kernel will say "nope, sorry oom" and the python script will die. That certainly does not occur on my system. I would also expect the system to be usable and responsive even during an OOM event. Moving the mouse cursor or toggling num lock on the keyboard doesn't need to malloc right? (I hope?)

My test criteria: I should be able to run a python script that tries to append ints to a list, one at a time, a quadrillion times, until the OOM killer nukes it and my system does not lock up and have to be power cycled.

Sorry for the wall of text, I just want to get this issue solved. One bad tab in firefox shouldn't send me hunting for the reset button.
kerryhall
 
Posts: 147
Joined: 2008-08-19 11:06

Re: One bad process causes my system to hard lock

Postby sunrat » 2020-03-02 11:19

May help to set up persistent systemd journal and read its logs with journalctl after a naughty event.
Also ~/.xsession-errors may shed some light although it can get rather wordy. dmesg can help too.
“ computer users can be divided into 2 categories:
Those who have lost data
...and those who have not lost data YET ”
Remember to BACKUP!
User avatar
sunrat
 
Posts: 2998
Joined: 2006-08-29 09:12
Location: Melbourne, Australia

Re: One bad process causes my system to hard lock

Postby CwF » 2020-03-02 12:59

kerryhall wrote:Every now and then though, a site will be full of ads or videos or whatever, and my system will simply freeze.

Actually it's likely not frozen. I dug into this awhile back and you're exactly right the culprit can be a single page, a rude one. I say it's not frozen because I would induce these issues, sometime leave up and wait, and do so in a virtual machine with memory blooming available. When the vm 'locks' and is unresponsive, I'd magically plug in some more memory, sometimes just a few hundred MB and the machine magically comes to life. Sometimes, with an additional block of memory the nasty page would suck up to that new limit, and re-freeze! Give it some more, more, then while it's running fine take some notes and shut it down.

Anymore, much of the web sucks.
CwF
 
Posts: 607
Joined: 2018-06-20 15:16

Re: One bad process causes my system to hard lock

Postby kerryhall » 2020-03-02 19:23

sunrat wrote:May help to set up persistent systemd journal and read its logs with journalctl after a naughty event.
Also ~/.xsession-errors may shed some light although it can get rather wordy. dmesg can help too.


Thank you. I'm going to review my logging policy and see if I can get some more data here.

I'm also going to try and see if I can repro on a vm or another system, so I don't have to bork my main system when testing this.
kerryhall
 
Posts: 147
Joined: 2008-08-19 11:06

Re: One bad process causes my system to hard lock

Postby kerryhall » 2020-03-02 19:27

CwF wrote:
kerryhall wrote:Every now and then though, a site will be full of ads or videos or whatever, and my system will simply freeze.

Actually it's likely not frozen. I dug into this awhile back and you're exactly right the culprit can be a single page, a rude one. I say it's not frozen because I would induce these issues, sometime leave up and wait, and do so in a virtual machine with memory blooming available. When the vm 'locks' and is unresponsive, I'd magically plug in some more memory, sometimes just a few hundred MB and the machine magically comes to life. Sometimes, with an additional block of memory the nasty page would suck up to that new limit, and re-freeze! Give it some more, more, then while it's running fine take some notes and shut it down.

Anymore, much of the web sucks.


Interesting! So you were able to repro in a vm. Sounds like a good approach for me for now.

It's not just web sites causing this issue though, I was able to reproduce this behavior with a few lines of python.
kerryhall
 
Posts: 147
Joined: 2008-08-19 11:06

Re: One bad process causes my system to hard lock

Postby Deb-fan » 2020-03-03 00:30

Obvious sign of bad config or depending just could be badly developed software. Wanted to add some quick thoughts on tuning Firefox cause you mention it. Can be tweaked to a great extent. Number of content processes (e10's) my rule is 1 per actual cpu-core on the system. Comes with 8 now and each adds to overhead, things like disabling prefetch, so the browser isn't fetching things you may never even click on. I leave prefetch dns alone(enabled) don't mind if ff is resolving domain names in the background. Number of tabs stored for back/forward w/o having to redownload them. I also like Noscript for security, speed and reducing overhead. Don't do all that much fiddling around in about:config now but left as it comes Firefox is horrible. Tweaked I can have 80 tabs open and using a fraction of what you're seeing.

Tried Chrome and due to the stats I was seeing almost vowed to never under any circumstances launch the thing again. It was that bad, trying to melt my old dual-core. Then researched what can be done to tune it too. Quickly found out chrome/ium can be easily tweaked to great extent as well. Seems all modern browsers now are horrendous unless tuned.

Edit: Oops in gnu/nix are many ways to limit how many resources a process can use. cgroups, re/nice commands and plenty other approaches you could employ so whatever errant process can't trigger OOM. Yeah that's a last resort to keeping an install running.
Deb-fan
 
Posts: 693
Joined: 2012-08-14 12:27

Re: One bad process causes my system to hard lock

Postby pylkko » 2020-03-03 08:12

I want point out that the memory management on linux is notoriously bad, and that there has been discussion on improving on the kernel mailing list since the end of 2019. For example:

Artem S Tashkinov wrote:"Once you hit a situation when opening a new tab requires more RAM than is currently available, the system will stall hard. You will barely be able to move the mouse pointer. Your disk LED will be flashing incessantly (I'm not entirely sure why). You will not be able to run new applications or close currently running ones. This little crisis may continue for minutes or even longer. I think that's not how the system should behave in this situation. I believe something must be done about that to avoid this stall."
in https://lkml.org/lkml/2019/8/4/15

After this, some linux distributions (e.g Fedora and Clear) have begun to offer EarlyOOM by default
https://github.com/rfjakob/earlyoom

Also, systemd is working on a OOM daemon and the 5.4 kernel saw some work to improve the situation. Given that none of this is in Debian (yet), the situation is worse. The best way to avoid it is to provide large amounts of virtual memory, so that the system never runs out of memory.
User avatar
pylkko
 
Posts: 1659
Joined: 2014-11-06 19:02

Re: One bad process causes my system to hard lock

Postby Deb-fan » 2020-03-03 08:56

Has to be bad config, whichever sys admin has not set practical limits and gnu/Linux is more than up to the task with several effective ways of doing so. Personally never pushed an install to the point of triggering OOM-killer, though am sure that OS must be running like pure poop long before that happens too. Definitely a process can crash a system, it's our job (as admin's) to control such things, setting up reasonable resource limits. :) Regardless can't treat a personal computer like a super computer w/o the specs and expect anything else to happen.

Thou shall ever make all processes behaveth with any of the many tools that they not poop up thine installs.

Great bk of gnu/nix dorkage.

RAmennnnn. :)
Last edited by Deb-fan on 2020-03-03 13:22, edited 2 times in total.
Deb-fan
 
Posts: 693
Joined: 2012-08-14 12:27

Re: One bad process causes my system to hard lock

Postby CwF » 2020-03-03 12:13

pylkko wrote: The best way to avoid it is to provide large amounts of virtual memory

I've mentioned a few times, my answer was zram swap. It gives some headroom, more importantly with it the condition easier to monitor. It could be that zram is a canary. In a browser vm with 256MB (default) full zram the canary is not feeling well, a user can watch this an clean it up. The full zram means there are many tabs swapped and unused. It also means as much as a gig is already corralled and the user should clean up their room before they go play.
CwF
 
Posts: 607
Joined: 2018-06-20 15:16

Re: One bad process causes my system to hard lock

Postby Deb-fan » 2020-03-03 12:21

What's swappiness set to here? Thought of mentioning zram too but would question any benefit on a system already overloaded to the point OOM-killer comes into play. Disk thrashing is also evil incarnate as sayeth the great bk. :)
Deb-fan
 
Posts: 693
Joined: 2012-08-14 12:27

Re: One bad process causes my system to hard lock

Postby CwF » 2020-03-03 12:28

Deb-fan wrote:What's swappiness set to here?

I've settled on 20 so far
CwF
 
Posts: 607
Joined: 2018-06-20 15:16

Re: One bad process causes my system to hard lock

Postby Deb-fan » 2020-03-03 12:38

Considering doing same in hopes of better filesystem performance ... file caching but meant what's it at on OP's system. Again never seen OOM-killer in action, only ever read about it and would take seeing it as a sign I've badly misconfig'ed something and would try to fix that asap.
Deb-fan
 
Posts: 693
Joined: 2012-08-14 12:27

Re: One bad process causes my system to hard lock

Postby trinidad » 2020-03-03 13:08

How to I prevent an OOM event from one badly behaving process from causing my system to hard lock up


https://manpages.debian.org/buster/util ... .1.en.html

Sort out the PID. Change the setting.

TC
You can't believe your eyes if your imagination is out of focus.
trinidad
 
Posts: 117
Joined: 2016-08-04 14:58

Re: One bad process causes my system to hard lock

Postby pylkko » 2020-03-04 07:56

Yes, except that the oom killer doesn't always manage, as mentioned in the second post:

Vlastimil Babka wrote:Yeah that's a known problem, made worse SSD's in fact, as they are able
to keep refaulting the last remaining file pages fast enough, so there
is still apparent progress in reclaim and OOM doesn't kick in.
and it is a kernel protection method, not really designed for QoS

Also, when/if it finally works (can take up to 20 min), it will probably kill the process the user is interested in keeping alive (firefox). Which, while better than a hard lock, would also be annoying. Which is why there is so much work going on to develop software that takes corrective action in userspace before an OOM occurs in kernel space. And I believe there will be significant progress in this in the near future (like next year, so maybe for Bullseye).

GNOME low memory API:
http://www.hadess.net/2019/12/gmemorymo ... r-2nd.html
User avatar
pylkko
 
Posts: 1659
Joined: 2014-11-06 19:02

Re: One bad process causes my system to hard lock

Postby trinidad » 2020-03-18 16:22

This is an interesting subject. Fedora is going with earlyoom on their new release, and SID seems to have a modified package as well (different than Buster) though I haven't looked at it yet. I do think that systemd needs to run settings from user space, because systems without swap, SSD trim, certain oom application presets like Chrome uses, and several other hardware/software topography issues make it something that needs to respond to user configuration. It's too user parameter sensitive to have any reliable default setting. The problem with the whole idea of oom is that every variable effects every other variable right now, and even though certain settings will reliably kill an oom process the settings themselves can handcuff a running system with long delays.

TC
You can't believe your eyes if your imagination is out of focus.
trinidad
 
Posts: 117
Joined: 2016-08-04 14:58


Return to Beginners Questions

Who is online

Users browsing this forum: No registered users and 7 guests

fashionable