Skyrim keeping crashing and a hard reset is needed

If none of the more specific forums is the right place to ask

Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-06-27 15:41

I bought Skyrim a few days ago, I'm a few hours in, it seems to work well in Steam Proton 4.11, well, when it's not crashing.

Since yesterday I have had about four catastrophic crashes, by which I mean, the screen goes grey and I have to restart. But I can't even restart through REISUB, it has to be a hard reset.

Does anyone have any idea what could be going on? I have monitored hardware temps while playing, they are all well within limits. No overheating. I have played on low gfx settings, still it happens.

System specs

Intel Q8400
AMD HD 5870
6GB RAM
Debian Linux using GNOME


Some error logs from /var/log/syslog

Code: Select all
Jun 27 11:40:39 psychopig-xxxvii systemd[1]: Started Run anacron jobs.
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448657] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 0: f200084000000800
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448686] mce: [Hardware Error]: TSC 0
Jun 27 11:40:39 psychopig-xxxvii systemd[1]: Started Daily rotation of log files.
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448711] mce: [Hardware Error]: PROCESSOR 0:1067a TIME 1593254432 SOCKET 0 APIC 0 microcode a0b
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448737] mce: [Hardware Error]: Machine check events logged
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448739] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: f200001034000e0f
Jun 27 11:40:39 psychopig-xxxvii kernel: [    0.448763] mce: [Hardware Error]: TSC 0
Jun 27 11:40:39 psychopig-xxxvii systemd[1]: Started Daily apt download activities.



It says there's a CPU issue but it doesn't seem to be temperature related, there appear to be no temp spikes in-game which I have monitored through the sensor terminal prog. I don't want to keep restarting the game for too long, as amazing as it is, since hard resets are not good for the hardware. Any help appreciated.

I wonder if I should try running it from Xfce to see if there's any difference.

EDIT: This only happens so far in wide open outdoor spaces. I am ruling out an issue with thermal paste since I replaced that six months ago, and this is the first time this has happened since. I have played Witcher 1 a lot recently, and no such thing happened.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby Head_on_a_Stick » 2020-06-27 16:16

Do you have the CPU µcode and AMD graphics firmware installed?

Enable persistent logging and check the systemd journal for clues.
Black Lives Matter

Debian buster-backports ISO image: for new hardware support
User avatar
Head_on_a_Stick
 
Posts: 12194
Joined: 2014-06-01 17:46
Location: /dev/chair

Re: Skyrim keeping crashing and a hard reset is needed

Postby LE_746F6D617A7A69 » 2020-06-27 16:25

It's definitely a hardware bug.
What You can do:
1. Install the collectd-core package, which provides the mcelog utility - it allows to decode MCE events, so It can help to find the source of the problem (CPU/memory/PCI/other)
2. Independently of the above:
a) measure the PSU voltages when the game is running, especially the +12V lines which are used to power almost everything in Your PC. A result below ~11.6V means that the PSU is overloaded / damaged.
b) If You have non-EFI BIOS use the memtest86 from Debian repos to test the memory modules. If You have EFI BIOS, use the FreeVersion of memtest
c) Underclock the CPU (f.e. using cpufrequtils or linux-cpupower)
d) remove the memory sticks, one at a time to find a faulty one
e) rise the memory voltage by the smallest value supported by BIOS -> this can "fix" the faulty RAM modules
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 184
Joined: 2020-05-03 14:16

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-06-28 10:07

LE_746F6D617A7A69 wrote:It's definitely a hardware bug.
What You can do:
1. Install the collectd-core package, which provides the mcelog utility - it allows to decode MCE events, so It can help to find the source of the problem (CPU/memory/PCI/other)
2. Independently of the above:
a) measure the PSU voltages when the game is running, especially the +12V lines which are used to power almost everything in Your PC. A result below ~11.6V means that the PSU is overloaded / damaged.
b) If You have non-EFI BIOS use the memtest86 from Debian repos to test the memory modules. If You have EFI BIOS, use the FreeVersion of memtest
c) Underclock the CPU (f.e. using cpufrequtils or linux-cpupower)
d) remove the memory sticks, one at a time to find a faulty one
e) rise the memory voltage by the smallest value supported by BIOS -> this can "fix" the faulty RAM modules


You are definitely right in that it's hardware bug. I thought this was something specific to Skyrim and though it's the first time in memory that this has happened, I found out that it's the well-known Grey Screen of Death that happens regardless of OS and often game with some ATI cards.

https://www.overclock.net/forum/67-amd/ ... ution.html

Apparently the 5xxx series is particularly prone to it. I had a HD 4870 sitting in a box, so I swapped the 5870 for it. Now the amount of crashes has been drastically reduced [I got one crash yesterday in eight hours of playing and I was able to soft reset after it too, in comparison to one crash every 20 mins or so on the 5870].

The link above says that this has been solved by an old Catalyst driver update, but obviously not so in the mesas, even the recent ones in Debian 10.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby LE_746F6D617A7A69 » 2020-06-28 12:10

Lysander wrote:https://www.overclock.net/forum/67-amd/650900-gray-screen-explained-5xxx-series-updated-w-solution.html

Apparently the 5xxx series is particularly prone to it.

I'm sorry, but I have to say this: That explanation of a problem is just moronic.
Besides, a lot of people are overclocking the memory on 5xxx - no issues.

I have better explanation for You:
https://www.techpowerup.com/gpu-specs/radeon-hd-4870.c219
https://www.techpowerup.com/gpu-specs/radeon-hd-5870.c253
HD4870: TDP: 150W, min. PSU: 350W
HD5870: TDP: 188W, min. PSU: 450W

Better start with measuring the PSU voltages.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 184
Joined: 2020-05-03 14:16

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-07-02 15:41

Thanks for your assessment. I am still kind of stumped here. Because...

I bought a new 550W PSU about 18 months ago with my current setup including the 5870. Before I bought the PSU, I did multiple calculations on the setup and the result was, unanimously, that 550W was more than enough for the hardware on this machine.

I have replaced the 5870 into the machine, played about 30 mins of Skyrim, no problems. Voltages and temps all seem to be fine and I have monitored them during play previously.

Still, I will play some more and see if the crashes return. The only difference I can see that I've recently made is lowering the shadow and distant object quality.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby LE_746F6D617A7A69 » 2020-07-02 16:38

All of the PSU manufacturers are using ugly tricks for calculations of power rating.
Can You tell the exact PSU model or show the nameplate?

Typically, the usable power is below 60% of the declared value.
Also, You should take into account that TDP means *average* power consumption under 100% load - the peak power is undefined.
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 184
Joined: 2020-05-03 14:16

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-07-02 22:20

Yes, this PSU is a Corsair VS550. I had one other GSOD in Skyrim this evening with the 5870, again in an open area.

So I decided to run some tests again and monitor temps while playing. Of course, it didn't crash even though this time I wanted it to, and the most I got the GPU temp up to was 76C, momentarily, just before this:

Image

Image link https://imgur.com/IkUjzhb

Now, the 5870 gives notably superior performance than the 4870, and the 4870 never gives a GSOD. So this means either an issue with the card or the PSU.

I am left with the quandary of whether to buy a new GPU or a new 650W PSU. I could get a refurbished second hand GPU, e.g. a 6870, which has lower power consumption than the 5870, along similar lines to the 4870. Money is a little tight, so it's either GPU or PSU. I just haven't decided which.

I do wonder if having the browser open in the background doesn't help things.

Head_on_a_Stick wrote:Do you have the CPU µcode and AMD graphics firmware installed?


Oh, and yes to both these questions.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-07-03 11:09

And yet more threads on this:

https://forums.nexusmods.com/index.php? ... world-map/

https://www.overclock.net/forum/67-amd/ ... kyrim.html

The consensus, as far as I am concerned, is overwhelming that this is an issue with the 5xxx series of graphics cards on this game. Neon Drive, for instance, which gives my GPU an intense workout, never gives a GSOD. I have seen many posts and threads on this. I think the only thing I can do is stick with the 4870 for now which cedes no problems. Unfortunately an upgrade to a 4890 is unfeasible without upgrading the PSU.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby LE_746F6D617A7A69 » 2020-07-03 13:28

Yes, it seems that this can be a problem with Skyrim.

Anyway I've checked Your PSU:
The interesting thing is, that Corsair is not willing to show the PSU nameplate and the maximum/combined line load on their official site:
https://www.corsair.com/us/en/Categories/Products/Power-Supply-Units/vs-series-config-2018/p/CP-9020171-NA#tab-tech-specs
But I've found the nameplate photo here:
https://www.technokick.com/2015/10/corsair-vs550-review-best-budget-psu/

The usable power on +12V line ranges from (550W-110W-3.6W)=436W to max 504W - it depends on the power draw on the other lines.
Not that bad - so why are they hiding the specs?

Your CPU' TDP is 95W + 188W for 5870 + ~30W for MOBO + ~10W for Fans/drives gives ~323W average under 100% load, so there's ~113W left in the worst case.
Provided that the PSU is working 100% correct and the PEG connectors are OK, we can safely exclude the PSU from a set of possible sources of this problem.

What about overheating of the GFX memory chips? Will it crash if You remove the side cover from the PC case?
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 184
Joined: 2020-05-03 14:16

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-07-03 22:46

LE_746F6D617A7A69 wrote:What about overheating of the GFX memory chips? Will it crash if You remove the side cover from the PC case?


OK, so more tests were run today on your suggestion, first of all I had to make sure it wasn't overheating or a voltage issue. I ran Neon Drive with the side off the case for about 30 mins. I got the GPU up to 78.5C, which is a record, with the card fan whining away [the 5870 makes notably more noise than the 4870 at this temp, but then the 4870 is apparently designed to run hotter]. Even with Neon Drive taxing the card and the side off the case, no GSOD.

Now, I've noticed that these GSODs happen only in open areas of Skyrim [i.e. roads, mountains, hills, plains etc]. So I decided to play tonight for a bit on high graphics settings [which look gorgeous, and you get a smooth framerate with the 5870] for an hour or so, but it was important not to play in wide open spaces - so that leaves cities, houses, taverns, crypts etc - it's still a lot of the game. No crashes. There was even one quest in the city of Markarth called House of Horrors, which contains a lot of low lighting, reflections and mist, which really taxed the graphics card, I could hear. But still no issues.

I am increasingly whittling this down to not only being a GPU issue, but an issue with rendering open spaces in the game [and maybe even types of open spaces] and draw distance. I have included two screenshots of the graphics settings menu and am wondering which of these I could turn down in order to eliminate these crashes.

https://imgur.com/gallery/LXqJU8y

Of course, I could just turn everything down to their lowest settings and go up from there, but that may not even work and may be something of a protracted process. Still, I may have to give it try.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London

Re: Skyrim keeping crashing and a hard reset is needed

Postby LE_746F6D617A7A69 » 2020-07-05 09:39

You should also consider a problem with Proton.
Proton is based on WINE, and WINE is known to have regressions where only selected games are crashing.
Maybe You should report a bug to Valve?
Bill Gates: "(...) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system."
The_full_story and Nothing_have_changed
LE_746F6D617A7A69
 
Posts: 184
Joined: 2020-05-03 14:16

Re: Skyrim keeping crashing and a hard reset is needed

Postby Lysander » 2020-07-05 12:43

Thanks for all your suggestions with this, LE_746F6D617A7A69, but I have decided to put this issue to bed. Last night I got a crash just looking at the main map, so as far as I'm concerned, that's it for the 5870 with this game. It's the 4870 or a new/other second hard card. I've spent enough time trying to diagnose the issue and it's just turning into a wild goose chase.

This post from this thread was particularly telling:

These black screen hard crashes have been a known issue for a considerable amount of people since release. Some people report newer/later drivers and patches fixed it for them, others had hardware faults, while others still have no solution. It's been reported by both ATI and Nvidia users with a very wide range of cards and manufacturers.

What was reasonably determined over about 100 pages of forum threads was that Skyrim is calling the video driver to do *something*, and the driver does not know how to handle the request so it tries to shut down and restart the driver which would normally just cause a CTD, but it's unable to restart the driver for some reason causing your whole system to hang until it is hard rebooted. It could be an issue with Skyrims code, or it could be an issue with the physical hardware, or it could be some obscure driver issue, there's unfortunately no real way to tell outside of a lab setting with some sort of external debugging setup logging the games calls to the video card when it hangs. As far as I know Bethesda has made no direct acknowledgment of the issue, all we can really hope is that the mysterious "code optimizations" in the patch notes inadvertently solve the issue.

I suffered from the same issue at release, and decided to just shelf the game entirely for a good year so they can hopefully patch up all the awful bugs and the Modding community can take care of the rest. Hopefully the patches or a whole new system eventually fixes the issue.


It would take even more time to diagnose further since people can have a whole host of different reasons for this happening. I often see a poster in a forum jubilantly saying, "solution X fixed it!!!!" only to come back in a few days to say the 'solution' actually hasn't fixed the issue. I'll just continue with the older card.
User avatar
Lysander
 
Posts: 616
Joined: 2017-02-23 10:07
Location: London


Return to General Questions

Who is online

Users browsing this forum: No registered users and 7 guests

fashionable