Corrupt filesystem - blame the SSD?

If none of the more specific forums is the right place to ask

Corrupt filesystem - blame the SSD?

Postby dwasi » 2019-11-27 17:13

Your opinions, please.

I have a Stretch system running headless as a server. /dev/sda is a fairly new SSD (less than 6 mo in service). /dev/sda1 is an ext4 partition mounted at / with errors=remount-ro.

Recently I found the partition mounted read-only. e2fsck reported mild filesystem corruption, mainly orphaned inodes. e2fsck was able to fix the partition, and as far as I can tell it was entirely successful; at least I was not able to find any important munged files, and the partition now reports clean. SMART extended test reports the SSD entirely ok. But I replaced the SSD anyway as a precaution, as this is a primary production server.

So now I have this nearly-new SSD, and I'm trying to decide whether to blame it for the problem and discard it, or just put it down to cosmic rays or something and use the drive somewhere else. I can't warranty it because I can't prove there's anything wrong with it.

None of the common causes of filesystem corruption apply here. There was no disorderly shutdown. Power is regulated, so no spikes. The server hardware is enterprise, not consumer.

In this situation, would you first suspect the drive, or something else?
dwasi
 
Posts: 3
Joined: 2019-11-27 16:57

Re: Corrupt filesystem - blame the SSD?

Postby Bloom » 2019-11-27 17:35

This looks more like a spontaneous reboot of your server. Is that possible? For instance, doesn't it have a UPS and so a power interruption would reset or power-off the system?
User avatar
Bloom
 
Posts: 194
Joined: 2017-11-11 12:23

Re: Corrupt filesystem - blame the SSD?

Postby dwasi » 2019-11-27 17:42

Bloom wrote:This looks more like a spontaneous reboot of your server. Is that possible? For instance, doesn't it have a UPS and so a power interruption would reset or power-off the system?

Unlikely. It is on a UPS. Still possible that it spontaneously rebooted, I suppose, but not from power loss. Maybe an internal power supply issue could cause that, but it seems unlikely.

Also, there is another server on the same circuit and that one did not reboot.
dwasi
 
Posts: 3
Joined: 2019-11-27 16:57

Re: Corrupt filesystem - blame the SSD?

Postby pendrachken » 2019-11-28 00:58

It's possible that the drive is fine, but the Cable is bad / port wasn't completely plugged in. Cabling is cheap enough to toss ( in the emergency spares bin ) when replacing a drive anyways, and hasn't been heat cycled / cooked in the server. I'd replace the cable too, every time you replace a questionable drive.

Crazy shit can happen with bad cables, especially on enterprise hardware. I had one server that wouldn't cold boot ( wouldn't even POST ) unless you banged it on the side in a very specific spot, but would warm boot all day long without being touched. I finally traced the problem down to a bad EIDE cable, with no idea why it wouldn't make it to POST on a cold boot unless the cable was in a specific position. After I replaced the cable the damn thing would cold boot every time.
fortune -o
Your love life will be... interesting.
:twisted: How did it know?

The U.S. uses the metric system too, we have tenths, hundredths and thousandths of inches :-P
pendrachken
 
Posts: 1355
Joined: 2007-03-04 21:10
Location: U.S.A. - WI.

Re: Corrupt filesystem - blame the SSD?

Postby dwasi » 2019-12-02 14:49

After some research I figured out that there actually is something weird about what SMART reports for this drive. SMART reports only 2675 power-on hours, and 622 GiB lifetime writes, but the value for SSD Life Left was down to 2%. That doesn't seem right, so I started an RMA with the manufacturer.

It took me a while to pick up on that because the SSD_Life_Left attribute is 231, and smartmontools reports attribute 231 as Temperature_Celsius for any drive not in its database; plus I wasn't sure whether I should be looking at VALUE (002) or RAW_VALUE (98). Eventually I learned that VALUE is the normalized value that contains the actual figure.
dwasi
 
Posts: 3
Joined: 2019-11-27 16:57


Return to General Questions

Who is online

Users browsing this forum: No registered users and 8 guests

fashionable