Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

smart disk help

If none of the specific sub-forums seem right for your thread, ask here.
Post Reply
Message
Author
lageotakes
Posts: 2
Joined: 2017-12-14 15:27

smart disk help

#1 Post by lageotakes »

I am trying to get smart to monitor my hard disks. I want to make sure it is enabled upon boot up and will do the testing.

I have came across the following but I am not sure if I reboot the server, it will still be enabled.

I typed this in a terminal and get the following message for each hard drive (sda, sdb, and sdc)

root@server:~# smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda
root@server:~# smartctl --smart=on --offlineauto=on --saveauto=on /dev/sdb
root@server:~# smartctl --smart=on --offlineauto=on --saveauto=on /dev/sdc

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-4-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMART Attribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.

Question
1. Does the above command automatically work on reboot or does it need to go into a startup somewhere?

I added/uncommented the following line in /etc/smartd.conf
DEVICESCAN -m myemail@mydomain.com -M exec /usr/share/smartmontools/smartd-runner
(FYI I tested the above with -M test to make sure email works. It works fine)

I also changed the following
/etc/default/smartmontools, uncomment the “start_smartd=yes”

I believe SMART provides three basic categories of testing.
a. Online
b. Offline
c. Self Test (which I think includes short and long)

Item a. and b. are taken care by the first command listed above, but I am not sure if the first command listed above is enabled upon reboot.

2. How can I tell which tests are going to run?

3. How do I scheduled the self tests?

Thanks in advance. Running latest Debian stable.

User avatar
bw123
Posts: 4015
Joined: 2011-05-09 06:02
Has thanked: 1 time
Been thanked: 28 times

Re: smart disk help

#2 Post by bw123 »

The answer to 1 is probably "depends on your disk and bios" on some machines you can turn smart on/off.
The answer to 2 is there won't be a test until you run it, use -t long or -t short to decide which to run.
The answer to 3 is, I'm not sure. I don't do this, but I don't have a server that is up all the time. You could do it based on the date, maybe once a day or week for the short test, once every month or two for long?

You can use smartctl -c to tell what's going on, some disks don't do automatic offline at all, not sure how it gets collected in that case.

man smartctl has a LOT of info, so if my answers are wrong, it's because I haven't read it close enough.
resigned by AI ChatGPT

Segfault
Posts: 993
Joined: 2005-09-24 12:24
Has thanked: 5 times
Been thanked: 17 times

Re: smart disk help

#3 Post by Segfault »

To answer the last question you need to configure and run smartd daemon. You can also set it up to email results to you.

User avatar
bw123
Posts: 4015
Joined: 2011-05-09 06:02
Has thanked: 1 time
Been thanked: 28 times

Re: smart disk help

#4 Post by bw123 »

BTW, I've killed more than a few disks, and never had one failure predicted by smart. It's a cool thing, I use it but I wouldn't count on it.
resigned by AI ChatGPT

Segfault
Posts: 993
Joined: 2005-09-24 12:24
Has thanked: 5 times
Been thanked: 17 times

Re: smart disk help

#5 Post by Segfault »

That's true. Any hard drive can have "the click of death" without any warning. Smartd can warn when bad sectors start appear.

User avatar
bw123
Posts: 4015
Joined: 2011-05-09 06:02
Has thanked: 1 time
Been thanked: 28 times

Re: smart disk help

#6 Post by bw123 »

Segfault wrote:That's true. Any hard drive can have "the click of death" without any warning. Smartd can warn when bad sectors start appear.
What exactly are the ones to look for? crc errors, pending sector count? All my drives have slightly different terminology and smart setups. Makes it confusing. I have an ssd with "life curve status" it's like 298324692586928 and I have no clue what it means. No help from sandisk that I have found. I guess anything that changes often would be bad, except temp.
resigned by AI ChatGPT

Segfault
Posts: 993
Joined: 2005-09-24 12:24
Has thanked: 5 times
Been thanked: 17 times

Re: smart disk help

#7 Post by Segfault »

Yes SSD's may have attributes unknown to smrtctl. For my WD hard drives I run this:

Code: Select all

smartctl --all /dev/sd$1 | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"

lageotakes
Posts: 2
Joined: 2017-12-14 15:27

Re: smart disk help

#8 Post by lageotakes »

Segfault wrote:Yes SSD's may have attributes unknown to smrtctl. For my WD hard drives I run this:

Code: Select all

smartctl --all /dev/sd$1 | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Offline_Uncorrectable" -e "UDMA_CRC_Error_Count" -e "Hardware_ECC_Recovered"

Where do you put this? In a startup script? I basically want this to run but don't know if I nned to put it in a startup script.
smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda

steve_v
df -h | grep > 20TiB
df -h | grep > 20TiB
Posts: 1400
Joined: 2012-10-06 05:31
Location: /dev/chair
Has thanked: 79 times
Been thanked: 175 times

Re: smart disk help

#9 Post by steve_v »

That command is something one would run manually to check those attributes on a disk, I have something similar in an alias to print a quick summary of all 12 drives in my fileserver.
For unattended monitoring, testing and email alerts, you want to configure smartd. /etc/smartd.conf is well commented.

For reference, my (excessive I'm sure) smartd.conf contains the line:

Code: Select all

DEVICESCAN -a -d sat -n standby -o on -S on -s (S/../.././02|L/../../6/03) -W 5,50,60 -C 197+ -U 198+ -I 194 -I 231 -I 9 -I 190 -I 189 -m root -M exec /usr/share/smartmontools/smartd-runner
It's been that way for years,so I'd have to read the manual again to explain what it all means... IIRC the "S/../.././02|L/../../6/03" is scheduling the tests.

IMO, the attributes Segfault mentions are the only ones really worth monitoring, SMART is handy for detecting bad media (and possibly spindle motor / bearing issues if you monitor spin retry too) but it won't give you any warning for sudden electronics failure. IME most drives die of the latter.
Once is happenstance. Twice is coincidence. Three times is enemy action. Four times is Official GNOME Policy.

Post Reply