I/O Errors using TAR to LTO-5 tape AFTER CLEANING

Getting your soundcard to work, using Debian on non-i386 hardware, etc

I/O Errors using TAR to LTO-5 tape AFTER CLEANING

Postby w4kh » 2019-11-05 16:22

This question involves some "Old School" activity, but as an old dude, I think I'm entitled to ask... And, this ran fine three times before upgrading to Buster and twice since.
My system configuration:
Code: Select all
Linux BigMutt 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u1 (2019-07-19) x86_64 GNU/Linux
Motherboard: Gigabyte 970A-D3P
CPU: AMD FX-8350 8-Core Processor @4000.000 MHz
cache: 2048 KB
RAM: 32GB (4x8GB)  Unbuffered (Unregistered)
HP EH957SB StorageWorks LTO-5 Ultrium 3000 SAS Internal Tape Drive
QUANTUM LTO 5 TAPE CARTRIDGE (MR-L5MQN-01)
Video: GeForce 8400 GS
Monitor: VIZIO E320VA

I clean (cleaner cartridge) the tape drive once a month, and at this point, I am only starting tar seven times a month. The LTO-5 tape cartridges have only been run through 6-10 times...
Is the error with the cartridge, the LTO-5 (SAS) drive or what? How do I find the exact cause for an i/o error that is simply reported initially as:
Code: Select all
TAPE=/dev/nst0
tar --create --file $TAPE --verbose --totals ./*
./2019-07-21_SDA1.img
./2019-07-21_SDA2.img
Total bytes written: 39025121280 (37GiB, ?/s)
tar: /dev/nst0: cannot write: Input/output error
tar: /dev/nst0: cannot close: Input/output error
tar: Error is not recoverable: exiting now
/bin/mt: /dev/nst0: rmtopen failed: No such file or directory

I looked in syslog for clues and syslog says:
Code: Select all
Jul 27 09:18:17 BigMutt kernel: [39221.766994] st 6:0:3:0: [st0] Block limits 1 - 16777215 bytes.
[39576.563080] st 6:0:3:0: device_block, handle(0x0009)
[39576.563205] st 6:0:3:0: [st0] Error e0000 (driver bt 0x0, host bt 0xe).
[39578.062876] st 6:0:3:0: device_unblock and setting to running, handle(0x0009)
[39578.062963] st 6:0:3:0: [st0] Error 10000 (driver bt 0x0, host bt 0x1).
[39578.062966] st 6:0:3:0: [st0] Error on write filemark.
[39578.064281] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x500110a001622ed0)
[39578.064283] mpt2sas_cm0: enclosure logical id(0x500605b00341cef0), slot(0)
[39582.825144] scsi 6:0:4:0: Sequential-Access HP       Ultrium 5-SCSI   Z6ED PQ: 0 ANSI: 6
[39582.825152] scsi 6:0:4:0: SSP: handle(0x0009), sas_addr(0x500110a001622ed0), phy(3), device_name(0x500110a001622ed2)
[39582.825153] scsi 6:0:4:0: enclosure logical id (0x500605b00341cef0), slot(0)
[39582.827036] scsi 6:0:4:0: TLR Enabled
[39582.829132] st 6:0:4:0: Attached scsi tape st0
[39582.829134] st 6:0:4:0: st0: try direct i/o: yes (alignment 4 B)
[39582.829207] st 6:0:4:0: Attached scsi generic sg2 type 1

I tried looking for "Error Codes" in the support pages, but nothing came up for either "000E0000" or "00010000".
I can put a different LTO-5 cartridge in the drive and try again, but it would be nice not to overwrite what are good backups to tape, and more to the point, I want to find the CAUSE of the error so I can fix it.
Code: Select all
# mt -f /dev/nst0 status
drive type = 114
drive status = 1476395008
sense key error = 0
residue count = 0
file number = 0
block number = 0

While results show an blank tape, I did erase the failed backup on that cartridge, hence the zeros. I need to know how to decode the "drive status = 1476395008"
Also, I ran a second command (and edited out the entries pertaining to "office" and a spreadsheet I had open noting a large memory block of almost 4 GB - but nothing but the backup was open during the failed backup attempt... the system was freshly booted prior to the backup attempt):
Code: Select all
dmesg
st 6:0:3:0: [st0] Block limits 1 - 16777215 bytes.
[39576.563080] st 6:0:3:0: device_block, handle(0x0009)
[39576.563205] st 6:0:3:0: [st0] Error e0000 (driver bt 0x0, host bt 0xe).
[39578.062876] st 6:0:3:0: device_unblock and setting to running, handle(0x0009)
[39578.062963] st 6:0:3:0: [st0] Error 10000 (driver bt 0x0, host bt 0x1).
[39578.062966] st 6:0:3:0: [st0] Error on write filemark.
[39578.064281] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x500110a001622ed0)
[39578.064283] mpt2sas_cm0: enclosure logical id(0x500605b00341cef0), slot(0)
[39582.825144] scsi 6:0:4:0: Sequential-Access HP       Ultrium 5-SCSI   Z6ED PQ: 0 ANSI: 6
[39582.825152] scsi 6:0:4:0: SSP: handle(0x0009), sas_addr(0x500110a001622ed0), phy(3), device_name(0x500110a001622ed2)
[39582.825153] scsi 6:0:4:0: enclosure logical id (0x500605b00341cef0), slot(0)
[39582.827036] scsi 6:0:4:0: TLR Enabled
[39582.829132] st 6:0:4:0: Attached scsi tape st0
[39582.829134] st 6:0:4:0: st0: try direct i/o: yes (alignment 4 B)
[39582.829207] st 6:0:4:0: Attached scsi generic sg2 type 1
[122176.617444] st 6:0:4:0: [st0] Block limits 1 - 16777215 bytes

I can easily replace a no good tape cartridge, but the drive is about 15 or more times the cartridge cost, so I'll have to work on a workaround unless I can identify the problem as the drive, and swing a new drive. And at this point, I don't really know if it is a cartridge (easy fix) or drive issue, so throwing money at this isn't a good solution (for me - I am old and retired)...

My issue seems very simple - to me, that is...
I use "partclone.[vfat|ext4] -c -s /dev/sda1|4 -o $OUTF" or "dd if=/dev/sda2 of=$OUTF conv=sparse,sync,noerror bs=4096" to create an image of disk partitions, and then use "tar -cvf /dev/st0 /backup_images" to write the images to LTO-5 tape.

I am trying to determine if the errors that occur during "tar" are related to the drive, the cartridge, or the cable (some of my searching found instances of a cable fault causing an LTO write to fail)... replacing the cable or the cartridges can be done, but randomly substituting new for older in hopes of having it work violates EVERYTHING I have learned in 60+ years (yes, I am old, and the first computer that I wrote a program for - and was paid - was a hybrid discrete solid-state and vacuum tube machine!) of programming and using computers. I want to know WHAT is causing the error and to WHAT piece of equipment, so I can develop a solution that doesn't involve random pecking in hopes of finding a seed.
4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
MB: Gigabyte 970A-D3P
CPU: AMD FX-8350 @4000.000 MHz cache: 2048 KB
RAM: 32GB (4x8GB) Unbuffered/Unregistered
LTO-5 SAS Tape on LSI SAS9211 controller
Video: GeForce 8400 GS to VIZIO E320VA
User avatar
w4kh
 
Posts: 83
Joined: 2006-09-09 19:10
Location: Tennessee, USA

Re: I/O Errors using TAR to LTO-5 tape AFTER CLEANING

Postby trinidad » 2019-11-06 14:04

I've followed both of your posts on this subject. A couple of things jump out at me. If SSL did not report a clearer explanation of the code then there isn't one. 000E0000 and 000e0000 can express a variety of issues: startpoint, mount point, drive offset errors, inaccessible or spurious RAM sectors, API extension errors, an almost endless list of designations. You must realize that not many people would be running your hardware with Debian 10. Because of your I/O errors RAM permissions, and/or RAM failure seem likely which can also be linked to board performance itself if you were dealing with an all disk setup. However because you are using a tape drive several other things can randomly occur not the least of which is random sector misallignments during copying, thus startpoint/mountpoint errors. My first question is do you have all the OEM Ultrium utilities installed and available to you? Secondly is any firware up to date and unbroken? If you do have the utilities you should use them first to completely check the tape drive regularly, and preferrably right after cooking it a while. You might have better luck with your question on StackExchange. It's difficult for people to address such a question when they have no access to your particular hardware situation. I do wish you good luck, and hope you identify the issue.

TC
You can't believe your eyes if your imagination is out of focus.
trinidad
 
Posts: 76
Joined: 2016-08-04 14:58

Re: I/O Errors using TAR to LTO-5 tape AFTER CLEANING

Postby w4kh » 2019-11-06 19:05

Thank you, TC

I will do some more research, looking for Ultrium utilities and firmware updates...
Then I'll check StackExchange...

I appreciate the leads...

And, as an old timer with loads of mainframe time, I did choose tape as my local backup medium because I can do an overnight backup daily (or on some other schedule that I choose) to a medium that is easy to store off-site or in a fire-proof safe. Tape cartridges cost much less than disks of comparable capacity, and their physical size makes storage much easier.
4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11)
MB: Gigabyte 970A-D3P
CPU: AMD FX-8350 @4000.000 MHz cache: 2048 KB
RAM: 32GB (4x8GB) Unbuffered/Unregistered
LTO-5 SAS Tape on LSI SAS9211 controller
Video: GeForce 8400 GS to VIZIO E320VA
User avatar
w4kh
 
Posts: 83
Joined: 2006-09-09 19:10
Location: Tennessee, USA


Return to Hardware

Who is online

Users browsing this forum: No registered users and 2 guests

fashionable