Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

File Size comparing enough to ensure copy ok or we need MD5?

If none of the specific sub-forums seem right for your thread, ask here.
Post Reply
Message
Author
User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

File Size comparing enough to ensure copy ok or we need MD5?

#1 Post by debian121212 »

I'm copying large amounts of data from cel phone into my PC and have experienced data corruption before. I'd like to make sure that the copied files will open correctly in the future/experience 0 corruption.

Is it necessary to generate and check md5 checksums for every file or does just checking the file size work enough for this?

For example:
I copy everything from the android source folder into the destination pc folder.
How do I make sure the copied files will always open fine with no issues in the future?

I figured generating an md5 checksum with md5deep would be correct however it seems a little overkill and I'd like to know if its really worth it. Shouldn't just simply checking the File Size do the trick if the files open correctly from the source? Md5 or just exact file byte sizes?

cuckooflew
Posts: 677
Joined: 2018-05-10 19:34
Location: Some where out west
Been thanked: 1 time

Re: File Size comparing enough to ensure copy ok or we need

#2 Post by cuckooflew »

How do you copy ? Since you do not bother to tell us what method or command you use, maybe you all ready are using this :

Code: Select all

rsync -n -c  original-dir/ copied-dir/ 

If not , then perhaps some other variation of the 'rsync' command, see 'man rsync' . There are also many other ways to copy / transfer large amounts of data / files. You also might look at the 'diff' command , it could help you.'man diff'
DESCRIPTION
The diff utility compares the contents of file1 and file2 and writes to
the standard output the list of changes necessary to convert one file
into the other. No output is produced if the files are identical
They key words:

Code: Select all

on Linux, File Size comparing enough to ensure copy ok  
Copy/pasted into a search engine will give you results that go into detail, I just "scratched" the surface here, there are many ways to transfer files/data from a device to another device on linux, I generally use the above 'rsync' command , but at times another command is needed.
One of many detailed instructions : https://www.networkworld.com/article/31 ... -unix.html
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!

CwF
Global Moderator
Global Moderator
Posts: 2684
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 41 times
Been thanked: 196 times

Re: File Size comparing enough to ensure copy ok or we need

#3 Post by CwF »

I generally never check a particular file for corruption but I do check hardware and methods. Once a corruption happens, something will be different by the end of the day! So, get the methodology down, test it, go with it...

Is there a reason why some DE and its GUI file manager doesn't work right?

cuckooflew
Posts: 677
Joined: 2018-05-10 19:34
Location: Some where out west
Been thanked: 1 time

Re: File Size comparing enough to ensure copy ok or we need

#4 Post by cuckooflew »

Ahh, yes, that would be relevant, which DE and file manager ? I all ways forget , some people do not use the CLI ,... :mrgreen:
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#5 Post by debian121212 »

cuckooflew wrote:How do you copy ?
Its full folder copies (and individual files as well) and figured that the three best options at the moment for simple folder (and individual file) lossless copying are:

a) md5deep hashing
Seems overkill.

b) rsync -r

Code: Select all

rsync -r /home/USER/compare1/ /home/USER/Pictures/compare1copy/
Seems rsync reliably does all the MD5 checking automatically so I don't have to do it manually with md5deep.

c) exact byte counted regular GTK Copy (XFCE Thunar Ctrl+v or right click copy pasting)
Figured regular XFCE Thunar copy paste and check exact byte count afterwards should technically be enough since data issues at transfer should result in a Thunar warning. HOWEVER I'VE EXPERIENCED CORRUPTION AFTER REGULAR BYTE CHECKING copy pasting like this using external hard drives so I can't really tell if it was the Seagate External Hard Drive or the PC.

Does regular GTK Copy and exact byte check afterwards ensure a lossless copy or should we still be wary about files not opening after an exact byte count check with regular right click copy paste?
Last edited by debian121212 on 2020-07-14 20:36, edited 1 time in total.

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#6 Post by debian121212 »

CwF wrote:Is there a reason why some DE and its GUI file manager doesn't work right?
Well to be more specific the exact issue is that I store my data on Seagate External Hard Drives and even though the byte count was right, after some time the files won't open due to data corruption errors.

This is not something I want to continue having to deal with and are therefore looking for ways to prevent this in the future. Lost priceless data like this and any insight is more than appreciated.

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#7 Post by debian121212 »

So its therefore an issue of preventing backup data corruption then.

Files open normally at first and then after some time of having been stored on every external hard drive I've owned, they just stop opening due to data corruption warnings

So I figure this will have to do with the corruption at the External Hard Drive since the byte count was exact at the initial copy paste then and the 3 methodologies should work regardless as its an external hard drive issue.

Keeps happening no matter with what external hard drives and seems to be that perfectly kept CD R's or mutliple copies of the same hard drive is the only way to go to ensure stuff opens normally after a while?

Having to keep pumping money and effort into new supposed-to-work-seagate external hard drives because they all just suck after a while is so annoying!!!! Haven't been able to ever trust online storage.

Any non basic recommendations on the best way to ensure stuff just opens normally after a while?

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#8 Post by debian121212 »

Looks like ill stick to happy medium rsync (still dont know if its exactly necessary though) and it was just an issue about better cold storage all along.

So 100 dollar basic consumer cold storage isn't doing it.

Paying more for better External Hard Drives seems to be the real way out of this data corruption issue.

Any insight more than appreciated.

cuckooflew
Posts: 677
Joined: 2018-05-10 19:34
Location: Some where out west
Been thanked: 1 time

Re: File Size comparing enough to ensure copy ok or we need

#9 Post by cuckooflew »

So, you are saying at first, when you first transfer them to the drive, they do open ok and are good ?
But after a long period of time they become corrupted.
Mostly I use western digital, I am not sure if I have any Seagate, but I can't say I have ever had anything like this happen. I have heard of this happening with CD/DVD, but there again, I have not experienced it, and I have some that are 15 years old, still good. How long is a "long period" ?EG:months, years, etc...
The only thing I can think of would be where they are stored, for example if they were close to electric motors, generators, any magnetic fields, that might cause some damage ?
Last edited by cuckooflew on 2020-07-14 21:51, edited 2 times in total.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!

cuckooflew
Posts: 677
Joined: 2018-05-10 19:34
Location: Some where out west
Been thanked: 1 time

Re: File Size comparing enough to ensure copy ok or we need

#10 Post by cuckooflew »

Maybe read this: https://www.securedatarecovery.com/serv ... corruption
Serious data corruption is more likely with larger files than with smaller files, since larger files take up more physical space on a hard drive's platters. If a hard drive has tracking issues or read/write head problems, corruption may affect several files or folders simultaneously. The physical hard disk issues that contribute to corruption are often caused by poor operating conditions, but all hard drives eventually fail due to mechanical stress and wear.
How ever that site is really a add for a data recovery company, and it seems to be referring to HD's, but really, USB portable hd drives are basically the same, mine are anyway, they do have disks, etc.
============ edited ==========
another says basically the same :
Every big brand has its issues after a long term use, particularly with frequently improper use, such as incompatible bundled software with a newer operating system, a connection on multiple computers, unsafe ejection, physical vibration, etc. As a consequence, the Seagate external hard drive is not working anymore.
Last edited by cuckooflew on 2020-07-14 21:51, edited 1 time in total.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!

cuckooflew
Posts: 677
Joined: 2018-05-10 19:34
Location: Some where out west
Been thanked: 1 time

Re: File Size comparing enough to ensure copy ok or we need

#11 Post by cuckooflew »

Any non basic recommendations on the best way to ensure stuff just opens normally after a while?
I found this one to be interesting, https://superuser.com/questions/284427/ ... s-its-data
To periodically refresh the data on the drive, simply transfer it to another location, and re-writing it back to the drive. That way, the magnetic domains in the physical disk surface will be renewed with their original strength (because you just re-wrote the files back to the disk). If you're concerned about filesystem corruption, you can also format the disk before transferring the data back.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#12 Post by debian121212 »

cuckooflew wrote:So, you are saying at first, when you first transfer them to the drive, they do open ok and are good ?
But after a long period of time they become corrupted.
Exactly.
cuckooflew wrote: The only thing I can think of would be where they are stored, for example if they were close to electric motors, generators, any magnetic fields, that might cause some damage ?
Thx for pointing out however i don't think this is the case since its just a regular room with an ac on one corner and a pc in another. I don't place them on top of the PC case ever for this reason.
Last edited by debian121212 on 2020-07-14 22:58, edited 1 time in total.

CwF
Global Moderator
Global Moderator
Posts: 2684
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 41 times
Been thanked: 196 times

Re: File Size comparing enough to ensure copy ok or we need

#13 Post by CwF »

I do put large check files in the mix if concerned with the particular media. I use video files with the sha1 sum as the file name, and a thunar custom action gives a zenity dialog for a quick compare, ie filename=sha1. I keep 1G,4G,10G and 64G files handy.

Other than that, yes, cycle often.

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#14 Post by debian121212 »

cuckooflew wrote:Maybe read this: https://www.securedatarecovery.com/serv ... corruption
Serious data corruption is more likely with larger files than with smaller files, since larger files take up more physical space on a hard drive's platters. If a hard drive has tracking issues or read/write head problems, corruption may affect several files or folders simultaneously. The physical hard disk issues that contribute to corruption are often caused by poor operating conditions, but all hard drives eventually fail due to mechanical stress and wear.
How ever that site is really a add for a data recovery company, and it seems to be referring to HD's, but really, USB portable hd drives are basically the same, mine are anyway, they do have disks, etc.
============ edited ==========
another says basically the same :
Every big brand has its issues after a long term use, particularly with frequently improper use, such as incompatible bundled software with a newer operating system, a connection on multiple computers, unsafe ejection, physical vibration, etc. As a consequence, the Seagate external hard drive is not working anymore.
This looks like the exact case. They ALL EVENTUALLY FAIL WITH TIME; sometimes as short as 5 years as is the case with my "cheap consumer 100 USD" Seagate USB External HD,
To periodically refresh the data on the drive, simply transfer it to another location, and re-writing it back to the drive. That way, the magnetic domains in the physical disk surface will be renewed with their original strength (because you just re-wrote the files back to the disk). If you're concerned about filesystem corruption, you can also format the disk before transferring the data back.
Great to know.

Is it certain that using rsync (thanks to its md5 before and after check with every transfer as per its man page) is *actually necessary* at all just to make sure the file copied correctly and opens normally right after copying? How necessary is it? Really worth it?

Is a regular byte count good enough to make sure files just open correctly immediately after a basic ctrl+c and ctrl+v copy (GTK Copy assuming the source file is ok and not corrupted itself)

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#15 Post by debian121212 »

CwF wrote:I do put large check files in the mix if concerned with the particular media. I use video files with the sha1 sum as the file name, and a thunar custom action gives a zenity dialog for a quick compare, ie filename=sha1. I keep 1G,4G,10G and 64G files handy.

Other than that, yes, cycle often.
How'd you copy the files? Can you assure that this has been actually necessary ever right after the initial copy to make sure the media opens correctly right after copying? As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?

After some time a checksum would be able to reveal data has gone bad however this is due to data corruption after a good initial copy. After the file has copied right, we golden. However, is the checksum on the spot right after normal ctrl+c ctrl+v (gtk copy paste) copy pasting really necessary at all after an exact byte count check reveals the byte count to be correct?

Cant decide if rsync is actually worth it or not. For now Im using it bc it seems the safer option until someone can confirm that its unnecessary just to make sure the file opens right just after copying (and its not due to data corruption after a successful regular copy)

CwF
Global Moderator
Global Moderator
Posts: 2684
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 41 times
Been thanked: 196 times

Re: File Size comparing enough to ensure copy ok or we need

#16 Post by CwF »

debian121212 wrote:As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?
No, I haven't had such an issue.
debian121212 wrote:How'd you copy the files?
With thunar. I segregate data. It doesn't all get the same treatment.

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#17 Post by debian121212 »

CwF wrote:
debian121212 wrote:As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?
No, I haven't had such an issue.
debian121212 wrote:How'd you copy the files?
With thunar. I segregate data. It doesn't all get the same treatment.
How do you address possible future data corruption?

Do you stop data corruption by re copying periodically and changing your cold storage every x amount of years as I am planning to do or do you use net storage? What you use for top tier data?

Do you use above basic consumer level cold storage external hard drives? What would you use for cold storage? By cold storage I mean something like an external hard drive.

User avatar
debian121212
Posts: 80
Joined: 2019-01-03 01:34

Re: File Size comparing enough to ensure copy ok or we need

#18 Post by debian121212 »

If regular byte checking after a ctrl c and ctrl v copy paste is enough, then why does rsync even use md5?

CwF
Global Moderator
Global Moderator
Posts: 2684
Joined: 2018-06-20 15:16
Location: Colorado
Has thanked: 41 times
Been thanked: 196 times

Re: File Size comparing enough to ensure copy ok or we need

#19 Post by CwF »

I don't pay attention to what I use as much as the pattern of use. First off, storage is stupid cheap. I once forked over the cash for a 6 disc passive backplane scsi array - now, storage is dirt cheap -buy some!

My OS's are imaged, multiple copies. I image to any spinning disk of the moment. I then write the image to a new/recycled disk. That disk goes into use, the old disk a known good does nothing until I retask it. That happens every year or so, all SSD. Data sets are by size, most exist as qcow2 images GB's in size and live on SSD's with copies on the spinner of the moment. Large sets warrant a device of their own are similar to an OS without a image to file step, there exist the current one in use, and the used last one, maybe the one before that. When moving to a new device I'll usually refresh the prior device, yep two steps back. This is the only time I'd want the bit for bit check. When it passes, the current set is deemed good, and moved to the new device. Then the 2 step old device is retasked after some random gestation period.
Small data, ie a handful of spreadsheets and cherrytree files that benefit from todays backup might get backed to the systems usbdrive and may be intentionally trapped in vm snapshots.

I wish higher end 120GB disk were still common, my OS's will never need more...

In all of that, I've never needed to 'restore'. The point is I don't exactly back up stuff, I move it to the new and put the old on the shelf. The last time I pulled data from a 'shelved' device due to current corruption was back in the IDE days.

Post Reply