Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
File Size comparing enough to ensure copy ok or we need MD5?
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
File Size comparing enough to ensure copy ok or we need MD5?
I'm copying large amounts of data from cel phone into my PC and have experienced data corruption before. I'd like to make sure that the copied files will open correctly in the future/experience 0 corruption.
Is it necessary to generate and check md5 checksums for every file or does just checking the file size work enough for this?
For example:
I copy everything from the android source folder into the destination pc folder.
How do I make sure the copied files will always open fine with no issues in the future?
I figured generating an md5 checksum with md5deep would be correct however it seems a little overkill and I'd like to know if its really worth it. Shouldn't just simply checking the File Size do the trick if the files open correctly from the source? Md5 or just exact file byte sizes?
Is it necessary to generate and check md5 checksums for every file or does just checking the file size work enough for this?
For example:
I copy everything from the android source folder into the destination pc folder.
How do I make sure the copied files will always open fine with no issues in the future?
I figured generating an md5 checksum with md5deep would be correct however it seems a little overkill and I'd like to know if its really worth it. Shouldn't just simply checking the File Size do the trick if the files open correctly from the source? Md5 or just exact file byte sizes?
-
- Posts: 677
- Joined: 2018-05-10 19:34
- Location: Some where out west
- Been thanked: 1 time
Re: File Size comparing enough to ensure copy ok or we need
How do you copy ? Since you do not bother to tell us what method or command you use, maybe you all ready are using this :
If not , then perhaps some other variation of the 'rsync' command, see 'man rsync' . There are also many other ways to copy / transfer large amounts of data / files. You also might look at the 'diff' command , it could help you.'man diff' Copy/pasted into a search engine will give you results that go into detail, I just "scratched" the surface here, there are many ways to transfer files/data from a device to another device on linux, I generally use the above 'rsync' command , but at times another command is needed.
One of many detailed instructions : https://www.networkworld.com/article/31 ... -unix.html
Code: Select all
rsync -n -c original-dir/ copied-dir/
If not , then perhaps some other variation of the 'rsync' command, see 'man rsync' . There are also many other ways to copy / transfer large amounts of data / files. You also might look at the 'diff' command , it could help you.'man diff'
They key words:DESCRIPTION
The diff utility compares the contents of file1 and file2 and writes to
the standard output the list of changes necessary to convert one file
into the other. No output is produced if the files are identical
Code: Select all
on Linux, File Size comparing enough to ensure copy ok
One of many detailed instructions : https://www.networkworld.com/article/31 ... -unix.html
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
-
- Global Moderator
- Posts: 2713
- Joined: 2018-06-20 15:16
- Location: Colorado
- Has thanked: 41 times
- Been thanked: 201 times
Re: File Size comparing enough to ensure copy ok or we need
I generally never check a particular file for corruption but I do check hardware and methods. Once a corruption happens, something will be different by the end of the day! So, get the methodology down, test it, go with it...
Is there a reason why some DE and its GUI file manager doesn't work right?
Is there a reason why some DE and its GUI file manager doesn't work right?
-
- Posts: 677
- Joined: 2018-05-10 19:34
- Location: Some where out west
- Been thanked: 1 time
Re: File Size comparing enough to ensure copy ok or we need
Ahh, yes, that would be relevant, which DE and file manager ? I all ways forget , some people do not use the CLI ,...
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
Its full folder copies (and individual files as well) and figured that the three best options at the moment for simple folder (and individual file) lossless copying are:cuckooflew wrote:How do you copy ?
a) md5deep hashing
Seems overkill.
b) rsync -r
Code: Select all
rsync -r /home/USER/compare1/ /home/USER/Pictures/compare1copy/
c) exact byte counted regular GTK Copy (XFCE Thunar Ctrl+v or right click copy pasting)
Figured regular XFCE Thunar copy paste and check exact byte count afterwards should technically be enough since data issues at transfer should result in a Thunar warning. HOWEVER I'VE EXPERIENCED CORRUPTION AFTER REGULAR BYTE CHECKING copy pasting like this using external hard drives so I can't really tell if it was the Seagate External Hard Drive or the PC.
Does regular GTK Copy and exact byte check afterwards ensure a lossless copy or should we still be wary about files not opening after an exact byte count check with regular right click copy paste?
Last edited by debian121212 on 2020-07-14 20:36, edited 1 time in total.
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
Well to be more specific the exact issue is that I store my data on Seagate External Hard Drives and even though the byte count was right, after some time the files won't open due to data corruption errors.CwF wrote:Is there a reason why some DE and its GUI file manager doesn't work right?
This is not something I want to continue having to deal with and are therefore looking for ways to prevent this in the future. Lost priceless data like this and any insight is more than appreciated.
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
So its therefore an issue of preventing backup data corruption then.
Files open normally at first and then after some time of having been stored on every external hard drive I've owned, they just stop opening due to data corruption warnings
So I figure this will have to do with the corruption at the External Hard Drive since the byte count was exact at the initial copy paste then and the 3 methodologies should work regardless as its an external hard drive issue.
Keeps happening no matter with what external hard drives and seems to be that perfectly kept CD R's or mutliple copies of the same hard drive is the only way to go to ensure stuff opens normally after a while?
Having to keep pumping money and effort into new supposed-to-work-seagate external hard drives because they all just suck after a while is so annoying!!!! Haven't been able to ever trust online storage.
Any non basic recommendations on the best way to ensure stuff just opens normally after a while?
Files open normally at first and then after some time of having been stored on every external hard drive I've owned, they just stop opening due to data corruption warnings
So I figure this will have to do with the corruption at the External Hard Drive since the byte count was exact at the initial copy paste then and the 3 methodologies should work regardless as its an external hard drive issue.
Keeps happening no matter with what external hard drives and seems to be that perfectly kept CD R's or mutliple copies of the same hard drive is the only way to go to ensure stuff opens normally after a while?
Having to keep pumping money and effort into new supposed-to-work-seagate external hard drives because they all just suck after a while is so annoying!!!! Haven't been able to ever trust online storage.
Any non basic recommendations on the best way to ensure stuff just opens normally after a while?
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
Looks like ill stick to happy medium rsync (still dont know if its exactly necessary though) and it was just an issue about better cold storage all along.
So 100 dollar basic consumer cold storage isn't doing it.
Paying more for better External Hard Drives seems to be the real way out of this data corruption issue.
Any insight more than appreciated.
So 100 dollar basic consumer cold storage isn't doing it.
Paying more for better External Hard Drives seems to be the real way out of this data corruption issue.
Any insight more than appreciated.
-
- Posts: 677
- Joined: 2018-05-10 19:34
- Location: Some where out west
- Been thanked: 1 time
Re: File Size comparing enough to ensure copy ok or we need
So, you are saying at first, when you first transfer them to the drive, they do open ok and are good ?
But after a long period of time they become corrupted.
Mostly I use western digital, I am not sure if I have any Seagate, but I can't say I have ever had anything like this happen. I have heard of this happening with CD/DVD, but there again, I have not experienced it, and I have some that are 15 years old, still good. How long is a "long period" ?EG:months, years, etc...
The only thing I can think of would be where they are stored, for example if they were close to electric motors, generators, any magnetic fields, that might cause some damage ?
But after a long period of time they become corrupted.
Mostly I use western digital, I am not sure if I have any Seagate, but I can't say I have ever had anything like this happen. I have heard of this happening with CD/DVD, but there again, I have not experienced it, and I have some that are 15 years old, still good. How long is a "long period" ?EG:months, years, etc...
The only thing I can think of would be where they are stored, for example if they were close to electric motors, generators, any magnetic fields, that might cause some damage ?
Last edited by cuckooflew on 2020-07-14 21:51, edited 2 times in total.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
-
- Posts: 677
- Joined: 2018-05-10 19:34
- Location: Some where out west
- Been thanked: 1 time
Re: File Size comparing enough to ensure copy ok or we need
Maybe read this: https://www.securedatarecovery.com/serv ... corruption
============ edited ==========
another says basically the same :
How ever that site is really a add for a data recovery company, and it seems to be referring to HD's, but really, USB portable hd drives are basically the same, mine are anyway, they do have disks, etc.Serious data corruption is more likely with larger files than with smaller files, since larger files take up more physical space on a hard drive's platters. If a hard drive has tracking issues or read/write head problems, corruption may affect several files or folders simultaneously. The physical hard disk issues that contribute to corruption are often caused by poor operating conditions, but all hard drives eventually fail due to mechanical stress and wear.
============ edited ==========
another says basically the same :
Every big brand has its issues after a long term use, particularly with frequently improper use, such as incompatible bundled software with a newer operating system, a connection on multiple computers, unsafe ejection, physical vibration, etc. As a consequence, the Seagate external hard drive is not working anymore.
Last edited by cuckooflew on 2020-07-14 21:51, edited 1 time in total.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
-
- Posts: 677
- Joined: 2018-05-10 19:34
- Location: Some where out west
- Been thanked: 1 time
Re: File Size comparing enough to ensure copy ok or we need
I found this one to be interesting, https://superuser.com/questions/284427/ ... s-its-dataAny non basic recommendations on the best way to ensure stuff just opens normally after a while?
To periodically refresh the data on the drive, simply transfer it to another location, and re-writing it back to the drive. That way, the magnetic domains in the physical disk surface will be renewed with their original strength (because you just re-wrote the files back to the disk). If you're concerned about filesystem corruption, you can also format the disk before transferring the data back.
Please Read What we expect you have already Done
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
Search Engines know a lot, and
"If God had wanted computers to work all the time, He wouldn't have invented RESET buttons"
and
Just say NO to help vampires!
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
Exactly.cuckooflew wrote:So, you are saying at first, when you first transfer them to the drive, they do open ok and are good ?
But after a long period of time they become corrupted.
Thx for pointing out however i don't think this is the case since its just a regular room with an ac on one corner and a pc in another. I don't place them on top of the PC case ever for this reason.cuckooflew wrote: The only thing I can think of would be where they are stored, for example if they were close to electric motors, generators, any magnetic fields, that might cause some damage ?
Last edited by debian121212 on 2020-07-14 22:58, edited 1 time in total.
-
- Global Moderator
- Posts: 2713
- Joined: 2018-06-20 15:16
- Location: Colorado
- Has thanked: 41 times
- Been thanked: 201 times
Re: File Size comparing enough to ensure copy ok or we need
I do put large check files in the mix if concerned with the particular media. I use video files with the sha1 sum as the file name, and a thunar custom action gives a zenity dialog for a quick compare, ie filename=sha1. I keep 1G,4G,10G and 64G files handy.
Other than that, yes, cycle often.
Other than that, yes, cycle often.
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
This looks like the exact case. They ALL EVENTUALLY FAIL WITH TIME; sometimes as short as 5 years as is the case with my "cheap consumer 100 USD" Seagate USB External HD,cuckooflew wrote:Maybe read this: https://www.securedatarecovery.com/serv ... corruptionHow ever that site is really a add for a data recovery company, and it seems to be referring to HD's, but really, USB portable hd drives are basically the same, mine are anyway, they do have disks, etc.Serious data corruption is more likely with larger files than with smaller files, since larger files take up more physical space on a hard drive's platters. If a hard drive has tracking issues or read/write head problems, corruption may affect several files or folders simultaneously. The physical hard disk issues that contribute to corruption are often caused by poor operating conditions, but all hard drives eventually fail due to mechanical stress and wear.
============ edited ==========
another says basically the same :Every big brand has its issues after a long term use, particularly with frequently improper use, such as incompatible bundled software with a newer operating system, a connection on multiple computers, unsafe ejection, physical vibration, etc. As a consequence, the Seagate external hard drive is not working anymore.
Great to know.To periodically refresh the data on the drive, simply transfer it to another location, and re-writing it back to the drive. That way, the magnetic domains in the physical disk surface will be renewed with their original strength (because you just re-wrote the files back to the disk). If you're concerned about filesystem corruption, you can also format the disk before transferring the data back.
Is it certain that using rsync (thanks to its md5 before and after check with every transfer as per its man page) is *actually necessary* at all just to make sure the file copied correctly and opens normally right after copying? How necessary is it? Really worth it?
Is a regular byte count good enough to make sure files just open correctly immediately after a basic ctrl+c and ctrl+v copy (GTK Copy assuming the source file is ok and not corrupted itself)
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
How'd you copy the files? Can you assure that this has been actually necessary ever right after the initial copy to make sure the media opens correctly right after copying? As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?CwF wrote:I do put large check files in the mix if concerned with the particular media. I use video files with the sha1 sum as the file name, and a thunar custom action gives a zenity dialog for a quick compare, ie filename=sha1. I keep 1G,4G,10G and 64G files handy.
Other than that, yes, cycle often.
After some time a checksum would be able to reveal data has gone bad however this is due to data corruption after a good initial copy. After the file has copied right, we golden. However, is the checksum on the spot right after normal ctrl+c ctrl+v (gtk copy paste) copy pasting really necessary at all after an exact byte count check reveals the byte count to be correct?
Cant decide if rsync is actually worth it or not. For now Im using it bc it seems the safer option until someone can confirm that its unnecessary just to make sure the file opens right just after copying (and its not due to data corruption after a successful regular copy)
-
- Global Moderator
- Posts: 2713
- Joined: 2018-06-20 15:16
- Location: Colorado
- Has thanked: 41 times
- Been thanked: 201 times
Re: File Size comparing enough to ensure copy ok or we need
No, I haven't had such an issue.debian121212 wrote:As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?
With thunar. I segregate data. It doesn't all get the same treatment.debian121212 wrote:How'd you copy the files?
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
How do you address possible future data corruption?CwF wrote:No, I haven't had such an issue.debian121212 wrote:As in, have you ever copied something only to realize the MD5/Sha1 or whatever is off right after a successful copy paste operation is over?With thunar. I segregate data. It doesn't all get the same treatment.debian121212 wrote:How'd you copy the files?
Do you stop data corruption by re copying periodically and changing your cold storage every x amount of years as I am planning to do or do you use net storage? What you use for top tier data?
Do you use above basic consumer level cold storage external hard drives? What would you use for cold storage? By cold storage I mean something like an external hard drive.
- debian121212
- Posts: 80
- Joined: 2019-01-03 01:34
Re: File Size comparing enough to ensure copy ok or we need
If regular byte checking after a ctrl c and ctrl v copy paste is enough, then why does rsync even use md5?
-
- Global Moderator
- Posts: 2713
- Joined: 2018-06-20 15:16
- Location: Colorado
- Has thanked: 41 times
- Been thanked: 201 times
Re: File Size comparing enough to ensure copy ok or we need
I don't pay attention to what I use as much as the pattern of use. First off, storage is stupid cheap. I once forked over the cash for a 6 disc passive backplane scsi array - now, storage is dirt cheap -buy some!
My OS's are imaged, multiple copies. I image to any spinning disk of the moment. I then write the image to a new/recycled disk. That disk goes into use, the old disk a known good does nothing until I retask it. That happens every year or so, all SSD. Data sets are by size, most exist as qcow2 images GB's in size and live on SSD's with copies on the spinner of the moment. Large sets warrant a device of their own are similar to an OS without a image to file step, there exist the current one in use, and the used last one, maybe the one before that. When moving to a new device I'll usually refresh the prior device, yep two steps back. This is the only time I'd want the bit for bit check. When it passes, the current set is deemed good, and moved to the new device. Then the 2 step old device is retasked after some random gestation period.
Small data, ie a handful of spreadsheets and cherrytree files that benefit from todays backup might get backed to the systems usbdrive and may be intentionally trapped in vm snapshots.
I wish higher end 120GB disk were still common, my OS's will never need more...
In all of that, I've never needed to 'restore'. The point is I don't exactly back up stuff, I move it to the new and put the old on the shelf. The last time I pulled data from a 'shelved' device due to current corruption was back in the IDE days.
My OS's are imaged, multiple copies. I image to any spinning disk of the moment. I then write the image to a new/recycled disk. That disk goes into use, the old disk a known good does nothing until I retask it. That happens every year or so, all SSD. Data sets are by size, most exist as qcow2 images GB's in size and live on SSD's with copies on the spinner of the moment. Large sets warrant a device of their own are similar to an OS without a image to file step, there exist the current one in use, and the used last one, maybe the one before that. When moving to a new device I'll usually refresh the prior device, yep two steps back. This is the only time I'd want the bit for bit check. When it passes, the current set is deemed good, and moved to the new device. Then the 2 step old device is retasked after some random gestation period.
Small data, ie a handful of spreadsheets and cherrytree files that benefit from todays backup might get backed to the systems usbdrive and may be intentionally trapped in vm snapshots.
I wish higher end 120GB disk were still common, my OS's will never need more...
In all of that, I've never needed to 'restore'. The point is I don't exactly back up stuff, I move it to the new and put the old on the shelf. The last time I pulled data from a 'shelved' device due to current corruption was back in the IDE days.