Upload huge files to cloud (keeping integrity)

Share your own howto's etc. Not for support questions!

Upload huge files to cloud (keeping integrity)

Postby bester69 » 2016-01-05 19:10

If we want to upload/backup a large files into Dropbox, MEGA, etc, keeping the data integrity and be able to restoring with success, we will need to apply data parity in the process https://en.wikipedia.org/wiki/Parchive . we could do as following:

UPLOADING/splitting file process
1- First, we do a cheksum of the source big files with the purpose of detecting errors which may have been introduced during its transmission or storage.
md5sum ubuntu-11.10-dvd-i386.iso
md5sum should then print out a single line after calculating the hash:
8044d756b7f00b695ab8dce07dce43e5 ubuntu-11.10-dvd-i386.iso

we keep the hash to compare once we want to restore/download from the cloud the big file, to check if their hashes match each other, so this'd mean the big file has download without any data corruption.

2- we split the huge file into smaller sizes with any tool that do this: (split, 7z, etc)
I'd reccomend to rise slices between 25Mb and 50MB to prevent data corruption, the bigger the slice the bigger the data posibility corruption.

In this case i splited a 4,2GB simage iso file into around 88 pieces of 50Mb size each one.

3- We will use data parity archive to repair data corruption in case of needed. https://en.wikipedia.org/wiki/Parchive
The more numer of files you split a huge file (the bigger the file), the more possiblities to be corrupted the huge file when you try to join them.

3.1 well install par2 command line: https://github.com/Parchive/par2cmdline
apt-get install par2

3.2 We apply par2 to our splited huge file directory to generate the parity files needed to repair our splited file if it was necessary.:
par2 c -R archive.par2 PATHtosplitedfile

it will create a few parity files that will garatize the recovery in case of data corruption.

4. Sync files in cloud including parity files.

DOWNLOADING/merging file process

1. we downloading from our cloud server all the pieces of the huge file.
2. we check or verify the integrity of all the pieces downloaded by using parity checking with par2 command line.
par2 v archive.par2

if the result is ok, we can merge our files.

3. We will repair pieces if the check was with data corruption:
par2 r archive.par2

4. once we repair the pieces, we' will join them to merge the big file.

5. Finally we will aply checksum to compare the hash with the source, to make sure there wasnt any data corruption in all the process. This step shoul be ok as parity repair should keep integrity data restoration.
bester69 wrote:You wont change my mind when I know Im right, Im not an ...
User avatar
Posts: 1736
Joined: 2015-04-02 13:15

Return to Docs, Howtos, Tips & Tricks

Who is online

Users browsing this forum: No registered users and 8 guests