Page 1 of 1

Sort GBs of data - archive

Posted: 2017-12-18 19:10
by jalisco
Hi,

I have tons of images, having saved many "quick and dirty backups" over the years.

In other words, often trying to keep it organized, but over time (a decade), I just save the old hard drive, and replace it with a new one.
During this process, I have many "Photos" folders, from different systems, Apple and Debian (Shotwell) predominantly, in multiple directories.

What I would like to do is consolidate all the images, erring on the side of caution -- I don't want to lose any images, if possible.

For example, I have say
Directory One -> Photos with 75GB
Directory Two -> Photos with 50 GB
Directory Three -> Photos 20GB

so on and so forth.

Programs like iPhoto/Photos use databases and have all kinds of extra crap (like "faces" and thumbnails) that I am not necessarily interested in.

At this point, I just want to organize all my images, without having to manually go through 100,000+ of many redundant images.

I use Debian predominantly and is my main system for some time. There may be 2-3 instances of iPhoto images, based on my partners usage of Mac, and really old images, of when I used Apple products.

My plan is:

Well, now that I start to formulate it, I realize, I have no plan =/

I know I can use rsync to synchronize the directories.

But, ideally, I really want all the files in one big fat directory. I don't really care about the funny "by year, or some other funny camera directory/sub-directory style". I just want all the image files.

So, then, I guess I should do some "bashing", use BASH to get all the images from the various subdirectories, recursively, and move them into one big folder.

Both of these possibilities leave me lacking complete comfort, because I am not sure how they handle duplicates.

Thinking out loud, I guess I would want to:

1. create a BASH script, using the mv command with the backup option --> to recursively get all the image files from their various folder structures and subdirectories, into one massive directory.
2. then use fdupes to remove the duplicates.


Is there a heavy duty application that can take all the image in, and manage them, that anyone know of, to make this process easier??

Programs like Photos even Shotwell, simply can't handle this volume of images very efficiently.

Re: Sort GBs of data - archive

Posted: 2017-12-19 01:10
by acewiza
"Organizing" is always a challenge. Here's a little snippet I use to find dupes:

Code: Select all

find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 15 > dupes.txt
Other than that, I think you are pretty much on your own.

Re: Sort GBs of data - archive

Posted: 2017-12-19 08:51
by jalisco
acewiza wrote:"Organizing" is always a challenge. Here's a little snippet I use to find dupes:

Code: Select all

find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 15 > dupes.txt
Other than that, I think you are pretty much on your own.

Thanks. That's kind of what I figured. Unfortunately, at some point, I passed the "consumer level" problem, into
"professional level" problem =)

Re: Sort GBs of data - archive

Posted: 2017-12-19 11:09
by bw123
The 'one massive directory" doesn't sound fun to me, unless the file names are all consistent that would be a mess wouldn't it?

I don't have nearly that big a problem, but I like arranging things by year, and once the dupes are gone, I use a desktop search app to find things.