Sort GBs of data - archive

Here you can discuss every aspect of Debian. Note: not for support requests!

Sort GBs of data - archive

Postby jalisco » 2017-12-18 19:10

Hi,

I have tons of images, having saved many "quick and dirty backups" over the years.

In other words, often trying to keep it organized, but over time (a decade), I just save the old hard drive, and replace it with a new one.
During this process, I have many "Photos" folders, from different systems, Apple and Debian (Shotwell) predominantly, in multiple directories.

What I would like to do is consolidate all the images, erring on the side of caution -- I don't want to lose any images, if possible.

For example, I have say
Directory One -> Photos with 75GB
Directory Two -> Photos with 50 GB
Directory Three -> Photos 20GB

so on and so forth.

Programs like iPhoto/Photos use databases and have all kinds of extra crap (like "faces" and thumbnails) that I am not necessarily interested in.

At this point, I just want to organize all my images, without having to manually go through 100,000+ of many redundant images.

I use Debian predominantly and is my main system for some time. There may be 2-3 instances of iPhoto images, based on my partners usage of Mac, and really old images, of when I used Apple products.

My plan is:

Well, now that I start to formulate it, I realize, I have no plan =/

I know I can use rsync to synchronize the directories.

But, ideally, I really want all the files in one big fat directory. I don't really care about the funny "by year, or some other funny camera directory/sub-directory style". I just want all the image files.

So, then, I guess I should do some "bashing", use BASH to get all the images from the various subdirectories, recursively, and move them into one big folder.

Both of these possibilities leave me lacking complete comfort, because I am not sure how they handle duplicates.

Thinking out loud, I guess I would want to:

1. create a BASH script, using the mv command with the backup option --> to recursively get all the image files from their various folder structures and subdirectories, into one massive directory.
2. then use fdupes to remove the duplicates.


Is there a heavy duty application that can take all the image in, and manage them, that anyone know of, to make this process easier??

Programs like Photos even Shotwell, simply can't handle this volume of images very efficiently.
jalisco
 
Posts: 73
Joined: 2013-09-01 17:30

Re: Sort GBs of data - archive

Postby acewiza » 2017-12-19 01:10

"Organizing" is always a challenge. Here's a little snippet I use to find dupes:
Code: Select all
find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 15 > dupes.txt

Other than that, I think you are pretty much on your own.
Nobody would ever ask questions If everyone possessed encyclopedic knowledge of the man pages.
User avatar
acewiza
 
Posts: 358
Joined: 2013-05-28 12:38
Location: Out West

Re: Sort GBs of data - archive

Postby jalisco » 2017-12-19 08:51

acewiza wrote:"Organizing" is always a challenge. Here's a little snippet I use to find dupes:
Code: Select all
find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 15 > dupes.txt

Other than that, I think you are pretty much on your own.



Thanks. That's kind of what I figured. Unfortunately, at some point, I passed the "consumer level" problem, into
"professional level" problem =)
jalisco
 
Posts: 73
Joined: 2013-09-01 17:30

Re: Sort GBs of data - archive

Postby bw123 » 2017-12-19 11:09

The 'one massive directory" doesn't sound fun to me, unless the file names are all consistent that would be a mess wouldn't it?

I don't have nearly that big a problem, but I like arranging things by year, and once the dupes are gone, I use a desktop search app to find things.
User avatar
bw123
 
Posts: 3186
Joined: 2011-05-09 06:02
Location: TN_USA


Return to General Discussion

Who is online

Users browsing this forum: No registered users and 4 guests

fashionable