Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Software] cps - a useful addition to the cp command?

Off-Topic discussions about science, technology, and non Debian specific topics.
Post Reply
Message
Author
DK00
Posts: 4
Joined: 2024-01-11 10:13
Has thanked: 1 time
Been thanked: 1 time

[Software] cps - a useful addition to the cp command?

#1 Post by DK00 »

Hello!

I have just finished the 1.1 version of my copying/backup program called cps (stands for copy-synchronize). cps compares and synchronizes two directories by copying only the missing files and directories, but also enables you to overwrite the same files of different size or different last modification time, as well as to copy or delete any surplus data. It provides very useful information in it's statistical output after scanning and enables you to see the list and size of the files and directories that will be copied before commencing the copying operation. It also allows you to create a text file with the list of all the files and directories that are about to be copied without actually copying anything. The program recognizes when two directories are on different disks and will read the contents of the two directories simultaneously during the scanning.

I've been testing it by deleting files and directories at various points in the file tree, scanning them with the program and copying, and I think it is safe to say that it is 100% accurate. I have used it to backup my own data for quite some time, but more people will have to test it to confirm that. So if you don't have nothing better to do or you are in need of such a program, you can provide some feedback on bugs and suggestions for other features that I should try to implement. Advice from the native English speakers about any unclear option names or descriptions is also welcome. There is a short tutorial for the program on my github page where you can also download it: https://github.com/DK0352/cps.

I also plan to implement the networking part to enable remote copying, but that will come in the future if I persist working on the program.

I have two questions:

1. The program ignores all special files like sockets, device files, FIFOS. I'm not sure whether it is worth bothering with adding the option to copy these types of files?

2. Does the "surplus data" term makes sense in English or is there a more appropriate term for the data existing in the secondary/destination directory, but not in the main/source directory?

Also, the program doesn't differentiate pathname arguments based on the last slash character like rsync does. Synchronization of directories is always implied.

For fastest copying/bench-marking, use -q or --no-questions option, possibly even -g or --dont-list-data-to-copy.

And finally, I think that I should mention that this is the first "serious" program that I have ever made, so if there are some dumb mistakes, now you know why. This can also be the case since I decided to change many option letters at some point and it could be that I forgot to change some parts of the code in relation to that.

Aki
Global Moderator
Global Moderator
Posts: 2979
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 75 times
Been thanked: 407 times

Re: [Software] cps - a useful addition to the cp command?

#2 Post by Aki »

Moved from "General Questions" to "Off-Topic" sub-forum.
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

User avatar
Hetzer
Posts: 80
Joined: 2024-01-05 22:30
Location: /etc/fstab
Has thanked: 45 times
Been thanked: 21 times

Re: [Software] cps - a useful addition to the cp command?

#3 Post by Hetzer »

Compiled the 1.1 version, it's fast and copies properly (By that I mean, doesn't create corrupted files or corrupts anything). I really appreciate the fact it's written in C + has no dependencies
Haven't tested it entirely yet, though. Simple copying && preserving access/modify times works for sure.
Answerin' to questions:
1. The program ignores all special files like sockets, device files, FIFOS. I'm not sure whether it is worth bothering with adding the option to copy these types of files?
I think it's not worth bothering, since it's designed for directory syncing
2. Does the "surplus data" term makes sense in English or is there a more appropriate term for the data existing in the secondary/destination directory, but not in the main/source directory?
I'd use term "extraneous data" instead, like in rsync
Heave 'er up, and away we'll go...

DK00
Posts: 4
Joined: 2024-01-11 10:13
Has thanked: 1 time
Been thanked: 1 time

Re: [Software] cps - a useful addition to the cp command?

#4 Post by DK00 »

Thank you! Be sure to update to 1.1.2 version because there were some bugs in 1.1.

You are correct about "extraneous data" being used in rsync manual to describe --delete option. But it is hard to give up on it as I have used it for so long. but I will definitely consider it.

DK00
Posts: 4
Joined: 2024-01-11 10:13
Has thanked: 1 time
Been thanked: 1 time

Re: [Software] cps - a useful addition to the cp command?

#5 Post by DK00 »

Some people on other forums requested me to post benchmark comparison with rsync, so here it is:

The tests were done on a Xeon E3-1225v5, 8GB DDR4 with 1TB, 3TB and 6TB sata disks on Debian 11, Xubuntu 23.10 and Fedora 39. Some syncing tests with two disks happened to be on different filesystems. I don't know if that can impact performance in any visible way. I've used a small shell script that takes a text document with the list of files and directories to delete so to delete exacatly the same files and directories each time, and then ran both programs with the time command. I have restarted the OS inbetween each run, and did few runs with each program. The test directories were 596.67GB and 581.18GB in size, and I've used different levels of copy sizes. I've done more tests than what is show here and the only difference is that results fluctuate from 10-30 seconds. I've picked the best results that I received with each program. I will perform the tests with even bigger directories in the future. Also, suggestions for different/better types of benchmarks are welcome as I don't really have experience with this.

commands used:

cps -qgrw directory1 directory2
rsync -rl --ignore-existing --stats directory1/ directory2/

Xubuntu 23.10 (xfs filesystem)

Two directories on the same disk:

directory1: 581.29GB

Size of the data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74

Code: Select all

cps:
real    2m32,243s
user    0m4,093s
sys     0m20,384s

rsync:
real    2m43,732s
user    0m7,533s
sys     0m25,640s

Size of the data to copy: 50.51GB
Number of files to copy: 11175
Number of directories to copy: 589

Code: Select all

cps:
real    14m35,204s
user    0m17,513s
sys     1m24,874s

rsync:
real    14m22,347s
user    0m32,287s
sys     1m51,335s

Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703

Code: Select all

cps:
real    23m30,800s
user    0m31,796s
sys     2m30,802s

rsync:
real    23m16,686s
user    0m56,191s
sys     3m15,328s

Two directories on the different disks (second disk with the xfs filesystem):

Size of data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74

Code: Select all

cps:
real    2m32,243s
user    0m4,093s
sys     0m20,384s

rsync:
real    2m40,743s
user    0m7,577s
sys     0m25,922s

Size of the data to copy: 63.15GB
Number of files to copy: 12570
Number of directories to copy: 592

Code: Select all

cps:
real    8m55,929s
user    0m21,644s
sys     1m26,772s

rsync:
real    9m0,735s
user    0m40,129s
sys     1m52,601s

Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703

Code: Select all

cps:
real    12m10,194s
user    0m31,127s
sys     2m2,778s

rsync:
real    12m26,278s
user    0m58,397s
sys     2m40,277s

Debian 11 (ext4 filesystem)

Two directories on the same disk:

directory1: 591,18GB

Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81

Code: Select all

cps:
real    3m47.284s
user    0m5.292s
sys     0m24.453s

rsync:
real    4m0.605s
user    0m5.271s
sys     0m23.787s

Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598

Code: Select all

cps:
real    20m59.690s
user    0m26.864s
sys     2m2.506s

rsync:
real    22m3.747s
user    0m58.654s
sys     2m48.546s

Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729

Code: Select all

cps:
real    33m2.035s
user    0m43.929s
sys     3m17.635s

rsync:
real    32m7.915s
user    1m32.557s
sys     4m31.768s

Two directories on the different disks (second disk with the xfs filesystem):

Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598

Code: Select all

cps:
real    10m12.624s
user    0m31.260s
sys     1m45.261s

rsync:
real    11m11.143s
user    0m57.825s
sys     2m16.927s

Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729

Code: Select all

cps:
real    15m.762s
user    0m57.414s
sys     3m7.345s

rsync:
real    16m7.254s
user    1m31.635s
sys     3m39.779s

Fedora 39 (btrfs)

Two directories on the same disk:

directory1: 597.18GB

Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81

Code: Select all

cps:
real    3m21,448s
user    0m3,566s
sys     0m14,904s

rsync:
real    3m27,852s
user    0m7,215s
sys     0m17,114s

Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598

Code: Select all

cps:
real    16m4,211s
user    0m17,393s
sys     1m9,931s

rsync:
real    17m32,169s
user    0m35,373s
sys     1m21,892s

Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729

Code: Select all

cps:
real    25m25,975s
user    0m29,050s
sys     1m54,108s

rsync:
real    26m2,359s
user    0m57,765s
sys     2m16,175s
Two directories on the different disks (second disk with the xfs filesystem):

directory1: 597.18GB

Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81

Code: Select all

cps:
real    2m40,859s
user    0m3,700s
sys     0m14,996s

rsync:
real    3m31,256s
user    0m6,989s
sys     0m17,113s

Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598

Code: Select all

cps:
real    13m29,253s
user    0m17,796s
sys     1m12,609s

rsync:
real    13m19,804s
user    0m36,075s
sys     1m24,027s

Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729

Code: Select all

cps:
real    16m27,128s
user    0m29,237s
sys     1m59,896s

rsync:
real    15m54,411s
user    0m57,473s
sys     2m13,916s

User avatar
Hetzer
Posts: 80
Joined: 2024-01-05 22:30
Location: /etc/fstab
Has thanked: 45 times
Been thanked: 21 times

Re: [Software] cps - a useful addition to the cp command?

#6 Post by Hetzer »

Compared it with rsync meself as well - consumes as "much" resources as rsync does (1% processor usage (Ryzen 7 5700G), 2,5MB of RAM used), performance is similiar to that of rsync. Both tested by copying ~50GB worth of data between two disks - both had LUKS-encrypted ext4 filesystem, the first being a 1TB HDD and the second - a 256GB NVMe SSD. Both programs set to preserve modification and access times

I think it's gonna be nice replacement for rsync in local (non-network) backups - Mainly because of it's simplicity, lack of networking overhead and interesting new options

I've noticed a problem with it, though - I'm somewhat bad at describing, so I'll give ye what I did instead:

Code: Select all

pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp
total 4
-rw-r--r-- 1 pl pl 1 Jan 20 22:30 grzyb
pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp2
total 4
-rw-r--r-- 1 pl pl 1 Jan 20 22:30 grzyb
temp is the first directory, temp2 is that second. Both have "same" file but of different contents (one from temp has "e" inside, the second - "a"). The first one was modified later than the second one

Code: Select all

pl@ambassador:~/Desktop/cps-1.1.2/src$ ./cps ~/Desktop/temp/ ~/Desktop/temp2/
Opening: /home/pl/Desktop/temp
/home/pl/Desktop/temp/grzyb
Opening: /home/pl/Desktop/temp2
/home/pl/Desktop/temp2/grzyb

Directories to copy:

directory: temp2
 location: /home/pl/Desktop/temp2
 new location: /home/pl/Desktop/temp2/temp2
 size: 1




SOURCE DIRECTORY

Number of files: 1
Number of directories (excluding the top directory): 0
Size of directory in bytes: 1


DESTINATION DIRECTORY

Number of files: 1
Number of directories (excluding the top directory): 0
Size of directory in bytes: 1


Number of individual files to copy: 1
Size of individual files to copy in bytes: 0
Number of directories to copy: 1
Size of directories to copy in bytes: 1
Files and directories to copy: Number of surplus files: 0
Size of surplus files in bytes: 0
Number of surplus directories: 0
Size of surplus directories in bytes: 0
Same files with different size (main location smaller): 0
Same files with different size (main location larger): 0
Same files with different modification time (main location newer): 0
Same files with different modification time (main location older): 0


Do you want to write the missing files and directories? Type yes or no ...
yes
Directory: /home/pl/Desktop/temp2/temp2
mkdir: Permission denied
read_write_data() 3: /home/pl/Desktop/temp2/temp2/temp2
After that, temp2 gets a non-accessible empty directory of the same name as itself:

Code: Select all

pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp2
total 8
-rw-r--r-- 1 pl pl    1 Jan 20 22:30 grzyb
d--------- 2 pl pl 4096 Jan 20 22:38 temp2
It's a bug or did I miss something?

And also (not bugs):
- cps 1.1.2 declares itself as 1.1.1:

Code: Select all

pl@ambassador:~/Desktop/cps-1.1.2/src$ ./cps

Usage: cps OPTIONS directory1 directory2

       directory1 (the main directory)
       directory2 (the secondary directory (directory that you wish to syncronize with the main directory).

OPTIONS: (long option) or (short option) 
[...]

cps 1.1.1
- Typo (?) in help:

Code: Select all

       directory2 (the secondary directory (directory that you wish to syncronize with the main directory).
Shouldn't it be "synchronize"?

- It seems that default scans directories for different file sizes, comparing by modification time can be toggled by -T. I think it's better to have scan by modification time by default, and make scan by file size available by a option (for example, --scan-by-filesize). Why? File may be newer than the one on second dir, but may not be updated 'cause it's of the same size as the old one
Heave 'er up, and away we'll go...

DK00
Posts: 4
Joined: 2024-01-11 10:13
Has thanked: 1 time
Been thanked: 1 time

Re: [Software] cps - a useful addition to the cp command?

#7 Post by DK00 »

Thank you Hetzer for testing the program and noticing all these typos and this luckily very simple bug! When I released 1.1, I realised that it has some bug related to the files in the top directory, and yet it works fine with 1.0.4. I was under stress and made some quick modifications, but I obviously did not resolve it. Now I think it should be fine.
Hetzer wrote: 2024-01-20 21:53- It seems that default scans directories for different file sizes, comparing by modification time can be toggled by -T. I think it's better to have scan by modification time by default, and make scan by file size available by a option (for example, --scan-by-filesize). Why? File may be newer than the one on second dir, but may not be updated 'cause it's of the same size as the old one
Now it defaults to the search based on the size simply because the time mode was added later. To set the time mode as the default seems more logical now that I have released it to the public and from the point of view of a system administrator. I will definitely consider doing this.
Last edited by DK00 on 2024-01-21 08:59, edited 1 time in total.

Post Reply