Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230
[Software] cps - a useful addition to the cp command?
[Software] cps - a useful addition to the cp command?
Hello!
I have just finished the 1.1 version of my copying/backup program called cps (stands for copy-synchronize). cps compares and synchronizes two directories by copying only the missing files and directories, but also enables you to overwrite the same files of different size or different last modification time, as well as to copy or delete any surplus data. It provides very useful information in it's statistical output after scanning and enables you to see the list and size of the files and directories that will be copied before commencing the copying operation. It also allows you to create a text file with the list of all the files and directories that are about to be copied without actually copying anything. The program recognizes when two directories are on different disks and will read the contents of the two directories simultaneously during the scanning.
I've been testing it by deleting files and directories at various points in the file tree, scanning them with the program and copying, and I think it is safe to say that it is 100% accurate. I have used it to backup my own data for quite some time, but more people will have to test it to confirm that. So if you don't have nothing better to do or you are in need of such a program, you can provide some feedback on bugs and suggestions for other features that I should try to implement. Advice from the native English speakers about any unclear option names or descriptions is also welcome. There is a short tutorial for the program on my github page where you can also download it: https://github.com/DK0352/cps.
I also plan to implement the networking part to enable remote copying, but that will come in the future if I persist working on the program.
I have two questions:
1. The program ignores all special files like sockets, device files, FIFOS. I'm not sure whether it is worth bothering with adding the option to copy these types of files?
2. Does the "surplus data" term makes sense in English or is there a more appropriate term for the data existing in the secondary/destination directory, but not in the main/source directory?
Also, the program doesn't differentiate pathname arguments based on the last slash character like rsync does. Synchronization of directories is always implied.
For fastest copying/bench-marking, use -q or --no-questions option, possibly even -g or --dont-list-data-to-copy.
And finally, I think that I should mention that this is the first "serious" program that I have ever made, so if there are some dumb mistakes, now you know why. This can also be the case since I decided to change many option letters at some point and it could be that I forgot to change some parts of the code in relation to that.
I have just finished the 1.1 version of my copying/backup program called cps (stands for copy-synchronize). cps compares and synchronizes two directories by copying only the missing files and directories, but also enables you to overwrite the same files of different size or different last modification time, as well as to copy or delete any surplus data. It provides very useful information in it's statistical output after scanning and enables you to see the list and size of the files and directories that will be copied before commencing the copying operation. It also allows you to create a text file with the list of all the files and directories that are about to be copied without actually copying anything. The program recognizes when two directories are on different disks and will read the contents of the two directories simultaneously during the scanning.
I've been testing it by deleting files and directories at various points in the file tree, scanning them with the program and copying, and I think it is safe to say that it is 100% accurate. I have used it to backup my own data for quite some time, but more people will have to test it to confirm that. So if you don't have nothing better to do or you are in need of such a program, you can provide some feedback on bugs and suggestions for other features that I should try to implement. Advice from the native English speakers about any unclear option names or descriptions is also welcome. There is a short tutorial for the program on my github page where you can also download it: https://github.com/DK0352/cps.
I also plan to implement the networking part to enable remote copying, but that will come in the future if I persist working on the program.
I have two questions:
1. The program ignores all special files like sockets, device files, FIFOS. I'm not sure whether it is worth bothering with adding the option to copy these types of files?
2. Does the "surplus data" term makes sense in English or is there a more appropriate term for the data existing in the secondary/destination directory, but not in the main/source directory?
Also, the program doesn't differentiate pathname arguments based on the last slash character like rsync does. Synchronization of directories is always implied.
For fastest copying/bench-marking, use -q or --no-questions option, possibly even -g or --dont-list-data-to-copy.
And finally, I think that I should mention that this is the first "serious" program that I have ever made, so if there are some dumb mistakes, now you know why. This can also be the case since I decided to change many option letters at some point and it could be that I forgot to change some parts of the code in relation to that.
-
- Global Moderator
- Posts: 2979
- Joined: 2014-07-20 18:12
- Location: Europe
- Has thanked: 75 times
- Been thanked: 407 times
Re: [Software] cps - a useful addition to the cp command?
Moved from "General Questions" to "Off-Topic" sub-forum.
- Hetzer
- Posts: 80
- Joined: 2024-01-05 22:30
- Location: /etc/fstab
- Has thanked: 45 times
- Been thanked: 21 times
Re: [Software] cps - a useful addition to the cp command?
Compiled the 1.1 version, it's fast and copies properly (By that I mean, doesn't create corrupted files or corrupts anything). I really appreciate the fact it's written in C + has no dependencies
Haven't tested it entirely yet, though. Simple copying && preserving access/modify times works for sure.
Answerin' to questions:
Haven't tested it entirely yet, though. Simple copying && preserving access/modify times works for sure.
Answerin' to questions:
I think it's not worth bothering, since it's designed for directory syncing1. The program ignores all special files like sockets, device files, FIFOS. I'm not sure whether it is worth bothering with adding the option to copy these types of files?
I'd use term "extraneous data" instead, like in rsync2. Does the "surplus data" term makes sense in English or is there a more appropriate term for the data existing in the secondary/destination directory, but not in the main/source directory?
Heave 'er up, and away we'll go...
Re: [Software] cps - a useful addition to the cp command?
Thank you! Be sure to update to 1.1.2 version because there were some bugs in 1.1.
You are correct about "extraneous data" being used in rsync manual to describe --delete option. But it is hard to give up on it as I have used it for so long. but I will definitely consider it.
You are correct about "extraneous data" being used in rsync manual to describe --delete option. But it is hard to give up on it as I have used it for so long. but I will definitely consider it.
Re: [Software] cps - a useful addition to the cp command?
Some people on other forums requested me to post benchmark comparison with rsync, so here it is:
The tests were done on a Xeon E3-1225v5, 8GB DDR4 with 1TB, 3TB and 6TB sata disks on Debian 11, Xubuntu 23.10 and Fedora 39. Some syncing tests with two disks happened to be on different filesystems. I don't know if that can impact performance in any visible way. I've used a small shell script that takes a text document with the list of files and directories to delete so to delete exacatly the same files and directories each time, and then ran both programs with the time command. I have restarted the OS inbetween each run, and did few runs with each program. The test directories were 596.67GB and 581.18GB in size, and I've used different levels of copy sizes. I've done more tests than what is show here and the only difference is that results fluctuate from 10-30 seconds. I've picked the best results that I received with each program. I will perform the tests with even bigger directories in the future. Also, suggestions for different/better types of benchmarks are welcome as I don't really have experience with this.
commands used:
cps -qgrw directory1 directory2
rsync -rl --ignore-existing --stats directory1/ directory2/
Xubuntu 23.10 (xfs filesystem)
Two directories on the same disk:
directory1: 581.29GB
Size of the data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74
Size of the data to copy: 50.51GB
Number of files to copy: 11175
Number of directories to copy: 589
Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703
Two directories on the different disks (second disk with the xfs filesystem):
Size of data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74
Size of the data to copy: 63.15GB
Number of files to copy: 12570
Number of directories to copy: 592
Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703
Debian 11 (ext4 filesystem)
Two directories on the same disk:
directory1: 591,18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Two directories on the different disks (second disk with the xfs filesystem):
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Fedora 39 (btrfs)
Two directories on the same disk:
directory1: 597.18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Two directories on the different disks (second disk with the xfs filesystem):
directory1: 597.18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
The tests were done on a Xeon E3-1225v5, 8GB DDR4 with 1TB, 3TB and 6TB sata disks on Debian 11, Xubuntu 23.10 and Fedora 39. Some syncing tests with two disks happened to be on different filesystems. I don't know if that can impact performance in any visible way. I've used a small shell script that takes a text document with the list of files and directories to delete so to delete exacatly the same files and directories each time, and then ran both programs with the time command. I have restarted the OS inbetween each run, and did few runs with each program. The test directories were 596.67GB and 581.18GB in size, and I've used different levels of copy sizes. I've done more tests than what is show here and the only difference is that results fluctuate from 10-30 seconds. I've picked the best results that I received with each program. I will perform the tests with even bigger directories in the future. Also, suggestions for different/better types of benchmarks are welcome as I don't really have experience with this.
commands used:
cps -qgrw directory1 directory2
rsync -rl --ignore-existing --stats directory1/ directory2/
Xubuntu 23.10 (xfs filesystem)
Two directories on the same disk:
directory1: 581.29GB
Size of the data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74
Code: Select all
cps:
real 2m32,243s
user 0m4,093s
sys 0m20,384s
rsync:
real 2m43,732s
user 0m7,533s
sys 0m25,640s
Size of the data to copy: 50.51GB
Number of files to copy: 11175
Number of directories to copy: 589
Code: Select all
cps:
real 14m35,204s
user 0m17,513s
sys 1m24,874s
rsync:
real 14m22,347s
user 0m32,287s
sys 1m51,335s
Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703
Code: Select all
cps:
real 23m30,800s
user 0m31,796s
sys 2m30,802s
rsync:
real 23m16,686s
user 0m56,191s
sys 3m15,328s
Two directories on the different disks (second disk with the xfs filesystem):
Size of data to copy: 11.90GB
Number of files to copy: 572
Number of directories to copy: 74
Code: Select all
cps:
real 2m32,243s
user 0m4,093s
sys 0m20,384s
rsync:
real 2m40,743s
user 0m7,577s
sys 0m25,922s
Size of the data to copy: 63.15GB
Number of files to copy: 12570
Number of directories to copy: 592
Code: Select all
cps:
real 8m55,929s
user 0m21,644s
sys 1m26,772s
rsync:
real 9m0,735s
user 0m40,129s
sys 1m52,601s
Size of the data to copy: 91.42GB
Number of files to copy: 13432
Number of directories to copy: 703
Code: Select all
cps:
real 12m10,194s
user 0m31,127s
sys 2m2,778s
rsync:
real 12m26,278s
user 0m58,397s
sys 2m40,277s
Debian 11 (ext4 filesystem)
Two directories on the same disk:
directory1: 591,18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Code: Select all
cps:
real 3m47.284s
user 0m5.292s
sys 0m24.453s
rsync:
real 4m0.605s
user 0m5.271s
sys 0m23.787s
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Code: Select all
cps:
real 20m59.690s
user 0m26.864s
sys 2m2.506s
rsync:
real 22m3.747s
user 0m58.654s
sys 2m48.546s
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Code: Select all
cps:
real 33m2.035s
user 0m43.929s
sys 3m17.635s
rsync:
real 32m7.915s
user 1m32.557s
sys 4m31.768s
Two directories on the different disks (second disk with the xfs filesystem):
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Code: Select all
cps:
real 10m12.624s
user 0m31.260s
sys 1m45.261s
rsync:
real 11m11.143s
user 0m57.825s
sys 2m16.927s
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Code: Select all
cps:
real 15m.762s
user 0m57.414s
sys 3m7.345s
rsync:
real 16m7.254s
user 1m31.635s
sys 3m39.779s
Fedora 39 (btrfs)
Two directories on the same disk:
directory1: 597.18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Code: Select all
cps:
real 3m21,448s
user 0m3,566s
sys 0m14,904s
rsync:
real 3m27,852s
user 0m7,215s
sys 0m17,114s
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Code: Select all
cps:
real 16m4,211s
user 0m17,393s
sys 1m9,931s
rsync:
real 17m32,169s
user 0m35,373s
sys 1m21,892s
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Code: Select all
cps:
real 25m25,975s
user 0m29,050s
sys 1m54,108s
rsync:
real 26m2,359s
user 0m57,765s
sys 2m16,175s
directory1: 597.18GB
Size of the data to copy: 12.53GB
Number of files to copy: 712
Number of directories to copy: 81
Code: Select all
cps:
real 2m40,859s
user 0m3,700s
sys 0m14,996s
rsync:
real 3m31,256s
user 0m6,989s
sys 0m17,113s
Size of the data to copy: 63.51GB
Number of files to copy: 12638
Number of directories to copy: 598
Code: Select all
cps:
real 13m29,253s
user 0m17,796s
sys 1m12,609s
rsync:
real 13m19,804s
user 0m36,075s
sys 1m24,027s
Size of the data to copy: 105.70GB
Number of files to copy: 13637
Number of directories to copy: 729
Code: Select all
cps:
real 16m27,128s
user 0m29,237s
sys 1m59,896s
rsync:
real 15m54,411s
user 0m57,473s
sys 2m13,916s
- Hetzer
- Posts: 80
- Joined: 2024-01-05 22:30
- Location: /etc/fstab
- Has thanked: 45 times
- Been thanked: 21 times
Re: [Software] cps - a useful addition to the cp command?
Compared it with rsync meself as well - consumes as "much" resources as rsync does (1% processor usage (Ryzen 7 5700G), 2,5MB of RAM used), performance is similiar to that of rsync. Both tested by copying ~50GB worth of data between two disks - both had LUKS-encrypted ext4 filesystem, the first being a 1TB HDD and the second - a 256GB NVMe SSD. Both programs set to preserve modification and access times
I think it's gonna be nice replacement for rsync in local (non-network) backups - Mainly because of it's simplicity, lack of networking overhead and interesting new options
I've noticed a problem with it, though - I'm somewhat bad at describing, so I'll give ye what I did instead:
temp is the first directory, temp2 is that second. Both have "same" file but of different contents (one from temp has "e" inside, the second - "a"). The first one was modified later than the second one
After that, temp2 gets a non-accessible empty directory of the same name as itself:
It's a bug or did I miss something?
And also (not bugs):
- cps 1.1.2 declares itself as 1.1.1:
- Typo (?) in help:
Shouldn't it be "synchronize"?
- It seems that default scans directories for different file sizes, comparing by modification time can be toggled by -T. I think it's better to have scan by modification time by default, and make scan by file size available by a option (for example, --scan-by-filesize). Why? File may be newer than the one on second dir, but may not be updated 'cause it's of the same size as the old one
I think it's gonna be nice replacement for rsync in local (non-network) backups - Mainly because of it's simplicity, lack of networking overhead and interesting new options
I've noticed a problem with it, though - I'm somewhat bad at describing, so I'll give ye what I did instead:
Code: Select all
pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp
total 4
-rw-r--r-- 1 pl pl 1 Jan 20 22:30 grzyb
pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp2
total 4
-rw-r--r-- 1 pl pl 1 Jan 20 22:30 grzyb
Code: Select all
pl@ambassador:~/Desktop/cps-1.1.2/src$ ./cps ~/Desktop/temp/ ~/Desktop/temp2/
Opening: /home/pl/Desktop/temp
/home/pl/Desktop/temp/grzyb
Opening: /home/pl/Desktop/temp2
/home/pl/Desktop/temp2/grzyb
Directories to copy:
directory: temp2
location: /home/pl/Desktop/temp2
new location: /home/pl/Desktop/temp2/temp2
size: 1
SOURCE DIRECTORY
Number of files: 1
Number of directories (excluding the top directory): 0
Size of directory in bytes: 1
DESTINATION DIRECTORY
Number of files: 1
Number of directories (excluding the top directory): 0
Size of directory in bytes: 1
Number of individual files to copy: 1
Size of individual files to copy in bytes: 0
Number of directories to copy: 1
Size of directories to copy in bytes: 1
Files and directories to copy: Number of surplus files: 0
Size of surplus files in bytes: 0
Number of surplus directories: 0
Size of surplus directories in bytes: 0
Same files with different size (main location smaller): 0
Same files with different size (main location larger): 0
Same files with different modification time (main location newer): 0
Same files with different modification time (main location older): 0
Do you want to write the missing files and directories? Type yes or no ...
yes
Directory: /home/pl/Desktop/temp2/temp2
mkdir: Permission denied
read_write_data() 3: /home/pl/Desktop/temp2/temp2/temp2
Code: Select all
pl@ambassador:~/Desktop/cps-1.1.2/src$ ls -l ~/Desktop/temp2
total 8
-rw-r--r-- 1 pl pl 1 Jan 20 22:30 grzyb
d--------- 2 pl pl 4096 Jan 20 22:38 temp2
And also (not bugs):
- cps 1.1.2 declares itself as 1.1.1:
Code: Select all
pl@ambassador:~/Desktop/cps-1.1.2/src$ ./cps
Usage: cps OPTIONS directory1 directory2
directory1 (the main directory)
directory2 (the secondary directory (directory that you wish to syncronize with the main directory).
OPTIONS: (long option) or (short option)
[...]
cps 1.1.1
Code: Select all
directory2 (the secondary directory (directory that you wish to syncronize with the main directory).
- It seems that default scans directories for different file sizes, comparing by modification time can be toggled by -T. I think it's better to have scan by modification time by default, and make scan by file size available by a option (for example, --scan-by-filesize). Why? File may be newer than the one on second dir, but may not be updated 'cause it's of the same size as the old one
Heave 'er up, and away we'll go...
Re: [Software] cps - a useful addition to the cp command?
Thank you Hetzer for testing the program and noticing all these typos and this luckily very simple bug! When I released 1.1, I realised that it has some bug related to the files in the top directory, and yet it works fine with 1.0.4. I was under stress and made some quick modifications, but I obviously did not resolve it. Now I think it should be fine.
Now it defaults to the search based on the size simply because the time mode was added later. To set the time mode as the default seems more logical now that I have released it to the public and from the point of view of a system administrator. I will definitely consider doing this.Hetzer wrote: ↑2024-01-20 21:53- It seems that default scans directories for different file sizes, comparing by modification time can be toggled by -T. I think it's better to have scan by modification time by default, and make scan by file size available by a option (for example, --scan-by-filesize). Why? File may be newer than the one on second dir, but may not be updated 'cause it's of the same size as the old one
Last edited by DK00 on 2024-01-21 08:59, edited 1 time in total.