Home Blog CV Projects Patterns Notes Book Colophon Search

Rsync Compare

1 Oct, 2021

james@Jamess-MacBook Desktop % mkdir rsync-test
james@Jamess-MacBook Desktop % cd rsync-test 
james@Jamess-MacBook rsync-test % mkdir -p old current diff
james@Jamess-MacBook rsync-test % echo '1' > old/1.txt
james@Jamess-MacBook rsync-test % rsync -aHxv old/ current/ 
building file list ... done
./
1.txt

sent 141 bytes  received 48 bytes  378.00 bytes/sec
total size is 2  speedup is 0.01
james@Jamess-MacBook rsync-test % echo '2' > current/2.txt 
james@Jamess-MacBook rsync-test % echo '3' > old/3.txt     
james@Jamess-MacBook rsync-test % rsync -aHxv --compare-dest=../old/ current/ diff/
building file list ... done
./
2.txt

sent 156 bytes  received 48 bytes  408.00 bytes/sec
total size is 4  speedup is 0.02
james@Jamess-MacBook rsync-test % ls diff 
2.txt
james@Jamess-MacBook rsync-test % ls old 
1.txt	3.txt
james@Jamess-MacBook rsync-test % ls current 
1.txt	2.txt
james@Jamess-MacBook rsync-test % ls diff 
2.txt
james@Jamess-MacBook rsync-test % rm -r diff 
james@Jamess-MacBook rsync-test % rsync -aHxv --compare-dest=../current/ old/ diff/    
building file list ... done
created directory diff
./
3.txt

sent 156 bytes  received 48 bytes  408.00 bytes/sec
total size is 4  speedup is 0.02
james@Jamess-MacBook rsync-test % 

This use of flags leaves lots of empty directories in diff/ so you might expect the prune-empty-dirs flag to help, but it doesn't has explained here: https://lists.samba.org/archive/rsync/2009-January/022488.html.

Instead I run these commands to prune diff/ manually afterwards:

WARNING: Be careful you run these in the right place, otherwise you might be deleting things from the wrong directories.

cd diff
find . -type f -name .DS_Store -delete
find . -type d -empty -delete     

With find, -delete also implies -depth.

If you want to find duplicates from a source directory anywhere in another directory, you can use rmlint:

Use data as master directory. Find only duplicates in backup that are also in data. Do not delete any files in data:

mkdir -p data backup
echo 'one' > data/1.txt
echo 'one' > backup/1.txt
echo 'two' > backup/2.txt
echo 'two' > backup/2b.txt
rmlint backup // data --keep-all-tagged --must-match-tagged -T 'df' -g
./rmlint.sh -d
% tree
.
├── backup
│   ├── 2.txt
│   └── 2b.txt
├── data
│   └── 1.txt
└── rmlint.json

2 directories, 4 files

If you want to do something complicated, like not include all the files in backup for de-duplication, you can do something like this:

mkdir -p data backup backup/photos.photoslibrary backup/photos.photolibrary
echo 'one' > data/1.txt
echo 'one' > backup/1.txt
echo 'two' > backup/2.txt
echo 'two' > backup/2b.txt
echo 'three' > data/3.txt
echo 'three' > backup/photos.photolibrary/3.txt
echo 'three' > backup/photos.photoslibrary/3.txt
find  backup -type f  -not -path  '*.photo*library/*' -print0  | rmlint -0 // data --keep-all-tagged --must-match-tagged -T 'df' -g
./rmlint.sh -d
tree
.
├── backup
│   ├── 2.txt
│   ├── 2b.txt
│   ├── photos.photolibrary
│   │   └── 3.txt
│   └── photos.photoslibrary
│       └── 3.txt
├── data
│   ├── 1.txt
│   └── 3.txt
└── rmlint.json

3 directories, 6 files

Comments

Be the first to comment.

Add Comment





Copyright James Gardner 1996-2020 All Rights Reserved. Admin.