测序数据几十 GB、上百 GB,从分析服务器同步到本地或备份到另一台机器——scp 不支持断点续传,断了重来;cp 不跨机器。rsync 专为大数据同步设计:增量传输只传变了的部分,断了可以续传,还能排除不要的文件、限速。这一讲把 rsync 的增量同步、断点续传、排除规则、限速、--dry-run 预演一次理清,配测试数据跑通。
概念速览
rsync 用文件修改时间+大小(或 --checksum 校验和)判断哪些文件需要传输,只传差异部分。-a 归档模式保留权限、时间戳、软链接;--dry-run 是安全第一步:预演不执行,确认命令对再去掉它。
核心命令速查
下面按功能分组,复制即可用;有副作用的命令(如 --delete、conda install)放最后一章:
#!/bin/bash# rsync 速查 -- 大数据增量同步SRC=/tmp/l13_srcDST=/tmp/l13_dstmkdir -p $SRC/subdir $DST# 生成测试数据for f in sample1.fastq.gz sample2.fastq.gz ref.fa; doddif=/dev/urandom bs=1K count=2 2>/dev/null | gzip > $SRC/$fdoneecho"metadata" > $SRC/subdir/meta.txtecho"skip_me" > $SRC/tmp_skip.log# ---- 基础同步 ----rsync -avz $SRC/ $DST/ # -a 保属性 -v 详细 -z 压缩传输rsync -avz --dry-run $SRC/ $DST/ # --dry-run 预演不执行rsync -av --progress $SRC/ $DST/ # 显示传输进度# ---- 增量同步(只传变化的) ----echo"new_data" >> $SRC/sample1.fastq.gzrsync -avz --checksum $SRC/ $DST/ # --checksum 强制按内容比较rsync -avz --update $SRC/ $DST/ # --update 跳过目标更新的# ---- 排除规则 ----rsync -avz --exclude="*.log"$SRC/ $DST/rsync -avz --exclude="tmp_*" --exclude="*.bak"$SRC/ $DST/rsync -avz --exclude-from=<(echo"*.log") $SRC/ $DST/# ---- 删除目标多余文件 ----rsync -avz --delete $SRC/ $DST/ # 目标有源没有的删掉# ---- 断点续传 ----rsync -avz --partial $SRC/ $DST/ # 保留未完成的临时文件rsync -avz --partial --progress $SRC/ $DST/# ---- 限速 ----rsync -avz --bwlimit=5000 $SRC/ $DST/ # 限速 5 MB/s(KB/s 为单位)# ---- 远程同步(需 SSH)----# rsync -avz -e ssh $SRC/ user@server:/path/to/dst/# rsync -avz -e "ssh -p 2222" $SRC/ user@server:/path/# rsync -avz user@server:/remote/ $DST/ # 从远程拉到本地# ---- 只同步某类文件 ----rsync -avz --include="*.gz" --exclude="*"$SRC/ $DST/# ---- 统计传输信息 ----rsync -avz --stats $SRC/ $DST/ | tail -10rm -rf $SRC$DSTecho"rsync 演示完成"
示例的输出日志
用测试数据实跑一遍,输出如下(路径已脱敏):
sending incremental file listref.fasample1.fastq.gzsample2.fastq.gztmp_skip.logsubdir/subdir/meta.txtsent 6,658 bytes received 119 bytes 13,554.00 bytes/sectotal size is 6,230 speedup is 0.92sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,230 speedup is 28.45 (DRY RUN)sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,230 speedup is 28.45sending incremental file listsample1.fastq.gzsent 2,417 bytes received 36 bytes 4,906.00 bytes/sectotal size is 6,239 speedup is 2.54sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,239 speedup is 28.49sending incremental file listsent 183 bytes received 13 bytes 392.00 bytes/sectotal size is 6,231 speedup is 31.79sending incremental file listsent 183 bytes received 13 bytes 392.00 bytes/sectotal size is 6,231 speedup is 31.79sending incremental file listsent 183 bytes received 13 bytes 392.00 bytes/sectotal size is 6,231 speedup is 31.79sending incremental file listsent 206 bytes received 20 bytes 452.00 bytes/sectotal size is 6,239 speedup is 27.61sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,239 speedup is 28.49sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,239 speedup is 28.49sending incremental file listsent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,239 speedup is 28.49sending incremental file listsent 102 bytes received 12 bytes 228.00 bytes/sectotal size is 4,151 speedup is 36.41Literal data: 0 bytesMatched data: 0 bytesFile list size: 0File list generation time: 0.001 secondsFile list transfer time: 0.000 secondsTotal bytes sent: 206Total bytes received: 13sent 206 bytes received 13 bytes 438.00 bytes/sectotal size is 6,239 speedup is 28.49rsync 演示完成OK
以上为真实终端输出,可直接对照验证命令效果。
避坑指南
- 末尾
/ 很重要 → rsync src/ dst/(带斜杠同步内容)≠ rsync src dst/(同步目录本身) --delete 删错文件 → 先 --dry-run 确认,只在目标多余才删- SSH 端口非 22 →
-e 'ssh -p PORT' 指定端口 - 断点续传失败 → 确认两边 rsync 版本一致,
rsync --version
📦 完整代码 + 测试数据下载
百度网盘链接:https://pan.baidu.com/s/1_-hzd95Xn_CcwxYFLgiHFw?pwd=l13c
提取码:l13c(代码已实测可直接运行,建议保存到自己网盘)