
Getty Images
How to use rsync to streamline centralized backups
The Linux rsync utility is a powerful tool in the hands of a knowledgeable administrator. Learn how an organization might use rsync for backups with this step-by-step tutorial.
Today's decentralized computing puts data near the users who need it, but can make completing basic tasks, such as backups, challenging. Luckily, rsync can help simplify complex backup processes for administrators.
The Linux rsync utility copies and synchronizes data between file servers. It does this by comparing files at the source and identifying differences between them. The utility then transfers only the changed parts of the file to make the two copies match.
While rsync has been around for a very long time, it remains an essential sysadmin tool. To demonstrate, this article presents a specific scenario where an organization might use rsync for backups.
The theoretical organization has multiple branch offices, each with its own file server for local employees. The administrators at headquarters (HQ) want to simplify the backup process by centralizing specific data from these branch office servers to a single, powerful central file server. From there, administrators will use enterprise backup software.
This scenario illustrates the various uses and the commands necessary for synchronizing files for centralized backups using rsync. If your organization struggles to synchronize files for backups, this scenario can help guide you through the necessary steps.
Why rsync?
Several of rsync's features and capabilities make it suited to this data synchronization process, including the following:
- Easy configuration and automation with scripting.
- Speed and optimization by only synchronizing changes.
- Recursive file structure navigation to detect changed files.
- The ability to resume interrupted transfers.
- Strong cross-platform compatibility to span many use cases.
Scenario environment
The technical aspects of this scenario are as follows. Readers can adapt locations, hostnames, and IP addressing to reflect their environment.
HQ:
- Central Linux file server named file-server01 with IP address 192.168.1.10/24
- Sufficient storage capacity and performance to support the expected data.
- Rsync installed.
- Target directory for incoming data, such as /data/branches. Add subdirectories for each branch office file server.
Branch offices:
- Branch Office 2: Local Linux file server named file-server02 with IP address 192.168.2.30/24
- Branch Office 3: Local Linux file server named file-server03 with IP address 192.168.3.22/24
- Branch Office 4: Local Linux file server named file-server04 with IP address 192.168.4.27/24
- Consistent data directories on each server, such as /srv/data.
- Reliable Internet or VPN connectivity, including encryption.
Consider updating your Linux file servers to the same versions of the same distributions. Update all involved applications, especially rsync itself, to the same version. While these steps are not strictly necessary, they help avoid minor differences in software versions.

Note that macOS includes rsync by default. Add rsync to Windows with Windows Subsystem for Linux or Cygwin.
Configure connectivity
Configure DNS name resolution to simplify management, enabling your commands to use the server hostnames instead of IP addresses.
Tunnel these connections using SSH for reliable encryption. Configure SSH key-based authentication for a passwordless option. Here are the general steps:
- Create a dedicated account, such as backupadmin, for the rsync process on each branch file server and the HQ file server.
- Generate keys for the backupadmin account that rsync will use on the HQ file server by using the ssh-keygen command.
- Distribute the public key to each of the branch servers using the ssh-copy-id backupadmin@file-server02 command. Modify the command for the other two file servers.
- Test the configuration to ensure SSH does not prompt the backupadmin account for a password.

To push or to pull, that is the question
You have two strategies when designing your rsync configurations: push or pull.
- Push method. The branch office file servers initiate a connection to the HQ server and "push" the changed files to it.
- Pull method. The HQ file server initiates the connection to the branch office servers and "pulls" the changed files from them.
This scenario uses the pull method to keep the rsync configuration in the hands of the central IT staff at the HQ location. It also simplifies account management and SSH configurations.
Running rsync commands
Rsync relies on a standard Linux syntax: the command is followed by options and then the source and destination targets.
rsync [options] source destination
If you're using a remote system as the source, specify an account with the necessary privileges to access the required data. The destination is a local file path.
rsync [options] account@remotesystem:/path /local/path
To synchronize the files at /srv/data on the remote system file-server02 as backadmin to the local /data/branches/file-server02 directory, type:
rsync [options] backupadmin@file-server02:/srv/data /data/branches/file-server02
Many options exist for modifying rsync's behavior. Here are a few common choices:
- -a: Archive mode to preserve permissions, timestamps and symlinks.
- -v: Verbose mode to improve the usefulness of log file entries.
- -z: Compresses files for more efficient transfers.
- --delete: Removes files deleted at the branches from the storage at HQ.
Use the man cron command to open the cron man page if you need help with the options.
Your fundamental command will look like this:
rsync -avz --delete backupadmin@file-server02:/srv/data /data/branches/file-server02
Repeat the command, modifying it appropriately for file-server03 and file-server04.
One option when configuring rsync is to use its built-in "dry run" feature. By adding the -n option, you can test the command without transferring any data. This feature can be particularly useful for testing.
Configure rsync logging
Rsync can generate log file entries in custom logs. Add the --log-file={path} option to the standard rsync command.
For example, to generate a log file for the remote file-server02 resource, use the following command:
rsync -avz --delete --log-file=/var/log/backups/file-server02.log backupadmin@file-server02:/srv/data /data/branches/file-server02
You can then view the log file entries in the /var/log/backups/file-server02.log file.
Implement rsync
You're now ready to implement your data centralization plan, with a couple of different configuration choices.
Scheduling rsync
One of the simplest choices is to embed the above rsync commands in your Linux system's /etc/crontab configuration file. This file offers extensive time choices for when the system will run rsync.
For example, to cause rsync to connect to the file server at Branch Office 2 nightly at 11 p.m., type:
0 23 * * * rsync -avz --delete --log-file=/var/log/backups/file-server02.log backupadmin@file-server02:/srv/data /data/branches/file-server02

This configuration runs the command at the 0-minute of the 23rd hour, every day, every month, every day of the week. This provides an effective incremental backup approach.
Repeat the process for the two remaining branch office file servers. Consider setting each branch server synchronization task for a different time to avoid overwhelming the HQ file server's storage and network systems.
Creating a script
Instead of scheduling, you could consider a slightly more complex Bash script that offers additional flexibility, such as including multiple file copy targets.
For example, you might want your rsync job to synchronize the /srv/data contents and the log files found at /var/log. Or perhaps you want the same synchronization process to grab user home directories, too.
Scripts also enable you to comment on the various commands or provide other guidance to the rest of the IT team.
Once you're satisfied with your script, schedule it with cron using a similar syntax to what you used above.
Once you initiate your data centralization plan, be sure to establish data retention policies to manage storage consumption and industry compliance. You might also need to pay attention to relevant data sovereignty requirements.
Damon Garn owns Cogspinner Coaction and provides freelance IT writing and editing services. He has written multiple CompTIA study guides, including the Linux+, Cloud Essentials+ and Server+ guides, and contributes extensively to Informa TechTarget, The New Stack and CompTIA Blogs.