Restic Cephfs Backup (Automatic)
This guide will cover the basic architecture and operations on the distributed backup system using restic
All the puppet configuration is under the following hostgroup structure:
The code of the different scripts reside in the following git repository:
These are the actual components of the current system and their role:
Stateless nodes and actual workers of the system. This nodes contain a restic agent each, which is always running
and checking for new backup jobs every 5 seconds. When a job is found, the agent will handle the backup copying files
This daemon runs every hour at a random minute in every agent and changes the status of
Completed backups after
24 hours so they become
Pending (check Operating section). This daemon will do the same process for
the prune mechanism, making
Pending all the jobs with no recent prune in the last week.
This is where we store the backups. Each user has its own bucket named like
cboxbackproj-svc_account for the projects). Every restic agent
has the utility
s3cmd installed and configured so we can list the actual buckets:
CAUTION: Not needed to say, but deleting the S3 bucket will delete all backup data and snapshot information. The backup won't fail, instead a new fresh backup will be triggered. So, take care while operating the bucket directly and eventually disable the related backup job
cback backup disable <id>.
The basic configuration of the backup system is done by config files managed by puppet through hiera.
These config files reside in
The available configuration parameters are explained in each data hostgroup (hiera) file.
Command Line Interface (cback)
For operating the system there is a command line tool called
cback. This tool is available in any of the backup, prune
or restore agents. This tool is still in development so always check
cback --help to see the actual commands.
- Check backup status
cback backup status
These are the possible backup status:
EnabledOnly the enabled jobs will be taken into account by the backup agents.
PendingThe backup is ready to be backed up. Any available agent will pick this job whenever is free unless the job is disabled.
RunningThe job is running at that moment. Check
cback backup statusto see which agent is taking care of the job.
FailedThere was a problem with that backup. Check
cback backup status user_name | job_idto check what went bad.
CompletedThe last backup was successful. This is not a permanent state. After the default 24 hours, the status will be changed to
Only the jobs
Pending and prune status different to
Running will be processed by the backup agents. The command
cback backup reset <id> will
set the status to
- Check the status of a particular user or backup id:
cback backup status rvalverd
- List all backups
cback backup ls
- List all backups by status:
cback backup ls [failed|completed|running|pending|disabled]
- Enable / disable a backup job
cback backup enable|disable <backup_id>
NOTE: This command does not stop a running backup. In case that the backup is running, it will go until the end but won't be available for the subsequent backups.
- Reset a backup job (changes the status to
cback backup reset <backup_id>
- Add a new backup job
cback backup add <user_name> <instance> <path> [--bucket-prefix=<prefix>] [--bucket-name=<name>] [--enable]
cback backup add rvalverd cvmfs /cephfs-flax/volumes/_nogroup/234234 --enable
This will add a new backup job and will store the specified path on a bucket called
cephback-rvalverd. The bucket
will be created automatically on the first run of the backup.
NOTE 1: By default, the
<user_name>flag will be used to generate the name of the bucket concatenating it with the bucket prefix (by default
cephback-). If user_name is
rvalverd, the bucket name will be named
NOTE 2: It's possible to add more than one backup per user as long as the path is different
NOTE 3: If the instance does not exist, it will be created automatically. This field is only used for categorizing the jobs, so does not need to match an existing ceph instance and is not used in the actual backup logic.
NOTE 4: All the backup jobs are added as
Disabledby default unless
--enableflag is set, which will add the backup as
Enabled. The flag
--enablewill also set prune as
NOTE 5: If
--bucket-prefixis not specified, the default will be used:
cephback-. This is configurable through Puppet.
NOTE 6: If
--bucket-nameis specified, its value will be used instead of any other combination
NOTE 7: S3 repository will be created automatically by the backup agent on the first run of the backup.
- Delete backup job. A interactive shell will be presented to delete backup metadata and also S3 bucket contents if needed. Use it with care, no recovery is possible. Is not possible to delete backups in running status.
cback backup delete <backup_id>
Restoring a backup
Currently, refer to the restic documentation in order to recover the data.
For operating the repository using restic you need to:
- Source the enviroment configuration:
- Get the url of the backup to operate:
cback backup status <user_name | backup_id>
- Run normal restic commands:
restic -r s3:s3.cern.ch/cephback-rvalverd snaphots|restore|find ...
restic help for all available options.
Scaling the System
- You can run as many process as you wish in any agent spawning a new process like
systemctl start cback-<type_of_agent>@<new_agent_id>
If we have only one agent in
cephrestic-backup-01 we can do the following to have two:
[rvalverd@cephrestic-backup-01]$ systemctl start cback-backup@2
The number of agents to run in each machine is not managed by puppet (currently) so changes are persistent. If an agent crashes won't be restarted by puppet. This will be addressed in further versions of the system.
You need to spawn a new machine in the required hostgroup:
- backup agent:
- prune agent:
- restore agent:
For example, for adding a new backup agent
N (we assume that we have
[rvalverd@aiadm09 ~]$ eval `ai-rc "IT Ceph Storage Service"`
ai-bs --landb-mainuser ceph-admins --landb-responsible ceph-admins --nova-flavor m2.large --cc7 -g ceph/restic/agent/backup --foreman-environment qa cephrestic-backup-N.cern.ch
Adding the node to the load balanced alias:
openstack server set --property landb-alias=cephrestic-backup--load-N- cephrestic-backup-N
Once the installation is done and puppet is done, you need to log-in to the machine and start the daemon (this will be done automatically in a further version of the system):
[rvalverd@cephrestic-backup-N]$ systemctl start cback-backup@1
After that, the agent should start pulling jobs
Using the log system
The log of any agent could be found on
You can grep for the
job_id for convenience, for example:
cat /var/log/cback/cback-backup.log | grep 3452
Operating with the backup repository using upstream Restic
As the system uses upstream version of restic, the backup repository could be managed directly. Restic is installed in all backup agents.
- First, you need to source the configuration:
NOTE: If that file is not available, you can export the contents of /etc/sysconfig/restic_env
And then, you can refer to restic documentation about how to use the tool.
- Here is an example of how to list the available snapshots of one backup:
restic -r s3:s3.cern.ch/cephback-rvalverd snapshots
For convenience or long debugging sessions, you can also seed the repository information as a environmental variable:
This way you don't need to specify the
-r flag every time.
- Here is another example about how to mount the repository as a filesystem (read-only):
restic -r s3:s3.cern.ch/cephback-rvalverd mount /mnt
Data backup with Restic (manual)
Restic Download / Install
Initialize Backup Repository
restic backup <my_share>
NOTE: By default, restic place the cache files on $HOME/.cache, if you want to specify another path for the cache you can use the
There are two options, directly using the
restic restore command or mounting the backup repository and copy the files
- List backup snapshots
- Restore the selected snapshot
restic restore <snapshot_id> --target <target_path>
NOTE: you can use
restic findto look for specific files inside a snapshot.
Using the mount option
- You can browse your backup repository using fuse
restic mount /mnt/<my_repo>
NOTE: You can run
restic snapshotsto see the correlation between the snapshot id and the folder.
Delete a snapshot
- List snapshots
- Forget snapshot
restic forget <snapshot_id>
Interesting flags for
-l, --keep-last n keep the last n snapshots
-H, --keep-hourly n keep the last n hourly snapshots
-d, --keep-daily n keep the last n daily snapshots
-w, --keep-weekly n keep the last n weekly snapshots
-m, --keep-monthly n keep the last n monthly snapshots
-y, --keep-yearly n keep the last n yearly snapshots
--keep-tag taglist keep snapshots with this taglist (can be specified multiple times) (default )
- Clean the repo (this will delete all forgotten snapshots)
restic forget <snapshot_id> --prune
Check the repository for inconsistencies
Crontab job setup
mm hh dom m dow restic backup <my_share>