Lab 10 - Version Control and Backups
Overview
In this lab, we’ll be talking about using version control, in particular Git, for keeping track of changes to your files and backing up your files to other machines.
We’ll be setting up Gitolite, an open-source Git server that allows you to host Git repositories on other machines,
and set access to Git repositories hosted on the Gitolite server.
A Git Primer
Version control tools are designed to keep track of changes to files, and allow you to rewind to some last saved state.
Git, as one of the most popular version control tools in use today, supports these features, but also allows for more advanced workflows such as splitting a filebase into “branches” for experimentation.
In order to complete this lab, you’ll need to know a few basic git commands and some terminology.
- A folder can be made into a git repository by running
git init
, telling git that files under the folder can start being tracked by git. When a repository is initialized, git creates a hidden .git
folder within the root folder to track the files’ histories.
- A git repository can also be cloned from online by running
git clone [repository url]
, which will create a copy of the repository
on your machine.
- However, files do not have their changes tracked by git by default. In order to tell git to track changes made to a file, run
git add [file]
.
- git groups a set of changes into a commit. Tracked files in a git repository do not have their contents saved in git until
git commit
is run. You can think of git commit
as creating a snapshot of all modifications made to all tracked files since the last commit.
- In order to back up changes in a repository to another machine, or download changes from another machine to your local machine, git allows you to set remotes. Adding a remote is done by running
git remote add [remote name] [remote url]
.
- In order to back up changes on a local repository to a remote, run
git push [remote name]
. By convention, the primary remote set for a repository is named “origin”. If you clone a repository, its origin is automatically set to the URL you cloned from.
- In order to download changes from a remote to a local repository, run
git pull [remote name]
.
Gitolite
Generating an SSH key
Gitolite identifies its users through their SSH public key. Since you’ll need to interact with Gitolite from your OCF account on tsunami, in
order to complete the lab, you’ll first have to generate an SSH keypair on tsunami.
On tsunami.ocf.berkeley.edu
, create a directory called .ssh
in your home directory if it doesn’t already exist.
Within .ssh
, run ssh-keygen
and hit enter until
it finishes generating a public-private keypair. You’ll need to copy the contents of .ssh/id_rsa.pub
later when installing Gitolite.
Installing Gitolite
Log into your DigitalOcean VM and install Gitolite using apt
. The package name in the apt system is gitolite3
.
When you are prompted for the administrator’s SSH key, paste in the contents of id_rsa.pub (not id_rsa
!) from the key you just
generated on tsunami
.
If you mess up when configuring the admin SSH key or accidentally paste in the contents of id_rsa
, you can run sudo apt purge gitolite3
to remove the package and all of its config files, and try installing it again.
After you’ve finished installing Gitolite, verify that it is running by running the command ssh gitolite3@[your VM's address] info
from
tsunami
. It should produce an output similar to:
hello admin, this is gitolite3@test running gitolite3 3.6.6-1 (Debian) on git 2.11.0
R W gitolite-admin
R W testing
Configuring Gitolite
Gitolite configuration, unusually, is done by cloning the gitolite-admin
git repository hosted on the Gitolite server, modifying it,
and sending the updated repository back by pushing.
In order to clone the configuration repository to tsunami, run git clone gitolite3@[username].decal.xcf.sh:gitolite-admin
from tsunami.
If all goes well, a folder called gitolite-admin
will appear in your current directory.
Adding a New User
As the administrator of the Gitolite server, you automatically get an account on the Gitolite server (Gitolite accounts are entirely separate
from Linux user accounts). For the purposes of this lab’s checkoff, we’ll have you add a new user to the Gitolite server, which will be done
by adding a new file in the keydir/
subdirectory that contains the SSH public key of the new Gitolite user.
Navigate to the directory keydir
under the gitolite-admin
folder. Create a new file named decal-checkoff.pub
using the editor of your choice, and paste in this SSH public key:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY1qy6OE8M1gZWhyBsjE66ip+3mQpDWhP7LvT2uBloHqY6s/AkuB9Ao3hbLkJuKPghDII4UOH4DGzvcBypTjh5ClxHRMY3mInTJ6OSOuyF1lpw3SYkbIWlVWkyUABqUkt4pgWuiK6gz8egNtOo4XWf2254D+HGuYxHmLq+CInQcdqU9Wy8xmxMaq4RVqDnlmItHW6EZb4b09I87Z5T4mnc148d8p/cZ2oKXWC20EOyg55UVeTVTZw4ETzDXFFL1rOQPiMMjVfnXqoubsUCBjrctryQcxCLL3m3bOEHexAiRYmD5ItR4EiEBxiWaglBMu6OhcVYDHOQ2MEnGFbuhqXH test@test
After the key has been added, you’ll need to add, commit, and push the file to the Gitolite server through Git.
Assuming you are in keydir/
, run git add decal-checkoff.pub
to track the file under Git, git commit
with a message describing the changes made in the commit,
and finally git push origin
to push the changes up to the Gitolite server.
Adding a new Git repository
Now that you’ve added a new Gitolite user, you’ll need to create a new Git repository that they can access.
In Gitolite, this is done by modifying the conf/gitolite.conf
file within the gitolite-admin
repository.
Edit gitolite.conf
to create a new repository with name decal-checkoff
with RW+
access given to both admin
(you) and decal-checkoff
.
Hopefully, you can figure out what format Gitolite expects by looking at the entires for the admin
and testing
repositories.
The only gotcha is that multiple users have to separated by a space and nothing else.
Once you have done that, follow the same Git workflow as with decal-checkoff.pub
to add, commit, and push the changes to gitolite.conf
to the Gitolite server.
If you’ve done everything correctly, running ssh gitolite3@[username].decal.xcf.sh info
should display something like:
hello admin, this is gitolite3@test running gitolite3 3.6.6-1 (Debian) on git 2.11.0
R W decal-checkoff
R W gitolite-admin
R W testing
However, please make sure that you have added the decal-checkoff
user and given it permissions for the decal-checkoff
repository,
as we will be using the ssh info
command to check you off for the lab, and the command only works if the decal-checkoff
user has been
added properly.
Backups
We’ll be briefly practicing using rsync
, a powerful file transfer tool often used in backup scenarios. It is difficult to
overstate the degree to which rsync
and the rsync delta-transfer algorithm are used in the real world.
On a superficial level, rsync
works similarly to scp
- it copies files from src
to dst
, optionally over the network,
over SSH. What makes rsync
in particular so powerful is its ability to keep two directories synchronized
while transferring little beyond the absolute difference in files between the two directories. It does this by calculating
the differences between blocks in a file, and only transferring the difference (the ‘delta’). By default, it assumes that a file is unchanged if its last modified time is not newer.
For example, if one line in a 1-GB text file is changed, rsync won’t transfer the entire file again, but will send the one line that changed.
The basic syntax of rsync
is rsync [options] [[user]@host:]source [[user]@host:]dest
- you can either transfer files
in the local filesystem, between the local system and a network host, or between network hosts.
rsync
has dozens of options, but the most common are the following:
-r
--recursive
: recursively transfer all files and subdirectories in the source
-v
--verbose
: verbosely print out file names and transfer speeds
-P
--partial --progress
: keep partially transferred files if the transfer is interrupted, show progress of file transfers
-a
--archive
: combination option, implies -r
, and preservation of most file metadata properties
-u
--append
: append data to the end of files on the destination that are smaller than the corresponding
file on the source (useful for log files, dangerous if used on files that do not only grow by appending data)
-z
--compress
: compress files before transfer (Only useful for rsync
ing over the network)
Suppose you keep all your homework in a directory “school”, in your home directory:
user@machine [~/school] $ ls
cs198-8
cs61a
cs262
some_r1a
ds8
and you want to back everything to your student VM. A command like:
$ rsync -rvPaz ~/school user@user.decal.xcf.sh:school-backup/
would result in a copy of your school
directory being made inside the school-backup
directory on your student VM.
Play around with rsync
yourself. We’ve provided some test files here. (hint: use wget
).
After extracting the archive, (tar xzvf b10.tar.gz
), you’ll see two directories with over 100MB of files in them.
-
What happens if you try rsync
ing dir1
to dir2
using rsync -rac dir1 dir2
? (hint: it should finish almost instantaneously). Why might this be the case?
-
What rsync
commands would you use to keep both directories in sync, from updating the source to dest and dest to source?
-
What happens if you truncate one of the files in the dest directory (e.g. dir2
, by truncate --size=5M file2.txt
) and then try rsync
from source to dest again?
Conclusion
Now that you know how to set up a Gitolite server and add users and repositories to it, you can now use your VM (before we delete it at the end of the semester)
as a git remote to back up your programming projects, fanfic collections, etc., free from the prying eyes of our Github overlords.
Checkoff
The checkoff form is here. Please double-check that you’ve completed the Gitolite section successfully, as we will be running ssh gitolite3@[username].decal.xcf.sh info
to
check you off on the Gitolite section. If you’d like to double-check that you added the decal-checkoff user properly, you can add another new user with a different SSH key (maybe from your laptop or an OCF desktop),
and run the ssh info
command from that machine.