Lab 10 - Version Control and Backups

Overview

In this lab, we’ll be talking about using version control, in particular Git, for keeping track of changes to your files and backing up your files to other machines. We’ll be setting up Gitolite, an open-source Git server that allows you to host Git repositories on other machines, and set access to Git repositories hosted on the Gitolite server.

A Git Primer

Version control tools are designed to keep track of changes to files, and allow you to rewind to some last saved state. Git, as one of the most popular version control tools in use today, supports these features, but also allows for more advanced workflows such as splitting a filebase into “branches” for experimentation. In order to complete this lab, you’ll need to know a few basic git commands and some terminology.

Gitolite

Generating an SSH key

Gitolite identifies its users through their SSH public key. Since you’ll need to interact with Gitolite from your OCF account on tsunami, in order to complete the lab, you’ll first have to generate an SSH keypair on tsunami.

On tsunami.ocf.berkeley.edu, create a directory called .ssh in your home directory if it doesn’t already exist. Within .ssh, run ssh-keygen and hit enter until it finishes generating a public-private keypair. You’ll need to copy the contents of .ssh/id_rsa.pub later when installing Gitolite.

Installing Gitolite

Log into your DigitalOcean VM and install Gitolite using apt. The package name in the apt system is gitolite3. When you are prompted for the administrator’s SSH key, paste in the contents of id_rsa.pub (not id_rsa!) from the key you just generated on tsunami. If you mess up when configuring the admin SSH key or accidentally paste in the contents of id_rsa, you can run sudo apt purge gitolite3 to remove the package and all of its config files, and try installing it again.

After you’ve finished installing Gitolite, verify that it is running by running the command ssh gitolite3@[your VM's address] info from tsunami. It should produce an output similar to:

hello admin, this is gitolite3@test running gitolite3 3.6.6-1 (Debian) on git 2.11.0

 R W	gitolite-admin
 R W	testing

Configuring Gitolite

Gitolite configuration, unusually, is done by cloning the gitolite-admin git repository hosted on the Gitolite server, modifying it, and sending the updated repository back by pushing.

In order to clone the configuration repository to tsunami, run git clone gitolite3@[username].decal.xcf.sh:gitolite-admin from tsunami. If all goes well, a folder called gitolite-admin will appear in your current directory.

Adding a New User

As the administrator of the Gitolite server, you automatically get an account on the Gitolite server (Gitolite accounts are entirely separate from Linux user accounts). For the purposes of this lab’s checkoff, we’ll have you add a new user to the Gitolite server, which will be done by adding a new file in the keydir/ subdirectory that contains the SSH public key of the new Gitolite user.

Navigate to the directory keydir under the gitolite-admin folder. Create a new file named decal-checkoff.pub using the editor of your choice, and paste in this SSH public key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY1qy6OE8M1gZWhyBsjE66ip+3mQpDWhP7LvT2uBloHqY6s/AkuB9Ao3hbLkJuKPghDII4UOH4DGzvcBypTjh5ClxHRMY3mInTJ6OSOuyF1lpw3SYkbIWlVWkyUABqUkt4pgWuiK6gz8egNtOo4XWf2254D+HGuYxHmLq+CInQcdqU9Wy8xmxMaq4RVqDnlmItHW6EZb4b09I87Z5T4mnc148d8p/cZ2oKXWC20EOyg55UVeTVTZw4ETzDXFFL1rOQPiMMjVfnXqoubsUCBjrctryQcxCLL3m3bOEHexAiRYmD5ItR4EiEBxiWaglBMu6OhcVYDHOQ2MEnGFbuhqXH test@test

After the key has been added, you’ll need to add, commit, and push the file to the Gitolite server through Git. Assuming you are in keydir/, run git add decal-checkoff.pub to track the file under Git, git commit with a message describing the changes made in the commit, and finally git push origin to push the changes up to the Gitolite server.

Adding a new Git repository

Now that you’ve added a new Gitolite user, you’ll need to create a new Git repository that they can access. In Gitolite, this is done by modifying the conf/gitolite.conf file within the gitolite-admin repository.

Edit gitolite.conf to create a new repository with name decal-checkoff with RW+ access given to both admin (you) and decal-checkoff. Hopefully, you can figure out what format Gitolite expects by looking at the entires for the admin and testing repositories. The only gotcha is that multiple users have to separated by a space and nothing else.

Once you have done that, follow the same Git workflow as with decal-checkoff.pub to add, commit, and push the changes to gitolite.conf to the Gitolite server.

If you’ve done everything correctly, running ssh gitolite3@[username].decal.xcf.sh info should display something like:

hello admin, this is gitolite3@test running gitolite3 3.6.6-1 (Debian) on git 2.11.0

 R W	decal-checkoff
 R W	gitolite-admin
 R W	testing

However, please make sure that you have added the decal-checkoff user and given it permissions for the decal-checkoff repository, as we will be using the ssh info command to check you off for the lab, and the command only works if the decal-checkoff user has been added properly.

Backups

We’ll be briefly practicing using rsync, a powerful file transfer tool often used in backup scenarios. It is difficult to overstate the degree to which rsync and the rsync delta-transfer algorithm are used in the real world.

On a superficial level, rsync works similarly to scp - it copies files from src to dst, optionally over the network, over SSH. What makes rsync in particular so powerful is its ability to keep two directories synchronized while transferring little beyond the absolute difference in files between the two directories. It does this by calculating the differences between blocks in a file, and only transferring the difference (the ‘delta’). By default, it assumes that a file is unchanged if its last modified time is not newer. For example, if one line in a 1-GB text file is changed, rsync won’t transfer the entire file again, but will send the one line that changed.

The basic syntax of rsync is rsync [options] [[user]@host:]source [[user]@host:]dest - you can either transfer files in the local filesystem, between the local system and a network host, or between network hosts.

rsync has dozens of options, but the most common are the following:

Suppose you keep all your homework in a directory “school”, in your home directory:

user@machine [~/school] $ ls
cs198-8
cs61a
cs262
some_r1a
ds8

and you want to back everything to your student VM. A command like:

$ rsync -rvPaz ~/school user@user.decal.xcf.sh:school-backup/

would result in a copy of your school directory being made inside the school-backup directory on your student VM.

Play around with rsync yourself. We’ve provided some test files here. (hint: use wget). After extracting the archive, (tar xzvf b10.tar.gz), you’ll see two directories with over 100MB of files in them.

Conclusion

Now that you know how to set up a Gitolite server and add users and repositories to it, you can now use your VM (before we delete it at the end of the semester) as a git remote to back up your programming projects, fanfic collections, etc., free from the prying eyes of our Github overlords.

Checkoff

The checkoff form is here. Please double-check that you’ve completed the Gitolite section successfully, as we will be running ssh gitolite3@[username].decal.xcf.sh info to check you off on the Gitolite section. If you’d like to double-check that you added the decal-checkoff user properly, you can add another new user with a different SSH key (maybe from your laptop or an OCF desktop), and run the ssh info command from that machine.