Lab 8 - Version Control and Backups
Facilitator: Zoey Sun, Luke Peters
25 min readTable of contents
In this lab, we’ll be learning how to use Git and typical best practices for version control!
Pulling the code
If you haven’t already cloned the decal-lab git repository, run:
$ git clone https://github.com/0xcf/decal-labs.git
Go to your decal-labs
directory and cd
into the folder b9
. Run git pull
to pull the latest changes.
Part 1: Git Basics
In the b9
directory, we’ve provided a basic Python program called rand.py
to
help demonstrate some core Git concepts. rand.py
takes command line arguments
and performs a number of coin flips and outputs the result in your terminal. We
want to add a feature to rand.py
that also lets us perform dice rolls. We will
simulate the real-world development process by making a branch for our new dice
rolling feature, making a few commits to that branch, and then merging our
feature branch into the master branch. You won’t need to personally write any
Python for this lab! You will only need to copy-and-paste the code we provide,
since this lab is about learning version control and not Python.
rand.py
demo
Run python3 rand.py -h
in your terminal to see the help dialogue for the
program. At the moment, it takes one positional argument of either coin
or
dice
as well as an optional flag -i
(or --iterations
) if you want to
flip/roll more than once. The dice
argument doesn’t work as we haven’t
implemented it yet, but you can try flipping a coin:
~/decal-labs/b9$ python3 rand.py coin
1 coin flip(s) resulted in 0 Heads and 1 Tails:
T
~/decal-labs/b9$ python3 rand.py coin -i 10
10 coin flip(s) resulted in 3 Heads and 7 Tails:
T, T, H, T, H, H, T, T, T, T
~/decal-labs/b9$ python3 rand.py coin -i 999
Number of flips must be in the range [0 - 100]
Creating a new branch
Let’s start working on the dice rolling feature! When working on a project with version control, it’s best practice to keep the master branch clean (read: mostly bug-free and stable).
-
Any branches you make will be based on/descended from code in the master branch. You don’t want to write new code on top of a foundation that’s broken or buggy!
-
Some organizations may directly deploy the master branch to a production environment (live for your users and clients), so new and work-in-progress features should be developed in their respective branches and finished before merging them into master.
Let’s make a branch for our dice rolling feature:
$ git checkout -b dice
This makes a new local branch called dice
based on the branch that we are
currently on (master
) and switches you to the dice
branch. This command is
basically shorthand for:
$ git branch dice # Create new branch called 'dice'
$ git checkout dice # Switch to branch called 'dice'
You can view the branches you’ve created by typing git branch
. You should see
two branches at this point, one called master
and one called dice
. An
asterisk is placed next to the branch that you’ve currently checked out.
You can make
git branch
(and many other command-line tools) display more information by passing the verbose flag-v
, or make it display even more information by adding additionalv
characters to the flag (e.g.-vv
). For example,git branch -v
will display the commit message of the most recent commit on the branch (aka theHEAD
).git branch -vv
also displays in square brackets the remote branch that the local branch is tracking, if one exists. In this repository,master
is associated with a remote branchorigin/master
that lives on Github’s servers. This means when yougit push
orpull
to/fromorigin master
, you are syncing up your local copy of themaster
branch with the remotemaster
branch on Github’s servers. At the moment, your newly createddice
branch is local only, meaning that it is not tracked. Therefore attempting togit push
orgit pull
will not work (or Git may ask if you want to have a remote branch created).
Making commits
Now that you’ve created and swapped to the dice
branch, you can safely start
adding code without modifying the master
branch. The code for dice rolling
will be split up into three commits so that we can demonstrate how to combine
multiple commits into a single commit later. Open rand.py
in your preferred
text editor. Find the comment in the __main__
function that says COMMIT 1
and add the following code below it like so:
# COMMIT 1: Add -s flag for number of sides on a die
parser.add_argument(
"-s", "--sides",
dest="sides",
type=int,
default=6,
help="Number of sides on a die (max=20; ignored when flipping a coin)"
)
If you’re unsure about copying the code correctly, take a look at
rand_reference.py
for what the finished code should look like!
Save and exit the text editor and return to the terminal. Type git status
to
see that rand.py
has been changed but not staged. Type git add rand.py
to
stage the file. To “stage” a file means to add it to the group of files that you
want to include in your commit.
You may already know
git add .
to add every changed file in your current directory to the staging area, but in some cases you want to only add relevant files if you hopped around implementing different things. A commit should be a group of changes focused on a single small goal, sub-feature, or bug fix. In the event that you need to roll back a commit, you don’t want to cause collateral damage by undoing changes in an unrelated file that you lumped into the commit.
Now, lets commit our changes and write a concise and useful commit message. Type
git commit -m "Add -s flag for number of sides on a die"
. Organizations and
companies will have different best practices for writing commit messages, but
here are some common guidelines:
-
Keep it short. Typical commits should be one line (some exceptions). If a lot of changes have been made, you might want to make multiple commits instead.
-
Make it descriptive. If someone needs to read the commit history, they’re not going to know or remember what was changed by a commit that just says “Fixed bug” or “WIP”.
-
A lot of places like to capitalize the first letter and use the “<present tense verb> <descriptive thing>” sentence structure. For example, “Add”, “Fix”, or “Remove” some thing. You can think of your commit message as completing the sentence “This commit will…”. For example, “This commit will add -s flag for number of sides on a die”.
Next, we will follow the same procedure to make commits 2 and 3.
Find the comment in the function roll_dice
that says COMMIT 2
and add the
following code. Make sure that it’s indented properly inside the function! Then
stage your changes and make a commit with the message "Add dice rolling logic
and output dice sum and sequence"
.
# COMMIT 2: Add dice rolling logic and output dice sum and sequence
diceRecord, diceSum = [], 0
for i in range(iterations):
roll = random.randint(1, sides)
diceRecord.append(roll)
diceSum += roll
print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
.format(iterations, sides, diceSum))
print(*diceRecord, sep=', ')
Find the comment in the function roll_dice
that says COMMIT 3
and add the
following code. Then stage your changes and make a commit with the message
"Restrict input range for dice iterations and sides"
.
# COMMIT 3: Restrict input range for dice iterations and sides
if iterations > MAX_ITERATIONS or iterations < 0:
print("Number of rolls must be in the range [0 - {}]"
.format(MAX_ITERATIONS))
return
if sides > MAX_SIDES or sides < 1:
print("Number of sides must be in the range [1 - {}]"
.format(MAX_SIDES))
return
When you’re done, you should now be able to roll dice:
~/decal-labs/b9$ python3 rand.py dice -i 10 -s 20
10 roll(s) of a 20-sided die resulted in a sum of 119:
15, 3, 12, 10, 16, 8, 18, 20, 13, 4
Viewing your progress
Let’s take a look at the commits you’ve made. Type git log
to see a history of
commits. Each commit has some information such as who authored the commit, a
timestamp for when the commit was created, and the commit message.
-
The first line of each commit entry has a long hexadecimal string. This is the commit hash: think of it as a unique ID that you can use to reference that specific commit.
-
Some commits have branch information in parentheses next to the commit hash, indicating that they are the most recent commit or
HEAD
of that branch. Your most recent commit should have something like(HEAD -> dice)
. The fourth commit should have(origin/master, origin/HEAD)
because we based our branch off ofmaster
and have added three new commits on top of it. Note that if someone adds new commits to the local or remotemaster
, the branch information may change or be out of date.
Type q
to exit git log
.
Besides looking at the commit history, you may want to view the actual changes
in the code. You can use git diff <old commit> <new commit>
to view the
difference between two commits. There are a few different ways you can reference
a commit. One that was mentioned before was copying a commit’s hash (note that
your commit hashes will be different from the example below):
$ git diff 3368313c0afb6e306133d604ca72b0287124e8f2 762053064506810dee895219e5b2c2747a202829
You can also copy a small chunk of the beginning of the commit hash instead of the entire hash. Because of the way hashes work, it’s very unlikely that you’ll have two commits that have the exact same starting sequence.
$ git diff 3368313 7620530
If you’re trying to diff
two commits that are pretty close together in the
log, an easier way is to reference commits by their distance from the HEAD
(most recent) commit using the format HEAD~<number>
. Since we added three
commits new commits in dice
, we can view the difference between dice
and
master
using the following command:
$ git diff HEAD~3 HEAD
Merge conflicts and rebase
Now that you’ve implemented dice rolling on on your feature branch, you’ll want
to merge your feature into the master
branch. This means that Git will take
your changes on the dice
branch and and replay them on the master
branch,
bringing master
up to date with dice
. However, things can (and often will) go wrong when master
has new
commits added to it while you’re working on dice
. Now, our commits on dice
may be based on an old version of master. Even worse, someone else may have
modified the same lines of code on master
that we changed on dice
,
resulting in Git not knowing whose lines to use. This is called a merge
conflict.
Let’s simulate a merge conflict by making a change on master
. Switch to the
master
branch using git checkout master
. Make sure that you’re on the
master
branch by checking the output of git branch
. Now that you’re on the
master
branch, rand.py
shouldn’t contain the code for the new dice feature
we added. Go to the comment that says COMMIT 2
and add the following code
below:
# COMMIT 2: Add dice rolling logic and output dice sum and sequence
diceSum = random.randint(1, iterations * 6)
print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
.format(iterations, sides, diceSum))
Now stage the changes and commit with the message "dice rolling WIP"
. In this
totally realistic scenario, our imaginary maverick teammate has added some buggy
code to master
, making your life harder. Switch back to your dice
branch
with git checkout dice
and prepare to handle this merge conflict.
There are a multiple ways to handle a merge conflict, but the one we will be
showing you in this lab is using git rebase
. Our dice
branch is “based” on
the master
branch at a certain point in time, but the master
branch has
moved forward leaving dice
based on an outdated master
. Thus, we want to
“re-base” dice
on the current state of master
. While on your dice
branch,
run git rebase master
. Git will rewind the commits you’ve made on dice
, copy
any new commits made on master
, and attempt to replay your commits on top.
Sometimes rebase
will run to completion without your intervention, but if
there’s a merge conflict you will need to resolve it.
Git will give you instructions on what to do if it encounters a merge conflict
during a rebase. In this case, open rand.py
and find the conflicted zone which
should have the following format:
<<<<<<< HEAD
Lines of code from the base branch (in this case master)
=======
Lines of code from the branch you're rebasing (in this case dice)
>>>>>>> Commit message of the commit that conflicts with the base branch
To fix the conflict, simply keep the lines you want (your lines from dice
) and
delete the other lines in the conflicted zone (<<<<<<< HEAD
, =======
,
>>>>>>> dice
, and the unwanted code from master), and then save and exit the
file. Git will take what you’ve saved as the exact form of what the file will
look like at the end of the rebase, so what you’re doing is essentially fixing
the file so that the code runs properly. This means that if you have multiple
merge conflicts and you decide to mix keeping some lines from the base branch
and some from your feature branch, you need to make sure the code actually works
correctly.
Now that you’ve fixed the merge conflict, follow the rebase instructions and
stage your fixed file (git add rand.py
), then run git rebase --continue
. If
Git finds any more merge conflicts for other files, you would follow the same
procedure as above. However, we only had one conflicted file so our rebase is
finished! Run git log
to see the result of our rebase. You should now see that
your imaginary teammate’s "dice rolling WIP"
commit in your branch’s history,
with your commits on top of theirs.
Wait! We haven’t actually merged our commits into the master
branch. If
you want to do the optional step of combining your commits into a single commit
before merging, go to the next section. You can also merge now and do the
optional step later; you won’t see the combined commit on master
but you will
still learn the concepts. Otherwise, switch to the master branch using git
checkout master
and run git merge dice
to integrate your changes from dice
into master
. Now you’re done! Don’t forget to complete the quick Gradescope
check below.
Because we ran rebased our branch before merging, we didn’t get any merge conflicts when running
git merge dice
. You can simplygit merge dice
without rebasing first, but Git will prompt you to resolve the exact same merge conflicts before it allows a merge so you aren’t saving yourself any work. We decided to showgit rebase
because it’s a good habit to regularly sync your branch with its base branch. Rungit rebase
every so often if you suspect there are new changes onmaster
so that any merge conflicts are small and incremental. If you work with a large team and toil away on your feature branch for several days without rebasing, you may find that when its time to merge, your teammates have already updatedmaster
many times giving you a large backlog of merge conflicts and an even larger headache.
Gradescope:
-
On your
master
branch, typegit log
and paste a portion of your commit history showing the commit(s) you merged in from yourdice
branch plus a few commits into the past for context. If you haven’t completed the optional “Combining commits” section, 5-6 of the most recent commits should be sufficient. If you have completed the optional section and squashed your commits, 3-4 of the most recent commits should be sufficient. -
On the
master
branch, usegit diff
to show the difference between whatrand.py
looks like now versus what it looked like at the start of the lab. There are multiple ways to usegit diff
to do this, so pick any one you like. In Gradescope, paste both thegit diff
command you used and the output of the command.
(OPTIONAL) Combining commits
Git has the ability to combine multiple commits into a single commit. This
process is called squashing. You may want to do this if you have work in
progress commits that you want to combine into a single finished commit, or in
our case, keeping git log
for the master
branch more digestible.
-
Pros: If you have a feature branch with fifty commits, you prevent inundating
master
’s history with fifty different commits when merging. When you squash, you can make a multi-line commit message where the first line is a summary of the feature, and subsequent lines are bullet points containing relevant individual commit messages from the individual commits. -
Cons: You lose the granularity of the history on
master
, making it more difficult if you want to partially roll back your feature. Git also doesn’t support having multiple authors for a single commit, so if you want to credit other contributors you need to add co-author credit somewhere in the commit message of your combined commit. Aside: you can see how co-authoring works with GitHub here.
First, make sure you’re on your dice
branch using git checkout dice
. We
will be performing our squash using the command git rebase
in interactive mode
by passing the -i
flag. This may seem a bit unusual that we can run rebase
without a base branch, but rebase
can not only sync branches, but also rewrite
the history of our current branch.
The format for the command of an interactive rebase is
git rebase -i HEAD~#
where#
is the parent commit of the commit where we want to start our rebase. This will seem a bit unintuitive if you’ve followed how theHEAD~#
pointer works: to work on our last three commits (HEAD
,HEAD~1
, andHEAD~2
), we need to actually rungit rebase -i HEAD~3
. An easier way to remember this is if you want to rebase the lastN
commits, rungit rebase -i HEAD~N
. However, we may not always remember how many commits we made so its okay to give yourself a little more room/context by using a larger number.
Let’s run git rebase -i HEAD~5
. This will bring up a text editor with a file
that we can modify to change the history of commits from HEAD~4
to HEAD
.
You’ll notice that the commits are in reverse order, starting with the oldest
commit at the top and newest commit at the bottom. This is because git
will
use this text file as part of a script to replay those commits (with your
rewritten changes) in chronological order. Your file will have different commit
hashes and likely a different commit message on the first line than the example
below:
pick 7c5bd5d fixed sus
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
pick b193d44 Add dice rolling logic and output dice sum and sequence
pick 554402e Restrict input range for dice iterations and sides
# Rebase 7dac858..554402e onto b193d44 (5 commands)
# ...
As you can see, Git includes useful instructions on what you can do with commits
in interactive mode in the commented-out section at the bottom of the file. We
can combine, delete, reword, or even reorder our commits, along with many more
options. In our case, we want to squash
our two most recent commits into the
third most recent commit in the list. To do this, replace the word pick
with
the word squash
(or just s
for short) on the last two lines. Do not
squash
the third line; that will take your three most recent commits and
meld them with the “dice rolling WIP” commit, which is a commit from master
.
Your file should look something like this:
pick 7c5bd5d fixed sus
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
squash b193d44 Add dice rolling logic and output dice sum and sequence
squash 554402e Restrict input range for dice iterations and sides
# Rebase 7dac858..554402e onto b193d44 (5 commands)
# ...
Save and exit the file. Git will now rewrite the commit history as you specified. If you squash commits or do another action that may modify commit messages, Git will bring up another text editor for you to modify the commit message. Because you squashed, Git made a multi-line commit message by combining the three individual commit messages from before you squashed. Let’s add a useful one-line summary, “Add dice roll feature”, at the top of the file. You can also adjust the formatting to your liking; I like adding bullet points to the individual commit messages.
Add dice roll feature
# This is a combination of 3 commits.
# This is the 1st commit message:
- Add -s flag for number of sides on a die
# This is the commit message #2:
- Add dice rolling logic and output dice sum and sequence
# This is the commit message #3:
- Restrict input range for dice iterations and sides
Git and other tools use the first line of a commit message as a shortened version for certain display purposes. For example, try running
git log --oneline
. As another example, when Github displays the most recent commit message next to files and folders in the directory view, the first line of the message is shown.
Save the file and exit. The rebase is now complete! Type git log
and view your
new squashed commit. If you didn’t already merge your three individual commits
into master
, you can run git merge master
to merge in your combined commit.
commit 6b4f705921f9b06af880f97d36cb0b74e51aeb0c (HEAD -> dice)
Author: Max Vogel <max-v@berkeley.edu>
Date: Sun Apr 3 19:45:51 2022 -0700
Add dice roll feature
- Add -s flag for number of sides on a die
- Add dice rolling logic and output dice sum and sequence
- Restrict input range for dice iterations and sides
Gradescope: There’s no separate Gradescope check for this optional section, but make sure you go back and complete the Gradescope check above in the previous section if you’ve chosen to do this part first!
Part 2: Pull Requests
One of the most common ways to contribute changes to a repository on Github (and other similar Git remote hosting services) is through a pull request (often shortened to “PR”). A pull request is basically a proposal to make changes to a repository that can be reviewed by others, commented on, and edited before the changes are actually approved to be merged into the repository. For example, when working on a large project with a team, you may want to submit a pull request with your new feature so that team members can review your code for bugs or code style mistakes. For projects on public repositories that invite contributions from strangers, you can contribute a feature by forking the repository (making a copy of it that you own), implementing your feature, and then making a pull request to have your contribution merged into the official repository.
First, read up on how to create a pull request from Github’s documentation. We’ve created a dummy repository at github.com/0xcf/decal-pr-practice for you to practice making a simple PR from a fork. You will be making a fork of the repo, writing your name in the Markdown file in the repo, and then making a PR to merge your change into the original repo.
-
Read about how to fork a repo from Github’s documentation. Make a fork of our dummy repository at github.com/0xcf/decal-pr-practice.
-
Clone your forked repository and create a new branch based on
master
calledmy-name
. OpenREADME.md
and replace “Tux the Penguin” with your name. -
Stage and commit your change with the commit message
Add my name
, and thengit push
. -
Read about how to make a PR from a fork from Github’s documentation. Make a PR from your branch
my-name
on your forked repo<Your Github username>/decal-pr-practice
to the branchmaster
on the original repo0xcf/decal-pr-practice
.
You’ve now made a pull request! Github has a lot of features for pull requests, feel free to check some of them out. This lab you’re reading right now was merged through a PR, which you can view at github.com/0xcf/decal-web/pull/216.
-
The main page for the pull request has a timeline of commits and any comments from reviewers. PRs can be continually updated by making and pushing new commits to your compare branch. Github will track any new changes pushed to the compare branch and update your PR. As you can see, I didn’t follow my own advice and messed up a rebase after trying to sync my fork, leaving duplicate commits in my commit history…
-
The sidebar on the main page has several notable features. You can request other people to review your changes, assign others to work on the PR, and link an issue that your PR will resolve if it gets merged. Github repositories have an “Issues” tab where one can submit a report about a bug or request a feature (e.g. github.com/0xcf/decal-web/issues). The Github issue associated with the PR for this lab is at github.com/0xcf/decal-web/issues/184.
-
Go to the “Files Changed” tab to see a diff of the PR’s changes. Reviewers of your PR can leave inline comments on a specific line of code by hovering over the line and clicking the blue “+” button that pops up.
Gradescope: Paste a link to the pull request you made so that it can be checked off for completion.
Part 3: Questions
-
What caused the merge conflict in the Git exercises you did?
-
Why does Git require us to manually intervene in merge conflicts?
-
In our exercise of making pull requests, why did we fork the repository before making a PR?
-
What command would you use to sync a folder
~/Downloads/Linux_ISOs
to the folder/usr/local/share/Calendar
, while preserving file metadata? (hint: use rsync) -
How does rsync determine when to look for changes between files? Select from the following: (read up on how rsync works, and what makes it efficient!)
A. By calculating the checksum of each file and comparing them.
B. By comparing the entire contents of each file for any differences.
C. By seeing if the ‘last modified’ timestamp of the files are different.
D. By seeing if the ‘created’ timestamp of the files are different.
E. By seeing if the permissions of the files are different.