Lab 9 - Version Control and Backups
Git IRL
In this lab, we’ll be learning how to use Git and typical best practices for
version control!
Pulling the code
If you haven’t already cloned the decal-lab git repository, run: git clone
https://github.com/0xcf/decal-labs.git
Go to your decal-labs
directory and cd
into the folder b9
. Run git pull
to pull the latest changes.
Part 1: Git Basics
In the b9
directory, we’ve provided a basic Python program called rand.py
to
help demonstrate some core Git concepts. rand.py
takes command line arguments
and performs a number of coin flips and outputs the result in your terminal. We
want to add a feature to rand.py
that also lets us perform dice rolls. We will
simulate the real-world development process by making a branch for our new dice
rolling feature, making a few commits to that branch, and then merging our
feature branch into the master branch. You won’t need to personally write any
Python for this lab! You will only need to copy-and-paste the code we provide,
since this lab is about learning version control and not Python.
rand.py
demo
Run python3 rand.py -h
in your terminal to see the help dialogue for the
program. At the moment, it takes one positional argument of either coin
or
dice
as well as an optional flag -i
(or --iterations
) if you want to
flip/roll more than once. The dice
argument doesn’t work as we haven’t
implemented it yet, but you can try flipping a coin:
exiang@supernova:~/decal-labs/b9$ python3 rand.py coin
1 coin flip(s) resulted in 0 Heads and 1 Tails:
T
exiang@supernova:~/decal-labs/b9$ python3 rand.py coin -i 10
10 coin flip(s) resulted in 3 Heads and 7 Tails:
T, T, H, T, H, H, T, T, T, T
exiang@supernova:~/decal-labs/b9$ python3 rand.py coin -i 999
Number of flips must be in the range [0 - 100]
Creating a new branch
Let’s start working on the dice rolling feature! When working on a project with
version control, it’s best practice to keep the master branch clean (read:
mostly bug-free and stable).
-
Any branches you make will be based on/descended from code in the master
branch. You don’t want to write new code on top of a foundation that’s broken
or buggy!
-
Some organizations may directly deploy the master branch to a production
environment (live for your users and clients), so new and work-in-progress
features should be developed in their respective branches and finished before
merging them into master.
Let’s make a branch for our dice rolling feature:
This makes a new local branch called dice
based on the branch that we are
currently on (master
) and switches you to the dice
branch. This command is
basically shorthand for:
git branch dice // Create new branch called 'dice'
git checkout dice // Switch to branch called 'dice'
You can view the branches you’ve created by typing git branch
. You should see
two branches at this point, one called master
and one called dice
. An
asterisk is placed next to the branch that you’ve currently checked out.
You can make git branch
(and many other command-line tools) display more
information by passing the verbose flag -v
, or make it display even more
information by adding additional v
characters to the flag (e.g. -vv
). git
branch -v
will display the commit message of the most recent commit on the
branch (aka the HEAD
). git branch -vv
also displays in square brackets the
remote branch that the local branch is tracking, if one exists. In this
repository, master
is associated with a remote branch origin/master
that
lives on Github’s servers. This means when you git push
or pull
to/from
origin master
, you are syncing up your local copy of the master
branch
with the remote master
branch on Github’s servers. At the moment, your newly
created dice
branch is local only, meaning that attempting to git push
or
pull
will not work (or Git may ask you if you want to have a remote branch
created).
Making commits
Now that you’re on your dice
branch, you can safely start adding code without
modifying the master
branch. The code for dice rolling will be split up into
three commits so that we can demonstrate how to combine multiple commits into a
single commit later. Open rand.py
in your preferred text editor. Find the
comment in the __main__
function that says COMMIT 1
and add the following
code below it like so:
# COMMIT 1: Add -s flag for number of sides on a die
parser.add_argument(
"-s", "--sides",
dest="sides",
type=int,
default=6,
help="Number of sides on a die (max=20; ignored when flipping a coin)"
)
If you’re unsure about copying the code correctly, take a look at
rand_reference.py
for what the finished code should look like!
Save and exit the text editor and return to the terminal. Type git status
to
see that rand.py
has been changed but not staged. Type git add rand.py
to
stage the file. To “stage” a file means to add it to the group of files that you
want to include in your commit.
You may already know git add .
to add every changed file in your current
directory to the staging area, but in some cases you want to only add relevant
files if you hopped around implementing different things. A commit should be a
group of changes focused on a single small goal, sub-feature, or bug fix. In
the event that you need to roll back a commit, you don’t want to cause
collateral damage by undoing changes in an unrelated file that you lumped into
the commit.
Now, lets commit our changes and write a concise and useful commit message. Type
git commit -m "Add -s flag for number of sides on a die"
. Organizations and
companies will have different best practices for writing commit messages, but
here are some common guidelines:
-
Keep it short. Typical commits should be one line (some exceptions). If a lot
of changes have been made, you might want to make multiple commits instead.
-
Make it descriptive. If someone needs to read the commit history, they’re not
going to know or remember what was changed by a commit that just says “Fixed
bug” or “WIP”.
-
A lot of places like to capitalize the first letter and use the “<present
tense verb> <descriptive thing>” sentence structure. For example, “Add”,
“Fix”, or “Remove” some thing. You can think of your commit message as
completing the sentence “This commit will…”. For example, “This commit will
add -s flag for number of sides on a die”.
Next, we will follow the same procedure to make commits 2 and 3.
Find the comment in the function roll_dice
that says COMMIT 2
and add the
following code. Make sure that it’s indented properly inside the function! Then
stage your changes and make a commit with the message "Add dice rolling logic
and output dice sum and sequence"
.
# COMMIT 2: Add dice rolling logic and output dice sum and sequence
diceRecord, diceSum = [], 0
for i in range(iterations):
roll = random.randint(1, sides)
diceRecord.append(roll)
diceSum += roll
print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
.format(iterations, sides, diceSum))
print(*diceRecord, sep=', ')
Find the comment in the function roll_dice
that says COMMIT 3
and add the
following code. Then stage your changes and make a commit with the message
"Restrict input range for dice iterations and sides"
.
# COMMIT 3: Restrict input range for dice iterations and sides
if iterations > MAX_ITERATIONS or iterations < 0:
print("Number of rolls must be in the range [0 - {}]"
.format(MAX_ITERATIONS))
return
if sides > MAX_SIDES or sides < 1:
print("Number of sides must be in the range [1 - {}]"
.format(MAX_SIDES))
return
When you’re done, you should now be able to roll dice:
exiang@supernova:~/decal-labs/b9$ python3 rand.py dice -i 10 -s 20
10 roll(s) of a 20-sided die resulted in a sum of 119:
15, 3, 12, 10, 16, 8, 18, 20, 13, 4
Viewing your progress
Let’s take a look at the commits you’ve made. Type git log
to see a history of
commits. Each commit has some information such as who authored the commit, a
timestamp for when the commit was created, and the commit message.
-
The first line of each commit entry has a long hexadecimal string. This is the
commit hash: think of it as a unique ID that you can use to reference that
specific commit.
-
Some commits have branch information in parentheses next to the commit hash,
indicating that they are the most recent commit or HEAD
of that branch. Your
most recent commit should have something like (HEAD -> dice)
. The fourth
commit should have (origin/master, origin/HEAD)
because we based our branch
off of master
and have added three new commits on top of it. Note that if
someone adds new commits to the local or remote master
, the branch
information may change or be out of date.
Type q
to exit git log
.
Besides looking at the commit history, you may want to view the actual changes
in the code. You can use git diff <old commit> <new commit>
to view the
difference between two commits. There are a few different ways you can reference
a commit. One that was mentioned before was copying a commit’s hash (note that
your commit hashes will be different from the example below):
git diff 3368313c0afb6e306133d604ca72b0287124e8f2 762053064506810dee895219e5b2c2747a202829
You can also copy a small chunk of the beginning of the commit hash instead of
the entire hash. Because of the way hashes work, it’s very unlikely that you’ll
have two commits that have the exact same starting sequence.
If you’re trying to diff
two commits that are pretty close together in the
log, an easier way is to reference commits by their distance from the HEAD
(most recent) commmit using the format HEAD~<number>
. Since we added three
commits new commits in dice
, we can view the difference between dice
and
master
using the following command:
Merge conflicts and rebase
Now that you’ve implemented dice rolling on on your feature branch, you’ll want
to merge your feature into the master
branch. This means that Git will take
your changes on the dice
branch and and replay them on the master
branch,
bringing master
up to date with dice
. However, things can go wrong when
master
has new commits added to it while you’re working on dice
. Now, our
commits on dice
may be based on an old version of master. Even worse, someone
else may have modified the same lines of code on master
that we changed on
dice
, resulting in Git not knowing whose lines to use. This is called a merge
conflict.
Let’s simulate a merge conflict by making a change on master
. Switch to the
master
branch using git checkout master
. Make sure that you’re on the
master
branch by checking the output of git branch
. Now that you’re on the
master
branch, rand.py
shouldn’t contain the code for the new dice feature
we added. Go to the comment that says COMMIT 2
and add the following code
below:
# COMMIT 2: Add dice rolling logic and output dice sum and sequence
diceSum = random.randint(1, iterations * 6)
print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
.format(iterations, sides, diceSum))
Now stage the changes and commit with the message "dice rolling WIP"
. In this
totally realistic scenario, our imaginary maverick teammate has added some buggy
code to master
, making your life harder. Switch back to your dice
branch
with git checkout dice
and prepare to handle this merge conflict.
There are a multiple ways to handle a merge conflict, but the one we will be
showing you in this lab is using git rebase
. Our dice
branch is “based” on
the master
branch at a certain point in time, but the master
branch has
moved forward leaving dice
based on an outdated master
. Thus, we want to
“re-base” dice
on the current state of master
. While on your dice
branch,
run git rebase master
. Git will rewind the commits you’ve made on dice
, copy
any new commits made on master
, and attempt to replay your commits on top.
Sometimes rebase
will run to completion without your intervention, but if
there’s a merge conflict you will need to resolve it.
Git will give you instructions on what to do if it encounters a merge conflict
during a rebase. In this case, open rand.py
and find the conflicted zone which
should have the following format:
<<<<<<< HEAD
Lines of code from the base branch (in this case master)
=======
Lines of code from the branch you're rebasing (in this case dice)
>>>>>>> Commit message of the commit that conflicts with the base branch
To fix the conflict, simply keep the lines you want (your lines from dice
) and
delete the other lines in the conflicted zone (<<<<<<< HEAD
, =======
,
>>>>>>> dice
, and the unwanted code from master), and then save and exit the
file. Git will take what you’ve saved as the exact form of what the file will
look like at the end of the rebase, so what you’re doing is essentially fixing
the file so that the code runs properly. This means that if you have multiple
merge conflicts and you decide to mix keeping some lines from the base branch
and some from your feature branch, you need to make sure the code actually works
correctly.
Now that you’ve fixed the merge conflict, follow the rebase instructions and
stage your fixed file (git add rand.py
), then run git rebase --continue
. If
Git finds any more merge conflicts for other files, you would follow the same
procedure as above. However, we only had one conflicted file so our rebase is
finished! Run git log
to see the result of our rebase. You should now see that
your imaginary teammate’s "dice rolling WIP"
commit in your branch’s history,
with your commits on top of theirs.
Wait! We haven’t actually merged our commits into the master
branch. If
you want to do the optional step of combining your commits into a single commit
before merging, go to the next section. You can also merge now and do the
optional step later; you won’t see the combined commit on master
but you will
still learn the concepts. Otherwise, switch to the master branch using git
checkout master
and run git merge dice
to integrate your changes from dice
into master
. Now you’re done! Don’t forget to complete the quick Gradescope
check below.
Because we ran rebased our branch before merging, we didn’t get any merge
conflicts when running git merge dice
. You can simply git merge dice
without rebasing first, but Git will prompt you to resolve the exact same
merge conflicts before it allows a merge so you aren’t saving yourself any
work. We decided to show git rebase
because it’s a good habit to regularly
sync your branch with its base branch. Run git rebase
every so often if you
suspect there are new changes on master
so that any merge conflicts are
small and incremental. If you work with a large team and toil away on your
feature branch for several days without rebasing, you may find that when its
time to merge, your teammates have already updated master
many times giving
you a large backlog of merge conflicts and an even larger headache.
Gradescope:
-
On your master
branch, type git log
and paste a portion of your commit
history showing the commit(s) you merged in from your dice
branch plus a
few commits into the past for context. If you haven’t completed the optional
“Combining commists” section, 5-6 of the most recent commits should be
sufficient. If you have completed the optional section and squashed your
commits, 3-4 of the most recent commits should be sufficient.
-
On the master
branch, use git diff
to show the difference between what
rand.py
looks like now versus what it looked like at the start of the lab.
There are multiple ways to use git diff
to do this, so pick any one you
like. In Gradescope, paste both the git diff
command you used and the
output of the command.
(OPTIONAL) Combining commits
Git has the ability to combine multiple commits into a single commit. This
process is called squashing. You may want to do this if you have work in
progress commits that you want to combine into a single finished commit, or in
our case, keeping git log
for the master
branch more digestable.
-
Pros: If you have a feature branch with fifty commits, you prevent inundating
master
’s history with fifty different commits when merging. When you squash,
you can make a multi-line commit message where the first line is a summary of
the feature, and subsequent lines are bullet points containing relevant
individual commit messages from the individaul commits.
-
Cons: You lose the granularity of the history on master
, making it more
difficult if you want to partially roll back your feature. Git also doesn’t
support having multiple authors for a single commit, so if you want to credit
other contributors you need to add co-author credit somewhere in the commit
message of your combined commit.
First, make sure you’re on your dice
branch using git checkout dice
. We
will be performing our squash using the command git rebase
in interactive mode
by passing the -i
flag. This may seem a bit unusual that we can run rebase
without a base branch, but rebase
can not only sync branches, but also rewrite
the history of our current branch.
The format for the command of an interactive rebase is git rebase -i HEAD~#
where #
is the parent commit of the commit where we want to start our
rebase. This will seem a bit unintuitive if you’ve followed how the HEAD~#
pointer works: to work on our last three commits (HEAD
, HEAD~1
, and
HEAD~2
), we need to actually run git rebase -i HEAD~3
. An easier way to
remember this is if you want to rebase the last N
commits, run git rebase
-i HEAD~N
. However, we may not always remember how many commits we made so
its okay to give yourself a little more room/context by using a larger number.
Let’s run git rebase -i HEAD~5
. This will bring up a text editor with a file
that we can modify to change the history of commits from HEAD~4
to HEAD
.
You’ll notice that the commits are in reverse order, starting with the oldest
commit at the top and newest commit at the bottom. This is because git
will
use this text file as part of a script to replay those commits (with your
rewritten changes) in chronological order. Your file will have different commit
hashes and likely a different commit message on the first line than the example
below:
pick 9f785bc Remove trailing whitespace
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
pick b193d44 Add dice rolling logic and output dice sum and sequence
pick 554402e Restrict input range for dice iterations and sides
# Rebase 7dac858..554402e onto b193d44 (5 commands)
#
...
As you can see, Git includes useful instructions on what you can do with commits
in interactive mode in the commented-out section at the bottom of the file. We
can combine, delete, reword, or even reorder our commits, along with many more
options. In our case, we want to squash
our two most recent commits into the
third most recent commit in the list. To do this, replace the word pick
with
the word squash
(or just s
for short) on the last two lines. Do not
squash
the third line; that will take your three most recent commmits and
meld them with the “dice rolling WIP” commit, which is a commit from master
.
Your file should look something like this:
pick 9f785bc Remove trailing whitespace
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
squash b193d44 Add dice rolling logic and output dice sum and sequence
squash 554402e Restrict input range for dice iterations and sides
# Rebase 7dac858..554402e onto b193d44 (5 commands)
#
...
Save and exit the file. Git will now rewrite the commit history as you
specified. If you squash commits or do another action that may modify commit
messages, Git will bring up another text editor for you to modify the commit
message. Because you squashed, Git made a multi-line commit message by combining
the three individual commit messages from before you squashed. Let’s add a
useful one-line summary, “Add dice roll feature”, at the top of the file. You
can also adjust the formatting to your liking; I like adding bullet points to
the individual commit messages.
Add dice roll feature
# This is a combination of 3 commits.
# This is the 1st commit message:
- Add -s flag for number of sides on a die
# This is the commit message #2:
- Add dice rolling logic and output dice sum and sequence
# This is the commit message #3:
- Restrict input range for dice iterations and sides
Git and other tools use the first line of a commit message as a shortened
version for certain display purposes. For example, try running git log
--oneline
. As another example, when Github displays the most recent commit
message next to files and folders in the directory view, the first line of the
message is shown.
Save the file and exit. The rebase is now complete! Type git log
and view your
new squashed commit. If you didn’t already merge your three individual commits
into master
, you can run git merge master
to merge in your combined commit.
commit c0fa61ad59a76635701813472bfbc1c7f36bb857 (HEAD -> dice)
Author: Edric Xiang <email redacted>
Date: Wed Aug 12 19:00:29 2020 -0700
Add dice roll feature
- Add -s flag for number of sides on a die
- Add dice rolling logic and output dice sum and sequence
- Restrict input range for dice iterations and sides
Gradescope: There’s no separate Gradescope check for this optional section,
but make sure you go back and complete the Gradescope check above in the
previous section if you’ve chosen to do this part first!
Part 2: Pull Requests
One of the most common ways to contribute changes to a repository on Github (and
other similar Git remote hosting services) is through a pull request (often
shortened to “PR”). A pull request is basically a proposal to make changes to a
repository that can be reviewed by others, commented on, and edited before the
changes are actually approved to be merged into the repository. For example,
when working on a large project with a team, you may want to submit a pull
request with your new feature so that team members can review your code for bugs
or code style mistakes. For projects on public repostitories that invite
contributions from strangers, you can contribute a feature by forking the
repository (making a copy of it that you own), implementing your feature, and
then making a pull request to have your contribution merged into the official
repository.
First, read up on how to create a pull
request
from Github’s documentation. We’ve created a dummy repository at
https://github.com/0xcf/decal-pr-practice for you to practice making a simple PR
from a fork. You will be making a fork of the repo, writing your name in the
Markdown file in the repo, and then making a PR to merge your change into the
original repo.
-
Read about how to fork a
repo
from Github’s documentation. Make a fork of our dummy repository at
https://github.com/0xcf/decal-pr-practice.
-
Clone your forked repository and create a new branch based on master
called
my-name
. Open README.md
and replace “Tux the Penguin” with your name.
-
Stage and commit your change with the commit message Add my name
, and then
git push
.
-
Read about how to make a PR from a
fork
from Github’s documentation. Make a PR from your branch my-name
on your
forked repo <Your Github username>/decal-pr-practice
to the branch
master
on the original repo 0xcf/decal-pr-practice
.
You’ve now made a pull request! Github has a lot of features for pull requests,
feel free to check some of them out. This lab you’re reading right now was
merged through a PR, which you can view at
https://github.com/0xcf/decal-web/pull/216.
-
The main page for the pull request has a timeline of commits and any comments
from reviewers. PRs can be continually updated by making and pushing new
commits to your compare branch. Github will track any new changes pushed to
the compare branch and update your PR. As you can see, I didn’t follow my own
advice and messed up a rebase after trying to sync my fork, leaving duplicate
commits in my commit history…
-
The sidebar on the main page has several notable features. You can request
other people to review your changes, assign others to work on the PR, and link
an issue that your PR will resolve if it gets merged. Github repositories have
an “Issues” tab where one can submit a report about a bug or request a feature
(e.g. https://github.com/0xcf/decal-web/issues). The Github issue associated
with the PR for this lab is at https://github.com/0xcf/decal-web/issues/184.
-
Go to the “Files Changed” tab to see a diff of the PR’s changes. Reviewers of
your PR can leave inline comments on a specific line of code by hovering over
the line and clicking the blue “+” button that pops up.
Gradescope: Paste a link to the pull request you made so that it can be
checked off for completion.
Part 3: Questions
-
What caused the merge conflict in the Git exercises you did?
-
Why does Git require us to manually intervene in merge conflicts?
-
What command would you use to sync a folder ~/Downloads/Linux_ISOs to the
folder /usr/local/share/Calendar, while preserving file metadata? (hint: use
rsync)
-
How does rsync determine when to look for changes between files? Select from
the following:
A. By calculating the checksum of each file and comparing them.
B. By comparing the entire contents of each file for any differences.
C. By seeing if the ‘last modified’ timestamp of the files are different.
D. By seeing if the ‘created’ timestamp of the files are different.
E. By seeing if the permissions of the files are different.
-
Consider a file str1 containing the string ‘abcd’. You rsync it to another
file str2, and then replace ‘abcd’ with ‘aaaa’ on str1. If you run ‘rsync
–append str1 str2’, what will the contents of str2 be?