@import "./support/support"; @import "./color_schemes/"; @import "./modules"; @import "./custom/custom";
Link

Lab 9 - Version Control and Backups

Facilitator: Max Vogel

25 min read

Table of contents

  1. Pulling the code
  2. Part 1: Git Basics
    1. rand.py demo
    2. Creating a new branch
    3. Making commits
    4. Viewing your progress
    5. Merge conflicts and rebase
    6. (OPTIONAL) Combining commits
  3. Part 2: Pull Requests
  4. Part 3: Questions

In this lab, we’ll be learning how to use Git and typical best practices for version control!

Pulling the code

If you haven’t already cloned the decal-lab git repository, run:

$ git clone https://github.com/0xcf/decal-labs.git

Go to your decal-labs directory and cd into the folder b9. Run git pull to pull the latest changes.

Part 1: Git Basics

In the b9 directory, we’ve provided a basic Python program called rand.py to help demonstrate some core Git concepts. rand.py takes command line arguments and performs a number of coin flips and outputs the result in your terminal. We want to add a feature to rand.py that also lets us perform dice rolls. We will simulate the real-world development process by making a branch for our new dice rolling feature, making a few commits to that branch, and then merging our feature branch into the master branch. You won’t need to personally write any Python for this lab! You will only need to copy-and-paste the code we provide, since this lab is about learning version control and not Python.

rand.py demo

Run python3 rand.py -h in your terminal to see the help dialogue for the program. At the moment, it takes one positional argument of either coin or dice as well as an optional flag -i (or --iterations) if you want to flip/roll more than once. The dice argument doesn’t work as we haven’t implemented it yet, but you can try flipping a coin:

~/decal-labs/b9$ python3 rand.py coin
1 coin flip(s) resulted in 0 Heads and 1 Tails:
T
~/decal-labs/b9$ python3 rand.py coin -i 10
10 coin flip(s) resulted in 3 Heads and 7 Tails:
T, T, H, T, H, H, T, T, T, T
~/decal-labs/b9$ python3 rand.py coin -i 999
Number of flips must be in the range [0 - 100]

Creating a new branch

Let’s start working on the dice rolling feature! When working on a project with version control, it’s best practice to keep the master branch clean (read: mostly bug-free and stable).

  • Any branches you make will be based on/descended from code in the master branch. You don’t want to write new code on top of a foundation that’s broken or buggy!

  • Some organizations may directly deploy the master branch to a production environment (live for your users and clients), so new and work-in-progress features should be developed in their respective branches and finished before merging them into master.

Let’s make a branch for our dice rolling feature:

$ git checkout -b dice

This makes a new local branch called dice based on the branch that we are currently on (master) and switches you to the dice branch. This command is basically shorthand for:

$ git branch dice       # Create new branch called 'dice'
$ git checkout dice     # Switch to branch called 'dice'

You can view the branches you’ve created by typing git branch. You should see two branches at this point, one called master and one called dice. An asterisk is placed next to the branch that you’ve currently checked out.

You can make git branch (and many other command-line tools) display more information by passing the verbose flag -v, or make it display even more information by adding additional v characters to the flag (e.g. -vv). For example, git branch -v will display the commit message of the most recent commit on the branch (aka the HEAD). git branch -vv also displays in square brackets the remote branch that the local branch is tracking, if one exists. In this repository, master is associated with a remote branch origin/master that lives on Github’s servers. This means when you git push or pull to/from origin master, you are syncing up your local copy of the master branch with the remote master branch on Github’s servers. At the moment, your newly created dice branch is local only, meaning that it is not tracked. Therefore attempting to git push or git pull will not work (or Git may ask if you want to have a remote branch created).

Making commits

Now that you’ve created and swapped to the dice branch, you can safely start adding code without modifying the master branch. The code for dice rolling will be split up into three commits so that we can demonstrate how to combine multiple commits into a single commit later. Open rand.py in your preferred text editor. Find the comment in the __main__ function that says COMMIT 1 and add the following code below it like so:

    # COMMIT 1: Add -s flag for number of sides on a die
    parser.add_argument(
        "-s", "--sides",
        dest="sides",
        type=int,
        default=6,
        help="Number of sides on a die (max=20; ignored when flipping a coin)"
    )

If you’re unsure about copying the code correctly, take a look at rand_reference.py for what the finished code should look like!

Save and exit the text editor and return to the terminal. Type git status to see that rand.py has been changed but not staged. Type git add rand.py to stage the file. To “stage” a file means to add it to the group of files that you want to include in your commit.

You may already know git add . to add every changed file in your current directory to the staging area, but in some cases you want to only add relevant files if you hopped around implementing different things. A commit should be a group of changes focused on a single small goal, sub-feature, or bug fix. In the event that you need to roll back a commit, you don’t want to cause collateral damage by undoing changes in an unrelated file that you lumped into the commit.

Now, lets commit our changes and write a concise and useful commit message. Type git commit -m "Add -s flag for number of sides on a die". Organizations and companies will have different best practices for writing commit messages, but here are some common guidelines:

  • Keep it short. Typical commits should be one line (some exceptions). If a lot of changes have been made, you might want to make multiple commits instead.

  • Make it descriptive. If someone needs to read the commit history, they’re not going to know or remember what was changed by a commit that just says “Fixed bug” or “WIP”.

  • A lot of places like to capitalize the first letter and use the “<present tense verb> <descriptive thing>” sentence structure. For example, “Add”, “Fix”, or “Remove” some thing. You can think of your commit message as completing the sentence “This commit will…”. For example, “This commit will add -s flag for number of sides on a die”.

Next, we will follow the same procedure to make commits 2 and 3.

Find the comment in the function roll_dice that says COMMIT 2 and add the following code. Make sure that it’s indented properly inside the function! Then stage your changes and make a commit with the message "Add dice rolling logic and output dice sum and sequence".

    # COMMIT 2: Add dice rolling logic and output dice sum and sequence
    diceRecord, diceSum = [], 0
    for i in range(iterations):
        roll = random.randint(1, sides)
        diceRecord.append(roll)
        diceSum += roll

    print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
            .format(iterations, sides, diceSum))
    print(*diceRecord, sep=', ')

Find the comment in the function roll_dice that says COMMIT 3 and add the following code. Then stage your changes and make a commit with the message "Restrict input range for dice iterations and sides".

    # COMMIT 3: Restrict input range for dice iterations and sides
    if iterations > MAX_ITERATIONS or iterations < 0:
        print("Number of rolls must be in the range [0 - {}]"
                .format(MAX_ITERATIONS))
        return

    if sides > MAX_SIDES or sides < 1:
        print("Number of sides must be in the range [1 - {}]"
                .format(MAX_SIDES))
        return

When you’re done, you should now be able to roll dice:

~/decal-labs/b9$ python3 rand.py dice -i 10 -s 20
10 roll(s) of a 20-sided die resulted in a sum of 119:
15, 3, 12, 10, 16, 8, 18, 20, 13, 4

Viewing your progress

Let’s take a look at the commits you’ve made. Type git log to see a history of commits. Each commit has some information such as who authored the commit, a timestamp for when the commit was created, and the commit message.

  • The first line of each commit entry has a long hexadecimal string. This is the commit hash: think of it as a unique ID that you can use to reference that specific commit.

  • Some commits have branch information in parentheses next to the commit hash, indicating that they are the most recent commit or HEAD of that branch. Your most recent commit should have something like (HEAD -> dice). The fourth commit should have (origin/master, origin/HEAD) because we based our branch off of master and have added three new commits on top of it. Note that if someone adds new commits to the local or remote master, the branch information may change or be out of date.

Type q to exit git log.

Besides looking at the commit history, you may want to view the actual changes in the code. You can use git diff <old commit> <new commit> to view the difference between two commits. There are a few different ways you can reference a commit. One that was mentioned before was copying a commit’s hash (note that your commit hashes will be different from the example below):

$ git diff 3368313c0afb6e306133d604ca72b0287124e8f2 762053064506810dee895219e5b2c2747a202829

You can also copy a small chunk of the beginning of the commit hash instead of the entire hash. Because of the way hashes work, it’s very unlikely that you’ll have two commits that have the exact same starting sequence.

$ git diff 3368313 7620530

If you’re trying to diff two commits that are pretty close together in the log, an easier way is to reference commits by their distance from the HEAD (most recent) commit using the format HEAD~<number>. Since we added three commits new commits in dice, we can view the difference between dice and master using the following command:

$ git diff HEAD~3 HEAD

Merge conflicts and rebase

Now that you’ve implemented dice rolling on on your feature branch, you’ll want to merge your feature into the master branch. This means that Git will take your changes on the dice branch and and replay them on the master branch, bringing master up to date with dice. However, things can (and often will) go wrong when master has new commits added to it while you’re working on dice. Now, our commits on dice may be based on an old version of master. Even worse, someone else may have modified the same lines of code on master that we changed on dice, resulting in Git not knowing whose lines to use. This is called a merge conflict.

Let’s simulate a merge conflict by making a change on master. Switch to the master branch using git checkout master. Make sure that you’re on the master branch by checking the output of git branch. Now that you’re on the master branch, rand.py shouldn’t contain the code for the new dice feature we added. Go to the comment that says COMMIT 2 and add the following code below:

    # COMMIT 2: Add dice rolling logic and output dice sum and sequence
    diceSum = random.randint(1, iterations * 6)

    print("{} roll(s) of a {}-sided die resulted in a sum of {}:"
            .format(iterations, sides, diceSum))

Now stage the changes and commit with the message "dice rolling WIP". In this totally realistic scenario, our imaginary maverick teammate has added some buggy code to master, making your life harder. Switch back to your dice branch with git checkout dice and prepare to handle this merge conflict.

There are a multiple ways to handle a merge conflict, but the one we will be showing you in this lab is using git rebase. Our dice branch is “based” on the master branch at a certain point in time, but the master branch has moved forward leaving dice based on an outdated master. Thus, we want to “re-base” dice on the current state of master. While on your dice branch, run git rebase master. Git will rewind the commits you’ve made on dice, copy any new commits made on master, and attempt to replay your commits on top. Sometimes rebase will run to completion without your intervention, but if there’s a merge conflict you will need to resolve it.

Git will give you instructions on what to do if it encounters a merge conflict during a rebase. In this case, open rand.py and find the conflicted zone which should have the following format:

<<<<<<< HEAD
Lines of code from the base branch (in this case master)
=======
Lines of code from the branch you're rebasing (in this case dice)
>>>>>>> Commit message of the commit that conflicts with the base branch

To fix the conflict, simply keep the lines you want (your lines from dice) and delete the other lines in the conflicted zone (<<<<<<< HEAD, =======, >>>>>>> dice, and the unwanted code from master), and then save and exit the file. Git will take what you’ve saved as the exact form of what the file will look like at the end of the rebase, so what you’re doing is essentially fixing the file so that the code runs properly. This means that if you have multiple merge conflicts and you decide to mix keeping some lines from the base branch and some from your feature branch, you need to make sure the code actually works correctly.

Now that you’ve fixed the merge conflict, follow the rebase instructions and stage your fixed file (git add rand.py), then run git rebase --continue. If Git finds any more merge conflicts for other files, you would follow the same procedure as above. However, we only had one conflicted file so our rebase is finished! Run git log to see the result of our rebase. You should now see that your imaginary teammate’s "dice rolling WIP" commit in your branch’s history, with your commits on top of theirs.

Wait! We haven’t actually merged our commits into the master branch. If you want to do the optional step of combining your commits into a single commit before merging, go to the next section. You can also merge now and do the optional step later; you won’t see the combined commit on master but you will still learn the concepts. Otherwise, switch to the master branch using git checkout master and run git merge dice to integrate your changes from dice into master. Now you’re done! Don’t forget to complete the quick Gradescope check below.

Because we ran rebased our branch before merging, we didn’t get any merge conflicts when running git merge dice. You can simply git merge dice without rebasing first, but Git will prompt you to resolve the exact same merge conflicts before it allows a merge so you aren’t saving yourself any work. We decided to show git rebase because it’s a good habit to regularly sync your branch with its base branch. Run git rebase every so often if you suspect there are new changes on master so that any merge conflicts are small and incremental. If you work with a large team and toil away on your feature branch for several days without rebasing, you may find that when its time to merge, your teammates have already updated master many times giving you a large backlog of merge conflicts and an even larger headache.

Gradescope:

  1. On your master branch, type git log and paste a portion of your commit history showing the commit(s) you merged in from your dice branch plus a few commits into the past for context. If you haven’t completed the optional “Combining commits” section, 5-6 of the most recent commits should be sufficient. If you have completed the optional section and squashed your commits, 3-4 of the most recent commits should be sufficient.

  2. On the master branch, use git diff to show the difference between what rand.py looks like now versus what it looked like at the start of the lab. There are multiple ways to use git diff to do this, so pick any one you like. In Gradescope, paste both the git diff command you used and the output of the command.

(OPTIONAL) Combining commits

Git has the ability to combine multiple commits into a single commit. This process is called squashing. You may want to do this if you have work in progress commits that you want to combine into a single finished commit, or in our case, keeping git log for the master branch more digestible.

  • Pros: If you have a feature branch with fifty commits, you prevent inundating master’s history with fifty different commits when merging. When you squash, you can make a multi-line commit message where the first line is a summary of the feature, and subsequent lines are bullet points containing relevant individual commit messages from the individual commits.

  • Cons: You lose the granularity of the history on master, making it more difficult if you want to partially roll back your feature. Git also doesn’t support having multiple authors for a single commit, so if you want to credit other contributors you need to add co-author credit somewhere in the commit message of your combined commit. Aside: you can see how co-authoring works with GitHub here.

First, make sure you’re on your dice branch using git checkout dice. We will be performing our squash using the command git rebase in interactive mode by passing the -i flag. This may seem a bit unusual that we can run rebase without a base branch, but rebase can not only sync branches, but also rewrite the history of our current branch.

The format for the command of an interactive rebase is git rebase -i HEAD~# where # is the parent commit of the commit where we want to start our rebase. This will seem a bit unintuitive if you’ve followed how the HEAD~# pointer works: to work on our last three commits (HEAD, HEAD~1, and HEAD~2), we need to actually run git rebase -i HEAD~3. An easier way to remember this is if you want to rebase the last N commits, run git rebase -i HEAD~N. However, we may not always remember how many commits we made so its okay to give yourself a little more room/context by using a larger number.

Let’s run git rebase -i HEAD~5. This will bring up a text editor with a file that we can modify to change the history of commits from HEAD~4 to HEAD. You’ll notice that the commits are in reverse order, starting with the oldest commit at the top and newest commit at the bottom. This is because git will use this text file as part of a script to replay those commits (with your rewritten changes) in chronological order. Your file will have different commit hashes and likely a different commit message on the first line than the example below:

pick 7c5bd5d fixed sus
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
pick b193d44 Add dice rolling logic and output dice sum and sequence
pick 554402e Restrict input range for dice iterations and sides

# Rebase 7dac858..554402e onto b193d44 (5 commands)
# ...

As you can see, Git includes useful instructions on what you can do with commits in interactive mode in the commented-out section at the bottom of the file. We can combine, delete, reword, or even reorder our commits, along with many more options. In our case, we want to squash our two most recent commits into the third most recent commit in the list. To do this, replace the word pick with the word squash (or just s for short) on the last two lines. Do not squash the third line; that will take your three most recent commits and meld them with the “dice rolling WIP” commit, which is a commit from master. Your file should look something like this:

pick 7c5bd5d fixed sus
pick 1f7a1e1 dice rolling WIP
pick 26e0827 Add -s flag for number of sides on a die
squash b193d44 Add dice rolling logic and output dice sum and sequence
squash 554402e Restrict input range for dice iterations and sides

# Rebase 7dac858..554402e onto b193d44 (5 commands)
# ...

Save and exit the file. Git will now rewrite the commit history as you specified. If you squash commits or do another action that may modify commit messages, Git will bring up another text editor for you to modify the commit message. Because you squashed, Git made a multi-line commit message by combining the three individual commit messages from before you squashed. Let’s add a useful one-line summary, “Add dice roll feature”, at the top of the file. You can also adjust the formatting to your liking; I like adding bullet points to the individual commit messages.

Add dice roll feature

# This is a combination of 3 commits.
# This is the 1st commit message:

- Add -s flag for number of sides on a die

# This is the commit message #2:

- Add dice rolling logic and output dice sum and sequence

# This is the commit message #3:

- Restrict input range for dice iterations and sides

Git and other tools use the first line of a commit message as a shortened version for certain display purposes. For example, try running git log --oneline. As another example, when Github displays the most recent commit message next to files and folders in the directory view, the first line of the message is shown.

Save the file and exit. The rebase is now complete! Type git log and view your new squashed commit. If you didn’t already merge your three individual commits into master, you can run git merge master to merge in your combined commit.

commit 6b4f705921f9b06af880f97d36cb0b74e51aeb0c (HEAD -> dice)
Author: Max Vogel <max-v@berkeley.edu>
Date:   Sun Apr 3 19:45:51 2022 -0700

    Add dice roll feature

    - Add -s flag for number of sides on a die

    - Add dice rolling logic and output dice sum and sequence

    - Restrict input range for dice iterations and sides

Gradescope: There’s no separate Gradescope check for this optional section, but make sure you go back and complete the Gradescope check above in the previous section if you’ve chosen to do this part first!

Part 2: Pull Requests

One of the most common ways to contribute changes to a repository on Github (and other similar Git remote hosting services) is through a pull request (often shortened to “PR”). A pull request is basically a proposal to make changes to a repository that can be reviewed by others, commented on, and edited before the changes are actually approved to be merged into the repository. For example, when working on a large project with a team, you may want to submit a pull request with your new feature so that team members can review your code for bugs or code style mistakes. For projects on public repositories that invite contributions from strangers, you can contribute a feature by forking the repository (making a copy of it that you own), implementing your feature, and then making a pull request to have your contribution merged into the official repository.

First, read up on how to create a pull request from Github’s documentation. We’ve created a dummy repository at github.com/0xcf/decal-pr-practice for you to practice making a simple PR from a fork. You will be making a fork of the repo, writing your name in the Markdown file in the repo, and then making a PR to merge your change into the original repo.

  1. Read about how to fork a repo from Github’s documentation. Make a fork of our dummy repository at github.com/0xcf/decal-pr-practice.

  2. Clone your forked repository and create a new branch based on master called my-name. Open README.md and replace “Tux the Penguin” with your name.

  3. Stage and commit your change with the commit message Add my name, and then git push.

  4. Read about how to make a PR from a fork from Github’s documentation. Make a PR from your branch my-name on your forked repo <Your Github username>/decal-pr-practice to the branch master on the original repo 0xcf/decal-pr-practice.

You’ve now made a pull request! Github has a lot of features for pull requests, feel free to check some of them out. This lab you’re reading right now was merged through a PR, which you can view at github.com/0xcf/decal-web/pull/216.

  • The main page for the pull request has a timeline of commits and any comments from reviewers. PRs can be continually updated by making and pushing new commits to your compare branch. Github will track any new changes pushed to the compare branch and update your PR. As you can see, I didn’t follow my own advice and messed up a rebase after trying to sync my fork, leaving duplicate commits in my commit history…

  • The sidebar on the main page has several notable features. You can request other people to review your changes, assign others to work on the PR, and link an issue that your PR will resolve if it gets merged. Github repositories have an “Issues” tab where one can submit a report about a bug or request a feature (e.g. github.com/0xcf/decal-web/issues). The Github issue associated with the PR for this lab is at github.com/0xcf/decal-web/issues/184.

  • Go to the “Files Changed” tab to see a diff of the PR’s changes. Reviewers of your PR can leave inline comments on a specific line of code by hovering over the line and clicking the blue “+” button that pops up.

Gradescope: Paste a link to the pull request you made so that it can be checked off for completion.

Part 3: Questions

  1. What caused the merge conflict in the Git exercises you did?

  2. Why does Git require us to manually intervene in merge conflicts?

  3. What command would you use to sync a folder ~/Downloads/Linux_ISOs to the folder /usr/local/share/Calendar, while preserving file metadata? (hint: use rsync)

  4. How does rsync determine when to look for changes between files? Select from the following:

    A. By calculating the checksum of each file and comparing them.

    B. By comparing the entire contents of each file for any differences.

    C. By seeing if the ‘last modified’ timestamp of the files are different.

    D. By seeing if the ‘created’ timestamp of the files are different.

    E. By seeing if the permissions of the files are different.

  5. Consider a file str1 containing the string ‘abcd’. You rsync it to another file str2, and then replace ‘abcd’ with ‘aaaa’ on str1. If you run rsync --append str1 str2, what will the contents of str2 be?