Advanced Lab 3 - Pre-install Setup and Installation

Setup
- Acquire installation media
- Create VM
Partitioning time
Installation

PLEASE READ: Some people have been having issues with the Create Filesystems -> Wiping section of the lab, where executing the commands causes the Arch VM to reboot into the UEFI shell. To avoid encountering this problem, please upgrade packages/reboot Ubuntu on your Digital Ocean droplet if prompted to do so by the motd (Message of the Day; the text you see when you ssh into your DO droplet), but do not upgrade to Ubuntu 20.04 (do not run the command do-release-upgrade). In other words, before starting the lab, run sudo apt update, then sudo apt upgrade, and finally sudo reboot. You will lose connection when your DO droplet reboots, then ssh back in and begin the lab. If you are already in the middle of the lab, you can try to proceed through the Wiping section, but if you run into the issue you will likely need to destroy and start a fresh VM.

In this lab we will be installing Arch Linux into a virtual machine. In this lab you will need to pay close attention to detail: mistakes may not be apparent for quite a while later, and can cause huge headaches. As always, make sure to take advantage of Piazza and other resources if you need help! Also, in this lab, the ArchWiki will be an invaluable resource that we will be referencing a lot.

Throughout the lab, additional notes will be formatted like this. These notes are the only sections of the lab that you can safely skip. They will provide additional interesting information and context, so I recommend reading them anyway!

As always, you will need to submit this lab on Gradescope. Any time you see “GRADESCOPE:” in this lab, stop and make sure that you have submitted the relevant answer or output to Gradescope. In most cases, it will be hard or impossible to go back and get the answer later in the lab.

We chose to install Arch Linux in this lab for a few reasons. First, Arch Linux gives you tools to handle monotonous tasks (like genfstab, which does the dirty work of writing out /etc/fstab) while leaving the more complicated and more customizable tasks to you. Second, Arch is widely used and experience with Arch can usually be applied to other distributions as well. Third, ArchWiki. Seriously. It’s that good.

Setup

We will be setting up virtual machine inside your student VM.

Yes, this is a “nested” virtual machine.

First, connect to your student VM using the credentials that were emailed to you. If you haven’t yet, you will need to reset your password when you first log in. Please use a safe password and don’t lose it!

If you’re enrolled in this DeCal, you should have a student VM. If you’re not, we can’t guarantee that these techniques will work on other systems (for a variety of reasons). However, you can obtain a system that’s very similar to a student VM by creating a Droplet on DigitalOcean. Normally this Droplet would cost some amount of money, but you can get free DigitalOcean credits by using the GitHub Student Developer Pack. You can follow the instructions on the DigitalOcean website to create the Droplet. When selecting an image, choose Ubuntu 18.04.

Now that you are connected to your student VM, we will need to install some packages that will allow us to create our nested Arch Linux VM. Run

sudo apt install --no-install-recommends qemu-kvm qemu-utils libvirt-bin virtinst ovmf

QEMU is a system for running virtual machines, and it can use KVM, which is a kernel module that supports the virtualization under the hood. We can install this with the package qemu-kvm. libvirt (installed via libvirt-bin) is a system for managing these QEMU/KVM virtual machines. qemu-utils provides necessary tools for creating virtual drives, and is used by virt-install. virtinst will provide the script virt-install, which we will use to create the virtual machine. We add --no-install-recommends because virtinst has a recommended dependency virt-viewer, which depends on the X11 system. However, this is a lot of packages that are only useful if we have a GUI (which we don’t!) so we want to skip recommended dependencies. Finally, ovmf is an additional library that will let us use UEFI (instead of the older BIOS spec) to boot the VM.

Acquire installation media

Next, we need to get the installation media. Go to https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/ to find the latest Arch release. Using the wget command within your VM, download the .iso file found within this directory as well as the .iso.sig signature file that we will use to verify the authenticity of the .iso file. For example, the current release at the time of writing is 2020.06.01, so you would run the following commands:

wget 'https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/archlinux-2020.06.01-x86_64.iso'
wget 'https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/archlinux-2020.06.01-x86_64.iso.sig'

Note that Arch is a “rolling release” distribution (as opposed to distros like Ubuntu that release distinct versions e.g. Ubuntu 18.04, Ubuntu 20.04), so it is updated frequently. Therefore, the links used above are likely out of date by the time you’re reading this! Make sure to find the correct files for the latest release, do not just copy and paste the commands above.

You can run ls to see the two files that were downloaded.

Notice that we are downloading this from the OCF software mirrors (mirrors.ocf.berkeley.edu). There’s nothing stopping the OCF from modifying the .iso file and inserting malicious code that will end up running on our machine. To avoid this, we need to verify that the signature we downloaded is valid, and matches the file we downloaded. For more info, see https://wiki.archlinux.org/index.php/Installation_guide#Verify_signature.

To verify the signature, run gpg --keyserver-options auto-key-retrieve --verify file.sig, replacing file.sig with the downloaded .sig file. You’ll see information about who created this signature. We want to know that this signature is from an Arch Linux developer. To be pretty confident of that, go to https://www.archlinux.org/people/developers/ and find the developer who is said to have created this signature. Under their info click on the link that says “PGP Key”. You’ll see a bunch of info about their PGP key. Check that the “Fingerprint” listed near the top of this page matches the “Primary key fingerprint” in the gpg output. If not, something is very wrong!

If the gpg commmand above to verify the signature doesn’t work, try using the OCF’s keyserver by appending this to the original command: --keyserver=hkp://pgp.ocf.berkeley.edu.

GRADESCOPE: Who is the developer that created this signature, and what is their key’s fingerprint?

Note that this test is only resilient if www.archlinux.org is not compromised. To be more confident, you might want to try to find this fingerprint somewhere else, or independently verify this key somehow. Traditionally this is done via the web of trust, but new systems such as Keybase are trying to find alternate solutions to this fundamental problem of “How do I know you are who you say you are?”

Create VM

Now, we need to actually create our VM. (Don’t run this command until you’ve read the next paragraph!) To do this, you will run

sudo virt-install --name archvm --memory 704 --cpu host --vcpus 1 --disk size=5 --network network=default --boot uefi --graphics none --cdrom archlinux-2019.09.01-x86_64.iso

This will do some initial setup, and then drop us into a virtual serial console connected to the VM. You’ll see some boot options and a timer. Press e to edit the first boot option before the timer runs out! By default, Arch will not enable the serial console we are currently using to connect to the VM, so we need to manually add it to the kernel command line. Once we’re editing this command line, hit END to jump to the end of the line and carefully add a space and then console=ttyS0. When you’re confident you’ve typed it right, hit enter to start booting the kernel.

Normally, we use “terminal emulators” (like Terminal or iTerm on Mac, or GNOME Terminal or Terminator on Linux) to access the console. Alternatively, we can use the console via terminals running via a keyboard and display. (To see this, hit ctrl-alt-F1 on an OCF lab computer. Hit ctrl-alt-F7 to get back to the GUI.) The option we are using here is accessing the console via a serial port connection, which in the modern age is pretty rare. However, we’re not using a real serial port of course—since we’re using a virtual machine, the serial port is also virtualized. libvirt provides convenient this convenient tooling to connect to this virtual serial port from the console we already have with the host machine (your student VM).

We’re communicating with the VM via this virtual serial connection. If you wanted to detatch from this, you would hit the key combo ctrl-], and reconnect by running the command sudo virsh console archvm.

In most cases if you disconnect and have to restart in the middle, you can just run sudo virsh console archvm to reconnect. If you really mess up, you can start things over by stopping the VM (sudo virsh destroy archvm), deleting the VM (sudo virsh undefine archvm --nvram --remove-all-storage), and then recreating it by starting over from the command above. If for some reason, your VM gets turned off before you are done installing Arch, it may be possible to reconnect the archiso and reboot the VM using the archiso. However, doing this is out of scope for this lab.

You should start seeing a bunch of lines that start with [ OK ], and eventually you will see a login prompt.

Since the installation media is basically just a preinstalled Linux system that makes it easy to install Linux on another system, this is the info about the booting “archiso” system. So, nothing is actually installed on the VM yet, but we are booting into a temporary Linux system that will help us install Linux permanently on the VM.

Once you see a login prompt, enter root as the username, and you should get in! You’re now at the archiso command line.

GRADESCOPE: Use the hostname command to get the current hostname. What is it? Then, use ip addr to find the current IP address (look at the inet line under ens2, it’s the stuff at the beginning of the line, not including the /24). What is your IP address? In another terminal, try to ping this IP address from your student VM, and then also from tsunami (ssh.ocf.berkeley.edu). Can either of them reach it?

We’ll discuss networking, and why you see these results, in a later lecture.

Notice we are booted using UEFI (hence -boot uefi in the virt-install line). Check out the ArchWiki (https://wiki.archlinux.org/index.php/Installation_guide#Verify_the_boot_mode) for more info. Also, notice that we are already connected to the internet! Our virtual network uses DHCP, and the archiso knows to try to autoconfigure using DHCP. We will talk more about DHCP and other networking configuration later in the course. Finally, notice that we’ve conveniently inherited the system clock, so the time inside the VM should be right. You can run date to see.

Partitioning time

The first order of business is to set up the partitions that we will be using for this installation. We are going to use a simple partition scheme:

an EFI system partition (ESP)
one encrypted partition that holds our root filesystem
a small partition for swap space

This is a very common scheme and is totally a scheme you could use if you were installing Arch on your own computer (except that you’ll hopefully have more than 5GiB to work with)!

GRADESCOPE: Give a brief and basic explanation of the purpose of having (1) the EFI system partition and (2) the swap partition. These explanations can be as short as one sentence, just try to get a general sense of why we have these partitions.

Create partitions

We will use fdisk to create our partitions. Before we start the actual partitioning process, run fdisk -l. You should see that your VM has a single storage device called /dev/sda. Linux has special device files that correspond to each “block device” (e.g. hard drives, SSDs, and partitions on the aformentioned storage devices) discovered by the operating system. These device files live in the /dev folder. Read more about the naming convention for storage devices and their partitions here.

Test your understanding: What would be the name of the device file corresponding to the fourth partition on the second discovered storage device?

Now to start the actual partitioning process, run fdisk /dev/sda.

Note that you’ll be making a bunch of changes to the partition table, but these changes are not made immediately. So, if you make a mistake, you can just start again from step 1 without damaging anything.

Create a GPT table by typing g (then hit enter).
Create your ESP by typing n. Use the default partition number and first sector (just hit enter to automatically take the default). For the last sector, type +512M to allocate 512MiB for the partition.
Type t to change the type of the new partition (partition 1) to “EFI System”. You will have to list the types and find the right ID. (Note: press q to get out of the list of filesystem types).
Create your main root partition by typing n again. Use the default partition number and first sector. To leave 512MiB at the end of the disk for our swap partition, type -512M for the last sector. (This partition should already have the correct type, “Linux filesystem”, so you won’t need to change that.)
Create your swap partition by typing n one last time, and just take all the defaults.
Type t to change the type of the swap partition (partition 3) to “Linux swap”, as in step 3.
Type p to show the current partition table with all the changes you’ve made. You should see an “EFI System” partition that is 512MiB, a “Linux filesystem” partition that is 4GiB, and a “Linux swap” partition that is 512MiB.
If everything looks right, type w to finally actually write the changes to disk.

At this point, fdisk should exit.

GRADESCOPE: To see your final results, run fdisk -l /dev/sda and paste the output into Gradescope.

At this point, you’ve created the disk partitions, but nothing is on them yet. Now we need to put the filesystems you will use on these partitions!

Create filesystems

ESP

The ESP (EFI System Partition) has to be in a FAT format. For more information about the ESP, see https://wiki.archlinux.org/index.php/EFI_system_partition.

Your ESP should be /dev/sda1 (refer back to output from fdisk -l /dev/sda). Run mkfs.fat -F32 /dev/sda1 to create the FAT32 filesystem.

These instructions are straight from https://wiki.archlinux.org/index.php/EFI_system_partition#Format_the_partition.

Root partition

Our root partition should be /dev/sda2.

We are going to encrypt the root partition. This makes things a bit more complicated, but thankfully the ArchWiki will help us out!

Encryption is basically essentially in modern devices. If you do not encrypt your drive, anyone who has access to that drive can simply take it out of your machine and read everything on it. Your account password doesn’t protect against anything. However, if you do use encryption, you will not be able to get anything off the disk at all without the encryption password.

Wiping

Before setting up the encrypted partition, we need to wipe the partition with random data. The instructions below are based on the ArchWiki instructions.

Why wipe your drive? You can tell the difference blank data on a disk and encrypted data, but not between random data and encrypted data. Filling the disk with random data before setting up the encryption protects information such as how full the partition is.

cryptsetup open --type plain -d /dev/urandom /dev/sda2 to_be_wiped
dd if=/dev/zero of=/dev/mapper/to_be_wiped status=progress bs=1M
cryptsetup close to_be_wiped

This generates a random key and pretends that our partition is encrypted with it. So, if we fill the entire encrypted drive with zeros, it will fill the real drive with fake encrypted data (indistinguishable from random data). This might take a sec to do! Don’t worry.

Setting up the encrypted device

Again, this is taken from the ArchWiki. Follow along!

First, set up the encrypted partition. For this lab, use the encryption passphrase ilovelinux.

cryptsetup -y -v luksFormat /dev/sda2

A note on passphrases: don’t use a passphrase like this! A good way to come up with passphrases is just to take 6 random words (and I mean actually random, not like xkcd.com/1210 random). One way to generate such a password is the command shuf -n 6 /usr/share/dict/words (you might have to install a package like wamerican on Ubuntu/Debian or words on Arch). We will talk more about security (including password security) later in the course.

Now that the encrypted partition has been created, let’s open it!

cryptsetup open /dev/sda2 cryptroot

GRADESCOPE: Run lsblk to see the hierarchy of these partitions. Paste this output into Gradescope!

Creating the actual filesystem

We now have a blank pseudo-partition (“block device”) /dev/mapper/cryptroot, that’s actually backed by encrypted /dev/sda2 disk partition. However, there’s no filesystem here yet. Create it by doing

mkfs.ext4 /dev/mapper/cryptroot

GRADESCOPE: What previous command we ran is similar to this command, and what’s the difference? (Bonus: why?)

Swap

Our swap partition should be /dev/sda3. The swap space isn’t really a filesystem, but we still need to set it up. Thankfully, this is pretty straightforward:

mkswap /dev/sda3

Then, enable it with

swapon /dev/sda3

If you don’t know about swap, it is basically a specially designated section of disk that the operating system can use as memory in case it runs out of actual memory. For more info, check out https://wiki.archlinux.org/index.php/Swap.

Mount the new filesystems

We’ve now created all our filesystems, but in order to access them, we need to mount them.

What does mounting mean? The man page for the mount command provides a nice description: “All files accessible in a Unix system are arranged in one big tree, the file hierarchy, rooted at /. These files can be spread out over several devices. The mount command serves to attach the filesystem found on some device to the big file tree. Conversely, the umount(8) command will detach it again.” We are currently in the “file tree” of the Arch live environment, which does not include the new ESP and root filesystems we created for our future Arch installation. Using mount, we can attach these filesystems to our live environment filesystem and access them as if they were directories at the mount point. Once the ESP and root filesystems are visible to the current filesystem, we can start installing a boot loader to the ESP filesystem and our permanent Arch OS to the root filesystem.

We’re going to use /mnt as the temporary root of the new Arch installation. Thus, we mount the root filesystem at /mnt, and the ESP at /mnt/boot (since the ESP will normally be mounted at /boot). Hopefully this will make a little more sense once we begin installing the system.

mount /dev/mapper/cryptroot /mnt
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot

GRADESCOPE:

You can see every single filesystem that’s currently mounted by running the mount command. Run it, and look for the two filesystems we just mounted. Copy all the output over to Gradescope.
Another common Linux partitioning scheme is to have a separate /home partition in addition to the partitions we created above. This can be useful in case something goes catastrophically wrong with your operating system to the point where you need to reinstall it. You can wipe your root filesystem and reinstall your OS while leaving your /home partition untouched, saving your user folder(s) with personal files like documents and photos without needing to restore from a backup. Hypothetically, if we had made a separate /home partition and filesystem in the previous steps, what command(s) would we need to run to mount it? Assume the /home partition is /dev/sda3.

Installation

With our partitions all set up and ready to get going, we can start installing out system onto them!

Much of this will be straight from the ArchWiki installation guide, starting at https://wiki.archlinux.org/index.php/Installation_guide#Installation. It’s a good idea to follow along there as well to see what modifications we are making.

Mirrors

We need to tell Arch which software mirror to use when downloading packages. Go ahead and just put the OCF mirror at the top! Edit /etc/pacman.d/mirrorlist using vim/emacs/nano and put the following line at the top:

Server = https://mirrors.ocf.berkeley.edu/archlinux/$repo/os/$arch

(We’re usually lower down in the file but putting us at the top will make things a bit faster to install.)

Continue according to ArchWiki!

Follow the arch wiki exactly from https://wiki.archlinux.org/index.php/Installation_guide#Install_essential_packages down until you get to the “Initramfs” section. If you get stuck, check out the following hints:

vim is not installed in the chroot by default. You can do pacman -S vim to install it, or just use plain old vi.
Our timezone is America/Los_Angeles.
The locale you want to use is en_US.UTF-8.
You shouldn’t need to change the keyboard layout.
You can pick whatever you want for your hostname. OCF machines are all named after “disasters”, so it can be fun to come up with a creative name that fit that theme.

initramfs

The initramfs is the minimal stuff that’s loaded into RAM that helps Linux get started when you first boot. On Arch, there’s a system to generate this initramfs. You need to tell it to support disk encryption.

Based on the ArchWiki’s instructions, we need to edit /etc/mkinitcpio.conf. You should see a line like

HOOKS=(base udev autodetect ...)

We need to add the keymap and encrypt hooks, and move the keyboard hook earlier in the line. When you’re done, it should look like

HOOKS=(base udev autodetect keyboard keymap modconf block encrypt filesystems fsck)

The order of these hooks matters! Be careful.

Run mkinitcpio -p linux to generate the new initramfs based on the new config file.

Root password

Use passwd to set the root password to itoolovelinux.

As mentioned for the disk encryption passphrase, don’t use a password like this in real life! Choose a strong root password using a method like the one from earlier. User programs shouldn’t be able to compromise this password, because it would allow a single program (which should generally be untrusted) to access everything on your computer.

Boot loader

Historically, GRUB has been the only reasonable choice of boot loader. However, with the advent of UEFI, there are lots of different good options (or my personal favorite, no bootloader at all). In this lab, we are going to use systemd-boot.

We can install systemd-boot to our ESP by running

bootctl --path=/boot install

We now need to tell systemd-boot that our Arch Linux installation exists and how to start it! Create a bootloader entry by creating the file /boot/loader/entries/arch.conf with these contents:

title   Arch Linux
linux   /vmlinuz-linux
initrd  /initramfs-linux.img
options cryptdevice=UUID=290d6a44-2964-48a0-a71e-ea3df0525987:cryptroot root=/dev/mapper/cryptroot rw console=ttyS0

IMPORTANT NOTE: You will need to replace 290d6a44-2964-48a0-a71e-ea3df0525987 with the correct UUID for your encrypted /dev/sda2 partition. To find the correct UUID, run ls -l /dev/disk/by-uuid and look for the line with -> ../../sda2. This line will have the UUID you need. If you make a mistake here, you can mess things up pretty bad! TRIPLE-CHECK this file!

We could use a name like /dev/sda2 instead of UUID=290d6a44-2964-48a0-a71e-ea3df0525987 here. However, these names like /dev/sdXX are not guaranteed to always be the same. A kernel update or hardware changes can mean that /dev/sda2 is now /dev/sdb2, or vice versa. This will mean we won’t be able to boot, and in particularly bad cases, can lead to data loss. For this reason, we should use UUIDs whenever possible, because they cannot change. See https://wiki.archlinux.org/index.php/Persistent_block_device_naming for more info about this.

Also, remember how we had to add console=ttyS0 when we first booted the archiso? We can add that option here to make it permanent for the new system, so the console will always be enabled.

Type exit to exit the chroot and then type reboot to REBOOT AND FINISH THE INSTALLATION! Congrats!

If things don’t work after you reboot, don’t panic! Please make a Piazza post or ask for help on Slack, and we will try to help you recover things.

When it reboots, you will be asked for the disk encryption password, and then you should see a login prompt. Log in as root using the password you set!

GRADESCOPE: To show off your newly installed Arch system, run the following commands and paste the output into Gradescope:

hostname -f
ip addr
mount
lsblk
uname -a

When you are done, you can hit ctrl-] to detach your console from the Arch VM and get back to your normal student VM shell.