Advanced Lab 3 - Pre-install Setup and Installation
Facilitator: Laksith24 min read
- A preliminary message (read this section carefully!)
- Partitioning time
In this lab we will be installing Arch Linux into a virtual machine. In this lab you will need to pay close attention to detail: mistakes may not be apparent for quite a while later, and can cause huge headaches. As always, make sure to take advantage of #decal-general and other resources if you need help! Also, in this lab, the ArchWiki will be an invaluable resource that we will be referencing a lot.
Throughout the lab, additional notes will be formatted like this. These notes are the only sections of the lab that you can safely skip. They will provide additional interesting information and context, so I recommend reading them anyway!
As always, you will need to submit this lab on Gradescope. Any time you see “GRADESCOPE:” in this lab, stop and make sure that you have submitted the relevant answer or output to Gradescope. In most cases, it will be hard or impossible to go back and get the answer later in the lab. If you realize you forgot to submit something, do not worry! Simply explain what you did / what command you ran rather than restarting.
We chose to install Arch Linux in this lab for a few reasons. First, Arch Linux gives you tools to handle monotonous tasks (like
genfstab, which does the dirty work of writing out
/etc/fstab) while leaving the more complicated and more customizable tasks to you. Second, Arch is widely used and experience with Arch can usually be applied to other distributions as well. Third, ArchWiki. Seriously. It’s that good.
Make sure you allocate about 2 hours for this lab (don’t do it last minute!) Trying to rush the process may result in skipping over key details and could require you to restart. That being said though, it ideally shouldn’t take much longer than that. If you’ve had to restart multiple times or have otherwise tried for a long time with no success, contact staff. We don’t want you to get stuck for 6 hours etc.
We will be setting up virtual machine inside your student VM.
Yes, this is a “nested” virtual machine.
First, connect to your student VM using the credentials that were emailed to you. If you haven’t yet, you will need to reset your password when you first log in. Please use a safe password and don’t lose it!
If you’re enrolled in this DeCal, you should have a student VM. If you’re not, we can’t guarantee that these techniques will work on other systems (for a variety of reasons). However, you can obtain a system that’s very similar to a student VM by creating a Droplet on DigitalOcean. Normally this Droplet would cost some amount of money, but you can get free DigitalOcean credits by using the GitHub Student Developer Pack. You can follow the instructions on the DigitalOcean website to create the Droplet. When selecting an image, choose Ubuntu 20.04.
PLEASE READ: Before starting the lab,
sudo apt update, then
sudo apt upgrade, and finally
reboot. You will lose connection when your DO droplet reboots, then
ssh back in and begin the lab.
Now that you are connected to your student VM, we will need to install some packages that will allow us to create our nested Arch Linux VM. Run
sudo apt install --no-install-recommends qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients virtinst ovmf
QEMU is a system for running virtual machines, and it can use KVM, which is a kernel module that supports the virtualization under the hood. We can install this with the package
libvirt-daemon-system and libvirt-clients) is a system for managing these QEMU/KVM virtual machines.
qemu-utilsprovides necessary tools for creating virtual drives, and is used by
virtinstwill provide the script
virt-install, which we will use to create the virtual machine. We add
virtinsthas a recommended dependency
virt-viewer, which depends on the X11 system. However, this is a lot of packages that are only useful if we have a GUI (which we don’t!) so we want to skip recommended dependencies. Finally,
ovmfis an additional library that will let us use UEFI (instead of the older BIOS spec) to boot the VM.
Next, we need to get the installation media. Go to
https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/ to find the latest Arch
release. Using the
wget command within your VM, download the
.iso file found
within this directory as well as the
.iso.sig signature file that we will use
to verify the authenticity of the
.iso file. For example, the current release
at the time of writing is
2021.09.01, so you would run the following commands:
Currently the OCF mirrors are under maintenance, and will lead to really slow download speeds, so we will use another mirror.
wget 'http://ca.us.mirror.archlinux-br.org/iso/2021.09.01/archlinux-2021.09.01-x86_64.iso' wget 'http://ca.us.mirror.archlinux-br.org/iso/2021.09.01/archlinux-2021.09.01-x86_64.iso.sig'
Note that Arch is a “rolling release” distribution (as opposed to distros like Ubuntu that release distinct versions e.g. Ubuntu 18.04, Ubuntu 20.04), so it is updated frequently. Therefore, the links used above are likely out of date by the time you’re reading this! Make sure to find the correct files for the latest release, do not just copy and paste the commands above.
You can run
ls to see the two files that were downloaded.
Notice that we are downloading this from the OCF software mirrors
mirrors.ocf.berkeley.edu). There’s nothing stopping the OCF from
.iso file and inserting malicious code that will end
up running on our machine. To avoid this, we need to verify that the
signature we downloaded is valid, and matches the file we downloaded.
For more info, see
To verify the signature, run
auto-key-retrieve --verify file.sig, replacing
file.sig with the
.sig file. You’ll see information about who created this
signature. We want to know that this signature is from an Arch Linux
developer. To be pretty confident of that, go to
and find the developer who is said to have created this
signature. Under their info click on the link that says “PGP
Key”. You’ll see a bunch of info about their PGP key. Check that the
“Fingerprint” listed near the top of this page matches the “Primary
key fingerprint” in the
gpg output. If not, something is very
gpgcommmand above to verify the signature doesn’t work, try using the OCF’s keyserver by appending this to the original command:
GRADESCOPE: Who is the developer that created this signature, and what is their key’s fingerprint?
Note that this test is only resilient if www.archlinux.org is not compromised. To be more confident, you might want to try to find this fingerprint somewhere else, or independently verify this key somehow. Traditionally this is done via the web of trust, but new systems such as Keybase are trying to find alternate solutions to this fundamental problem of “How do I know you are who you say you are?”
Now, we need to actually create our VM. (Don’t run this command until you’ve read the next paragraph!) To do this, you will run
sudo virt-install --name archvm --memory 575 --cpu host --vcpus 1 --disk size=5 --network network=default --boot uefi --graphics none --cdrom archlinux-2021.09.01-x86_64.iso
This will do some initial setup, and then drop us into a virtual
serial console connected to the VM. You’ll see some boot options and a
e to edit the first boot option before the timer runs
out! By default, Arch will not enable the serial console we are
currently using to connect to the VM, so we need to manually add it to
the kernel command line. Once we’re editing this command line, hit
END to jump to the end of the line and carefully add a space and
console=ttyS0. When you’re confident you’ve typed it right, hit
enter to start booting the kernel.
Normally, we use “terminal emulators” (like Terminal or iTerm on Mac, or GNOME Terminal or Terminator on Linux) to access the console. Alternatively, we can use the console via terminals running via a keyboard and display. (To see this, hit
ctrl-alt-F1on an OCF lab computer. Hit
ctrl-alt-F7to get back to the GUI.) The option we are using here is accessing the console via a serial port connection, which in the modern age is pretty rare. However, we’re not using a real serial port of course—since we’re using a virtual machine, the serial port is also virtualized. libvirt provides convenient this convenient tooling to connect to this virtual serial port from the console we already have with the host machine (your student VM).
We’re communicating with the VM via this virtual serial connection. If
you wanted to detatch from this, you would hit the key combo
and reconnect by running the command
sudo virsh console archvm.
In most cases if you disconnect and have to restart in the middle, you can just run
sudo virsh console archvmto reconnect. If you really mess up, you can start things over by stopping the VM (
sudo virsh destroy archvm), deleting the VM (
sudo virsh undefine archvm --nvram --remove-all-storage), and then recreating it by starting over from the command above. If for some reason, your VM gets turned off before you are done installing Arch, it may be possible to reconnect the archiso and reboot the VM using the archiso. However, doing this is out of scope for this lab.
You should start seeing a bunch of lines that start with
[ OK ],
and eventually you will see a login prompt.
Since the installation media is basically just a preinstalled Linux system that makes it easy to install Linux on another system, this is the info about the booting “archiso” system. So, nothing is actually installed on the VM yet, but we are booting into a temporary Linux system that will help us install Linux permanently on the VM.
Once you see a login prompt, enter
root as the username, and you
should get in! You’re now at the archiso command line.
Debugging Note: If you are thrown out of the VM unexpectedly and
were unable to complete the install, when rebooting you may be booted
into the UEFI Shell (since Arch hasn’t actually been installed on the
disk yet). To resolve this, follow these instructions
to boot back into the installer instead, replacing the
/isos/systemresucecd-x86-2.2.0.iso path with the location of your Arch ISO.
GRADESCOPE: Use the
hostnamectl command to get the current
hostname. What is it? Then, use
ip addr to find the current IP
address (look at the
inet line under
ens2, it’s the stuff at the
beginning of the line, not including the
/24). What is your IP
address? In another terminal, try to ping this IP address from your
student VM, and then also from tsunami (ssh.ocf.berkeley.edu). Can
either of them reach it?
We’ll discuss networking, and why you see these results, in a later lecture.
Notice we are booted using UEFI (hence
-boot uefiin the virt-install line). Check out the ArchWiki (https://wiki.archlinux.org/index.php/Installation_guide#Verify_the_boot_mode) for more info. Also, notice that we are already connected to the internet! Our virtual network uses DHCP, and the archiso knows to try to autoconfigure using DHCP. We will talk more about DHCP and other networking configuration later in the course. Finally, notice that we’ve conveniently inherited the system clock, so the time inside the VM should be right. You can run
The first order of business is to set up the partitions that we will be using for this installation. We are going to use a simple partition scheme:
- an EFI system partition (ESP)
- one encrypted partition that holds our root filesystem
- a small partition for swap space
This is a very common scheme and is totally a scheme you could use if you were installing Arch on your own computer (except that you’ll hopefully have more than 5GiB to work with)!
GRADESCOPE: Give a brief and basic explanation of the purpose of having (1) the EFI system partition and (2) the swap partition. These explanations can be as short as one sentence, just try to get a general sense of why we have these partitions.
We will use
fdisk to create our
partitions. Before we start the actual partitioning process, run
fdisk -l. You
should see that your VM has a single storage device called
/dev/sda. Linux has
special device files that
correspond to each “block device” (e.g. hard drives, SSDs, and partitions on the
aformentioned storage devices) discovered by the operating system. These device
files live in the
/dev folder. Read more about the naming convention for
storage devices and their partitions
Test your understanding: What would be the name of the device file corresponding to the fourth partition on the second discovered storage device?
Now to start the actual partitioning process, run
Note that you’ll be making a bunch of changes to the partition table, but these changes are not made immediately. So, if you make a mistake, you can just start again from step 1 without damaging anything.
- Create a GPT table by typing
g(then hit enter).
- Create your ESP by typing
n. Use the default partition number and first sector (just hit enter to automatically take the default). For the last sector, type
+512Mto allocate 512MiB for the partition.
tto change the type of the new partition (partition 1) to “EFI System”. You will have to list the types and find the right ID. (Note: press
qto get out of the list of filesystem types).
- Create your main root partition by typing
nagain. Use the default partition number and first sector. To leave 512MiB at the end of the disk for our swap partition, type
-512Mfor the last sector. (This partition should already have the correct type, “Linux filesystem”, so you won’t need to change that.)
- Create your swap partition by typing
none last time, and just take all the defaults.
tto change the type of the swap partition (partition 3) to “Linux swap”, as in step 3.
pto show the current partition table with all the changes you’ve made. You should see an “EFI System” partition that is 512MiB, a “Linux filesystem” partition that is 4GiB, and a “Linux swap” partition that is 512MiB.
- If everything looks right, type
wto finally actually write the changes to disk.
At this point,
fdisk should exit.
GRADESCOPE: To see your final results, run
fdisk -l /dev/sda and
paste the output into Gradescope.
At this point, you’ve created the disk partitions, but nothing is on them yet. Now we need to put the filesystems you will use on these partitions!
The ESP (EFI System Partition) has to be in a FAT format. For more information about the ESP, see https://wiki.archlinux.org/index.php/EFI_system_partition.
Your ESP should be
/dev/sda1 (refer back to output from
fdisk -l /dev/sda). Run
mkfs.fat -F32 /dev/sda1 to create the
These instructions are straight from https://wiki.archlinux.org/index.php/EFI_system_partition#Format_the_partition.
Our root partition should be
We are going to encrypt the root partition. This makes things a bit more complicated, but thankfully the ArchWiki will help us out!
Encryption is basically essentially in modern devices. If you do not encrypt your drive, anyone who has access to that drive can simply take it out of your machine and read everything on it. Your account password doesn’t protect against anything. However, if you do use encryption, you will not be able to get anything off the disk at all without the encryption password.
A previous version of this lab mentioned that you should wipe your device before encrypting.
While this may provide some benefits on older (spinning-disk) hard drives, it will significantly reduce the lifetime of modern solid-state drives. As DigitalOcean VMs, probably your laptop, and almost all modern computers use solid state drives, this section has been removed.
If you’d like to see the older version of this lab, see here.
Again, this is taken from the ArchWiki. Follow along!
First, set up the encrypted partition. For this lab, use the
cryptsetup -y -v luksFormat /dev/sda2
A note on passphrases: don’t use a passphrase like this! A good way to come up with passphrases is just to take 6 random words (and I mean actually random, not like xkcd.com/1210 random). One way to generate such a password is the command
shuf -n 6 /usr/share/dict/words(you might have to install a package like
wamericanon Ubuntu/Debian or
wordson Arch). We will talk more about security (including password security) later in the course.
Now that the encrypted partition has been created, let’s open it!
cryptsetup open /dev/sda2 cryptroot
lsblk to see the hierarchy of these partitions.
Paste this output into Gradescope!
We now have a blank pseudo-partition (“block device”)
/dev/mapper/cryptroot, that’s actually backed by encrypted
/dev/sda2 disk partition. However, there’s no filesystem here
yet. Create it by doing
GRADESCOPE: What previous command we ran is similar to this command, and what’s the difference? (Bonus: why?)
Our swap partition should be
/dev/sda3. The swap space isn’t really
a filesystem, but we still need to set it up. Thankfully, this is
Then, enable it with
If you don’t know about swap, it is basically a specially designated section of disk that the operating system can use as memory in case it runs out of actual memory. For more info, check out https://wiki.archlinux.org/index.php/Swap.
We’ve now created all our filesystems, but in order to access them, we need to mount them.
What does mounting mean? The
manpage for the
mountcommand provides a nice description: “All files accessible in a Unix system are arranged in one big tree, the file hierarchy, rooted at /. These files can be spread out over several devices. The mount command serves to attach the filesystem found on some device to the big file tree. Conversely, the umount(8) command will detach it again.” We are currently in the “file tree” of the Arch live environment, which does not include the new ESP and root filesystems we created for our future Arch installation. Using
mount, we can attach these filesystems to our live environment filesystem and access them as if they were directories at the mount point. Once the ESP and root filesystems are visible to the current filesystem, we can start installing a boot loader to the ESP filesystem and our permanent Arch OS to the root filesystem.
We’re going to use
/mnt as the temporary root of the new Arch installation.
Thus, we mount the root filesystem at
/mnt, and the ESP at
the ESP will normally be mounted at
/boot). Hopefully this will make a little
more sense once we begin installing the system.
mount /dev/mapper/cryptroot /mnt mkdir /mnt/boot mount /dev/sda1 /mnt/boot
You can see every single filesystem that’s currently mounted by running the
mountcommand. Run it, and look for the two filesystems we just mounted. Copy all the output over to Gradescope.
Another common Linux partitioning scheme is to have a separate
/homepartition in addition to the partitions we created above. This can be useful in case something goes catastrophically wrong with your operating system to the point where you need to reinstall it. You can wipe your root filesystem and reinstall your OS while leaving your
/homepartition untouched, saving your user folder(s) with personal files like documents and photos without needing to restore from a backup. Hypothetically, if we had made a separate
/homepartition and filesystem in the previous steps, what command(s) would we need to run to mount it? Assume the
With our partitions all set up and ready to get going, we can start installing out system onto them!
Much of this will be straight from the ArchWiki installation guide, starting at https://wiki.archlinux.org/index.php/Installation_guide#Installation. It’s a good idea to follow along there as well to see what modifications we are making.
Please skip this step as the OCF mirrors are currently down and doing this would make the download speeds really slow.
We need to tell Arch which software mirror to use when downloading
packages. Go ahead and just put the OCF mirror at the top! Edit
nano and put the
following line at the top:
Server = https://mirrors.ocf.berkeley.edu/archlinux/$repo/os/$arch
(We’re usually lower down in the file but putting us at the top will make things a bit faster to install.)
Follow the arch wiki exactly from https://wiki.archlinux.org/index.php/Installation_guide#Install_essential_packages down until you get to the “Initramfs” section. If you get stuck, check out the following hints:
vimis not installed in the chroot by default. You can do
pacman -S vimto install it, or just use plain old
- Our timezone is
- The locale you want to use is
- You shouldn’t need to change the keyboard layout.
- You can pick whatever you want for your hostname. OCF machines are all named after “disasters”, so it can be fun to come up with a creative name that fit that theme.
The initramfs is the minimal stuff that’s loaded into RAM that helps Linux get started when you first boot. On Arch, there’s a system to generate this initramfs. You need to tell it to support disk encryption.
Based on the ArchWiki’s
we need to edit
/etc/mkinitcpio.conf. You should see a line like
HOOKS=(base udev autodetect ...)
We need to add the
encrypt hooks, and move the
keyboard hook earlier in the line. When you’re done, it should look
HOOKS=(base udev autodetect keyboard keymap modconf block encrypt filesystems fsck)
The order of these hooks matters! Be careful.
mkinitcpio -p linux to generate the new initramfs based on the
new config file.
passwd to set the root password to
As mentioned for the disk encryption passphrase, don’t use a password like this in real life! Choose a strong root password using a method like the one from earlier. User programs shouldn’t be able to compromise this password, because it would allow a single program (which should generally be untrusted) to access everything on your computer.
Historically, GRUB has been the only reasonable choice of boot loader. However, with the advent of UEFI, there are lots of different good options (or my personal favorite, no bootloader at all). In this lab, we are going to use systemd-boot.
We can install systemd-boot to our ESP by running
bootctl --path=/boot install
We now need to tell systemd-boot that our Arch Linux installation
exists and how to start it! Create a bootloader entry by creating the file
/boot/loader/entries/arch.conf with these contents:
title Arch Linux linux /vmlinuz-linux initrd /initramfs-linux.img options cryptdevice=UUID=290d6a44-2964-48a0-a71e-ea3df0525987:cryptroot root=/dev/mapper/cryptroot rw console=ttyS0
IMPORTANT NOTE: You will need to replace
290d6a44-2964-48a0-a71e-ea3df0525987 with the correct UUID for your
/dev/sda2 partition. To find the correct UUID, run
/dev/disk/by-uuid and look for the line with
-> ../../sda2. This
line will have the UUID you need. If you make a mistake here, you
can mess things up pretty bad! TRIPLE-CHECK this file!
We could use a name like
UUID=290d6a44-2964-48a0-a71e-ea3df0525987here. However, these names like
/dev/sdXXare not guaranteed to always be the same. A kernel update or hardware changes can mean that
/dev/sdb2, or vice versa. This will mean we won’t be able to boot, and in particularly bad cases, can lead to data loss. For this reason, we should use UUIDs whenever possible, because they cannot change. See https://wiki.archlinux.org/index.php/Persistent_block_device_naming for more info about this.
Also, remember how we had to add
console=ttyS0when we first booted the archiso? We can add that option here to make it permanent for the new system, so the console will always be enabled.
exit to exit the chroot and then type
reboot to REBOOT AND
FINISH THE INSTALLATION! Congrats!
If things don’t work after you reboot, don’t panic! Please make a Ask for help on #decal-general, and we will try to help you recover things.
When it reboots, you will be asked for the disk encryption password, and then you should see a login prompt. Log in as root using the password you set!
GRADESCOPE: To show off your newly installed Arch system, run the following commands and paste the output into Gradescope:
hostname -f ip addr mount lsblk uname -a
Debugging Note: If the
hostname command is not found, you may need to install it by running
sudo pacman -S inetutils. Doing so may also reveal that your internet connection is not yet configured. If this is the case, then you will likely need to
sudo systemctl enable systemd-resolved and/or
sudo systemctl enable systemd-networkd and/or
sudo systemctl enable dhcpcd and/or
ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf . You can see more information about this in the ArchWiki: https://wiki.archlinux.org/index.php/systemd-networkd. If these instructions do not work, you may also need to reboot into the Arch installer and use pacstrap to install dhcpcd.
When you are done, you can hit
ctrl-] to detach your console from
the Arch VM and get back to your normal student VM shell.