Advanced Lab 3 - Pre-install Setup and Installation
Facilitator: Laksith
24 min readTable of contents
A preliminary message (read this section carefully!)
In this lab we will be installing Arch Linux into a virtual machine. In this lab you will need to pay close attention to detail: mistakes may not be apparent for quite a while later, and can cause huge headaches. As always, make sure to take advantage of #decal-general and other resources if you need help! Also, in this lab, the ArchWiki will be an invaluable resource that we will be referencing a lot.
Throughout the lab, additional notes will be formatted like this. These notes are the only sections of the lab that you can safely skip. They will provide additional interesting information and context, so I recommend reading them anyway!
As always, you will need to submit this lab on Gradescope. Any time you see “GRADESCOPE:” in this lab, stop and make sure that you have submitted the relevant answer or output to Gradescope. In most cases, it will be hard or impossible to go back and get the answer later in the lab. If you realize you forgot to submit something, do not worry! Simply explain what you did / what command you ran rather than restarting.
We chose to install Arch Linux in this lab for a few reasons. First, Arch Linux gives you tools to handle monotonous tasks (like
genfstab
, which does the dirty work of writing out/etc/fstab
) while leaving the more complicated and more customizable tasks to you. Second, Arch is widely used and experience with Arch can usually be applied to other distributions as well. Third, ArchWiki. Seriously. It’s that good.
Make sure you allocate about 2 hours for this lab (don’t do it last minute!) Trying to rush the process may result in skipping over key details and could require you to restart. That being said though, it ideally shouldn’t take much longer than that. If you’ve had to restart multiple times or have otherwise tried for a long time with no success, contact staff. We don’t want you to get stuck for 6 hours etc.
Setup
We will be setting up virtual machine inside your student VM.
Yes, this is a “nested” virtual machine.
First, connect to your student VM using the credentials that were emailed to you. If you haven’t yet, you will need to reset your password when you first log in. Please use a safe password and don’t lose it!
If you’re enrolled in this DeCal, you should have a student VM. If you’re not, we can’t guarantee that these techniques will work on other systems (for a variety of reasons). However, you can obtain a system that’s very similar to a student VM by creating a Droplet on DigitalOcean. Normally this Droplet would cost some amount of money, but you can get free DigitalOcean credits by using the GitHub Student Developer Pack. You can follow the instructions on the DigitalOcean website to create the Droplet. When selecting an image, choose Ubuntu 20.04.
PLEASE READ: Before starting the lab,
run sudo apt update
, then sudo apt upgrade
, and finally sudo
reboot
. You will lose connection when your DO droplet reboots, then
ssh
back in and begin the lab.
Now that you are connected to your student VM, we will need to install some packages that will allow us to create our nested Arch Linux VM. Run
sudo apt install --no-install-recommends qemu-kvm qemu-utils libvirt-daemon-system libvirt-clients virtinst ovmf
QEMU is a system for running virtual machines, and it can use KVM, which is a kernel module that supports the virtualization under the hood. We can install this with the package
qemu-kvm
.libvirt
(installed vialibvirt-daemon-system and libvirt-clients
) is a system for managing these QEMU/KVM virtual machines.qemu-utils
provides necessary tools for creating virtual drives, and is used byvirt-install
.virtinst
will provide the scriptvirt-install
, which we will use to create the virtual machine. We add--no-install-recommends
becausevirtinst
has a recommended dependencyvirt-viewer
, which depends on the X11 system. However, this is a lot of packages that are only useful if we have a GUI (which we don’t!) so we want to skip recommended dependencies. Finally,ovmf
is an additional library that will let us use UEFI (instead of the older BIOS spec) to boot the VM.
Acquire installation media
Next, we need to get the installation media. Go to
https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/ to find the latest Arch
release. Using the wget
command within your VM, download the .iso
file found
within this directory as well as the .iso.sig
signature file that we will use
to verify the authenticity of the .iso
file. For example, the current release
at the time of writing is 2021.09.01
, so you would run the following commands:
Currently the OCF mirrors are under maintenance, and will lead to really slow download speeds, so we will use another mirror.
wget 'http://ca.us.mirror.archlinux-br.org/iso/2021.09.01/archlinux-2021.09.01-x86_64.iso'
wget 'http://ca.us.mirror.archlinux-br.org/iso/2021.09.01/archlinux-2021.09.01-x86_64.iso.sig'
Note that Arch is a “rolling release” distribution (as opposed to distros like Ubuntu that release distinct versions e.g. Ubuntu 18.04, Ubuntu 20.04), so it is updated frequently. Therefore, the links used above are likely out of date by the time you’re reading this! Make sure to find the correct files for the latest release, do not just copy and paste the commands above.
You can run ls
to see the two files that were downloaded.
Notice that we are downloading this from the OCF software mirrors
(mirrors.ocf.berkeley.edu
). There’s nothing stopping the OCF from
modifying the .iso
file and inserting malicious code that will end
up running on our machine. To avoid this, we need to verify that the
signature we downloaded is valid, and matches the file we downloaded.
For more info, see
https://wiki.archlinux.org/index.php/Installation_guide#Verify_signature.
To verify the signature, run gpg --keyserver-options
auto-key-retrieve --verify file.sig
, replacing file.sig
with the
downloaded .sig
file. You’ll see information about who created this
signature. We want to know that this signature is from an Arch Linux
developer. To be pretty confident of that, go to
https://www.archlinux.org/people/developers/
and find the developer who is said to have created this
signature. Under their info click on the link that says “PGP
Key”. You’ll see a bunch of info about their PGP key. Check that the
“Fingerprint” listed near the top of this page matches the “Primary
key fingerprint” in the gpg
output. If not, something is very
wrong!
If the
gpg
commmand above to verify the signature doesn’t work, try using the OCF’s keyserver by appending this to the original command:--keyserver=hkp://pgp.ocf.berkeley.edu
.
GRADESCOPE: Who is the developer that created this signature, and what is their key’s fingerprint?
Note that this test is only resilient if www.archlinux.org is not compromised. To be more confident, you might want to try to find this fingerprint somewhere else, or independently verify this key somehow. Traditionally this is done via the web of trust, but new systems such as Keybase are trying to find alternate solutions to this fundamental problem of “How do I know you are who you say you are?”
Create VM
Now, we need to actually create our VM. (Don’t run this command until you’ve read the next paragraph!) To do this, you will run
sudo virt-install --name archvm --memory 575 --cpu host --vcpus 1 --disk size=5 --network network=default --boot uefi --graphics none --cdrom archlinux-2021.09.01-x86_64.iso
This will do some initial setup, and then drop us into a virtual
serial console connected to the VM. You’ll see some boot options and a
timer. Press e
to edit the first boot option before the timer runs
out! By default, Arch will not enable the serial console we are
currently using to connect to the VM, so we need to manually add it to
the kernel command line. Once we’re editing this command line, hit
END
to jump to the end of the line and carefully add a space and
then console=ttyS0
. When you’re confident you’ve typed it right, hit
enter to start booting the kernel.
Normally, we use “terminal emulators” (like Terminal or iTerm on Mac, or GNOME Terminal or Terminator on Linux) to access the console. Alternatively, we can use the console via terminals running via a keyboard and display. (To see this, hit
ctrl-alt-F1
on an OCF lab computer. Hitctrl-alt-F7
to get back to the GUI.) The option we are using here is accessing the console via a serial port connection, which in the modern age is pretty rare. However, we’re not using a real serial port of course—since we’re using a virtual machine, the serial port is also virtualized. libvirt provides convenient this convenient tooling to connect to this virtual serial port from the console we already have with the host machine (your student VM).
We’re communicating with the VM via this virtual serial connection. If
you wanted to detatch from this, you would hit the key combo ctrl-]
,
and reconnect by running the command sudo virsh console archvm
.
In most cases if you disconnect and have to restart in the middle, you can just run
sudo virsh console archvm
to reconnect. If you really mess up, you can start things over by stopping the VM (sudo virsh destroy archvm
), deleting the VM (sudo virsh undefine archvm --nvram --remove-all-storage
), and then recreating it by starting over from the command above. If for some reason, your VM gets turned off before you are done installing Arch, it may be possible to reconnect the archiso and reboot the VM using the archiso. However, doing this is out of scope for this lab.
You should start seeing a bunch of lines that start with [ OK ]
,
and eventually you will see a login prompt.
Since the installation media is basically just a preinstalled Linux system that makes it easy to install Linux on another system, this is the info about the booting “archiso” system. So, nothing is actually installed on the VM yet, but we are booting into a temporary Linux system that will help us install Linux permanently on the VM.
Once you see a login prompt, enter root
as the username, and you
should get in! You’re now at the archiso command line.
Debugging Note: If you are thrown out of the VM unexpectedly and
were unable to complete the install, when rebooting you may be booted
into the UEFI Shell (since Arch hasn’t actually been installed on the
disk yet). To resolve this, follow these instructions
to boot back into the installer instead, replacing the
/isos/systemresucecd-x86-2.2.0.iso
path with the location of your Arch ISO.
GRADESCOPE: Use the hostnamectl
command to get the current
hostname. What is it? Then, use ip addr
to find the current IP
address (look at the inet
line under ens2
, it’s the stuff at the
beginning of the line, not including the /24
). What is your IP
address? In another terminal, try to ping this IP address from your
student VM, and then also from tsunami (ssh.ocf.berkeley.edu). Can
either of them reach it?
We’ll discuss networking, and why you see these results, in a later lecture.
Notice we are booted using UEFI (hence
-boot uefi
in the virt-install line). Check out the ArchWiki (https://wiki.archlinux.org/index.php/Installation_guide#Verify_the_boot_mode) for more info. Also, notice that we are already connected to the internet! Our virtual network uses DHCP, and the archiso knows to try to autoconfigure using DHCP. We will talk more about DHCP and other networking configuration later in the course. Finally, notice that we’ve conveniently inherited the system clock, so the time inside the VM should be right. You can rundate
to see.
Partitioning time
The first order of business is to set up the partitions that we will be using for this installation. We are going to use a simple partition scheme:
- an EFI system partition (ESP)
- one encrypted partition that holds our root filesystem
- a small partition for swap space
This is a very common scheme and is totally a scheme you could use if you were installing Arch on your own computer (except that you’ll hopefully have more than 5GiB to work with)!
GRADESCOPE: Give a brief and basic explanation of the purpose of having (1) the EFI system partition and (2) the swap partition. These explanations can be as short as one sentence, just try to get a general sense of why we have these partitions.
Create partitions
We will use fdisk
to create our
partitions. Before we start the actual partitioning process, run fdisk -l
. You
should see that your VM has a single storage device called /dev/sda
. Linux has
special device files that
correspond to each “block device” (e.g. hard drives, SSDs, and partitions on the
aformentioned storage devices) discovered by the operating system. These device
files live in the /dev
folder. Read more about the naming convention for
storage devices and their partitions
here.
Test your understanding: What would be the name of the device file corresponding to the fourth partition on the second discovered storage device?
Now to start the actual partitioning process, run fdisk /dev/sda
.
Note that you’ll be making a bunch of changes to the partition table, but these changes are not made immediately. So, if you make a mistake, you can just start again from step 1 without damaging anything.
- Create a GPT table by typing
g
(then hit enter). - Create your ESP by typing
n
. Use the default partition number and first sector (just hit enter to automatically take the default). For the last sector, type+512M
to allocate 512MiB for the partition. - Type
t
to change the type of the new partition (partition 1) to “EFI System”. You will have to list the types and find the right ID. (Note: pressq
to get out of the list of filesystem types). - Create your main root partition by typing
n
again. Use the default partition number and first sector. To leave 512MiB at the end of the disk for our swap partition, type-512M
for the last sector. (This partition should already have the correct type, “Linux filesystem”, so you won’t need to change that.) - Create your swap partition by typing
n
one last time, and just take all the defaults. - Type
t
to change the type of the swap partition (partition 3) to “Linux swap”, as in step 3. - Type
p
to show the current partition table with all the changes you’ve made. You should see an “EFI System” partition that is 512MiB, a “Linux filesystem” partition that is 4GiB, and a “Linux swap” partition that is 512MiB. - If everything looks right, type
w
to finally actually write the changes to disk.
At this point, fdisk
should exit.
GRADESCOPE: To see your final results, run fdisk -l /dev/sda
and
paste the output into Gradescope.
At this point, you’ve created the disk partitions, but nothing is on them yet. Now we need to put the filesystems you will use on these partitions!
Create filesystems
ESP
The ESP (EFI System Partition) has to be in a FAT format. For more information about the ESP, see https://wiki.archlinux.org/index.php/EFI_system_partition.
Your ESP should be /dev/sda1
(refer back to output from
fdisk -l /dev/sda
). Run mkfs.fat -F32 /dev/sda1
to create the
FAT32 filesystem.
These instructions are straight from https://wiki.archlinux.org/index.php/EFI_system_partition#Format_the_partition.
Root partition
Our root partition should be /dev/sda2
.
We are going to encrypt the root partition. This makes things a bit more complicated, but thankfully the ArchWiki will help us out!
Encryption is basically essentially in modern devices. If you do not encrypt your drive, anyone who has access to that drive can simply take it out of your machine and read everything on it. Your account password doesn’t protect against anything. However, if you do use encryption, you will not be able to get anything off the disk at all without the encryption password.
Wiping
A previous version of this lab mentioned that you should wipe your device before encrypting.
While this may provide some benefits on older (spinning-disk) hard drives, it will significantly reduce the lifetime of modern solid-state drives. As DigitalOcean VMs, probably your laptop, and almost all modern computers use solid state drives, this section has been removed.
If you’d like to see the older version of this lab, see here.
Setting up the encrypted device
Again, this is taken from the ArchWiki. Follow along!
First, set up the encrypted partition. For this lab, use the
encryption passphrase ilovelinux
.
cryptsetup -y -v luksFormat /dev/sda2
A note on passphrases: don’t use a passphrase like this! A good way to come up with passphrases is just to take 6 random words (and I mean actually random, not like xkcd.com/1210 random). One way to generate such a password is the command
shuf -n 6 /usr/share/dict/words
(you might have to install a package likewamerican
on Ubuntu/Debian orwords
on Arch). We will talk more about security (including password security) later in the course.
Now that the encrypted partition has been created, let’s open it!
cryptsetup open /dev/sda2 cryptroot
GRADESCOPE: Run lsblk
to see the hierarchy of these partitions.
Paste this output into Gradescope!
Creating the actual filesystem
We now have a blank pseudo-partition (“block device”)
/dev/mapper/cryptroot
, that’s actually backed by encrypted
/dev/sda2
disk partition. However, there’s no filesystem here
yet. Create it by doing
mkfs.ext4 /dev/mapper/cryptroot
GRADESCOPE: What previous command we ran is similar to this command, and what’s the difference? (Bonus: why?)
Swap
Our swap partition should be /dev/sda3
. The swap space isn’t really
a filesystem, but we still need to set it up. Thankfully, this is
pretty straightforward:
mkswap /dev/sda3
Then, enable it with
swapon /dev/sda3
If you don’t know about swap, it is basically a specially designated section of disk that the operating system can use as memory in case it runs out of actual memory. For more info, check out https://wiki.archlinux.org/index.php/Swap.
Mount the new filesystems
We’ve now created all our filesystems, but in order to access them, we need to mount them.
What does mounting mean? The
man
page for themount
command provides a nice description: “All files accessible in a Unix system are arranged in one big tree, the file hierarchy, rooted at /. These files can be spread out over several devices. The mount command serves to attach the filesystem found on some device to the big file tree. Conversely, the umount(8) command will detach it again.” We are currently in the “file tree” of the Arch live environment, which does not include the new ESP and root filesystems we created for our future Arch installation. Usingmount
, we can attach these filesystems to our live environment filesystem and access them as if they were directories at the mount point. Once the ESP and root filesystems are visible to the current filesystem, we can start installing a boot loader to the ESP filesystem and our permanent Arch OS to the root filesystem.
We’re going to use /mnt
as the temporary root of the new Arch installation.
Thus, we mount the root filesystem at /mnt
, and the ESP at /mnt/boot
(since
the ESP will normally be mounted at /boot
). Hopefully this will make a little
more sense once we begin installing the system.
mount /dev/mapper/cryptroot /mnt
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot
GRADESCOPE:
-
You can see every single filesystem that’s currently mounted by running the
mount
command. Run it, and look for the two filesystems we just mounted. Copy all the output over to Gradescope. -
Another common Linux partitioning scheme is to have a separate
/home
partition in addition to the partitions we created above. This can be useful in case something goes catastrophically wrong with your operating system to the point where you need to reinstall it. You can wipe your root filesystem and reinstall your OS while leaving your/home
partition untouched, saving your user folder(s) with personal files like documents and photos without needing to restore from a backup. Hypothetically, if we had made a separate/home
partition and filesystem in the previous steps, what command(s) would we need to run to mount it? Assume the/home
partition is/dev/sda3
.
Installation
With our partitions all set up and ready to get going, we can start installing out system onto them!
Much of this will be straight from the ArchWiki installation guide, starting at https://wiki.archlinux.org/index.php/Installation_guide#Installation. It’s a good idea to follow along there as well to see what modifications we are making.
Mirrors
Please skip this step as the OCF mirrors are currently down and doing this would make the download speeds really slow.
We need to tell Arch which software mirror to use when downloading
packages. Go ahead and just put the OCF mirror at the top! Edit
/etc/pacman.d/mirrorlist
using vim
/emacs
/nano
and put the
following line at the top:
Server = https://mirrors.ocf.berkeley.edu/archlinux/$repo/os/$arch
(We’re usually lower down in the file but putting us at the top will make things a bit faster to install.)
Continue according to ArchWiki!
Follow the arch wiki exactly from https://wiki.archlinux.org/index.php/Installation_guide#Install_essential_packages down until you get to the “Initramfs” section. If you get stuck, check out the following hints:
vim
is not installed in the chroot by default. You can dopacman -S vim
to install it, or just use plain oldvi
.- Our timezone is
America/Los_Angeles
. - The locale you want to use is
en_US.UTF-8
. - You shouldn’t need to change the keyboard layout.
- You can pick whatever you want for your hostname. OCF machines are all named after “disasters”, so it can be fun to come up with a creative name that fit that theme.
initramfs
The initramfs is the minimal stuff that’s loaded into RAM that helps Linux get started when you first boot. On Arch, there’s a system to generate this initramfs. You need to tell it to support disk encryption.
Based on the ArchWiki’s
instructions,
we need to edit /etc/mkinitcpio.conf
. You should see a line like
HOOKS=(base udev autodetect ...)
We need to add the keymap
and encrypt
hooks, and move the
keyboard
hook earlier in the line. When you’re done, it should look
like
HOOKS=(base udev autodetect keyboard keymap modconf block encrypt filesystems fsck)
The order of these hooks matters! Be careful.
Run mkinitcpio -p linux
to generate the new initramfs based on the
new config file.
Root password
Use passwd
to set the root password to itoolovelinux
.
As mentioned for the disk encryption passphrase, don’t use a password like this in real life! Choose a strong root password using a method like the one from earlier. User programs shouldn’t be able to compromise this password, because it would allow a single program (which should generally be untrusted) to access everything on your computer.
Boot loader
Historically, GRUB has been the only reasonable choice of boot loader. However, with the advent of UEFI, there are lots of different good options (or my personal favorite, no bootloader at all). In this lab, we are going to use systemd-boot.
We can install systemd-boot to our ESP by running
bootctl --path=/boot install
We now need to tell systemd-boot that our Arch Linux installation
exists and how to start it! Create a bootloader entry by creating the file
/boot/loader/entries/arch.conf
with these contents:
title Arch Linux
linux /vmlinuz-linux
initrd /initramfs-linux.img
options cryptdevice=UUID=290d6a44-2964-48a0-a71e-ea3df0525987:cryptroot root=/dev/mapper/cryptroot rw console=ttyS0
IMPORTANT NOTE: You will need to replace
290d6a44-2964-48a0-a71e-ea3df0525987
with the correct UUID for your
encrypted /dev/sda2
partition. To find the correct UUID, run ls -l
/dev/disk/by-uuid
and look for the line with -> ../../sda2
. This
line will have the UUID you need. If you make a mistake here, you
can mess things up pretty bad! TRIPLE-CHECK this file!
We could use a name like
/dev/sda2
instead ofUUID=290d6a44-2964-48a0-a71e-ea3df0525987
here. However, these names like/dev/sdXX
are not guaranteed to always be the same. A kernel update or hardware changes can mean that/dev/sda2
is now/dev/sdb2
, or vice versa. This will mean we won’t be able to boot, and in particularly bad cases, can lead to data loss. For this reason, we should use UUIDs whenever possible, because they cannot change. See https://wiki.archlinux.org/index.php/Persistent_block_device_naming for more info about this.
Also, remember how we had to add
console=ttyS0
when we first booted the archiso? We can add that option here to make it permanent for the new system, so the console will always be enabled.
Type exit
to exit the chroot and then type reboot
to REBOOT AND
FINISH THE INSTALLATION! Congrats!
If things don’t work after you reboot, don’t panic! Please make a Ask for help on #decal-general, and we will try to help you recover things.
When it reboots, you will be asked for the disk encryption password, and then you should see a login prompt. Log in as root using the password you set!
GRADESCOPE: To show off your newly installed Arch system, run the following commands and paste the output into Gradescope:
hostname -f
ip addr
mount
lsblk
uname -a
Debugging Note: If the hostname
command is not found, you may need to install it by running sudo pacman -S inetutils
. Doing so may also reveal that your internet connection is not yet configured. If this is the case, then you will likely need to sudo systemctl enable systemd-resolved
and/or sudo systemctl enable systemd-networkd
and/or sudo systemctl enable dhcpcd
and/or ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf
. You can see more information about this in the ArchWiki: https://wiki.archlinux.org/index.php/systemd-networkd. If these instructions do not work, you may also need to reboot into the Arch installer and use pacstrap to install dhcpcd.
When you are done, you can hit ctrl-]
to detach your console from
the Arch VM and get back to your normal student VM shell.