Advanced Lab 3 - Pre-install Setup and Installation
PLEASE READ: Some people have been having issues with the Create
Filesystems -> Wiping section of the lab, where executing the
commands causes the Arch VM to reboot into the UEFI shell. To avoid
encountering this problem, please upgrade packages/reboot Ubuntu on
your Digital Ocean droplet if prompted to do so by the motd (Message
of the Day; the text you see when you ssh
into your DO droplet),
but do not upgrade to Ubuntu 20.04 (do not run the command
do-release-upgrade
). In other words, before starting the lab,
run sudo apt update
, then sudo apt upgrade
, and finally sudo
reboot
. You will lose connection when your DO droplet reboots, then
ssh
back in and begin the lab. If you are already in the middle of
the lab, you can try to proceed through the Wiping section, but if
you run into the issue you will likely need to destroy and start a
fresh VM.
In this lab we will be installing Arch Linux into a virtual machine.
In this lab you will need to pay close attention to detail: mistakes
may not be apparent for quite a while later, and can cause huge
headaches. As always, make sure to take advantage of Piazza and other
resources if you need help! Also, in this lab, the
ArchWiki will be an invaluable resource
that we will be referencing a lot.
Throughout the lab, additional notes will be formatted like this.
These notes are the only sections of the lab that you can safely
skip. They will provide additional interesting information and
context, so I recommend reading them anyway!
As always, you will need to submit this lab on Gradescope. Any time
you see “GRADESCOPE:” in this lab, stop and make sure that you
have submitted the relevant answer or output to Gradescope. In most
cases, it will be hard or impossible to go back and get the answer
later in the lab.
We chose to install Arch Linux in this lab for a few reasons. First,
Arch Linux gives you tools to handle monotonous tasks (like
genfstab
, which does the dirty work of writing out /etc/fstab
)
while leaving the more complicated and more customizable tasks to
you. Second, Arch is widely used and experience with Arch can
usually be applied to other distributions as well. Third,
ArchWiki. Seriously. It’s that good.
Setup
We will be setting up virtual machine inside your student VM.
Yes, this is a “nested” virtual machine.
First, connect to your student VM using the credentials that were
emailed to you. If you haven’t yet, you will need to reset your
password when you first log in. Please use a safe
password and don’t lose it!
If you’re enrolled in this DeCal, you should have a student VM. If
you’re not, we can’t guarantee that these techniques will work on
other systems (for a variety of reasons). However, you can obtain a
system that’s very similar to a student VM by creating a Droplet on
DigitalOcean. Normally this Droplet would cost some amount of money,
but you can get free DigitalOcean credits by using the GitHub
Student Developer Pack. You can
follow the instructions on the DigitalOcean
website
to create the Droplet. When selecting an image, choose Ubuntu 18.04.
Now that you are connected to your student VM, we will need to install
some packages that will allow us to create our nested Arch Linux VM.
Run
sudo apt install --no-install-recommends qemu-kvm qemu-utils libvirt-bin virtinst ovmf
QEMU is a system for running virtual machines, and it can use KVM,
which is a kernel module that supports the virtualization under the
hood. We can install this with the package qemu-kvm
. libvirt
(installed via libvirt-bin
) is a system for managing these
QEMU/KVM virtual machines. qemu-utils
provides necessary tools for
creating virtual drives, and is used by virt-install
. virtinst
will provide the script virt-install
, which we will use to create
the virtual machine. We add --no-install-recommends
because
virtinst
has a recommended dependency virt-viewer
, which depends
on the X11 system.
However, this is a lot of packages that are only useful if we have a
GUI (which we don’t!) so we want to skip recommended dependencies.
Finally, ovmf
is an additional library that will let us use
UEFI
(instead of the older BIOS spec) to boot the VM.
Next, we need to get the installation media. Go to
https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/ to find the latest Arch
release. Using the wget
command within your VM, download the .iso
file found
within this directory as well as the .iso.sig
signature file that we will use
to verify the authenticity of the .iso
file. For example, the current release
at the time of writing is 2020.06.01
, so you would run the following commands:
wget 'https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/archlinux-2020.06.01-x86_64.iso'
wget 'https://mirrors.ocf.berkeley.edu/archlinux/iso/latest/archlinux-2020.06.01-x86_64.iso.sig'
Note that Arch is a “rolling
release” distribution (as
opposed to distros like Ubuntu that release distinct versions e.g. Ubuntu 18.04,
Ubuntu 20.04), so it is updated frequently. Therefore, the links used above are
likely out of date by the time you’re reading this! Make sure to find the
correct files for the latest release, do not just copy and paste the commands
above.
You can run ls
to see the two files that were downloaded.
Notice that we are downloading this from the OCF software mirrors
(mirrors.ocf.berkeley.edu
). There’s nothing stopping the OCF from
modifying the .iso
file and inserting malicious code that will end
up running on our machine. To avoid this, we need to verify that the
signature we downloaded is valid, and matches the file we downloaded.
For more info, see
https://wiki.archlinux.org/index.php/Installation_guide#Verify_signature.
To verify the signature, run gpg --keyserver-options
auto-key-retrieve --verify file.sig
, replacing file.sig
with the
downloaded .sig
file. You’ll see information about who created this
signature. We want to know that this signature is from an Arch Linux
developer. To be pretty confident of that, go to
https://www.archlinux.org/people/developers/
and find the developer who is said to have created this
signature. Under their info click on the link that says “PGP
Key”. You’ll see a bunch of info about their PGP key. Check that the
“Fingerprint” listed near the top of this page matches the “Primary
key fingerprint” in the gpg
output. If not, something is very
wrong!
If the gpg
commmand above to verify the signature doesn’t work,
try using the OCF’s keyserver by appending this to the original
command: --keyserver=hkp://pgp.ocf.berkeley.edu
.
GRADESCOPE: Who is the developer that created this signature, and
what is their key’s fingerprint?
Note that this test is only resilient if www.archlinux.org is not
compromised. To be more confident, you might want to try to find
this fingerprint somewhere else, or independently verify this key
somehow. Traditionally this is done via the web of
trust, but new systems
such as Keybase are trying to find alternate
solutions to this fundamental problem of “How do I know you are who
you say you are?”
Create VM
Now, we need to actually create our VM. (Don’t run this command
until you’ve read the next paragraph!) To do this, you will run
sudo virt-install --name archvm --memory 704 --cpu host --vcpus 1 --disk size=5 --network network=default --boot uefi --graphics none --cdrom archlinux-2019.09.01-x86_64.iso
This will do some initial setup, and then drop us into a virtual
serial console connected to the VM. You’ll see some boot options and a
timer. Press e
to edit the first boot option before the timer runs
out! By default, Arch will not enable the serial console we are
currently using to connect to the VM, so we need to manually add it to
the kernel command line. Once we’re editing this command line, hit
END
to jump to the end of the line and carefully add a space and
then console=ttyS0
. When you’re confident you’ve typed it right, hit
enter to start booting the kernel.
Normally, we use “terminal emulators” (like Terminal or iTerm on
Mac, or GNOME Terminal or Terminator on Linux) to access the
console. Alternatively, we can use the console via terminals running
via a keyboard and display. (To see this, hit ctrl-alt-F1
on an
OCF lab computer. Hit ctrl-alt-F7
to get back to the GUI.) The
option we are using here is accessing the console via a serial port
connection, which in the modern age is pretty rare. However, we’re
not using a real serial port of course—since we’re using a virtual
machine, the serial port is also virtualized. libvirt provides
convenient this convenient tooling to connect to this virtual serial
port from the console we already have with the host machine (your
student VM).
We’re communicating with the VM via this virtual serial connection. If
you wanted to detatch from this, you would hit the key combo ctrl-]
,
and reconnect by running the command sudo virsh console archvm
.
In most cases if you disconnect and have to restart in the middle,
you can just run sudo virsh console archvm
to reconnect. If you
really mess up, you can start things over by stopping the VM
(sudo virsh destroy archvm
), deleting the VM
(sudo virsh undefine archvm --nvram --remove-all-storage
), and
then recreating it by starting over from the command above. If for
some reason, your VM gets turned off before you are done installing
Arch, it may be possible to reconnect the archiso and reboot the VM
using the archiso. However, doing this is out of scope for this lab.
You should start seeing a bunch of lines that start with [ OK ]
,
and eventually you will see a login prompt.
Since the installation media is basically just a preinstalled Linux
system that makes it easy to install Linux on another system, this
is the info about the booting “archiso” system. So, nothing is
actually installed on the VM yet, but we are booting into a
temporary Linux system that will help us install Linux permanently
on the VM.
Once you see a login prompt, enter root
as the username, and you
should get in! You’re now at the archiso command line.
GRADESCOPE: Use the hostname
command to get the current
hostname. What is it? Then, use ip addr
to find the current IP
address (look at the inet
line under ens2
, it’s the stuff at the
beginning of the line, not including the /24
). What is your IP
address? In another terminal, try to ping this IP address from your
student VM, and then also from tsunami (ssh.ocf.berkeley.edu). Can
either of them reach it?
We’ll discuss networking, and why you see these results, in a later
lecture.
Notice we are booted using UEFI (hence -boot uefi
in the
virt-install line). Check out the ArchWiki
(https://wiki.archlinux.org/index.php/Installation_guide#Verify_the_boot_mode)
for more info. Also, notice that we are already connected to the
internet! Our virtual network uses DHCP, and the archiso knows to
try to autoconfigure using DHCP. We will talk more about DHCP and
other networking configuration later in the course. Finally, notice
that we’ve conveniently inherited the system clock, so the time
inside the VM should be right. You can run date
to see.
Partitioning time
The first order of business is to set up the
partitions that we will be
using for this installation. We are going to use a simple partition scheme:
- an EFI system partition (ESP)
- one encrypted partition that holds our root filesystem
- a small partition for swap space
This is a very common scheme and is totally a scheme you could use if
you were installing Arch on your own computer (except that you’ll
hopefully have more than 5GiB to work with)!
GRADESCOPE: Give a brief and basic explanation of the purpose of having (1)
the EFI system partition and (2) the swap partition. These explanations can be
as short as one sentence, just try to get a general sense of why we have these
partitions.
Create partitions
We will use fdisk
to create our
partitions. Before we start the actual partitioning process, run fdisk -l
. You
should see that your VM has a single storage device called /dev/sda
. Linux has
special device files that
correspond to each “block device” (e.g. hard drives, SSDs, and partitions on the
aformentioned storage devices) discovered by the operating system. These device
files live in the /dev
folder. Read more about the naming convention for
storage devices and their partitions
here.
Test your understanding: What would be the name of the device file
corresponding to the fourth partition on the second discovered storage device?
Now to start the actual partitioning process, run fdisk /dev/sda
.
Note that you’ll be making a bunch of changes to the partition table, but
these changes are not made immediately. So, if you make a mistake, you can
just start again from step 1 without damaging anything.
- Create a GPT table by typing
g
(then hit enter).
- Create your ESP by typing
n
. Use the default partition number and
first sector (just hit enter to automatically take the default).
For the last sector, type +512M
to allocate 512MiB for the
partition.
- Type
t
to change the type of the new partition (partition 1) to
“EFI System”. You will have to list the types and find the right
ID. (Note: press q
to get out of the list of filesystem types).
- Create your main root partition by typing
n
again. Use the
default partition number and first sector. To leave 512MiB at the
end of the disk for our swap partition, type -512M
for the last
sector. (This partition should already have the correct type,
“Linux filesystem”, so you won’t need to change that.)
- Create your swap partition by typing
n
one last time, and just
take all the defaults.
- Type
t
to change the type of the swap partition (partition 3) to
“Linux swap”, as in step 3.
- Type
p
to show the current partition table with all the changes
you’ve made. You should see an “EFI System” partition that is
512MiB, a “Linux filesystem” partition that is 4GiB, and a “Linux
swap” partition that is 512MiB.
- If everything looks right, type
w
to finally actually write the
changes to disk.
At this point, fdisk
should exit.
GRADESCOPE: To see your final results, run fdisk -l /dev/sda
and
paste the output into Gradescope.
At this point, you’ve created the disk partitions, but nothing is on
them yet. Now we need to put the filesystems you will use on these
partitions!
Create filesystems
ESP
The ESP (EFI System Partition) has to be in a FAT format. For more
information about the ESP, see
https://wiki.archlinux.org/index.php/EFI_system_partition.
Your ESP should be /dev/sda1
(refer back to output from
fdisk -l /dev/sda
). Run mkfs.fat -F32 /dev/sda1
to create the
FAT32 filesystem.
These instructions are straight from
https://wiki.archlinux.org/index.php/EFI_system_partition#Format_the_partition.
Root partition
Our root partition should be /dev/sda2
.
We are going to encrypt the root partition. This makes things a bit
more complicated, but thankfully the ArchWiki will help us out!
Encryption is basically essentially in modern devices. If you do not
encrypt your drive, anyone who has access to that drive can simply
take it out of your machine and read everything on it. Your account
password doesn’t protect against anything. However, if you do use
encryption, you will not be able to get anything off the disk at all
without the encryption password.
Wiping
Before setting up the encrypted partition, we need to wipe the
partition with random data. The instructions below are based on
the ArchWiki instructions.
Why wipe your drive? You can tell the difference blank data on a
disk and encrypted data, but not between random data and encrypted
data. Filling the disk with random data before setting up the
encryption protects information such as how full the partition is.
cryptsetup open --type plain -d /dev/urandom /dev/sda2 to_be_wiped
dd if=/dev/zero of=/dev/mapper/to_be_wiped status=progress bs=1M
cryptsetup close to_be_wiped
This generates a random key and pretends that our partition is
encrypted with it. So, if we fill the entire encrypted drive with
zeros, it will fill the real drive with fake encrypted data
(indistinguishable from random data). This might take a sec to do!
Don’t worry.
Setting up the encrypted device
Again, this is taken from the ArchWiki. Follow
along!
First, set up the encrypted partition. For this lab, use the
encryption passphrase ilovelinux
.
cryptsetup -y -v luksFormat /dev/sda2
A note on passphrases: don’t use a passphrase like this! A good way
to come up with passphrases is just to take 6 random words (and I
mean actually random, not like
xkcd.com/1210 random). One way to generate
such a password is the command shuf -n 6 /usr/share/dict/words
(you might have to install a package like wamerican
on
Ubuntu/Debian or words
on Arch). We will talk more about security
(including password security) later in the course.
Now that the encrypted partition has been created, let’s open it!
cryptsetup open /dev/sda2 cryptroot
GRADESCOPE: Run lsblk
to see the hierarchy of these partitions.
Paste this output into Gradescope!
Creating the actual filesystem
We now have a blank pseudo-partition (“block device”)
/dev/mapper/cryptroot
, that’s actually backed by encrypted
/dev/sda2
disk partition. However, there’s no filesystem here
yet. Create it by doing
mkfs.ext4 /dev/mapper/cryptroot
GRADESCOPE: What previous command we ran is similar to this
command, and what’s the difference? (Bonus: why?)
Swap
Our swap partition should be /dev/sda3
. The swap space isn’t really
a filesystem, but we still need to set it up. Thankfully, this is
pretty straightforward:
Then, enable it with
If you don’t know about swap, it is basically a specially designated
section of disk that the operating system can use as memory in case
it runs out of actual memory. For more info, check out
https://wiki.archlinux.org/index.php/Swap.
Mount the new filesystems
We’ve now created all our filesystems, but in order to access them, we
need to mount them.
What does mounting mean? The man
page for the mount
command provides a
nice description: “All files accessible in a Unix system are arranged in one
big tree, the file hierarchy, rooted at /. These files can be spread out over
several devices. The mount command serves to attach the filesystem found on
some device to the big file tree. Conversely, the umount(8) command will
detach it again.” We are currently in the “file tree” of the Arch live
environment, which does not include the new ESP and root filesystems we
created for our future Arch installation. Using mount
, we can attach these
filesystems to our live environment filesystem and access them as if they were
directories at the mount point. Once the ESP and root filesystems are visible
to the current filesystem, we can start installing a boot loader to the ESP
filesystem and our permanent Arch OS to the root filesystem.
We’re going to use /mnt
as the temporary root of the new Arch installation.
Thus, we mount the root filesystem at /mnt
, and the ESP at /mnt/boot
(since
the ESP will normally be mounted at /boot
). Hopefully this will make a little
more sense once we begin installing the system.
mount /dev/mapper/cryptroot /mnt
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot
GRADESCOPE:
-
You can see every single filesystem that’s currently mounted by running the
mount
command. Run it, and look for the two filesystems we just mounted.
Copy all the output over to Gradescope.
-
Another common Linux partitioning scheme is to have a separate /home
partition in addition to the partitions we created above. This can be useful
in case something goes catastrophically wrong with your operating system to
the point where you need to reinstall it. You can wipe your root filesystem
and reinstall your OS while leaving your /home
partition untouched, saving
your user folder(s) with personal files like documents and photos without
needing to restore from a backup. Hypothetically, if we had made a separate
/home
partition and filesystem in the previous steps, what command(s) would
we need to run to mount it? Assume the /home
partition is /dev/sda3
.
Installation
With our partitions all set up and ready to get going, we can start
installing out system onto them!
Much of this will be straight from the ArchWiki installation guide,
starting at
https://wiki.archlinux.org/index.php/Installation_guide#Installation.
It’s a good idea to follow along there as well to see what
modifications we are making.
Mirrors
We need to tell Arch which software mirror to use when downloading
packages. Go ahead and just put the OCF mirror at the top! Edit
/etc/pacman.d/mirrorlist
using vim
/emacs
/nano
and put the
following line at the top:
Server = https://mirrors.ocf.berkeley.edu/archlinux/$repo/os/$arch
(We’re usually lower down in the file but putting us at the top will
make things a bit faster to install.)
Continue according to ArchWiki!
Follow the arch wiki exactly from
https://wiki.archlinux.org/index.php/Installation_guide#Install_essential_packages
down until you get to the “Initramfs” section. If you get stuck, check out the
following hints:
vim
is not installed in the chroot by default. You can do
pacman -S vim
to install it, or just use plain old vi
.
- Our timezone is
America/Los_Angeles
.
- The locale you want to use is
en_US.UTF-8
.
- You shouldn’t need to change the keyboard layout.
- You can pick whatever you want for your hostname. OCF machines are
all named after “disasters”, so it can be fun to come up with a
creative name that fit that theme.
initramfs
The initramfs is the minimal stuff that’s loaded into RAM that helps
Linux get started when you first boot. On Arch, there’s a system to
generate this initramfs. You need to tell it to support disk
encryption.
Based on the ArchWiki’s
instructions,
we need to edit /etc/mkinitcpio.conf
. You should see a line like
HOOKS=(base udev autodetect ...)
We need to add the keymap
and encrypt
hooks, and move the
keyboard
hook earlier in the line. When you’re done, it should look
like
HOOKS=(base udev autodetect keyboard keymap modconf block encrypt filesystems fsck)
The order of these hooks matters! Be careful.
Run mkinitcpio -p linux
to generate the new initramfs based on the
new config file.
Root password
Use passwd
to set the root password to itoolovelinux
.
As mentioned for the disk encryption passphrase, don’t use a
password like this in real life! Choose a strong root password using
a method like the one from earlier. User programs shouldn’t be able
to compromise this password, because it would allow a single program
(which should generally be untrusted) to access everything on your
computer.
Boot loader
Historically, GRUB has been the only reasonable choice of boot loader. However,
with the advent of UEFI, there are lots of different good options (or my
personal favorite, no bootloader at
all). In this
lab, we are going to use
systemd-boot.
We can install systemd-boot to our ESP by running
bootctl --path=/boot install
We now need to tell systemd-boot that our Arch Linux installation
exists and how to start it! Create a bootloader entry by creating the file
/boot/loader/entries/arch.conf
with these contents:
title Arch Linux
linux /vmlinuz-linux
initrd /initramfs-linux.img
options cryptdevice=UUID=290d6a44-2964-48a0-a71e-ea3df0525987:cryptroot root=/dev/mapper/cryptroot rw console=ttyS0
IMPORTANT NOTE: You will need to replace
290d6a44-2964-48a0-a71e-ea3df0525987
with the correct UUID for your
encrypted /dev/sda2
partition. To find the correct UUID, run ls -l
/dev/disk/by-uuid
and look for the line with -> ../../sda2
. This
line will have the UUID you need. If you make a mistake here, you
can mess things up pretty bad! TRIPLE-CHECK this file!
We could use a name like /dev/sda2
instead of
UUID=290d6a44-2964-48a0-a71e-ea3df0525987
here. However, these
names like /dev/sdXX
are not guaranteed to always be the same. A
kernel update or hardware changes can mean that /dev/sda2
is now
/dev/sdb2
, or vice versa. This will mean we won’t be able to boot,
and in particularly bad cases, can lead to data loss. For this
reason, we should use UUIDs whenever possible, because they cannot
change. See
https://wiki.archlinux.org/index.php/Persistent_block_device_naming
for more info about this.
Also, remember how we had to add console=ttyS0
when we first
booted the archiso? We can add that option here to make it permanent
for the new system, so the console will always be enabled.
Type exit
to exit the chroot and then type reboot
to REBOOT AND
FINISH THE INSTALLATION! Congrats!
If things don’t work after you reboot, don’t panic! Please make a
Piazza post or ask for help on Slack, and we will try to help you
recover things.
When it reboots, you will be asked for the disk encryption password,
and then you should see a login prompt. Log in as root using the
password you set!
GRADESCOPE: To show off your newly installed Arch system, run the
following commands and paste the output into Gradescope:
hostname -f
ip addr
mount
lsblk
uname -a
When you are done, you can hit ctrl-]
to detach your console from
the Arch VM and get back to your normal student VM shell.