Lab 7 - Networked Services
Overview
Networking is key to many services because it allows processes and computers to
communicate with each other. In this lab, we’ll work with a couple different
types of services and set up a service of your own from scratch!
Make sure, as always, that you are doing all of these steps on your provided
DigitalOcean VM (available at yourusername@yourusername.decal.xcf.sh
), as we
have provided some resources for you to use for this lab that are only
accessible from your student VMs.
Which networked services are already running?
Connect to your VM using SSH, and then run sudo netstat -plunt
(or sudo
netstat -peanut
if you’d prefer) to show the services running on your VM
already. You should see something like this:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 789/sshd
tcp6 0 0 :::22 :::* LISTEN 789/sshd
udp 0 0 10.138.132.55:123 0.0.0.0:* 792/ntpd
udp 0 0 10.46.0.38:123 0.0.0.0:* 792/ntpd
udp 0 0 159.65.76.196:123 0.0.0.0:* 792/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 792/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 792/ntpd
udp6 0 0 fe80::b0a7:c1ff:fef:123 :::* 792/ntpd
udp6 0 0 fe80::38c5:f3ff:fe0:123 :::* 792/ntpd
udp6 0 0 ::1:123 :::* 792/ntpd
udp6 0 0 :::123 :::* 792/ntpd
Why are there so many services already running? We haven’t even really done
anything yet! Well, to start off with, sshd
must have been running already,
otherwise how would you have connected to the machine in the first place using
SSH? However, the other service (ntpd) is a bit more mysterious. Let’s check
it out!
$ man ntpd
DESCRIPTION
The ntpd program is an operating system daemon which sets and maintains the
system time of day in synchronism with Internet standard time servers. It is a
complete implementation of the Network Time Protocol (NTP) version 4, but also
retains compatibility with version 3, as defined by RFC-1305, and version 1 and
2, as defined by RFC-1059 and RFC-1119, respectively. ntpd does most
computations in 64-bit floating-point arithmetic and does relatively clumsy
64-bit fixed-point operations only when necessary to preserve the ultimate
precision, about 232 picoseconds. While the ultimate precision is not
achievable with ordinary workstations and networks of today, it may be required
with future gigahertz CPU clocks and gigabit LANs.
The last sentence of this description snippet above is pretty funny, because
gigahertz CPU clocks and gigabit LANs are both pretty common these days!
Anyway, as mentioned in the lecture and in the manpage description above, NTP
is used for time synchronization on a computer using network time servers, so
it would make sense for this to already be running to allow your VM to always
have the correct system time. This is especially important for VMs compared to
using unix on a physical system. VMs can often be quite far off in terms of
system time if NTP is not running (if they are suspended and then later resumed
for example, or if the host for the virtual machine is under heavy load).
Here’s a pretty awesome post with a list of falsehoods that
programmers believe about time, there’s a surprising number of them.
/etc/services
One tip that might help when trying to find what a service does is to look at
which port it is listening on. For instance, from above, ntpd
is listening on
port 123
. If you open the file /etc/services
on most unix machines, you
will get a list of protocols and the ports they typically use. Here are the
lines for the port that ntpd
is using:
ntp 123/tcp
ntp 123/udp # Network Time Protocol
This helps make it clearer that ntpd
is most likely doing something with the
Network Time Protocol, which in this case was pretty clear, however, if you
have not seen the service before then /etc/services
can be more useful. Keep
in mind that any port can be used by any service but by convention they follow
the mapping in /etc/services
. Also keep in mind that for higher numbered
ports (above 1024), that they can be used by any user if a service is not
running on the port already, so this can be a security risk if you do not
properly secure these ports. That being said, most people follow convention if
possible to make their services easier to maintain, so checking /etc/services
is a good first step if trying to figure out what a specific port/service is
for.
Questions
To submit the lab, answer the questions on Gradescope.
NFS
We have provided a NFS server for you to connect to at staff4.decal.xcf.sh
with two
different directories, one read-only and one read-write.
First, install the nfs-common
package so that you can mount directories over NFS.
Then, use the mount
command (remember to look at the man
pages or search online if you do
not recognize a command) to mount from staff4.decal.xcf.sh:/opt/lab7/public
(the remote
directory) to your local directory at /mnt/
. Once you do this,
you should see a file with a secret inside it in /mnt
. You can
tell if you are connected or not by running df
and checking if there is
something that looks like staff4.decal.xcf.sh:/opt/lab7/public
present in the list. What is
the secret in the file?
If NFS takes a excessive time to mount or you cannot
read the file because it hangs while doing so, please let us know. Try creating
a file in the read-only directory (note that you will want to try with sudo
,
otherwise you will get a permission denied error because root
owns the
directory mounted over NFS)
If you’d like to disconnect again, make sure you are not in the directory
that has the file (otherwise it is unable to disconnect because it is still
loaded and you will get an error message like umount.nfs4:
/mnt: device is busy
). Then use umount
to disconnect from
NFS. If you run df
, you should see that the entry that was present before has
now disappeared.
Now, let’s unmount the filesystem (try to figure out how) so that we can mount
a different directory, at staff4.decal.xcf.sh:/opt/lab7/private/<your username>
using mount
in a similar way to before. What do you
see in /mnt/
now?
Follow the instructions in the file given
there. note that you will have to use sudo
here too to create a new file
since the directory mounted over NFS is owned by root, not your user.
Again, if NFS takes an excessive time to mount during any of this or you cannot
read files because it hands while doing so, please let us know on Slack or by
email (or at office hours if you’d prefer). We’ve had some problems in the past
with NFS being very slow to mount/read and needing a restart.
-
Question 1a. What command did you use to mount the read-only NFS directory?
-
Question 1b. What is the secret word in the read-only (public) NFS share?
-
Question 1c. What line does the df command show at the bottom when you have the read-only NFS directory mounted?
-
Question 1d. What error is given if you try to create a file in the read-only NFS file system? (any similar error message is fine, there are a large variety of ways to create files)
-
Question 1e. What command did you use to unmount the read-only NFS directory?
-
Question 1f. What item did you add in the private NFS directory? (/opt/lab7/private/<your username>
) We can check this from the NFS host, so make sure the text file exists with your answer!
DNS
In this section we are going to be setting up our own DNS server! Remember that
DNS is the system that maps from a domain like ocf.berkeley.edu
to an IP like
169.229.226.23
(and 2607:f140:8801::1:23
for IPv6) so that computers know
how to send information over the network to servers without people having to
remember a bunch of numbers to connnect to everything. A more thorough
description of this is in Lab 6 if you’d like a refresher or want
more information.
First, install the bind9
package on your VM to set up a DNS server.
Let’s check the status of the service using systemctl
. What command can you run to do this?
In the output of the systemctl
command, you should see that the bind9
service is already running. Let’s bring it down temporarily so we can
investigate: systemctl stop bind9
The service should have a unit file at /lib/systemd/system/named.service
or
/lib/systemd/system/bind9.service
. If you print that file (with cat
or
systemctl cat bind9
), you should see something like this:
[Unit]
Description=BIND Domain Name Server
Documentation=man:named(8)
After=network.target
Wants=nss-lookup.target
Before=nss-lookup.target
[Service]
EnvironmentFile=-/etc/default/named
ExecStart=/usr/sbin/named -f $OPTIONS
ExecReload=/usr/sbin/rndc reload
ExecStop=/usr/sbin/rndc stop
[Install]
WantedBy=multi-user.target
Alias=bind9.service
This should look pretty familiar to you after the lecture on services! Don’t
worry if it doesn’t all look familiar since there are some options you haven’t
seen yet in here, but you should at least recognize some of the options used.
If you now run dig ocf.berkeley.edu @localhost
from your VM, you should see
that the command eventually times out after trying to run for about 15 seconds.
This is because it is trying to send DNS requests to your VM, but the DNS
server is not actually running yet so it doesn’t get a response. However, if
@localhost
is left off the end of the command, it succeeds. Why is this the
case? What DNS server are requests currently being sent to if @localhost
is
not specified in the command?
Try starting the DNS server using the relevant systemctl
command. If you
check the status of the bind9
service after starting it, you should see the
status has changed to say that the service is active and running.
If you now run dig ocf.berkeley.edu @localhost
from your VM, you should now
see a response containing the correct IP (169.229.226.23
)!
Make sure to add port 53 to be allowed through your firewall with ufw
(set up
in lab a4) if you would like to access your DNS server from outside your VM.
Now to the exciting part, the configuration! Edit /etc/bind/named.conf.local
with your favorite text editor. Inside this file, it should be empty apart
from a few comments at the top because you haven’t done any local configuration
yet. Add a new zone in this file for example.com
with these contents:
zone "example.com" {
type master;
file "/etc/bind/db.example.com";
};
Then, create a file /etc/bind/db.example.com
to contain the responses to give
if anyone sends requests to your DNS server for example.com
. The easiest way
to do this is generally to copy an existing config and then make changes from
there to get what you want for your config instead of having to start from
scratch.
To make this easier, we’ve provided a valid config in
decal-labs
that you can copy in place at
/etc/bind/db.example.com
. You’ll need to edit the config to include your VM’s IP address and domain name!
This config includes a
subdomain that does not usually exist, named test.example.com
. Please add few
more records of your choice. Try to add one A record, and a couple of other
types of records (CNAME, SRV, TXT, etc.). Make sure to reload the bind9
service after changing anything in /etc/bind9
, since you want the running
service to change its configuration.
If you now run the dig
commands below, you should see that your VM’s domain
name (<username>.decal.xcf.sh
) is returned for the first result, for the
second result (example.com
) your VM’s IP address should be returned, and for
test.example.com
you should see 93.184.216.34
as the result.
Make sure to run these commands from your VM, or if you want to run them from
your laptop or from an OCF computer, substitute localhost
in any commands
with your VM’s domain name (it’ll be in the format <username>.decal.xcf.sh
).
-
Question 2a: What is the systemctl command to show whether bind9 is running or not?
-
Question 2b: Why does the dig command (dig ocf.berkeley.edu) work if @localhost is not present at the end (if bind9 is not started) but times out when @localhost is added?
-
Question 2c: What DNS server are requests currently being sent to on your VM if you don’t include @localhost in dig?
-
Question 2d: What additional entries did you add to your DNS server config (the db.example.com file)?
-
Question 2e: What commands did you use to make requests to the local DNS server for your additional entries?
Load Balancing
For this section we will be using HAProxy, a
commonly-used open-source load balancer. NGINX is
actually starting to become a load balancer alongside being a web
server, which is pretty interesting, but HAProxy is still commonly used.
You can install HAProxy using sudo apt install haproxy
.
First, grab the python file for the service you will be running from the
decal-labs repo using wget
or something similar to download
it. You’ll likely also need to install tornado
using sudo apt install python3-tornado
.
When run (python3 server.py
), this script will start up 6 different HTTP
server workers listening on ports 8080 to 8085 (inclusive). Each worker returns
different content to make it clear which one your are talking to for this lab
(“Hello, I am ID 0” for instance), but in real usage they would generally all
return the same content. You would still want something to distinguish between
them (maybe a HTTP header saying which host or instance they are?), but only
for debugging purposes, not like in this lab where they have actually differing
content.
The idea behind using a load balancer is that requests will be spread out among
instances so that if a lot of requests are coming in all at once, they will not
overload any one instance. Another very useful feature is that if one of the
instances happens to crash or become unavailable for whatever reason, another
working server will be used instead. This requires some kind of health checks
to be implemented to decide whether a server is healthy or not.
Your job is to do the configuration to get it to work with the services you are given! The main
config file is at /etc/haproxy/haproxy.cfg
and you should only have to append
to the end of this file to finish this lab. One snippet is provided here for you
to add to the config already, this will give you a nice status page that you
can use to see which of the servers is up or down:
listen stats
bind 0.0.0.0:7001
mode http
stats enable
stats hide-version
stats uri /stats
Make sure to add ports 7000 and 7001 to be allowed through your firewall with
ufw
(set up in lab a4) to allow access to your load balancer server and the
stats page from other computers.
After adding this, if you restart the haproxy
service and open
http://<username>.decal.xcf.sh:7001/stats
in a web browser, you should see a
page with a table and some statistics information on HAProxy (pid, sessions,
bytes transferred, uptime, etc.).
Part 1: Configuration
Your goal is to add a backend and frontend to haproxy’s config that proxies to
all of the running workers on the ports from 8080 to 8085 and listens on port
7000 on your VM, so that if you go to http://<username>.decal.xcf.sh:7000
you
can see the responses from the workers. Try refreshing, what do you notice
happening? Do you notice a pattern? What load balancing algorithm
are you using from your observations? What config did you add to the haproxy
config file to get this to work?
Part 2: Health Checks
Now, after adding all the servers to the backend in the config, add health
checks for each of them. If you refresh the stats page, what do you notice has
changed? What color are each of the servers in your backend?
Part 3: Crashing
If you make a request to http://<username>.decal.xcf.sh:7000/crash
, it will
crash the worker that you connect to. What changes in the HAProxy stats page?
(Try refreshing a few times, the health checks can take a couple seconds to
update the status from UP -> DOWN) If you make a lot of requests to
http://<username>.decal.xcf.sh:7000
again, are all the servers present in the
IDs that are returned in your requests or not? Try crashing a particular worker
by running curl localhost:<port>/crash
, substituting the port with one of the
workers that is still up on your instance. What happens on the HAProxy stats
page? If you crash all the workers, what status code does HAProxy return to you
when you make a request to the service?
-
Question 3a: Do you notice any pattern when you refresh the page multiple times?
-
Question 3b: What load balancing algorithm are you using?
-
Question 3c: What did you add to the haproxy config? (just copy and paste the lines you added to the bottom into here)
-
Question 3d: What do you notice has changed on the stats page after adding health checks? What color are each of the servers in the backend now?
-
Question 3e: What changes in the stats page when you crash a worker? What happened to the pattern from before?
-
Question 3f: What HTTP status code (or error message) does HAProxy return if you crash all of the workers?
Make sure to add port 53 to be allowed through your firewall with ufw
(set up
in lab a4) if you would like to access your DNS server from outside your VM.
Once you have set up your DNS server, try changing your laptop’s settings to
use your VM as a DNS server and navigate to http://example.com:7000
and you
should see the load-balanced services you set up. Also try navigating to
test.example.com
. What type of error do you see? Why do you think that this
causes a error and does not display the page that http://example.com normally
shows even though example.com resolves to the IP that you used
(93.184.216.34
)?
Also note that your DNS server is set up to only accept queries, especially
recursive queries, from within Berkeley networks. If you try to use it
off-campus somewhere, you will not be able to make queries to your DNS server.
This is because open relays are a security
problem that can be abused by attackers, so we’ve restricted your DNS server to
only accept queries from specific IP ranges that are more likely to be safe.
Again, remember to submit your answers on Gradescope!