Lab 7 - Networked Services

Overview

Networking is key to many services because it allows processes and computers to communicate with each other. In this lab, we’ll work with a couple different types of services and set up a service of your own from scratch!

Make sure, as always, that you are doing all of these steps on your provided DigitalOcean VM (available at yourusername@yourusername.decal.xcf.sh), as we have provided some resources for you to use for this lab that are only accessible from your student VMs.

Which networked services are already running?

Connect to your VM using SSH, and then run sudo netstat -plunt (or sudo netstat -peanut if you’d prefer) to show the services running on your VM already. You should see something like this:

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address State  PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*       LISTEN 789/sshd
tcp6       0      0 :::22                   :::*            LISTEN 789/sshd
udp        0      0 10.138.132.55:123       0.0.0.0:*              792/ntpd
udp        0      0 10.46.0.38:123          0.0.0.0:*              792/ntpd
udp        0      0 159.65.76.196:123       0.0.0.0:*              792/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*              792/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*              792/ntpd
udp6       0      0 fe80::b0a7:c1ff:fef:123 :::*                   792/ntpd
udp6       0      0 fe80::38c5:f3ff:fe0:123 :::*                   792/ntpd
udp6       0      0 ::1:123                 :::*                   792/ntpd
udp6       0      0 :::123                  :::*                   792/ntpd

Why are there so many services already running? We haven’t even really done anything yet! Well, to start off with, sshd must have been running already, otherwise how would you have connected to the machine in the first place using SSH? However, the other service (ntpd) is a bit more mysterious. Let’s check it out!

$ man ntpd

DESCRIPTION

The ntpd program is an operating system daemon which sets and maintains the
system time of day in synchronism with Internet standard time servers. It is a
complete implementation of the Network Time Protocol (NTP) version 4, but also
retains compatibility with version 3, as defined by RFC-1305, and version 1 and
2, as defined by RFC-1059 and RFC-1119, respectively. ntpd does most
computations in 64-bit floating-point arithmetic and does relatively clumsy
64-bit fixed-point operations only when necessary to preserve the ultimate
precision, about 232 picoseconds. While the ultimate precision is not
achievable with ordinary workstations and networks of today, it may be required
with future gigahertz CPU clocks and gigabit LANs.

The last sentence of this description snippet above is pretty funny, because gigahertz CPU clocks and gigabit LANs are both pretty common these days! Anyway, as mentioned in the lecture and in the manpage description above, NTP is used for time synchronization on a computer using network time servers, so it would make sense for this to already be running to allow your VM to always have the correct system time. This is especially important for VMs compared to using unix on a physical system. VMs can often be quite far off in terms of system time if NTP is not running (if they are suspended and then later resumed for example, or if the host for the virtual machine is under heavy load). Here’s a pretty awesome post with a list of falsehoods that programmers believe about time, there’s a surprising number of them.

/etc/services

One tip that might help when trying to find what a service does is to look at which port it is listening on. For instance, from above, ntpd is listening on port 123. If you open the file /etc/services on most unix machines, you will get a list of protocols and the ports they typically use. Here are the lines for the port that ntpd is using:

ntp             123/tcp
ntp             123/udp                         # Network Time Protocol

This helps make it clearer that ntpd is most likely doing something with the Network Time Protocol, which in this case was pretty clear, however, if you have not seen the service before then /etc/services can be more useful. Keep in mind that any port can be used by any service but by convention they follow the mapping in /etc/services. Also keep in mind that for higher numbered ports (above 1024), that they can be used by any user if a service is not running on the port already, so this can be a security risk if you do not properly secure these ports. That being said, most people follow convention if possible to make their services easier to maintain, so checking /etc/services is a good first step if trying to figure out what a specific port/service is for.

Questions

To submit the lab, answer the questions in this Google form.

NFS

We have provided a NFS server for you to connect to at staff with two different directories, one read-only and one read-write. First install the nfs-common package so that you can mount directories over NFS. Then, use the mount command (remember to look at the man pages or search online if you do not recognize a command) to mount from staff:/opt/lab7/public (the remote directory) to your local directory at /opt/lab7/read-only. Once you do this, you should see a file with a secret inside it in /opt/lab7/read-only. You can tell if you are connected or not by running df and checking if there is something that looks like staff:/opt/lab7/public present in the list. What is the secret in the file? If NFS takes a excessive time to mount or you cannot read the file because it hangs while doing so, please let us know. Try creating a file in the read-only directory (note that you will want to try with sudo, otherwise you will get a permission denied error because root owns the directory mounted over NFS)

If you’d like to disconnect again, make sure you are not in the directory that has the file (otherwise it is unable to disconnect because it is still loaded and you will get an error message like umount.nfs4: /opt/lab7/read-only: device is busy). Then use umount to disconnect from NFS. If you run df, you should see that the entry that was present before has now disappeared.

Next, mount the directory at staff:/opt/lab7/private/<your username> to /opt/lab7/read-write using mount in a similar way to before. What do you see in /opt/lab7/read-write now? Follow the instructions in the file given there, note that you will have to use sudo here too to create a new file since the directory mounted over NFS is owned by root, not your user.

Again, if NFS takes an excessive time to mount during any of this or you cannot read files because it hands while doing so, please let us know on Piazza or by email (or at office hours if you’d prefer). We’ve had some problems in the past with NFS being very slow to mount/read and needing a restart.

DNS

In this section we are going to be setting up our own DNS server! Remember that DNS is the system that maps from a domain like ocf.berkeley.edu to an IP like 169.229.226.23 (and 2607:f140:8801::1:23 for IPv6) so that computers know how to send information over the network to servers without people having to remember a bunch of numbers to connnect to everything. A more thorough description of this is in Lab 5 if you’d like a refresher or want more information.

First, install the bind9 package on your VM to set up a DNS server. By default, the service is not running yet. What is the systemctl command to show if the bind9 service is running or not?

In the output of the systemctl command, you should see that the bind9 service is not running (yet) and has a unit file at /lib/systemd/system/bind9.service. If you print that file, you should see something like this:

[Unit]
Description=BIND Domain Name Server
Documentation=man:named(8)
After=network.target
Wants=nss-lookup.target
Before=nss-lookup.target

[Service]
EnvironmentFile=/etc/default/bind9
ExecStart=/usr/sbin/named -f $OPTIONS
ExecReload=/usr/sbin/rndc reload
ExecStop=/usr/sbin/rndc stop

[Install]
WantedBy=multi-user.target

This should look pretty familiar to you after the lecture on services! Don’t worry if it doesn’t all look familiar since there are some options you haven’t seen yet in here, but you should at least recognize some of the options used.

If you now run dig ocf.berkeley.edu @localhost from your VM, you should see that the command eventually times out after trying to run for about 15 seconds. This is because it is trying to send DNS requests to your VM, but the DNS server is not actually running yet so it doesn’t get a response. However, if @localhost is left off the end of the command, it succeeds. Why is this the case? What DNS server are requests currently being sent to if @localhost is not specified in the command?

Try starting the DNS server using the relevant systemctl command. If you check the status of the bind9 service after starting it, you should see the status has changed to say that the service is active and running.

If you now run dig ocf.berkeley.edu @localhost from your VM, you should now see a response containing the correct IP (169.229.226.23)!

Make sure to add port 53 to be allowed through your firewall with ufw (set up in lab a4) if you would like to access your DNS server from outside your VM.

Now to the exciting part, the configuration! Edit /etc/bind/named.conf.local with your favorite text editor. Inside this file, it should be empty apart from a few comments at the top because you haven’t done any local configuration yet. Add a new zone in this file for example.com with these contents:

zone "example.com" {
  type master;
  file "/etc/bind/db.example.com";
};

Then, create a file /etc/bind/db.example.com to contain the responses to give if anyone sends requests to your DNS server for example.com. The easiest way to do this is generally to copy an existing config and then make changes from there to get what you want for your config instead of having to start from scratch. To make this easier, we’ve provided a valid config at /opt/lab7/db.example.com that you can copy in place at /etc/bind/db.example.com. It is prefilled with your VM’s IP, and includes a subdomain that does not usually exist, named test.example.com. Please add few more records of your choice. Try to add one A record, and a couple of other types of records (CNAME, SRV, TXT, etc.). Make sure to reload the bind9 service after changing anything in /etc/bind9, since you want the running service to change its configuration.

If you now run the dig commands below, you should see that your VM’s domain name (<username>.decal.xcf.sh) is returned for the first result, for the second result (example.com) your VM’s IP address should be returned, and for test.example.com you should see 93.184.216.34 as the result.

What commands did you use to query for each of the records (including the ones you added)?

Make sure to run these commands from your VM, or if you want to run them from your laptop or from an OCF computer, substitute localhost in any commands with your VM’s domain name (it’ll be in the format <username>.decal.xcf.sh).

Load Balancing

For this section we will be using HAProxy, a commonly-used open-source load balancer. NGINX is actually starting to become a load balancer alongside being a web server, which is pretty interesting, but HAProxy is still commonly used.

First, grab the python file for the service you will be running from the decal-labs repo using wget or something similar to download it. When run (python3 server.py), this script will start up 6 different HTTP server workers listening on ports 8080 to 8085 (inclusive). Each worker returns different content to make it clear which one your are talking to for this lab (“Hello, I am ID 0” for instance), but in real usage they would generally all return the same content. You would still want something to distinguish between them (maybe a HTTP header saying which host or instance they are?), but only for debugging purposes, not like in this lab where they have actually differing content.

The idea behind using a load balancer is that requests will be spread out among instances so that if a lot of requests are coming in all at once, they will not overload any one instance. Another very useful feature is that if one of the instances happens to crash or become unavailable for whatever reason, another working server will be used instead. This requires some kind of health checks to be implemented to decide whether a server is healthy or not.

HAProxy has already been installed on your VM, but your job is to do the configuration to get it to work with the services you are given! The main config file is at /etc/haproxy/haproxy.cfg and you should only have to append to the end of this file to finish this lab. One snippet is provided here for you to add to the config already, this will give you a nice status page that you can use to see which of the servers is up or down:

listen stats
  bind    0.0.0.0:7001
  mode    http
  stats   enable
  stats   hide-version
  stats   uri /stats

Make sure to add ports 7000 and 7001 to be allowed through your firewall with ufw (set up in lab a4) to allow access to your load balancer server and the stats page from other computers.

After adding this, if you restart the haproxy service and open http://<username>.decal.xcf.sh:7001/stats in a web browser, you should see a page with a table and some statistics information on HAProxy (pid, sessions, bytes transferred, uptime, etc.).

Part 1: Configuration

Your goal is to add a backend and frontend to haproxy’s config that proxies to all of the running workers on the ports from 8080 to 8085 and listens on port 7000 on your VM, so that if you go to http://<username>.decal.xcf.sh:7000 you can see the responses from the workers. Try refreshing, what do you notice happening? Do you notice a pattern? What load balancing algorithm are you using from your observations? What config did you add to the haproxy config file to get this to work?

Part 2: Health Checks

Now, after adding all the servers to the backend in the config, add health checks for each of them. If you refresh the stats page, what do you notice has changed? What color are each of the servers in your backend?

Part 3: Crashing

If you make a request to http://<username>.decal.xcf.sh:7000/crash, it will crash the worker that you connect to. What changes in the HAProxy stats page? (Try refreshing a few times, the health checks can take a couple seconds to update the status from UP -> DOWN) If you make a lot of requests to http://<username>.decal.xcf.sh:7000 again, are all the servers present in the IDs that are returned in your requests or not? Try crashing a particular worker by running curl localhost:<port>/crash, substituting the port with one of the workers that is still up on your instance. What happens on the HAProxy stats page? If you crash all the workers, what status code does HAProxy return to you when you make a request to the service?

Extra Fun (optional questions)

Make sure to add port 53 to be allowed through your firewall with ufw (set up in lab a4) if you would like to access your DNS server from outside your VM.

Once you have set up your DNS server, try changing your laptop’s settings to use your VM as a DNS server and navigate to http://example.com:7000 and you should see the load-balanced services you set up. Also try navigating to test.example.com. What type of error do you see? Why do you think that this causes a error and does not display the page that http://example.com normally shows even though example.com resolves to the IP that you used (93.184.216.34)?

Also note that your DNS server is set up to only accept queries, especially recursive queries, from within Berkeley networks. If you try to use it off-campus somewhere, you will not be able to make queries to your DNS server. This is because open relays are a security problem that can be abused by attackers, so we’ve restricted your DNS server to only accept queries from specific IP ranges that are more likely to be safe.

Again, the form for getting this lab checked off can be found here.