Lab 5 - Introduction to Networking

Overview

It is undeniable that the internet is an important system that has redefined our world. The ability to develop networks and allow devices to communicate is critical to modern day computer systems. This lab will take a look into the basics of computer networking and then examine networks through the perspective of a sysadmin.

We will be using web browsing as an analogy to understand the basics of networking. What exactly happens when I go web browsing for cat pictures?

But first let’s take a short dive into the details of networking.


MAC

Media access control (MAC) addresses are identifiers uniquely assigned to network interfaces. alt text

Since the MAC address is unique this is often referred to as the physical address. The octets are often written in hexadecimal and delimited by colons. An example MAC address is 00:14:22:01:23:45. Note that the first 3 octets refer to the Organizationally Unique Identifier (OUI) which can help identify manufacturers. Fun fact – the 00:14:22 above is an OUI for Dell.

IP

IP addresses are means of identifying devices connected to a network under Internet Protocol. There are two versions of the internet protocol, IPv4 and IPv6, that which differ on the size of their addresses. An example IPv6 address is 2001:0db8:85a3:0000:0000:8a2e:0370:7334 which is considerably longer than an IPv4 address like 127.0.0.1. For the sake of time we will only go over IPv4, but IPv6 is certainly gaining ground and worth checking out!

IPv4 addresses are 32 bits, i.e. 4 bytes, long and are delimited by a dot (.) every byte. An example IPv4 address is 127.0.0.1. Coincidentally this address is known as the loopback address which maps to the loopback interface on your own machine. This allows network applications to communicate with one another if they are running on the same machine, in this case your machine. But why 127.0.0.1 and not 127.0.0.0 or 127.0.0.2?

The answer is that 127.0.0.1 is simply convention, but technically any address in the network block 127.0.0.0/8 is a valid loopback address. But what exactly is a network block?

In IPv4 we can partition a block of addresses into a subnet. This is written in a format known as CIDR. Let’s take the subnet above as an example 127.0.0.0/8. The number that comes after the slash (/), in this case 8, is the subnet mask. This represents how many bits are in the network address, the remaining bits identify a host within the network. In this case the network address is 127.0.0.0 and the Mask is 255.0.0.0. So 127.0.0.1 would be the first host in the 127.0.0.0/8 network and so on and so forth.

This diagram provides a visual breakdown of CIDR addressing alt text

ARP

Address Resolution Protocol (ARP) is a protocol used to resolve IP addresses to MAC addresses. In order to understand ARP, we first discuss two ways to send a frame, unicast and broadcast. In the context of Layer 2, unicasting a frame means to send that frame to exactly one MAC address. On the other hand, broadcasting a frame by sending it to the broadcast address means the frame should be sent to every device on the network, effectively “flooding” the local network.

For example let’s imagine a sender, A, who has MAC 00:DE:AD:BE:EF:00, broadcasting a message that essentially asks “Who has IP address 42.42.42.42 please tell A at 00:DE:AD:BE:EF:00”.

If a machine, B, with MAC 12:34:56:78:9a:bc has the IP address 42.42.42.42 they send a unicast reply back to the sender with the info “12:34:56:78:9a:bc has 42.42.42.42”. The sender stores this information in an arp table so whenever it receives packets meant for machine B i.e. a packet with an destination IP address of 42.42.42.42 it sends the packet to MAC it received from B.

In order to route IP packets, devices have what is known as a routing table. Routing entries are stored in the routing table and they are essentially rules that tell the device how packets should be forwarded based on IP. A routing entry specifies a subnet and the interface that corresponds to that entry. The device chooses an entry with a subnet that is most specific to a given packet and forwards it out the interface on that entry.

Routing tables are usually to also have a default gateway. This serves as the default catch all for packets in the absence of a more specific matching entry.

Take this routing table for example.

default via 10.0.2.2 dev eth0
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15
10.0.2.128/25 dev eth0  proto kernel  scope link  src 10.0.2.15
192.168.162.0/24 dev eth1  proto kernel  scope link  src 192.168.162.162

A packet destined for 8.8.8.8 would be forwarded out eth0, the default gateway. A packet destined for 10.0.2.1 would be forwarded according to the second entry, out of eth0. A packet destined for 10.0.2.254 would be forwarded according to the third entry, out of eth0. A packet destined for 192.168.162.254 would be forwarded according to the fourth entry, out of eth1.

DNS

We’ve gone over IP addresses and how they are means of communicating with a host over IP, but while IP addresses are machine friendly (computers love numbers) they aren’t exactly human friendly. It’s hard enough trying to remember phone numbers, memorizing 32 bit IP addresses isn’t going to be any easier.

But it’s much easier for us to remember names like www.google.com, www.facebook.com, or coolmath-games.com. So out of this conflict the Domain Name System (DNS) was born as a compromise between machine friendly IP addresses and human friendly domain names.

DNS is a system that maps a domain name like google.com to 172.217.6.78. When you query for google.com your computer sends out a DNS query for google.com to a DNS server. Assuming things are properly configured and google.com has a valid corresponding address you will receive a response from an authoritative server that essentially says “google.com has IP address x.x.x.x”.

Now let’s flush out this black magic a bit…

DNS Records

DNS servers store data in the form of Resource Records (RR). Resource records are essentially a tuple of (name, value, type, TTL). While there are a wide variety of types of DNS Records the ones we are most concerned with are

  1. A records name = hostname value = IP address

    This record is very simply the record that has the IP address for a given hostname, essentially the information we want to end up with.

  2. NS records name = domain value = name of dns server for domain

    This record points to another dns server that can provide an authoritative answer for the domain. Think of this as redirecting you to another nameserver.

  3. CNAME records name = alias value = canonical name

    These records point to the canonical name for a given alias for example docs.google.com would be an alias which simply points to documents.google.com try www.facebook.com

  4. MX records The record used by mail service.

TCP and UDP

Now we will transition into a discussion on the protocols at the transport layer. The two most well known protocols at this layer are Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).

TCP is a stateful stream oriented protocol that ensures reliable transport. Reliable transport essentially guarantees that information arrives wholly intact and in order at the destination.

TCP is a connection oriented protocol which means it must first establish a connection before sending any data. This connection exchanges information that is the mechanisms TCP uses to provide reliable transport amongst other features. A TCP connection begins with something known as the TCP handshake.

The TCP handshake consists of setting certain flags in the TCP header of packets exchanged between sender and receiver. The sender initiating a TCP connection by first sending a SYN, a packet with the SYN flag set. The server acknowledges this connection request by sending back a SYN-ACK, a packet with both the SYN and ACK flags set. The client acknowledges this by sending one final ACK back to the server, and the connection is then established.

TCP then begins transmitting data and if it successfully arrives on the other end of the connection then an ACK is issued. Therefore if data is lost, reordered, or corrupted, TCP is capable of recognizing this and sends a request for retransmission of any lost data.

TCP also has a procedure to close connections. We only consider a graceful termination here, abrupt terminations have a different procedure we will not go over. If you’re interested, CS168 has some great material here. Let’s assume machine A wants to close its connection to machine B.

A begins by sending a FIN. B must respond by sending a FIN and an ACK. If B only sends a ACK the connection persist and additional data can be sent until an FIN is sent. On the other hand B can also send just one packet with both FIN and ACK flags set, i.e. FIN+ACK if B is ready to close the connection and doesn’t need to send additional data Once A has received a FIN and an ACK it sends one last ACK to signal the connection termination.

UDP is stateless connectionless protocol. UDP focuses on sending messages in datagrams. Being connectionless UDP also doesn’t incur the overhead of the TCP handshake and termination. UDP also makes no guarantees about reliable transport so messages may be corrupted, arrive out of order, or not arrive at all. For this reason UDP is sometimes called Unreliable Datagram Protocol.

While UDP makes no guarantees about reliable transport it doesn’t suffer from the overhead of establishing and closing connections like in TCP. UDP is therefore ideal for usage cases where we just want to send packets quickly and losing a few of those isn’t disastrous.

Moreover, compared to TCP each UDP datagram sent needs to be individually received. While for TCP you pass a stream of data that is transparently split into some number of sends and the data stream is transparently reconstructed as a whole on the other end.

Ports (Optional)

Ports define a service endpoint, broadly speaking – ports mark a point of traffic ingress and egress. Whereas IP addresses connect hosts, ports connect process that run on such hosts. Only one process can be bound to a port at a time. Ports are represented by a 16 bit number meaning thus ranging from 0 to 65535. Ports from 0 to 1023 are well known ports, i.e. system ports. Using these ports usually has a stricter requirement. 1024 to 49151 are registered ports. IANA maintains the official list of well-known and registered ranges. The remaining ports from 49152 to 65535 are ephemeral ports which can be dynamically allocated for communication sessions on a per request basis.

Some port numbers for well known services are as follows:

Service Port
SSH 22
DNS 53
HTTP 80
HTTPS 443

Sysadmin Commands

As a sysadmin, trying to diagnose network issues can often be pretty challenging. Given the scale and complexity of networks, it’s tough trying to narrow down the scope of a problem to a point of failure. What follows is a list of commands/tools that can help with triaging problems. There are a lot of tools and we don’t expect you to memorize every single detail. However, it is important to know what tools exist and when to use them when problems inevitably arise. If you ever need more details the man pages for these commands are a great place to turn to for reference.

Tools also tend to overlap in functionality – for example there are multiple tools that can display interface information or test connectivity. When possible, it is a good idea to use multiple tools to cross-check one another.

Note that when it comes to real world networks there are even more factors to consider that we haven’t touched on like network security. For example, two machines can have a fully functioning connection but if one machine has been configured to drop all packets then it might seem as if they aren’t connected.

So take the output of these tools with a grain of salt, they a means of narrowing down issues. It is important not to misinterpret outputs or jump to conclusions too quickly.

  1. hostname
    A simple and straightforward command that can display information about a host, IP addresses, FQDN, and etc. Make sure to also check out host, which is a similar command that provides more detailed information by doing a lookup on a given name.

  2. ping
    Another simple command, most of the time you’ll be using ping as a first step towards testing connectivity. If ping can’t reach a host then there is likely an issue with connectivity. The ping tool does this by sending out ICMP messages to the host expecting a response. (More on the protocol here)

    Moreover ping also provides metrics for Round Trip Time (RTT) and packet loss. Round trip time is defined as the time it takes for a response to arrive after sending the ping packet. These can prove to be very useful statistics.

  3. traceroute
    Traceroute sends packets Time to Live (TTL) equal to the number of hops. Routers decrease the value of TTL for incoming packets. If a packet’s TTL = 0 then the router drops it and may send back diagnostic information to the source about the router’s identity. Otherwise the router continues forwarding the packet.

    Traceroute provides a detailed view of the routers that a packet traverses while on its way to a destination.

    If router does not respond within a timeout then traceroute prints an asterisk.

  4. arp
    Provides info on and the ability to manipulate the ARP cache of the system.

    With arp you can display the system arp table. Add, remove, or modify arp entries and much more.

  5. dig
    Utility for doing dns query and triaging DNS issues.

    Dig by default performs queries to nameservers in /etc/resolv.conf but some options allow you to: specify name server, choose query type (iterative vs recursive), and much more – making dig a very flexible DNS tool.

  6. ip
    ip is a command with many subcommands offering a lot of functionality – so much it can be overwhelming at first. You will most commonly be using ip to display/modify routing, IP addresses, or network interfaces.

    It will take time to get use to how much functionality is included in this command but for reference here is a pretty compact cheatsheet. A few common use cases include: ip addr which displays information on your IP addresses, ip route which displays information on your routing table, and ip link which displays information about your network interfaces.

  7. curl
    cURL does as its name suggests, and allows you to see the contents at certain URLs. Beyond this it’s also an extremely powerful program that lets you interact with and inspect servers over several different protocols certain protocols such as HTTP, FTP, etc …

    Be sure to check out its documentation for specific use cases.

  8. wget
    wget is quite similar to curl in the sense that they are both command line tools designed to transfer data from or to servers with certain protocols and both come with a bunch of features.

    There are differences between the commands, two notable examples being that wget is command line only meaning there no library or API. However, wget has a big advantage of being able to download recursively. You can read a bit more on the two tools here.

  9. netstat (Optional) This tool is good for printing network connections, routing tables, and probing sockets, amongst other functions.

    netstat also has functionality to probe sockets for activity and displays information such protocol (UDP/TCP)

    If you are investigating sockets ss and lsof are also options you may want to consider

  10. tcpdump (Optional) Perfect for monitoring incoming or outgoing traffic on a machine.

    tcpdump offers countless options when it comes to analyzing traffic: it can capture packets, log traffic, compute metrics, filter traffic, monitor specific interfaces, etc. As a primer you can check out these examples.

  11. nc (Optional) Netcat is a very powerful tool that can be used for just about anything involving TCP or UDP. It can open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and deal with both IPv4 and IPv6.

Exercises

Short Answer Questions

  1. Does HTTP use TCP or UDP and why? How about Discord and Skype, why?
  2. Who manufactured the NIC with mac address 52:54:00:d7:ce:cc?
  3. How many distinct hosts can 127.0.0.0/8 contain?
  4. What types of records do you get when you perform a DNS lookup of facebook.com?

Programming Exercise

  1. Write a shell script is_on.sh so that is_on.sh host shows whether host is online. If it is, show “OK”. If it’s not, show “Host is not reachable”. Don’t show anything else. Some clarifications:
    • A host is online here means the ping to the host is successful
    • Just ping the server once (we assume the internet connection is reliable and the packet will not be dropped)
    • You can use man ping to see how to make the ping only ping the server once, and what the return value of ping command means. Use if to decide what to print.
  2. Write a shell script mac.sh which processes the output of ip command and displays the MAC address of the network interface eth0 of your VM.
    • First figure out how to use ip command to get an output which contains the information we want
    • Then use head and tail command and pipes to tailor ip’s output to one line
    • Use cut command (Examples) to get the MAC address. Since we know that the MAC address has fixed length, feel free to count the indices.
    • The final shell script only has to have one line, although a answer with multiple lines are also acceptable.

Submission

Go to Gradescope for submission.