Beginner Lab 4 - Debian, packages, compiling software
13 min readTable of contents
Introduction
What is a distribution?
A Linux distribution (often abbreviated as distro) is an operating system made from a software collection that is based upon the Linux kernel and, often, a package management system. -Wikipedia
Note that Linux is not the only kernel! There are alternative kernels such as the Berkeley Software Distribution (BSD) and Solaris.
What should I use?
There are around 300 actively developed Linux distributions listed on DistroWatch, so it’s often difficult for people to pick. People often switch between distros for months (known as “distro hopping”) to find their favorite.
Some popular distributions include Debian, Ubuntu, Arch Linux, Fedora CentOS, and Red Hat Enterprise Linux, but in this course, we’ll only be using Ubuntu.
What is Ubuntu?
Ubuntu is a popular Linux distribution built around ease of use. Ubuntu is based on the Debian project and and has spawned its own family of distributions. The defining characteristic of the Debian family of distributions is the use of the DEB package format and the dpkg/apt package managers.
But what about these “package managers”?
If you come from Windows or MacOS, you might be wondering what a package manager is. On those operating systems, the most common method of installing new programs is to run an installer that unpacks and copies files into the correct location, and updates the OS registry to reflect this.
Enter the package manager, a centralized way to install, update, and remove software from your computer using verified sources called repositories. Most package managers work in the same way: the package manager gets a list of packages from the repository, and then, when asked to install a package, fetches the package from the repository, verifies that it is legit, and installs it. A package manager is like a librarian. When a patron wants to read a book, the librarian consults a catalog (which is the package database) and then fetches the book from the shelf and gives it to the patron (installing the package). Sometimes, the librarian has to update the catalog because new editions of the books were added (updating the package database). When the patron is done, the librarian knows where to return the book (cleaning up after removing a package). Contrast this to the approach on Windows, where dedicated uninstaller programs are occasionally necessary as the OS does not know where programs install their own files.
Where do the packages come from??
If you use Windows or Mac, you are probably used to the process of searching for a program on Google, going to the developer’s website, and clicking the “Download” button. However, with packages being automatically looked up and fetched by a package manager in Linux, you might be wondering where these files are being downloaded from!
All around the world, hundreds of schools, companies, and organizations donate
a bit of their server capacity to store a copy of Ubuntu’s main package repository.
These copies are called mirrors, and are synced frequently as to ensure that
their contents remain nearly identical to the original. You can view your mirror
list in the file /etc/apt/sources.list
, which is often auto-generated on install
so that it prioritizes downloading from mirrors close by.
As a fun fact, the OCF also runs a mirror of the apt repositories (along with some other open source software)! The mirrors are available at mirrors.ocf.berkeley.edu and serve several petabytes of data per year.
Ubuntu: An Example
In this class, we will be focused on using Ubuntu. As noted before, Ubuntu uses
apt
or dpkg
as its package manager. dpkg
was the original tool used for
managing packages on Debian. apt
was created to make managing packages easier,
and introduced the concept of repositories (remote package sources).
We invoke the package manager by using apt
. Before installing anything, you
shoud update your local package metadata list (name, version, etc.) to ensure the package
manager installs the latest and most up-to-date packages. To do that, run:
$ sudo apt update
To find the package to install (note this doesn’t actually install any software, so sudo is not required):
$ apt search [package|description]
To install a package, run:
$ sudo apt install package
To remove a package, run:
$ sudo apt remove package
Easy? When you want to upgrade the packages that you have installed when new versions are released, you can do so by calling:
$ sudo apt upgrade
There are also other commands, such as removing unneeded dependencies (dependencies
are other packages a package needs), sudo apt autoremove
, and purging packages
(sudo apt purge
). You are encouraged to look at the man pages regarding these commands.
Exercise 0: Install a package!
We are going to install the C compiler GCC for the next step of the lab.
Simply run:
$ sudo apt install gcc
Now check if GCC is installed by running the following to check GCC’s version:
$ gcc --version
This, and all other parts of the lab, should be done on your DigitalOcean student VM, as you cannot install packages on OCF machines.
You should be able to connect to your student VM at <OCF username>.decal.xcf.sh
by running ssh <username>@<username>.decal.xcf.sh
and the password you set when
your first logged in, or using the email you received.
But what about software that isn’t in the repositories?
Sometimes it happens that a program you want to install hasn’t been packaged for your distribution, or hasn’t been packaged at all. You have several options in this scenario to install the software you want:
A Warning
Installing software not from a repository carries the same risks as installing software from a random .exe or .msi on Windows.
Linux is popularly considered more secure than Windows or MacOS because of its use of repositories – but a malicious package installed manually can pwn your system as easily as malware on any other OSes. Use common sense! Only manually install from sources you trust.
(For users of Ubuntu: PPAs, or third-party repositories, carry the same risks of running malicious third-party code.)
If the developers provide a package compatible with your distribution (.deb for Debian-based distros), you can download that package and install it using:
$ sudo apt install ./path/to/the/package.deb
Another way is to find a generic binary package from the developer. This can come in the form of shady shell scripts, or a binary tarball that you can just extract it and run, or an appimage, which is a special type of executable that includes its own dependencies.
The last way is to compile your software from scratch. What does that mean? Open source software must have its source code publicly available somehow, (GitHub, GitLab, their website). If you fetch their source code, it won’t magically run out of the box. The source code is like the recipe, while the software itself is like food. A package is like a dish that is put into a box but we won’t covering the details of making a package yourself (there are tools that do that for you and it varies from platform to platform).
Exercise 1: Creating a Package
So how do I compile?
Compiling software on Linux can be a mixed bag. Sometimes, all the dependencies (like libraries) are installed on your computer and there is no fiddling around. Sometimes, the dependencies don’t even exist pre-compiled for your distribution so you have to compile those yourself in order to compile what’s at hand. Most of the time, these steps are simplified through the use of a Makefile, which controls the Make build system.
In most source tarballs, there is usually a Makefile that contains a set of directives to compile a project. This is because there are usually multiple files across many directories that need to be compiled together into the final executable. On top of that, there are multiple settings that control, for example, optimizations, size of the executable, static vs. dynamic linking, and whether to link against system libraries or alternatives.
Many projects that have to be compiled are usually in C, C++, or similar
lower-level languages. On most Linux distributions, there are usually three
compiler “options”. There is the GNU Toolchain which provides gcc
or the GNU
C Compiler, LLVM which provides clang
, and Intel’s proprietary toolkit,
icc
. Both gcc
and clang
are open-source and free software.
To compile software that provides a Makefile, assuming you have the dependencies, simply type:
$ make
This is will usually choose the correct compiler and compile the whole project.
Once the compilation is done, the resulting executables are usually stored in the
./bin
or ./build
directories.
GCC and clang also have compiler flags that allow certain features to be enabled. Usually the flags that actually matter are optimization flags. Depending on what you want to optimize for, either space or memory or speed, there is a flag for it.
Now, we will make a very simple application in C that prints “Hello Penguin!”
named hellopenguin
. Run:
$ nano hellopenguin.c
to create a file named hellopenguin.c
, and type in the following:
#include <stdio.h>
int main()
{
printf("Hello Penguin!");
return 0;
}
Now save and exit.
We will now compile the source file that you have just written:
$ gcc hellopenguin.c -o hellopenguin
What this does is take the source file hellopenguin.c
and compile it, writing
the executable output to a file named hellopenguin
.
How do I package stuff?
Packaging manually for Debian can be very hard and frustrating, especially for first timers. That’s why for this class, we’ll be using a really cool Ruby package called fpm which simplifies the task of packaging a lot.
Note: This method is a great way to package your own applications more quickly, but isn’t up to the standards required for publishing a package to the official repositories. We won’t be covering them here, but if you are interested in learning more you can do so here.
First, make
sure ruby
and its own package manager called gem
is installed. If they
aren’t, run sudo apt install ruby ruby-dev rubygems build-essential
. Now run the following to
install fpm
locally:
$ gem install fpm --user
Try invoking fpm
. If it doesn’t work, check your ruby version with ruby --version
then add ~/.gem/ruby/<YOUR_VERSION>/bin
(where <YOUR_VERSION>
is something like 2.7.0
) to your
PATH
(list of directories to find executables). To do that, add this to your .bashrc
, or just type the following into
Bash to temporarily add it to your PATH:
$ export PATH=~/.gem/ruby/<YOUR_VERSION>/bin:$PATH
Now we will create a very simple package using the hellopenguin
executable
that you made above. First, we will make a new folder named packpenguin
and
move into it:
$ mkdir packpenguin
$ cd packpenguin
Now we will create the folder structure of where the executable will reside.
In Ubuntu, user-level packages usually reside in the folder /usr/bin/
.
$ mkdir -p usr/bin
Now move your hellopenguin
into the packpenguin/usr/bin/
folder.
$ cd ../ # cd into the directory where the hellpenguin executable is
$ mv hellopenguin packpenguin/usr/bin/
Now we will create a package called hellopenguin
. Move into the parent
directory of the hellopenguin folder and invoke the following:
$ fpm -s dir -t deb -n hellopenguin -v 1.0~ocf1 -C packpenguin
This specifies that you want to take in a directory, using the -s
flag, and
to output a .deb package using the -t
flag. It takes in a directory called
packpenguin
, using the -C
flag, and output a .deb file named hellopenguin,
using -n
, with a version number of 1.0~ocf1
, using the -v
flag.
Now test it by invoking apt
and installing it, replacing <version+arch>
with the appropriate version and architecture that the package is built for,
which will be provided by fpm when the package is built:
$ sudo apt install ./hellopenguin_<version+arch>.deb
Now you should be able to run hellopenguin
by doing the following:
$ hellopenguin
If all of this works, you’re ready to head over to Gradescope and answer the questions there!
Exercise 2: Spelunking
Let’s shift gears a bit and take a look at a popular package to learn more about how it’s structured! For this next section, we will choose a package from the Debian repository to download and extract.
Note that this exercise is mainly for exploration and learning purposes- you wouldn’t actually install a package using this method.
Step 1: Choose a package Here’s a list of packages published on the Debian repository. If you see one that seems interesting or you’ve used before, note its name and you can use it for this exercise!
If you are unsure or would like suggestions, here are some examples: htop
, less
, git
Step 2: Download and extract the package
Once you’ve decided on a package, you can download it from the terminal using the command apt download <packagename>
. You should now have a .deb
file in your current directory.
In order to extract the .deb
, you can use the command: ar x <your .deb file>
. Now, you should have two more files, control.tar.gz
and data.tar.gz
.
You can extract these in whichever way you like. One relatively easy way to do so is via the aunpack
command, which is available if you sudo apt install atool
.
Now, you should have two folders named control
and data
.
control
contains installation scripts and general package information.data
contains the actual stuff in the package that will be installed to your filesystem. Common folders that exist withindata
areusr/bin
(binaries) andusr/share
(documentation).
Step 3: Explore!
Using commands like cd
, cat
, vim
, and less
, spend some time looking through the files to see what’s inside a typical package. While many of the scripts may be hard to understand or somewhat irrelevant, see if you can learn anything about which folders they are located in and what their general purpose is!
Now, answer the following questions on Gradescope:
- What package did you choose?
- What are the package’s dependencies? What file can you find them in?
- What’s one interesting thing you learned about this package? (Binaries you never knew existed, easter eggs in documentation, a cool pre-install script…)