Lab a1 - Shell Scripting
Facilitator: Lance Mathias
12 min readTable of contents
A Note on Labs
Labs are graded on completion. As long as you give your best effort and submit an answer to each question, you will receive full credit. Treat them as seeds of exploration instead of just a grade.
As this is the first lab, we have done our best to make it relatively straightforward. (If something seems overly difficult, there is probably a simpler way to do it!) Bash scripting isn’t the main goal of the DeCal but this lab should introduce you to some fun bash features you may not have encountered before, such as loops and shell expansions, that you’ll probably find useful in the future.
If you ever find yourself confused, stuck, and/or curious to learn more, talk to us about it! The best way to connect with us (and your peers) is through Piazza. You can also join our Discord channel or Slack channel.
Workflow
This lab can be done on your own UNIX-like machine, or you can ssh into
tsunami.ocf.berkeley.edu
using your OCF account to finish the lab there. As always,
man
and Google will be your friends.
If you’d like to test your scripts for correctness, feel free to run the provided examples or make some of your own! Since labs are graded on completion, there are no autograder tests or anything of the sort to worry about (which will also be true for the following labs). We will release sample solutions after the lab is due, but keep in mind that there are many ways to solve these problems.
Shell Scripting Review
A quick refresher on writing shell scripts. Feel free to skip anything that looks familiar.
Shebang!
Shell scripts typically begin with the shebang line: #!path/to/interpreter.
#!
is a human-readable representation of a magic number 0x23 0x21
which can tell the shell to pass execution of the rest of the file to a
specified interpreter. If your script is run as an executable (e.g.
./awesome_shell_script
) with a shebang line, then the shell will invoke the
executable (usually an interpreter) at path/to/interpreter
to run your
script. If your script is passed as an argument to an interpreter e.g. bash
awesome_shell_script
, then the shebang has no effect and bash
will handle
the script’s execution.
Why is this important? The shebang line can be considered a useful piece of
metadata which passes the concern of how a script is executed from the user
to the program’s author. awesome_shell_script
could be a bash
script, a
python
script, a ruby
script, etc. The idea is that only the script’s
behavior, not its implementation details, should matter to the user who calls
the script.
You may have seen some variant of #!/bin/sh
. Although initially referencing
the Bourne shell, on modern systems sh
has come to reference to the Shell
Command Language, which is a POSIX specification with many implementations.
sh
is usually symlinked to one of these POSIX-compliant shells which
implement the Shell Command Language. On Debian, for instance, sh
is
symlinked to the shell dash
. It is important to note that bash
does not
comply with this standard, although running bash
as bash --posix
makes it
more compliant.
Piping
We can use the |
character to chain multiple commands together in one line. For instance: command1 | command2
will pass the output of command1
as the input to command2
. We can repeat this as many times as we need.
Looping with for
Bash can repeat an operation on multiple objects with a for loop. The syntax is as follows:
for VARIABLE in LIST; do
CONSEQUENT-COMMANDS
done
Indentation is not required, but makes the code easier to read.
The LIST
can either be something like a directory with multiple files or a file with multiple lines init, a list of files (file1 file2 file3
), or even a range of numbers ({start..end}
).
Useful Commands
Some commands which could be useful in completing the lab. Of course, there are many approaches to the problems, and using these commands is not required.
cat
cat
prints files to the standard output. Very useful for printing something to pipe into other commands!
cut
cut [options] [filename]
extracts certain parts of a file (or piped-in input) based on the arguments that are used. A few which might be useful:
-d
allows us to change the delimeter, or change the character cut
looks for to divide the string into chunks. If this option is omitted, tab
is used.
-f
allows us to specify a number corresponding to the field to return, e.g. cut -f1 -d" "
would return the first word in a sentence. The number followed by a -
returns all fields after the specified field as well, so cut -f1- -d" "
would return the entire string.
--complement
tells cut
to reutrn evertything except for the specified field.
grep
grep [pattern] [filename]
filters out and returns lines from a file (or piped-in input) that contain the specified pattern.
sed
sed
can do many things, such as editing strings and matching regex. We can use sed
to replace one pattern with another pattern as follows:
sed 's/<PATTERN-TO-REPLACE>/<NEW-PATTERN>/g <INPUT>'
sed` can also take piped-in input from something else instead of an explicit input.
The g
at the end tells sed
to replace all ocurrences of the pattern; it can be omitted if we want to replace only the first ocurrence of a pattern, or replaced with a number to replace only a certain number of occurrences.
xargs
xargs
lets us apply a command to an output redirected from a pipe. For instance, output | xargs command
would apply command
to output
. Some useful options:
-n1
tells xargs
to apply the command to every item in output
once if output has multiple items in it (such as a list of multiple strings)
-0
tells xargs
to split items in the output by the null character, which signifies the end of a string, instead of using spaces. Paired with -n1
, this means xargs
will apply a command to every string, instead of breaking up strings into individual words and applying the command to every word.
As always, there are many more ways to use these commands, so use Google or man pages to learn more!
Question 1
I like Lisp and Scheme, and miss car
and cdr
in my usual programming tasks.1
In bash, implement car
and cdr
(aka head
and tail
) such that they
operate on file paths.
e.g.
$ ./car /home/a/ab/abizer/some/path
home
$ ./cdr /home/a/ab/abizer/some/path
a/ab/abizer/some/path
You may assume that only absolute paths2 will be given.
Hint: There’s no need to use complicated string manipulations such as regex’s for this task. The easiest way to do this is with one very short command.
As an optional bonus challenge: generalize this solution to work for cadr
, caddr
, etc.
$ ./cadr /home/a/ab/abizer/some/path
a
$ ./cddr /home/a/ab/abizer/some/path
ab/abizer/some/path
Question 2
With the invention of the .norm
file format, file extension innovation is at its peak!3
However, your computer is old and doesn’t support it, so we’ll need to convert all of the files ending in .norm
into .docx
files.
Using Bash functions and shell wildcard expansion, write a
shell script rename.sh
to batch rename file extensions in a particular directory.
Here is some more specific info about this function:
- It should take in 3 arguments: the directory, the original extension, and the new extension.
- It should print the line
renaming <old file> to <new file>
for each renamed file. - It should not modify any files in the directory that do not have the specified extension.
Example:
$ ls Documents/
cats.norm data.norm dogs.norm ...
$ ./rename Documents norm docx # Run your script!
renaming Documents/data.norm to Documents/data.docx
renaming Documents/cats.norm to Documents/cats.docx
renaming Documents/dogs.norm to Documents/dogs.docx
...
$ ls Documents/
cats.docx data.docx dogs.docx ...
Your script should be able to convert between any arbitrary file formats, not just .norm
and .docx
! For example:
$ ls
# Creates a new directory tmp and adds 26 new files a.dat, b.dat ... to z.dat into it
$ mkdir tmp && touch tmp/{a..z}.dat
$ ./rename.sh tmp dat txt
renaming tmp/a.dat to tmp/a.txt
... # 24 more lines
renaming tmp/z.dat to tmp/z.txt
$ ls -lAh tmp | grep .txt | wc -l # Gets the number of lines in ls which contain .txt
26
Hint: Need help looping through files? See this week’s participation assignment on Gradescope for more hints on for
loops!
for bonus points, instead of using something like sed
to affect the rename,
use shell parameter expansion.
Question 3
At some point, everyone has looked at a problem and thought to themselves: “Hey, I can do this in one line!”
Lets find out if you can. I need to sort out some of my most listened to albums by making directories for each of them, specifically for my favorite artist Future.
I have hosted a list of my favorite albums followed by their respective artist at
https://raw.githubusercontent.com/0xcf/decal-labs/master/a1/albums.txt
,
in a comma delimited format like Die Lit, Playboi Carti
. For the GOAT artist Future,
I want to create a folder for each of his albums. For example, for an entry like
SUPER SLIMEY, Future
I would expect a directory called SUPER SLIMEY
to be created.
TLDR: You need to fetch the list from the web, filter out the albums we want, trim out the album name, and then make a directory for each one, all in one line.
$ cat albums.txt
...
Drip or Drown 2, Gunna
Playboi Carti, Playboi Carti
DS2 (Deluxe), Future <-- GOAT album detected!
Drip Harder, Lil Baby
The WIZRD, Future <-- GOAT album detected!
What a Time To Be Alive, Drake
...
# After our magic one liner...
$ ls
'DS2 (Deluxe)' 'The WIZRD' ...
# We got our new directories!
Hints:
- If it’s easier, download the above file and work on it locally so you don’t have to process output from
wget
. - What common text manipulation commands can help you solve this?
- Consider the commands
cat | grep ___ | cut ___ | sed ___ | xargs ___
(though you don’t have to use them…) - As always, be aware that there isn’t one unique solution to this problem!
Submit your one line solution on Gradescope!
Question 4 (Extra)
This question is optional but it’s quite fun and you should do it if you have the time!
Using Bash functions, write a script mkrandom.sh
that generates a user-specified number
of files of user-specified size filled with random content.
e.g.
$ ./mkrandom.sh 10 100 # create 10 100 byte random files
$ ls -lAh
total 44K
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 1
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 10
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 2
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 3
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 4
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 5
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 6
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 7
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 8
-rw-r--r-- 1 abizer ocf 100 Sep 16 21:57 9
-rwxr-xr-x 1 abizer ocf 147 Sep 16 21:56 mkrandom
Submission
Submit your solutions on Gradescope! There’ll be some extra feedback questions as well that we would appreciate you filling out.
Footnotes
You may want to look into dd
4 and the iflag=fullblock
argument,
seq
, and /dev/random
5.
-
Aren’t too familiar with
car
andcdr
? Here’s a brief article about it.. If you take CS61A, you’ll see it there as well! ↩ -
As a quick reminder, absolute paths always start from the root directory (
/
), whereas relative paths start from the current directory. ↩ -
Relevant xkcd: https://xkcd.com/2116/ ↩
-
dd
is a command used to copy files.6 It’s most commonly used to clone data from one device to another, such as when you want to generate a bootable Linux USB drive. ↩ -
A curious individual might find the device file
/dev/urandom
as well. What’s the difference? True randomness is a rather difficult problem for computers, as they’re expected to do the same thing given the same state, so they pull in random data from metrics like internal temperature and mouse movement. Unfortunately, such entropy may not exist in certain machines and gathering entropy may be prohibitively long. Thus,/dev/urandom
, or “unlimited random”, is a useful source when such randomness is not cryptographically critical. ↩ -
“But wait,” a nearby straw-man asks, “isn’t that what
cp
does?”7 ↩ -
They are indeed right, but
dd
has some useful features such as partial writing and reading that make it handy in weirder scenarios, such as devices. StackOverflow has a good explainer and the ArchWiki has some common examples. ↩