@import "./support/support"; @import "./color_schemes/"; @import "./modules"; @import "./custom/custom";
Link

Lab a1 - Shell Scripting

Facilitator: Lance Mathias

12 min read

Table of contents

  1. A Note on Labs
    1. Workflow
    2. Shell Scripting Review
      1. Shebang!
      2. Piping
      3. Looping with for
    3. Useful Commands
      1. cat
      2. cut
      3. grep
      4. sed
      5. xargs
  2. Question 1
  3. Question 2
  4. Question 3
  5. Question 4 (Extra)
  6. Submission
  7. Footnotes

A Note on Labs

Labs are graded on completion. As long as you give your best effort and submit an answer to each question, you will receive full credit. Treat them as seeds of exploration instead of just a grade.

As this is the first lab, we have done our best to make it relatively straightforward. (If something seems overly difficult, there is probably a simpler way to do it!) Bash scripting isn’t the main goal of the DeCal but this lab should introduce you to some fun bash features you may not have encountered before, such as loops and shell expansions, that you’ll probably find useful in the future.

If you ever find yourself confused, stuck, and/or curious to learn more, talk to us about it! The best way to connect with us (and your peers) is through Piazza. You can also join our Discord channel or Slack channel.

Workflow

This lab can be done on your own UNIX-like machine, or you can ssh into tsunami.ocf.berkeley.edu using your OCF account to finish the lab there. As always, man and Google will be your friends.

If you’d like to test your scripts for correctness, feel free to run the provided examples or make some of your own! Since labs are graded on completion, there are no autograder tests or anything of the sort to worry about (which will also be true for the following labs). We will release sample solutions after the lab is due, but keep in mind that there are many ways to solve these problems.

Shell Scripting Review

A quick refresher on writing shell scripts. Feel free to skip anything that looks familiar.

Shebang!

Shell scripts typically begin with the shebang line: #!path/to/interpreter.

#! is a human-readable representation of a magic number 0x23 0x21 which can tell the shell to pass execution of the rest of the file to a specified interpreter. If your script is run as an executable (e.g. ./awesome_shell_script) with a shebang line, then the shell will invoke the executable (usually an interpreter) at path/to/interpreter to run your script. If your script is passed as an argument to an interpreter e.g. bash awesome_shell_script, then the shebang has no effect and bash will handle the script’s execution.

Why is this important? The shebang line can be considered a useful piece of metadata which passes the concern of how a script is executed from the user to the program’s author. awesome_shell_script could be a bash script, a python script, a ruby script, etc. The idea is that only the script’s behavior, not its implementation details, should matter to the user who calls the script.

You may have seen some variant of #!/bin/sh. Although initially referencing the Bourne shell, on modern systems sh has come to reference to the Shell Command Language, which is a POSIX specification with many implementations. sh is usually symlinked to one of these POSIX-compliant shells which implement the Shell Command Language. On Debian, for instance, sh is symlinked to the shell dash. It is important to note that bash does not comply with this standard, although running bash as bash --posix makes it more compliant.

Piping

We can use the | character to chain multiple commands together in one line. For instance: command1 | command2 will pass the output of command1 as the input to command2. We can repeat this as many times as we need.

Looping with for

Bash can repeat an operation on multiple objects with a for loop. The syntax is as follows:

for VARIABLE in LIST; do

    CONSEQUENT-COMMANDS

done

Indentation is not required, but makes the code easier to read. The LIST can either be something like a directory with multiple files or a file with multiple lines init, a list of files (file1 file2 file3), or even a range of numbers ({start..end}).

Useful Commands

Some commands which could be useful in completing the lab. Of course, there are many approaches to the problems, and using these commands is not required.

cat

cat prints files to the standard output. Very useful for printing something to pipe into other commands!

cut

cut [options] [filename] extracts certain parts of a file (or piped-in input) based on the arguments that are used. A few which might be useful:

-d allows us to change the delimeter, or change the character cut looks for to divide the string into chunks. If this option is omitted, tab is used.

-f allows us to specify a number corresponding to the field to return, e.g. cut -f1 -d" " would return the first word in a sentence. The number followed by a - returns all fields after the specified field as well, so cut -f1- -d" " would return the entire string.

--complement tells cut to reutrn evertything except for the specified field.

grep

grep [pattern] [filename] filters out and returns lines from a file (or piped-in input) that contain the specified pattern.

sed

sed can do many things, such as editing strings and matching regex. We can use sed to replace one pattern with another pattern as follows:

sed 's/<PATTERN-TO-REPLACE>/<NEW-PATTERN>/g <INPUT>'

sed` can also take piped-in input from something else instead of an explicit input.

The g at the end tells sed to replace all ocurrences of the pattern; it can be omitted if we want to replace only the first ocurrence of a pattern, or replaced with a number to replace only a certain number of occurrences.

xargs

xargs lets us apply a command to an output redirected from a pipe. For instance, output | xargs command would apply command to output. Some useful options:

-n1 tells xargs to apply the command to every item in output once if output has multiple items in it (such as a list of multiple strings)

-0 tells xargs to split items in the output by the null character, which signifies the end of a string, instead of using spaces. Paired with -n1, this means xargs will apply a command to every string, instead of breaking up strings into individual words and applying the command to every word.

As always, there are many more ways to use these commands, so use Google or man pages to learn more!

Question 1

I like Lisp and Scheme, and miss car and cdr in my usual programming tasks.1

In bash, implement car and cdr (aka head and tail) such that they operate on file paths.

e.g.

$ ./car /home/a/ab/abizer/some/path
home
$ ./cdr /home/a/ab/abizer/some/path
a/ab/abizer/some/path

You may assume that only absolute paths2 will be given.

Hint: There’s no need to use complicated string manipulations such as regex’s for this task. The easiest way to do this is with one very short command.


As an optional bonus challenge: generalize this solution to work for cadr, caddr, etc.

$ ./cadr /home/a/ab/abizer/some/path
a
$ ./cddr /home/a/ab/abizer/some/path
ab/abizer/some/path

Question 2

With the invention of the .norm file format, file extension innovation is at its peak!3

However, your computer is old and doesn’t support it, so we’ll need to convert all of the files ending in .norm into .docx files.

Using Bash functions and shell wildcard expansion, write a shell script rename.sh to batch rename file extensions in a particular directory.

Here is some more specific info about this function:

  • It should take in 3 arguments: the directory, the original extension, and the new extension.
  • It should print the line renaming <old file> to <new file> for each renamed file.
  • It should not modify any files in the directory that do not have the specified extension.

Example:

$ ls Documents/
cats.norm data.norm dogs.norm ...
$ ./rename Documents norm docx # Run your script!
renaming Documents/data.norm to Documents/data.docx
renaming Documents/cats.norm to Documents/cats.docx
renaming Documents/dogs.norm to Documents/dogs.docx
...
$ ls Documents/
cats.docx data.docx dogs.docx ...

Your script should be able to convert between any arbitrary file formats, not just .norm and .docx! For example:

$ ls
# Creates a new directory tmp and adds 26 new files a.dat, b.dat ... to z.dat into it
$ mkdir tmp && touch tmp/{a..z}.dat 
$ ./rename.sh tmp dat txt
renaming tmp/a.dat to tmp/a.txt
... # 24 more lines
renaming tmp/z.dat to tmp/z.txt
$ ls -lAh tmp | grep .txt | wc -l # Gets the number of lines in ls which contain .txt
26

Hint: Need help looping through files? See this week’s participation assignment on Gradescope for more hints on for loops!

for bonus points, instead of using something like sed to affect the rename, use shell parameter expansion.

Question 3

At some point, everyone has looked at a problem and thought to themselves: “Hey, I can do this in one line!”

Lets find out if you can. I need to sort out some of my most listened to albums by making directories for each of them, specifically for my favorite artist Future.

I have hosted a list of my favorite albums followed by their respective artist at https://raw.githubusercontent.com/0xcf/decal-labs/master/a1/albums.txt, in a comma delimited format like Die Lit, Playboi Carti. For the GOAT artist Future, I want to create a folder for each of his albums. For example, for an entry like SUPER SLIMEY, Future I would expect a directory called SUPER SLIMEY to be created.

TLDR: You need to fetch the list from the web, filter out the albums we want, trim out the album name, and then make a directory for each one, all in one line.

$ cat albums.txt
...
Drip or Drown 2, Gunna
Playboi Carti, Playboi Carti
DS2 (Deluxe), Future  <-- GOAT album detected!
Drip Harder, Lil Baby
The WIZRD, Future  <-- GOAT album detected!
What a Time To Be Alive, Drake
...

# After our magic one liner...

$ ls
'DS2 (Deluxe)'   'The WIZRD'   ...
# We got our new directories!

Hints:

  • If it’s easier, download the above file and work on it locally so you don’t have to process output from wget.
  • What common text manipulation commands can help you solve this?
  • Consider the commands cat | grep ___ | cut ___ | sed ___ | xargs ___ (though you don’t have to use them…)
  • As always, be aware that there isn’t one unique solution to this problem!

Submit your one line solution on Gradescope!

Question 4 (Extra)

This question is optional but it’s quite fun and you should do it if you have the time!

Using Bash functions, write a script mkrandom.sh that generates a user-specified number of files of user-specified size filled with random content.

e.g.

$ ./mkrandom.sh 10 100  # create 10 100 byte random files
$ ls -lAh
total 44K
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 1
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 10
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 2
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 3
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 4
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 5
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 6
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 7
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 8
-rw-r--r-- 1 abizer ocf  100 Sep 16 21:57 9
-rwxr-xr-x 1 abizer ocf  147 Sep 16 21:56 mkrandom

Submission

Submit your solutions on Gradescope! There’ll be some extra feedback questions as well that we would appreciate you filling out.

Footnotes

You may want to look into dd4 and the iflag=fullblock argument, seq, and /dev/random5.

  1. Aren’t too familiar with car and cdr? Here’s a brief article about it.. If you take CS61A, you’ll see it there as well! 

  2. As a quick reminder, absolute paths always start from the root directory (/), whereas relative paths start from the current directory. 

  3. Relevant xkcd: https://xkcd.com/2116/ 

  4. dd is a command used to copy files.6 It’s most commonly used to clone data from one device to another, such as when you want to generate a bootable Linux USB drive

  5. A curious individual might find the device file /dev/urandom as well. What’s the difference? True randomness is a rather difficult problem for computers, as they’re expected to do the same thing given the same state, so they pull in random data from metrics like internal temperature and mouse movement. Unfortunately, such entropy may not exist in certain machines and gathering entropy may be prohibitively long. Thus, /dev/urandom, or “unlimited random”, is a useful source when such randomness is not cryptographically critical. 

  6. “But wait,” a nearby straw-man asks, “isn’t that what cp does?”7 

  7. They are indeed right, but dd has some useful features such as partial writing and reading that make it handy in weirder scenarios, such as devices. StackOverflow has a good explainer and the ArchWiki has some common examples