Advanced Lab 1

Lecture: Thurs 2/1 Released: Mon 2/5 Due: End of Feb

Advanced section, Lab 1

Grading note

Labs are graded on completion. Treat this lab as seeds of exploration instead of just a grade. If you don’t pass on the first submission, you can have it checked off in-person by a decal facilitator.

Since you know how to use unix tools (though you may be more or less familiar with certain tools), the goal of this lab is to drop you in the wilderness. You can find your way out! :D

Composability & workflows

This lab can be done on your own UNIX-like machine, or you can ssh into tsunami.ocf.berkeley.edu using your OCF account to finish the lab there. As always, man and Google will be your friends.

Shell. web fetching, parsing, and frequency analysis.

As it turns out, Project Gutenberg doesn’t like users of curl, and will demand you enable javascript. It’s probably a resonable defense against naive abuse of curl by people scraping the site, but it’s annoying to deal with.

You can just tell the server that you’re Firefox instead:

curl -s -A "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/51.0.2704.79 Chrome/51.0.2704.79 Safari/537.36" https://www.gutenberg.org/

Credit to: gurditsbedi on medium

Hints:

Hints:

Questions (answer on Google form)

  1. What are the top 10 words, and their frequency?

niche tools & little support

jq. jshawn.

These tools allow you to parse, restructure, and create JSON documents on the command line. Today, we’re using jq. It should be installed already on tsunami

 curl 'http://api.geonames.org/postalCodeSearchJSON?postalcode=12345&maxRows=10&username=ocf_decal' -o location.json
cat location.json  \
        | jq '.postalCodes[] | select(.placeName=="Berkeley") | {"lat": .lat, 
"long": .lng}'

    location.json -> { key: [ {...}, {...}] }
    jq '.key'   -> [1,2]
    jq '.key[]' -> 1,2

    jq '.[] | select(.key == "value")'

    jq '. | { "k1": .key1, "k2": .key2 }'

Consider using the pipemill pattern with date. It can take an argument for time to display, but it requires some syntax mangling you can find in the man pages.

Questions

  1. When is the next flyovers of Berkeley? Berlin?

(Look up at the sky.)

Other APIs

SpaceX has a beautiful API presenting information about at least launches. Something you found fun or interesting (optional)

https://github.com/toddmotto/public-apis
https://github.com/jdorfman/awesome-json-datasets
https://www.data.gov/

Submission

Fill out the Google form