HttpCommand
Ignorance more frequently begets confidence than does knowledge: it is those who know little, not those who know much, who so positively assert that this or that problem will never be solved by science. –Charles Darwin
Ah yes, the web. I’m sure you’ve heard of it. Dyalog has a nifty http client library built in, called HttpCommand
. In order to make us of it, we first need to load it up:
This loads the HttpCommand
class, calling it hc
. We could also have used the Dyalog user command ]load HttpCommand
, which loads it as HttpCommand
– but who wants to type all that? The above approach is also usable programmatically.
Kanye.rest
There is a handy web service delivering random Kanye West quotations we can put to good use, kanye.rest
, to demonstrate this. Kanye as a Service?
[rc: 0 | msg: "" | HTTP Status: 200 "OK" | ⍴Data: 25]
What did we get back? Let’s look in the headers to start with:
┌────────────┬────────────────┐
│Content-Type│application/json│
└────────────┴────────────────┘
We know JSON; good. Let’s unpack that.
We can use the handy user command ]map
to show what the inside of a namespace looks like (and yes, there is no way you’d ever discover its existence without being told):
#.[JSON object]
· ~ quote
So the payload we’re interested in is in the quote
field of the body
namespace:
If we already know we’re dealing with JSON, there is a handy shortcut, called GetJSON
:
There are people sleeping in parking lots
GetJSON
will do a couple of things for us behind the scenes. It will unpack the JSON body of the http response. If we’re POSTing to the url, it will also treat a parameter namespace as the JSON body of the request; we’ll look at that in more detail below. One last pearl of wisdom from Kanye:
I feel calm but energized
Thanks for that, Kanye. Here’s the seminal Gold Digger, feat. Jamie Foxx. If you’re easily offended by colorful language, that’s probably not a choon for you.
A more complex API
Anyway. Back to APL. Let’s look at a more complex API to examine a large data set. The Cloudant database https://skruger.cloudant.com/airaccidents
contains a large data set drawn from the FAA, listing air accident reports. Cloudant is a db as a service running the open source CouchDB database, a JSON-over-HTTP distributed document store. Let’s pull a few documents from it and see what they look like.
We’re going to hit the _all_docs
endpoint, but as this is a large database, we only want to fetch a few documents. In order to do so, we pass the parameters limit=3
and include_docs=true
on the URL.
We know it’s going to be JSON, as everything in CouchDB is JSON.
#.[JSON object]
· ~ offset rows total_rows
For the _all_docs
API endpoint, the data is returned under rows
, and if we set the include_docs
parameter, each of those entries will have a doc
field, containing the document itself. Let’s look at the first one.
#.[JSON object].[JSON object].[JSON object]
· ~ Country Latitude Location Longitude Make Model Schedule _id _rev ⍙Accident⍙32⍙Number ⍙Aircraft⍙32⍙Category ⍙Aircraft⍙32⍙Damage ⍙Airport⍙32⍙Code ⍙Airport⍙32⍙Name ⍙Air⍙32⍙Carrier ⍙Amateur⍙32⍙Built ⍙Broad⍙32⍙Phase⍙32⍙of⍙32⍙Flight ⍙Engine⍙32⍙Type ⍙Event⍙32⍙Date ⍙Event⍙32⍙Id ⍙FAR⍙32⍙Description ⍙Injury⍙32⍙Severity ⍙Investigation⍙32⍙Type ⍙Number⍙32⍙of⍙32⍙Engines ⍙Publication⍙32⍙Date ⍙Purpose⍙32⍙of⍙32⍙Flight ⍙Registration⍙32⍙Number ⍙Report⍙32⍙Status ⍙Total⍙32⍙Fatal⍙32⍙Injuries ⍙Total⍙32⍙Minor⍙32⍙Injuries ⍙Total⍙32⍙Serious⍙32⍙Injuries ⍙Total⍙32⍙Uninjured ⍙Weather⍙32⍙Condition
Yuck. What is that‽ So the JSON field names aren’t valid APL names, meaning Dyalog had to mangle them when converting to namespaces. We can read them like that if we want to, for example
but it sure hurts the eyes. A handy trick if we want to quickly peer into a nested JSON namespace thing is to… turn it back into JSON, but nicer:
{
"Country": "United States",
"Latitude": "",
"Location": "1/4NM S. OF PEO, OR",
"Longitude": "",
"Make": "BELLANCA",
"Model": "7GCBC",
"Schedule": "",
"_id": "42788bb50806d9cc7770d46953000c99",
"_rev": "1-0f7d34e40be0c3d4db805cf52c4adf42",
"Accident Number": "SEA83FYM01",
"Aircraft Category": "Airplane",
"Aircraft Damage": "Substantial",
"Airport Code": "",
"Airport Name": "",
"Air Carrier": "",
"Amateur Built": "No",
"Broad Phase of Flight": "CLIMB",
"Engine Type": "Reciprocating",
"Event Date": "10/10/1982",
"Event Id": "20020917X05139",
"FAR Description": "Part 91: General Aviation",
"Injury Severity": "Fatal(1)",
"Investigation Type": "Accident",
"Number of Engines": "1",
"Publication Date": "10/10/1983",
"Purpose of Flight": "Personal",
"Registration Number": "N57457",
"Report Status": "Probable Cause",
"Total Fatal Injuries": "1",
"Total Minor Injuries": "1",
"Total Serious Injuries": "0",
"Total Uninjured": "0",
"Weather Condition": "VMC"
}
As before, we could have used GetJSON
instead, and we can utilize the scalar extension behavior of arrays of namespaces to pick out all the embedded docs in the rows
field of the CouchDB _all_docs
response.
A quirk with GetJSON
if you’re used to, say, Python’s requests
library, is that it will encode any parameters given as JSON and pass those in the request body, which isn’t going to work against the CouchDB API, so we need to tag on the parameters on the URL ourselves first:
#.HttpCommand.[JSON object].[JSON object].[JSON object] #.HttpCommand.[JSON object].[JSON object].[JSON object] #.HttpCommand.[JSON object].[JSON object].[JSON object]
The other option is to convert the data to an array instead:
┌─┬──────────────────────┬──────────────────────────────────┬─┐
│0│ │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│total_rows │68389 │3│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│offset │0 │3│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│rows │ │2│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│2│ │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│id │42788bb50806d9cc7770d46953000c99 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│key │42788bb50806d9cc7770d46953000c99 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│value │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│rev │1-0f7d34e40be0c3d4db805cf52c4adf42│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│doc │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_id │42788bb50806d9cc7770d46953000c99 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_rev │1-0f7d34e40be0c3d4db805cf52c4adf42│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Air Carrier │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Category │Airplane │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Injury Severity │Fatal(1) │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Id │20020917X05139 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Country │United States │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Name │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Fatal Injuries │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Minor Injuries │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Uninjured │0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Latitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Amateur Built │No │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Date │10/10/1982 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Report Status │Probable Cause │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Code │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│FAR Description │Part 91: General Aviation │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Schedule │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Publication Date │10/10/1983 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Longitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Make │BELLANCA │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Model │7GCBC │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Engine Type │Reciprocating │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Serious Injuries│0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Purpose of Flight │Personal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Broad Phase of Flight │CLIMB │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Weather Condition │VMC │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Damage │Substantial │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Accident Number │SEA83FYM01 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Location │1/4NM S. OF PEO, OR │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Registration Number │N57457 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Number of Engines │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Investigation Type │Accident │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│2│ │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│id │42788bb50806d9cc7770d469530012b9 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│key │42788bb50806d9cc7770d469530012b9 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│value │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│rev │1-4aae09af2230b2cbb6ffec5fa2767e56│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│doc │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_id │42788bb50806d9cc7770d469530012b9 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_rev │1-4aae09af2230b2cbb6ffec5fa2767e56│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Air Carrier │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Category │Airplane │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Injury Severity │Non-Fatal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Id │20020917X05078 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Country │United States │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Name │CHEMUNG CO. │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Fatal Injuries │0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Minor Injuries │0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Uninjured │2 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Latitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Amateur Built │No │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Date │10/10/1982 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Report Status │Probable Cause │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Code │ELM │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│FAR Description │Part 91: General Aviation │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Schedule │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Publication Date │10/10/1983 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Longitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Make │GLOBE │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Model │SWIFT GC-1B │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Engine Type │Reciprocating │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Serious Injuries│0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Purpose of Flight │Personal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Broad Phase of Flight │APPROACH │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Weather Condition │VMC │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Damage │Substantial │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Accident Number │NYC83LA007 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Location │ELMIRA, NY │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Registration Number │N50BS │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Number of Engines │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Investigation Type │Accident │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│2│ │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│id │42788bb50806d9cc7770d46953001c8c │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│key │42788bb50806d9cc7770d46953001c8c │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│value │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│rev │1-0b3960a8c809e5b33b0cfb122c2fb7bb│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│3│doc │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_id │42788bb50806d9cc7770d46953001c8c │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│_rev │1-0b3960a8c809e5b33b0cfb122c2fb7bb│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Air Carrier │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Category │Airplane │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Injury Severity │Non-Fatal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Id │20020917X05079 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Country │United States │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Name │SULLIVAN CO. INT'L │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Fatal Injuries │0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Minor Injuries │0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Uninjured │3 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Latitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Amateur Built │No │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Event Date │10/10/1982 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Report Status │Probable Cause │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Airport Code │MSV │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│FAR Description │Part 91: General Aviation │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Schedule │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Publication Date │10/10/1983 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Longitude │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Make │CESSNA │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Model │172P │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Engine Type │Reciprocating │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Total Serious Injuries│0 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Purpose of Flight │Personal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Broad Phase of Flight │MANEUVERING │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Weather Condition │VMC │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Aircraft Damage │Substantial │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Accident Number │NYC83LA008 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Location │GRAHAMSVILLE, NY │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Registration Number │N54177 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Number of Engines │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│4│Investigation Type │Accident │4│
└─┴──────────────────────┴──────────────────────────────────┴─┘
Which is actually quite suitable for this data – the documents are completely flat. The documents themselves are at “depth 4”, as indicated by the first column.
The database has a few handy indexes, too, which in CouchDB-speak is called views. Let’s look at a couple of those. The first view allows us to fetch documents based on the make of aircraft. Here’s the first entry in the index where the make of the plane involved was a Cessna:
┌─┬──────────┬────────────────────────────────┬─┐
│0│ │ │1│
├─┼──────────┼────────────────────────────────┼─┤
│1│total_rows│68387 │3│
├─┼──────────┼────────────────────────────────┼─┤
│1│offset │13905 │3│
├─┼──────────┼────────────────────────────────┼─┤
│1│rows │ │2│
├─┼──────────┼────────────────────────────────┼─┤
│2│ │ │1│
├─┼──────────┼────────────────────────────────┼─┤
│3│id │5b97c6d78b17b37ceff620baf9657693│4│
├─┼──────────┼────────────────────────────────┼─┤
│3│key │Cessna │4│
├─┼──────────┼────────────────────────────────┼─┤
│3│value │1 │3│
└─┴──────────┴────────────────────────────────┴─┘
This is a materialised view, keyed on make. The view iteself contains no particularly interesting information beyond the document id and the “value” 1. We can fetch this document by the id:
┌─┬──────────────────────┬──────────────────────────────────┬─┐
│0│ │ │1│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│_id │5b97c6d78b17b37ceff620baf9657693 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│_rev │1-8d2e3c5096718be0236ef51ff7e0f153│4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Air Carrier │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Aircraft Category │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Injury Severity │Non-Fatal │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Event Id │20080611X00825 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Country │United States │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Airport Name │PORTLAND-TROUTDALE │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Total Fatal Injuries │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Total Minor Injuries │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Total Uninjured │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Latitude │45.549444 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Amateur Built │No │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Event Date │05/24/2008 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Report Status │Probable Cause │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Airport Code │TTD │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│FAR Description │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Schedule │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Publication Date │06/30/2008 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Longitude │-122.401389 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Make │Cessna │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Model │172P │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Engine Type │Reciprocating │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Total Serious Injuries│ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Purpose of Flight │Instructional │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Broad Phase of Flight │ │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Weather Condition │VMC │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Aircraft Damage │Substantial │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Accident Number │LAX08CA156 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Location │Troutdale, OR │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Registration Number │N62348 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Number of Engines │1 │4│
├─┼──────────────────────┼──────────────────────────────────┼─┤
│1│Investigation Type │Accident │4│
└─┴──────────────────────┴──────────────────────────────────┴─┘
but perhaps more interesting is that we can do aggregations if we enable the reduce
part of the view. We can also exploit the CouchDB API a bit further by using a POST instead, noting that we again treat the body and URL parameters separately. Let’s say we want to find the accident distribution, per make, for a make subset:
As we’re now relying on GetJSON
to encode our parameter list, we no longer need the ugly double-quotes in our list of keys.
Reductions in CouchDB views are similar to reduces in APL. All we did there was a +/
over the values in the view, which as we saw earlier was a “1”, grouping by key:
[
{
"key": "Cessna",
"value": 7728
},
{
"key": "Boeing",
"value": 771
},
{
"key": "Airbus",
"value": 13
}
]
Now we’re running the risk of making this about the CouchDB API, but this is quite an interesting data set. I made a video a long time ago about it and how to process the data with map-reduce using CouchDB, and this was all inspired by a very old blog post from Cloudant founder, Mike Miller.
Other useful bits
You can pass a left argument 1 to HttpCommand
’s functions to inspect what the request would have looked like had it been issued:
GET / HTTP/1.1
Host: api.kanye.rest
Content-Type: application/json
User-Agent: Dyalog/Conga
Accept: */*
Accept-Encoding: gzip, deflate
HttpCommand will strip basic auth params passed on the URL and turn them into a header instead:
GET / HTTP/1.1
Host: example.com
User-Agent: Dyalog/Conga
Accept: */*
Accept-Encoding: gzip, deflate
Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Web scraping
The Dyalog student competition in 2020 had a web-scraping problem set, problem 3, from Phase 2, asking us to find all URLs referencing PDF-files off the competition website, https://www.dyalog.com/student-competition.htm. There will be spoilers here, so if you want to have a go yourself, stop reading here.
Still here? The suggestion is that we process the data as XML (sigh). Let’s grab that page and see what we can find:
What we get back from ⎕XML
is a matrix with columns for depth, tag, content, attribute and type. We care only about the tag and attribute columns:
To pick out URLS, we need to look at the anchor tags:
Each such tag has a set of attribute key-value pairs. Let’s grab those:
Now we can look through the attribute values to find things that end in .pdf
:
uploads/files/student_competition/2020_problems_phase1.pdf
uploads/files/student_competition/2020_problems_phase2.pdf
uploads/files/student_competition/2019_problems_phase1.pdf
As we can see, these are all relative URLs. To convert to absolute, we need to extract the base
, and prepend that:
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf
Putting it all together, we get something like
Here’s what we get:
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2012_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2011_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2010_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2009_problems.pdf
For the sake of completeness, we could equally have solved this with some filthy regexing if we were so inclined:
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase1.pdf
https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase2.pdf
https://www.dyalog.com/uploads/files/student_competition/2012_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2011_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2010_problems.pdf
https://www.dyalog.com/uploads/files/student_competition/2009_problems.pdf
So which one is better? I wrote a bit on this topic on my guest blog post on Dyalog’s blog. In this case, as we know that the page is valid XML, we can delegate a lot of complexity to the ⎕XML
function, such as different quotes, whitespace etc, which we’d need to be explicit about in anything regexy if we wanted it to be robust. However, regular expressions are hard to beat when looking for complex patterns in textual data. If the page had not been correct XML, it would have been a lot harder solving this problem without reaching for regular expressions.