HttpCommand#

Ignorance more frequently begets confidence than does knowledge: it is those who know little, not those who know much, who so positively assert that this or that problem will never be solved by science. –Charles Darwin

⎕IO  0
]box on
]rows on
Was ON
Was OFF

Ah yes, the web. I’m sure you’ve heard of it. Dyalog has a nifty http client library built in, called HttpCommand. In order to make us of it, we first need to load it up:

hc  ⎕SE.SALT.Load'HttpCommand'

This loads the HttpCommand class, calling it hc. We could also have used the Dyalog user command ]load HttpCommand, which loads it as HttpCommand – but who wants to type all that? The above approach is also usable programmatically.

Kanye.rest#

There is a handy web service delivering random Kanye West quotations we can put to good use, kanye.rest, to demonstrate this. Kanye as a Service?

  resp  hc.Get 'https://api.kanye.rest/'
[rc: 0 | msg: "" | HTTP Status: 200 "OK" | ⍴Data: 25]

What did we get back? Let’s look in the headers to start with:

{([;0]∊⊂'Content-Type')} resp.Headers
┌────────────┬────────────────┐ │Content-Type│application/json│ └────────────┴────────────────┘

We know JSON; good. Let’s unpack that.

body  ⎕JSON resp.Data

We can use the handy user command ]map to show what the inside of a namespace looks like (and yes, there is no way you’d ever discover its existence without being told):

]map body  ⍝ list the fields
#.[JSON object] · ~ quote

So the payload we’re interested in is in the quote field of the body namespace:

body.quote
Manga all day

If we already know we’re dealing with JSON, there is a handy shortcut, called GetJSON:

(hc.GetJSON 'GET' 'https://api.kanye.rest/').Data.quote
There are people sleeping in parking lots

GetJSON will do a couple of things for us behind the scenes. It will unpack the JSON body of the http response. If we’re POSTing to the url, it will also treat a parameter namespace as the JSON body of the request; we’ll look at that in more detail below. One last pearl of wisdom from Kanye:

(hc.GetJSON 'GET' 'https://api.kanye.rest/').Data.quote
I feel calm but energized

Thanks for that, Kanye. Here’s the seminal Gold Digger, feat. Jamie Foxx. If you’re easily offended by colorful language, that’s probably not a choon for you.

A more complex API#

Anyway. Back to APL. Let’s look at a more complex API to examine a large data set. The Cloudant database https://skruger.cloudant.com/airaccidents contains a large data set drawn from the FAA, listing air accident reports. Cloudant is a db as a service running the open source CouchDB database, a JSON-over-HTTP distributed document store. Let’s pull a few documents from it and see what they look like.

We’re going to hit the _all_docs endpoint, but as this is a large database, we only want to fetch a few documents. In order to do so, we pass the parameters limit=3 and include_docs=true on the URL.

url  'https://skruger.cloudant.com/airaccidents/_all_docs'
(params⎕NS).(include_docs limit)  'true' 3
resp  hc.Get url params

We know it’s going to be JSON, as everything in CouchDB is JSON.

body  ⎕JSON resp.Data
]map body
#.[JSON object] · ~ offset rows total_rows

For the _all_docs API endpoint, the data is returned under rows, and if we set the include_docs parameter, each of those entries will have a doc field, containing the document itself. Let’s look at the first one.

]map body.rows[0].doc
#.[JSON object].[JSON object].[JSON object] · ~ Country Latitude Location Longitude Make Model Schedule _id _rev ⍙Accident⍙32⍙Number ⍙Aircraft⍙32⍙Category ⍙Aircraft⍙32⍙Damage ⍙Airport⍙32⍙Code ⍙Airport⍙32⍙Name ⍙Air⍙32⍙Carrier ⍙Amateur⍙32⍙Built ⍙Broad⍙32⍙Phase⍙32⍙of⍙32⍙Flight ⍙Engine⍙32⍙Type ⍙Event⍙32⍙Date ⍙Event⍙32⍙Id ⍙FAR⍙32⍙Description ⍙Injury⍙32⍙Severity ⍙Investigation⍙32⍙Type ⍙Number⍙32⍙of⍙32⍙Engines ⍙Publication⍙32⍙Date ⍙Purpose⍙32⍙of⍙32⍙Flight ⍙Registration⍙32⍙Number ⍙Report⍙32⍙Status ⍙Total⍙32⍙Fatal⍙32⍙Injuries ⍙Total⍙32⍙Minor⍙32⍙Injuries ⍙Total⍙32⍙Serious⍙32⍙Injuries ⍙Total⍙32⍙Uninjured ⍙Weather⍙32⍙Condition

Yuck. What is that‽ So the JSON field names aren’t valid APL names, meaning Dyalog had to mangle them when converting to namespaces. We can read them like that if we want to, for example

body.rows[0].doc.⍙Event⍙32⍙Date
10/10/1982

but it sure hurts the eyes. A handy trick if we want to quickly peer into a nested JSON namespace thing is to… turn it back into JSON, but nicer:

1(⎕JSON'Compact' 0)body.rows[0].doc
{ "Country": "United States", "Latitude": "", "Location": "1/4NM S. OF PEO, OR", "Longitude": "", "Make": "BELLANCA", "Model": "7GCBC", "Schedule": "", "_id": "42788bb50806d9cc7770d46953000c99", "_rev": "1-0f7d34e40be0c3d4db805cf52c4adf42", "Accident Number": "SEA83FYM01", "Aircraft Category": "Airplane", "Aircraft Damage": "Substantial", "Airport Code": "", "Airport Name": "", "Air Carrier": "", "Amateur Built": "No", "Broad Phase of Flight": "CLIMB", "Engine Type": "Reciprocating", "Event Date": "10/10/1982", "Event Id": "20020917X05139", "FAR Description": "Part 91: General Aviation", "Injury Severity": "Fatal(1)", "Investigation Type": "Accident", "Number of Engines": "1", "Publication Date": "10/10/1983", "Purpose of Flight": "Personal", "Registration Number": "N57457", "Report Status": "Probable Cause", "Total Fatal Injuries": "1", "Total Minor Injuries": "1", "Total Serious Injuries": "0", "Total Uninjured": "0", "Weather Condition": "VMC" }

As before, we could have used GetJSON instead, and we can utilize the scalar extension behavior of arrays of namespaces to pick out all the embedded docs in the rows field of the CouchDB _all_docs response.

A quirk with GetJSON if you’re used to, say, Python’s requests library, is that it will encode any parameters given as JSON and pass those in the request body, which isn’t going to work against the CouchDB API, so we need to tag on the parameters on the URL ourselves first:

(hc.GetJSON 'GET' (url,'?include_docs=true&limit=3')).Data.rows.doc
#.HttpCommand.[JSON object].[JSON object].[JSON object] #.HttpCommand.[JSON object].[JSON object].[JSON object] #.HttpCommand.[JSON object].[JSON object].[JSON object]

The other option is to convert the data to an array instead:

  body  ⎕JSON'M'  resp.Data
┌─┬──────────────────────┬──────────────────────────────────┬─┐ │0│ │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│total_rows │68389 │3│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│offset │0 │3│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│rows │ │2│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │2│ │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│id │42788bb50806d9cc7770d46953000c99 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│key │42788bb50806d9cc7770d46953000c99 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│value │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│rev │1-0f7d34e40be0c3d4db805cf52c4adf42│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│doc │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_id │42788bb50806d9cc7770d46953000c99 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_rev │1-0f7d34e40be0c3d4db805cf52c4adf42│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Air Carrier │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Category │Airplane │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Injury Severity │Fatal(1) │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Id │20020917X05139 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Country │United States │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Name │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Fatal Injuries │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Minor Injuries │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Uninjured │0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Latitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Amateur Built │No │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Date │10/10/1982 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Report Status │Probable Cause │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Code │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│FAR Description │Part 91: General Aviation │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Schedule │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Publication Date │10/10/1983 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Longitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Make │BELLANCA │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Model │7GCBC │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Engine Type │Reciprocating │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Serious Injuries│0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Purpose of Flight │Personal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Broad Phase of Flight │CLIMB │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Weather Condition │VMC │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Damage │Substantial │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Accident Number │SEA83FYM01 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Location │1/4NM S. OF PEO, OR │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Registration Number │N57457 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Number of Engines │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Investigation Type │Accident │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │2│ │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│id │42788bb50806d9cc7770d469530012b9 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│key │42788bb50806d9cc7770d469530012b9 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│value │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│rev │1-4aae09af2230b2cbb6ffec5fa2767e56│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│doc │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_id │42788bb50806d9cc7770d469530012b9 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_rev │1-4aae09af2230b2cbb6ffec5fa2767e56│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Air Carrier │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Category │Airplane │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Injury Severity │Non-Fatal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Id │20020917X05078 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Country │United States │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Name │CHEMUNG CO. │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Fatal Injuries │0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Minor Injuries │0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Uninjured │2 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Latitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Amateur Built │No │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Date │10/10/1982 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Report Status │Probable Cause │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Code │ELM │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│FAR Description │Part 91: General Aviation │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Schedule │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Publication Date │10/10/1983 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Longitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Make │GLOBE │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Model │SWIFT GC-1B │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Engine Type │Reciprocating │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Serious Injuries│0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Purpose of Flight │Personal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Broad Phase of Flight │APPROACH │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Weather Condition │VMC │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Damage │Substantial │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Accident Number │NYC83LA007 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Location │ELMIRA, NY │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Registration Number │N50BS │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Number of Engines │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Investigation Type │Accident │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │2│ │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│id │42788bb50806d9cc7770d46953001c8c │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│key │42788bb50806d9cc7770d46953001c8c │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│value │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│rev │1-0b3960a8c809e5b33b0cfb122c2fb7bb│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │3│doc │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_id │42788bb50806d9cc7770d46953001c8c │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│_rev │1-0b3960a8c809e5b33b0cfb122c2fb7bb│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Air Carrier │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Category │Airplane │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Injury Severity │Non-Fatal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Id │20020917X05079 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Country │United States │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Name │SULLIVAN CO. INT'L │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Fatal Injuries │0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Minor Injuries │0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Uninjured │3 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Latitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Amateur Built │No │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Event Date │10/10/1982 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Report Status │Probable Cause │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Airport Code │MSV │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│FAR Description │Part 91: General Aviation │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Schedule │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Publication Date │10/10/1983 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Longitude │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Make │CESSNA │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Model │172P │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Engine Type │Reciprocating │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Total Serious Injuries│0 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Purpose of Flight │Personal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Broad Phase of Flight │MANEUVERING │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Weather Condition │VMC │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Aircraft Damage │Substantial │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Accident Number │NYC83LA008 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Location │GRAHAMSVILLE, NY │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Registration Number │N54177 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Number of Engines │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │4│Investigation Type │Accident │4│ └─┴──────────────────────┴──────────────────────────────────┴─┘

Which is actually quite suitable for this data – the documents are completely flat. The documents themselves are at “depth 4”, as indicated by the first column.

The database has a few handy indexes, too, which in CouchDB-speak is called views. Let’s look at a couple of those. The first view allows us to fetch documents based on the make of aircraft. Here’s the first entry in the index where the make of the plane involved was a Cessna:

url  'https://skruger.cloudant.com/airaccidents/_design/make/_view/by-make'
(params  ⎕NS).(limit reduce key)  1 'false' '"Cessna"'
resp  hc.Get url params
  body  ⎕JSON'M'  resp.Data
┌─┬──────────┬────────────────────────────────┬─┐ │0│ │ │1│ ├─┼──────────┼────────────────────────────────┼─┤ │1│total_rows│68387 │3│ ├─┼──────────┼────────────────────────────────┼─┤ │1│offset │13905 │3│ ├─┼──────────┼────────────────────────────────┼─┤ │1│rows │ │2│ ├─┼──────────┼────────────────────────────────┼─┤ │2│ │ │1│ ├─┼──────────┼────────────────────────────────┼─┤ │3│id │5b97c6d78b17b37ceff620baf9657693│4│ ├─┼──────────┼────────────────────────────────┼─┤ │3│key │Cessna │4│ ├─┼──────────┼────────────────────────────────┼─┤ │3│value │1 │3│ └─┴──────────┴────────────────────────────────┴─┘

This is a materialised view, keyed on make. The view iteself contains no particularly interesting information beyond the document id and the “value” 1. We can fetch this document by the id:

url  'https://skruger.cloudant.com/airaccidents/5b97c6d78b17b37ceff620baf9657693'
resp  hc.Get url
  body  ⎕JSON'M'  resp.Data
┌─┬──────────────────────┬──────────────────────────────────┬─┐ │0│ │ │1│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│_id │5b97c6d78b17b37ceff620baf9657693 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│_rev │1-8d2e3c5096718be0236ef51ff7e0f153│4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Air Carrier │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Aircraft Category │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Injury Severity │Non-Fatal │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Event Id │20080611X00825 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Country │United States │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Airport Name │PORTLAND-TROUTDALE │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Total Fatal Injuries │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Total Minor Injuries │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Total Uninjured │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Latitude │45.549444 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Amateur Built │No │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Event Date │05/24/2008 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Report Status │Probable Cause │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Airport Code │TTD │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│FAR Description │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Schedule │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Publication Date │06/30/2008 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Longitude │-122.401389 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Make │Cessna │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Model │172P │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Engine Type │Reciprocating │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Total Serious Injuries│ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Purpose of Flight │Instructional │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Broad Phase of Flight │ │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Weather Condition │VMC │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Aircraft Damage │Substantial │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Accident Number │LAX08CA156 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Location │Troutdale, OR │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Registration Number │N62348 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Number of Engines │1 │4│ ├─┼──────────────────────┼──────────────────────────────────┼─┤ │1│Investigation Type │Accident │4│ └─┴──────────────────────┴──────────────────────────────────┴─┘

but perhaps more interesting is that we can do aggregations if we enable the reduce part of the view. We can also exploit the CouchDB API a bit further by using a POST instead, noting that we again treat the body and URL parameters separately. Let’s say we want to find the accident distribution, per make, for a make subset:

url  'https://skruger.cloudant.com/airaccidents/_design/make/_view/by-make?group=true'
(params  ⎕NS).keys  'Cessna' 'Boeing' 'Airbus' ⍝ Request body payload; will be JSON-encoded
body  (hc.GetJSON 'POST' url params).Data

As we’re now relying on GetJSON to encode our parameter list, we no longer need the ugly double-quotes in our list of keys.

Reductions in CouchDB views are similar to reduces in APL. All we did there was a +/ over the values in the view, which as we saw earlier was a “1”, grouping by key:

1(⎕JSON'Compact' 0)body.rows
[ { "key": "Cessna", "value": 7728 }, { "key": "Boeing", "value": 771 }, { "key": "Airbus", "value": 13 } ]

Now we’re running the risk of making this about the CouchDB API, but this is quite an interesting data set. I made a video a long time ago about it and how to process the data with map-reduce using CouchDB, and this was all inspired by a very old blog post from Cloudant founder, Mike Miller.

Other useful bits#

You can pass a left argument 1 to HttpCommand’s functions to inspect what the request would have looked like had it been issued:

1 hc.GetJSON 'GET' 'https://api.kanye.rest/'
GET / HTTP/1.1 Host: api.kanye.rest Content-Type: application/json User-Agent: Dyalog/Conga Accept: */* Accept-Encoding: gzip, deflate

HttpCommand will strip basic auth params passed on the URL and turn them into a header instead:

1 hc.Get 'https://username:password@example.com'
GET / HTTP/1.1 Host: example.com User-Agent: Dyalog/Conga Accept: */* Accept-Encoding: gzip, deflate Authorization: Basic dXNlcm5hbWU6cGFzc3dvcmQ=

Web scraping#

The Dyalog student competition in 2020 had a web-scraping problem set, problem 3, from Phase 2, asking us to find all URLs referencing PDF-files off the competition website, https://www.dyalog.com/student-competition.htm. There will be spoilers here, so if you want to have a go yourself, stop reading here.

Still here? The suggestion is that we process the data as XML (sigh). Let’s grab that page and see what we can find:

page  (_hc.Get 'https://www.dyalog.com/student-competition.htm').Data
xml  ⎕XML page

What we get back from ⎕XML is a matrix with columns for depth, tag, content, attribute and type. We care only about the tag and attribute columns:

(tags attributes)  (1 3)⌷↓⍉xml

To pick out URLS, we need to look at the anchor tags:

anchors  ((,'a')¨tags)/attributes

Each such tag has a set of attribute key-value pairs. Let’s grab those:

(names vals)  ↓⍉⊃⍪/anchors

Now we can look through the attribute values to find things that end in .pdf:

pdfs  vals/⍨{'.pdf'¯4}¨vals
3pdfs
uploads/files/student_competition/2020_problems_phase1.pdf uploads/files/student_competition/2020_problems_phase2.pdf uploads/files/student_competition/2019_problems_phase1.pdf

As we can see, these are all relative URLs. To convert to absolute, we need to extract the base, and prepend that:

  base  ⊃⌽⊃('base'¨tags)/attributes
https://www.dyalog.com/
3base,¨pdfs
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf

Putting it all together, we get something like

]dinput
PastTasks  {
    (tags attributes)  (1 3)⌷↓⍉⎕XML(hc.Get ).Data
    (names vals)  ↓⍉⊃⍪/((,'a')¨tags)/attributes ⍝ Names and values of attributes of anchor tags
    pdfs  vals/⍨{'.pdf'¯4}¨vals
    base  ⊃⌽⊃('base'¨tags)/attributes
    base,¨pdfs
}

Here’s what we get:

PastTasks 'https://www.dyalog.com/student-competition.htm'
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2012_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2011_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2010_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2009_problems.pdf

For the sake of completeness, we could equally have solved this with some filthy regexing if we were so inclined:

]dinput
PastTasksRE  {
    body  (hc.Get ).Data
    pdfs  '<a href="(.+?\.pdf)"'⎕S'\1'body
    base  '<base href="(.+?)"'⎕S'\1'body
    base,¨pdfs
}
PastTasksRE 'https://www.dyalog.com/student-competition.htm'
https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2020_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2019_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2018_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2017_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2016_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2015_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2014_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase1.pdf https://www.dyalog.com/uploads/files/student_competition/2013_problems_phase2.pdf https://www.dyalog.com/uploads/files/student_competition/2012_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2011_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2010_problems.pdf https://www.dyalog.com/uploads/files/student_competition/2009_problems.pdf

So which one is better? I wrote a bit on this topic on my guest blog post on Dyalog’s blog. In this case, as we know that the page is valid XML, we can delegate a lot of complexity to the ⎕XML function, such as different quotes, whitespace etc, which we’d need to be explicit about in anything regexy if we wanted it to be robust. However, regular expressions are hard to beat when looking for complex patterns in textual data. If the page had not been correct XML, it would have been a lot harder solving this problem without reaching for regular expressions.