It’s arrays all the way down
Contents
It’s arrays all the way down#
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. –Tony Hoare
Dyalog APL is an array language. Chances are that you have not come across this programming language paradigm before. In APL, the array is the only data structure that’s available to us. In terms of terminology, this can get a bit confusing. Both in mathematics and other programming languages there are many different words used to refer to what might ultimately end up implemented as an array in APL – vector, matrix, list, string etc. We’ll use some of these interchangeably where the problem domain makes this convenient: the APL character array 'hello world'
is most likely intended to be a string.
But first things first: we promised to be explicit about our index origin, so let’s set that to zero upfront:
⎕IO ← 0
A large part of programming in APL becomes a matter of data wrangling. Instead of looping and branching, you use simple primitives to transform your data, perhaps from a big vector into a multi-dimensional array which you then reduce back down along a different axis. As a simple example, say we have a nested vector, and we want to know the sum of all the first elements.
⎕ ← data ← (1 2 3 4) (2 5 8 6) (8 6 2 3) (8 7 6 1)
In naïve Python, we’d say something like
def sumfirst(d):
total = 0
for item in d:
total += item[0]
return total
The APL version turns the vector into a 2D array, sum-reduces (we’ll explain properly what this means later) along the leading axis and picks the first value:
⊃+⌿↑data
Wow. Ok, let’s take that apart. Starting from the right, first step is to “trade depth for rank” – turning the nested vector into a 2D array:
↑data
Now we sum each column, reducing the dimensionality by 1, turning our matrix into a vector:
+⌿↑data
Finally, we pick the first value:
⊃+⌿↑data
Let’s get in shape#
The first concept we need to understand is that data is defined in terms of its shape. The shape defines the dimensionality of our data, and we can investigate that using the APL function ⍴
– the Greek letter “rho”:
⍴5
Oh, ok – nothing? That was rather uninteresting, right? Let’s ask Dyalog to help us a bit by displaying output with borders indicating shape, type and structure, and try that again:
⍝ Some visualisation help, please.
]box on -style=max
⍴5
Well, at least that’s something to go on. That particular output is worth committing to memory. It’s the empty vector, which actually has its own symbol called Zilde, from zero-tilde, a 0
overstruck by ~
– ⍬
.
Note
Expressions that start with a close square bracket, like ]box
are called user commands. We’ll encounter a few of those throughout this book. Typically, user commands will give you a help text if you pass them a -??
. Give that a go with ]box -??
– there is a lot of useful information there.
We can glean a lot of useful information from the boxed output. Here, the ⊖
symbol at the top of the box means that the vector’s axis is of length 0. The ~
at the bottom means that the vector is non-nested, and the 0 in the middle is the empty vector’s prototype element – we can for now think of this as the vector’s type. So we have an empty numerical vector.
Running with ]box -style=max
gets very noisy after a while, so let’s dial that back a bit:
]box on -style=min
We can always get verbose boxing by using the command ]DISPLAY
before any expression.
We can check that the shape of 5 is indeed Zilde by saying:
⍬≡⍴5 ⍝ Does zilde match shape of 5?
So it would seem. What does that mean, then? Well, it means that the scalar 5
has “no shape” – it has no dimensional extent. It is a point, which makes a degree of sense.
Let’s instead consider a vector – in other words, something that has dimensional extent. Here’s a vector:
]DISPLAY 8 5 2 3 ⍝ A vector, yay
We made a vector by simply listing a bunch of scalars with spaces between them. Our verbose visualisation shows the vector as a box with an arrow on top. “Arrow on top” is a notation borrowed from mathematics to denote vectors.
So, what shape does a vector have?
]DISPLAY ⍴ 8 5 2 3 ⍝ What's the shape of my vector?
The shape is a vector with a single element, the integer 4. We suspected already that shape is a vector. A scalar’s shape is an empty vector, and a vector’s shape is a vector containing a single integer representing the number of elements.
Ok, let’s try a matrix – or a 2D array, disregarding for the moment the APL mechanics of its creation:
m ← ↑(1 2 3 4)(5 6 7 8)(9 10 11 12)
]DISPLAY m
That gives us a 3×4 matrix called m
, holding the integers 1-12. APL helps us by now showing two arrows on the surrounding box. What is its shape?
⍴m
The shape vector now has two components – as our matrix has two axes – the first axis, y, of length 3, and the second, x, of length 4. In other words, the shape of the shape tells us something about our array: its dimensionality, or its number of axes:
]DISPLAY ⍴⍴m
The shape-of-shape thing has a name: rank, although it’s usually thought of as the length of the shape vector, as it’s more convenient to have it as a scalar, rather than as a 1-element vector:
≢⍴m ⍝ Rank
Rank and reshape ⍴
#
Rank is a central concept in APL. APL’s functions are said to be rank-polymorphic, meaning that operations extend seamlessly to any-rank arguments where this is possible.
For example:
1 + 1 ⍝ Scalar + scalar
1 + 1 2 3 ⍝ Scalar + vector
1 2 3 + 9 3 2 ⍝ Vector + vector
This is an extraordinarily powerful concept, and one of the reasons why APL leads to such compact code. We’ll return to this time and again.
When used dyadically, the Shape (⍴
) function reshapes data. A typical use for this is to take data from some raw format and use ⍴
to bend it into the shape we want. Here’s an example:
]DISPLAY 2 4⍴⍳8 ⍝ Reshape vector to 2×4 matrix
A couple of things to note here. First, the iota primitive ⍳
, which is a function called Index generator. It’s what might be called range in other languages. It makes a sequence of consecutive integers, starting at the current index origin, ⎕IO
, which we set to zero in the beginning of this chapter, as you may recall.
In the example above, it creates a vector of 8 consecutive integers, starting with zero: 0 1 2 3 4 5 6 7
. We then reshape this into a 2×4 array.
Here’s another example:
]DISPLAY v ← 8 6 2 9 ⍝ Vector of length 4
]DISPLAY m ← 1 4⍴v ⍝ Reshape vector to 1×4 matrix
In case the difference isn’t obvious, let’s look at the ranks:
≢⍴v
≢⍴m
v≡m ⍝ Does v match m?
This is an issue that bites everyone at some stage of their APL journey. We can see from the rank that v
is a vector – rank 1, but m
is a 1×4 matrix – rank 2, even if they look the same. The visualisation shows this with two arrows on the box for m
, but only one arrow on the box for v
. If we switch off the verbose visualisation, it gets much trickier to spot:
v
m
v≡m ⍝ ?????
We can conclude that higher arrays is a first-order concept in APL. A 2D array in APL is not a list-of-lists, as it would be in, for example, Python.
The other thing that is worth re-highlighting is how the array axes are enumerated in APL. We saw from a previous example that the shape of a 2D array lists y before x in the shape vector:
⎕ ← m ← 2 4⍴0 1 2 3 4 5 6 7 ⍝ y=2, x=4
⍴m
This idea extends to higher dimensions, too. A 3D matrix would have its shape vector represent z, y and x axes in that order. A 57D matrix would.. you get the point.
Enclose, disclose ⊂⊃
#
Arrays in Dyalog APL are always collections of scalars, regardless of rank. However, we can create arbitrarily complex scalars by a process known as enclosing. This means putting something in a “box”. It looks like so:
⎕ ← v ← 1 2 3
⊂v ⍝ Enclose the vector 1 2 3
v≡⊂v ⍝ Does the vector v match the enclosed v? Of course not!
We can check that enclosing creates a scalar by inspecting the rank:
≢⍴v ⍝ Rank of vector should be 1
≢⍴⊂v ⍝ Rank of scalar should be 0
This actually allows us to create a Python-style “list of lists” – or in APL terms, a vector of enclosed vectors:
]DISPLAY v ← (1 2 3) (2 3 4) (2 3 4)
See the little epsilon ∊
in the lower edge of the box? It indicates that the elements of the vector are enclosed. We’ll discuss indexing in depth later on, but for now, let’s get a cell out of that vector:
v[1] ⍝ Get position 1 -- we get back an enclosed vector
We can create arbitrary levels of nesting:
(1 ('hello' 2)) (3 ('world' 4))
Tip
Non-simple, a.k.a nested, arrays can be detrimental to APL performance!
It’s best to bear this in mind from the beginning: if you can, choose multiple simple vectors, rather than a single vector of complex scalars.