# Iteration

> As long as I'm alive, APL will never be used in Munich --_Fritz Bauer_<br>
> Nor in Holland --_Edsger Dijkstra (as told by Alan Perlis)_

We started all this by claiming that there are no loops in APL. This is of course not entirely true: there are plenty of ways of achieving iteration, some of which are more efficient than others. 

In order to get the best possible performance out of APL, it's worth seeking data-parallel algorithms, typically employing Boolean masks. However, it's not always possible, or sometimes performance matters less than code complexity, and a more iterative solution can be both clearer and fast enough.

We have at least four, maybe five different kinds of iteration mechanisms at our disposal: _Each_ (`¨`), _Reduce_ (`⌿`), _Del_ (`∇`) and _Power_ (`⍣`). The fifth is that of _scalar pervasion_, which can either be seen as a way of achieving iteration, or as a way of _avoiding_ iteration, depending on your point of view. Wait, is it six? Maybe we should count _Rank_ (`⍤`), too? _Rank_ deserves its own separate [section](./rank.ipynb)! Oh, and _Scan_ (`⍀`), don't forget _Scan_! 

Let's introduce them.

In [4]:
⎕IO ← 0
]box on
]rows on
assert ← {⍺ ← 'assertion failure' ⋄ 0∊⍵: ⍺ ⎕signal 8 ⋄ shy ← 0}

## Each (a.k.a map): `¨`

Most languages nowadays have a `map` construct. In fact, it's occasionally touted -- erroneously -- as sufficient evidence that a language is "functional" if it has a map function.

Perhaps you've seen Python's somewhat cumbersome version of map:

```python
>>> list(map(lambda x: x*x, [1, 2, 3, 4, 5, 6, 7, 8, 9])) # Square elements
[1, 4, 9, 16, 25, 36, 49, 64, 81]
```
which in Python corresponds to something like
```python
def mymap(func, iterable):
    for item in iterable:
        yield func(item)
```

Of course, seasoned Pythonistas would rightly frown at the above and instead recommend a _list comprehension_.

A map takes a function and applies it to every element in an array, creating a result array of the same length as its argument.

In APL, the glyph for map -- referred to as [_Each_](https://help.dyalog.com/latest/#Language/Primitive%20Operators/Each%20with%20Monadic%20Operand.htm) -- is two high dots: `¨`.

In [14]:
×⍨¨1+⍳9  ⍝ Square elements via each (but see below!)

This wasn't actually a great example, this is one of those cases where using a scalar function already does the job for us:

In [13]:
×⍨1+⍳9  ⍝ Square elements via scalar pervasion

Scalar pervasion means that functions in certain cases already know how to penetrate arrays.

Let's try another. Given a nested vector, what are the lengths of the elements?

In [2]:
⎕ ← V ← (1 2 3 4)(1 2)(3 4 5 6 7)(,2)(5 4 3 2 1)
≢¨V ⍝ Tally-each

If you're used to a different language, _Each_ has a seductive quality, as it maps conceptually onto constructs you already know how to use. Beware though that in APL it's often inefficient, and that there are usually better alternatives.

## Reduce (a.k.a foldr): `⌿`, `/`

Most languages today sport some kind of variety of _fold_ or _reduce_, regardless of the level of "functional" they claim to be.

In APL, [_Reduce_](https://help.dyalog.com/latest/#Language/Primitive%20Operators/Reduce.htm) is a central feature, which somewhat unhelpfully hijacks the glyph also used by _Compress/Replicate_, `/`.  This is one, albeit not the main, reason why we prefer to use its close cousin, [_Reduce first_](https://help.dyalog.com/latest/#Language/Primitive%20Operators/Reduce%20First.htm), `⌿`, where we can. Reduction has a bit of an unfair reputation of being hard to understand. Guido reportedly hates `reduce()` in Python so much that it was demoted down to the dusty [functools](https://docs.python.org/3/library/functools.html#functools.reduce) module:

> So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly. --_Guido van Rossum_

Despite what Guido thinks, reduction is actually a pretty simple idea, and APL may even have been the first programming language [with reduce in it](https://www.jsoftware.com/papers/APL1.htm#1.8) (some Lispers disagree). Think of the operation of summing a bunch of numbers -- this is an example of a reduction.

In APL, _Reduce_ applies a function to elements in an array, producing a result which is rank-reduced by 1. In other words, reducing a vector (rank 1) produces a scalar (rank 0). In the example of `+`, summing the elements of a vector obviously produces a scalar: the total.

In [4]:
+⌿1 2 3 4 5 6 7 8 9 ⍝ sum-reduce-first integers 1-9

Simplifying a bit, we can think of _Reduce first_ as an operator that injects its left operand function in the gaps between elements of the argument:

In [29]:
1+2+3+4+5+6+7+8+9

When using _Reduce_ in APL, you need to take extra care to ensure that it works with its strict right to left evaluation order. A reduce is also called a _fold_ in other languages (Lisp, Erlang etc), and APL's _Reduce_ is a so-called _foldr_ -- it reduces right to left, which makes sense for APL, but occasionally less sense for the programmer.

Again, it can help writing it out in long-hand to see what's going on:

In [23]:
-⌿1 2 3 4 5 6 7 8 9 ⍝ difference-reduction -- take care: right to left fold!

If that was the result you expected, you're well on your way to mastery. Inject the operand between items:

In [28]:
1-2-3-4-5-6-7-8-9

Reduction is especially useful when working with higher-rank arrays. _Reduce first_ is called so because it reduces along the _first_ axis. So a sum-reduce-first of a rank 2 integer array will sum its columns to produce a _vector_ (rank 1) of the columnar sums:

In [3]:
⎕ ← m ← 3 3⍴9?9
+⌿m

If we wanted to sum-reduce along the rows, we can either use `/` (which for historical reasons does just that):

In [32]:
+/m

or we can explicitly tell `⌿` to apply along a different axis, using bracket axis:

In [33]:
+⌿[1]m

For consistency, it's best to prefer operators and functions that default to applying to the leading axis where possible. The fact that APL, unlike J, has a mixture is an unhelpful side-effect of backwards compatibility.

## Windowed reduction

_Reduce_ has a few more handy tricks up its sleeve. Instead of reducing the whole argument array, we can employ a sliding window. This lets us compute a set of reductions over shorter stretches of the data. The derived function returned by the reduction operators can be called dyadically, specifying as the left argument the size of the sliding window. 

For example, to calculate the sum of each element in a vector with its subsequent element, we employ a reduction with a sliding window of size 2:

In [6]:
2+⌿1 2 3 4 5 6 7 8 9

## Scan: `\, ⍀`

[_Scan/Scan first_](https://help.dyalog.com/latest/#Language/Primitive%20Operators/Scan.htm) blurrs the distinction between _Each_ and _Reduce_. In right hands, it can be a true APL super power, but beware that scans tend to be slow: most scans run to O(n²), although the interpreter can optimise some to the O(n) you perhaps expected.

_Scan_ is just like _Reduce_, but instead returns every intermediate state, not just the end state. A sum-reduce of a vector of numbers returns the sum total. A sum-scan of the same vector returns the running sums:

In [1]:
+⌿1 2 3 4 5 6 7 8 9 ⍝ Sum-reduce first
+⍀1 2 3 4 5 6 7 8 9 ⍝ Sum-scan first

One way we can think of scan is that it's the amalgamation of all possible calls to _Reduce_ with the same operand, taking in increasing lengths of the argument array. In the case above:

In [4]:
+⌿1
+⌿1 2
+⌿1 2 3
+⌿1 2 3 4
+⌿1 2 3 4 5
+⌿1 2 3 4 5 6
+⌿1 2 3 4 5 6 7 
+⌿1 2 3 4 5 6 7 8
+⌿1 2 3 4 5 6 7 8 9

As with _Reduce_, it's worth re-emphasizing that _Scan_ still is evaluated right-to-left, as with everything else APL, no matter how much you'd prefer it to run left to right instead. You can, of course, roll your own [scan-left](https://aplcart.info/?q=left%20scan#) if you really need it.

## Power: `⍣`

One of my favourite glyphs! It looks like a happy starfish! The [_Power_](https://help.dyalog.com/latest/#Language/Primitive%20Operators/Power%20Operator.htm) operator is... powerful. Conceptually it should be easy to grasp, but there are some aspects that take time to understand. Formally, it's defined as a function repeatedly applied to the output of itself, until some stopping criterion is fulfilled. If you pass it an integer as its right operand, it's basically a for-loop:

```apl
f ← { ... }               ⍝ some function or other
f f f f f f f f argvector ⍝ repeatedly apply function to itself, eh 8 times
f⍣8 ⊢ argvector           ⍝ power-8
```

If you give it a function as the right operand, it can be used as a while-loop. One example is to find a function's _fixed point_:

In [7]:
2÷⍨⍣=10 ⍝ Divide by 2 until we reach a fixed point

Here the right operand function is _equals_ `=`. This says: repeatedly apply the left operand (`2÷⍨`) until two subsequent applications return the same value.

We can explicitly refer to the left and right arguments of the right operand function. The left argument, `⍺`, refers to the result of the function application to the right argument, `⍵`. 

Keep generating random numbers between 1 and 10 until we get a 6:

In [2]:
 {⍞ ← ?10}⍣{6=⍺} 0 ⍝ Keep generating random numbers between 1 and 10 until we get a 6

The [_Quad-quote-gets_](https://help.dyalog.com/latest/#Language/System%20Functions/Character%20Input%20Output.htm) (`⍞←`) combo prints values without newlines. The final result is also returned, the expected 6.

## Del: `∇`

Dyalog has a most excellent, concise and efficient [recursion operator](https://help.dyalog.com/latest/#Language/Defined%20Functions%20and%20Operators/DynamicFunctions/Recursion.htm), `∇`. It allows you to express recursive algorithms in a natural, almost Lisp-like fashion. The interpreter has a very good [TCO](https://en.wikipedia.org/wiki/Tail_call) implementation.

Let's start with making our own version of sum-reduce, this time without actually using the _Reduce_ operator.

In [2]:
]dinput
Sum ← {
    ⍺ ← 0        ⍝ Left arg defaults to 0 if not given
    0=≢⍵: ⍺      ⍝ If right arg is empty, return left arg
    (⍺+⊃⍵)∇1↓⍵   ⍝ Add head to acc, recur over tail
}

In [3]:
⎕ ← mysum ← Sum 1 2 3 4 5 6 7 8 9
assert mysum=+/1 2 3 4 5 6 7 8 9

VALUE ERROR: Undefined name: assert
      assert mysum=+/1 2 3 4 5 6 7 8 9
      ∧


Yup, that seems to work; good.

The glyph [_Del_](https://help.dyalog.com/latest/#Language/Defined%20Functions%20and%20Operators/DynamicFunctions/Recursion.htm) (`∇`) is a reference to the current innermost dfn. If your dfn has a name, you can substitute it for the actual function name. In our case, the last line could equally well have been written:
```apl
(⍺+⊃⍵)Sum 1↓⍵
```
However, using the glyph has a number of advantages: it's more concise, immune to function name changes, and works equally well for anonymous dfns.

Our `Sum` dfn follows a common pattern: we accumulate something as the left argument, and decrease the right argument, either by magnitude, or as in this case, by dropping items off the front of a vector.

The recursion termination guard,
```apl
0=≢⍵: ⍺
```
simply states that, if the right argument is empty, we should return our accumulator. The recursive call itself is: 

* Add the head of the right argument to the accumulator. 
* Recur with the updated accumulator as the new left argument and with the tail as the right argument.

If the last thing the function does is a function call, and this includes _Del_, this is what's called a [_tail call_](https://help.dyalog.com/latest/#Language/Defined%20Functions%20and%20Operators/DynamicFunctions/Tail%20Calls.htm) which the Dyalog interpreter can handle without the addition of an extra stack frame. If you're using _Del_, strive to make all your recursive functions tail calls -- avoid making your function do any work on the value you get back from the recurring line -- for example, note that `1+⍺∇⍵` isn't a tail call, as the last thing that happens is the `1+...` not the `⍺∇⍵`.

For a fractionally more involved example, let's write our own _sum-scan_.

In [1]:
]dinput
Sscan ← {
    ⍺ ← ⍬              ⍝ Left arg defaults to ⍬ if not given
    0=≢⍵: ⍺            ⍝ If right arg is empty, return left arg
    (⍺,⊃⍵+⊃¯1↑⍺)∇1↓⍵   ⍝ Append the sum of the head and the last element of acc and recur on tail
}

In [2]:
⎕ ← myscan ← Sscan 1 2 3 4 5 6 7 8 9
assert myscan≡+⍀1 2 3 4 5 6 7 8 9

VALUE ERROR: Undefined name: assert
      assert myscan≡+⍀1 2 3 4 5 6 7 8 9
      ∧


No discourse on recursion is complete without mentioning the [Fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number) sequence. You know which one I mean -- every number is the sum of its two direct predecessors:

    0 1 1 2 3 5 8 13 21 34 ⍝ etc, something about rabbits
    
Here's one possible formulation where the right argument is the Fibonacci ordinal.

In [5]:
]dinput
Fib ← { ⍝ Tail-recursive Fibonacci.
    ⍺ ← 0 1
    ⍵=0: ⊃⍺
    (1↓⍺,+/⍺)∇⍵-1
}

In [4]:
Fib¨⍳10 ⍝ The 10 first Fibonacci numbers

The pattern is still the same: set a default for the accumulator, `⍺`. Terminate on some condition on `⍵`, returning a function of the accumulator. Modify the accumulator, given the head of the right argument, and recur on the tail.

The guts of the function is the last line. To the right, we decrease the right argument -- this is our loop counter if you like. To the left is our accumulator, which basically is a sliding window of size 2 over the Fib sequence. We append the sum of the two numbers, and drop the first, and recur over the tail.

Here's a pretty neat implementation of the [Quicksort](https://en.wikipedia.org/wiki/Quicksort) algorithm:

In [6]:
]dinput
Quicksort ← {
    1≥≢⍵: ⍵
    S ← {⍺⌿⍨⍺ ⍺⍺ ⍵}
    ⍵((∇<S),=S,(∇>S))⍵⌷⍨?≢⍵
}

Here `⍵⌷⍨?≢⍵` is the pivot element, picked at random, and the `S` operator partitions its left argument array based on its left operand function and the pivot element to the right. The whole idea of quicksort is pretty clearly visible in the [tacit](tacit.ipynb) fork `(∇<S),=S,(∇>S)` -- elements _less than_ the pivot, the pivot, elements _greater than_ the pivot, recursively applied.

In [4]:
Quicksort ⎕←20?20

Another example is binary search: locate an element in an array that is known to be sorted. Here's a function to do that:

In [7]:
]dinput
bsearch ← {⎕IO←0
    _bs_ ← {                 ⍝ Operator: ⍺,⍵ - lower,upper index. ⍺⍺ - item, ⍵⍵ - array
        ⍺>⍵: ⍬               ⍝ If lower index has moved past upper, item's not present
        mid ← ⌈0.5×⍺+⍵       ⍝ New midpoint  
        ⍺⍺=mid⊃⍵⍵: mid       ⍝ Check if item is at the new midpoint
        ⍺⍺<mid⊃⍵⍵: ⍺∇¯1+mid  ⍝ Drill into lower half
        ⍵∇⍨1+mid             ⍝ Upper half
    }
    0 (⍺ _bs_ (,⍵)) ¯1+≢,⍵
}

In [12]:
5 bsearch 0 2 3 5 8 12 75
5 bsearch 0 2 3 5 8 12
5 bsearch 5 5
5 bsearch 5
]display 1 bsearch 0 2 3 5 8 12

### Performance considerations

The pattern we've used in some of the examples above,

```apl
⍝ some stuff
0=≢⍵:⍺
head ← 1↑⍵
tail ← 1↓⍵
(head f ⍺)∇tail
```

has a ...sting in the tail, especially if you come from a functional language where that pattern is the expectation, like [Racket](https://racket-lang.org/), [Erlang](https://www.erlang.org/) or [Clojure](https://clojure.org/). In such languages, vectors/lists are either implemented as linked lists, slices, or are immutable, meaning that dropping an element from the front is an `O(1)` operation. In APL, like in Python, that's an `O(n)` operation. When combined with recursion, this can be crushing for performance. Here's an example.

The [Knuth-Morris-Pratt algorithm](https://brilliant.org/wiki/knuth-morris-pratt-algorithm/) is an efficient string search algorithm. In one of its forms it pre-calculates a prefix table, which is a metric of how well a string matches against shifts of itself. The brilliant.org article linked to above has this written out in Python as:

```python
def prefix(p):
    # https://brilliant.org/wiki/knuth-morris-pratt-algorithm/
    m=len(p)
    pi=[0]*m
    j=0 
    for i in range(1,m):
        while j>=0 and p[j]!=p[i]:
            if j-1>=0:
                j=pi[j-1]
            else:
                j=-1 
        j+=1
        pi[i]=j
    return pi
```

Here's an example running that:

```
Stefans-MacBook-Pro:~ stefan$ python
Python 3.8.5 (default, Sep 17 2020, 11:24:17) 
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from kmp import prefix
>>> prefix('CAGCATGGTATCACAGCAGAG')
[0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 1, 2, 1, 2, 3, 4, 5, 3, 0, 0]
>>> 
```

We can write that as an APL dfn like so:

In [1]:
]dinput
prefix1 ← {⎕IO←0
    p ← ⍵
    pi ← 0⍴⍨≢⍵
    j ← 0
    {
        0=≢⍵: pi
        i ← ⊃⍵ ⍝ head
        pi[i] ← j⊢←1+{⍵<0:⍵⋄p[⍵]=p[i]:⍵⋄0≤⍵-1:∇pi[⍵-1]⋄¯1} j ⍝ while j>=0 and p[j] != p[i]
        ∇1↓⍵ ⍝ tail
    } 1+⍳¯1+≢⍵ ⍝ for i in range(1, m)
}

and we'd hope it produces the same result:

In [2]:
prefix1 'CAGCATGGTATCACAGCAGAG'

So that's two nested "loops", both nicely tail recursive, and probably similar to how you'd construct it in Clojure or Racket. However, as the argument string grows, the performance tanks. We can illustrate this by tweaking it a tiny bit to avoid the reallocation of the argument to the recursive call in the outer loop:

In [8]:
]dinput
prefix2 ← {⎕IO←0
    p ← ⍵
    pi ← 0⍴⍨≢⍵
    j ← 0
    0 {
        ⍺=≢⍵: pi
        i ← ⍺⊃⍵ ⍝ Note: pick ⍺, not first
        pi[i] ← j⊢←1+{⍵<0:⍵ ⋄ p[⍵]=p[i]:⍵ ⋄ 0≤⍵-1:∇ pi[⍵-1] ⋄ ¯1} j
        (⍺+1)∇ ⍵ ⍝ Note: no tail!
    } 1+⍳¯1+≢⍵
}

Instead of taking the tail of `⍵`, we pass the current index as `⍺`. Let's see if that works before we proceed:

In [9]:
prefix2 'CAGCATGGTATCACAGCAGAG'

To demonstrate the difference, let's compare performance on a long string. Here's one taken from Project Rosalind:

In [12]:
data ← ⊃⊃⎕NGET'../kmp.txt'1 ⍝ From http://rosalind.info/problems/kmp/
≢data ⍝ LONG STRING!

In [2]:
'cmpx'⎕CY'dfns' ⍝ Load `cmpx` - comparative benchmarking

In [15]:
cmpx 'prefix1 data' 'prefix2 data'

Quite a staggering difference for such an innocuous change, perhaps.

The binary search implementation we concluded the previous section with already 'does the right thing' -- it doesn't cut off the data array on each iteration. So it should be fast, right? Searching for a number amongst a hundred thousand or so _must_ be faster than the APL primitive function that examines every element looking for a match. Surely...? In `Algorithms 101` they taught you that `O(log n)` always beats `O(n)`!

Except when it doesn't:

In [10]:
data ← ⍳100000 ⍝ A loooot of numbers
cmpx 'data ⍳ 17777' '17777 bsearch data' ⍝ Look for the number 17777

Ouch... a lesson for the APL neophyte here. Your intuition for what's efficient and what isn't is almost certainly wrong. But wait, we can do even better. Let's have a randomised array for APL to go at:

In [3]:
randInts ← 100000 ? 100000 
cmpx 'randInts⍳1' 'randInts⍳19326' 'randInts⍳46729'

We can make this even faster by pre-binding the array to the _Index Of_ function:

In [4]:
find←randInts∘⍳
cmpx 'find 1' 'find 19326' 'find 46729'

What's happening there is that Dyalog takes the binding to mean that as the lookup array is now fixed, it can make a hashed index behind the scenes. You can see that the times taken don't vary much with the argument.

But there is more performance to squeeze out here. If our search array is ordered, we can use _Interval Index_, dyadic `⍸`, as a binary search (which it is!):

In [9]:
cmpx 'data⍸1' 'data⍸19326' 'data⍸46729'

_Interval Index_ is very optimised in the 18.0/1 versions of Dyalog. Sadly, this optimisation had to be dropped in 18.2.

Key point: simple APL primitives on simple arrays are always faster than anything you can write in APL. Only ever iterate or recurse if there are no other alternatives.

Exercise for the reader: as the data grows, sooner or later the `bsearch` function will win out. How large does the array need to be for that to happen?

## Scalar pervasion

We touched briefly on _scalar pervasion_ above, but it's an important topic, so let's dive in a little bit deeper. It's worth reading what Dyalog has to say on the topic in the [docs](http://help.dyalog.com/latest/index.htm#Language/Primitive%20Functions/Scalar%20Functions.htm).

The idea is that a certain class of functions is _pervasive_. This means that a function that operates on a scalar will operate on scalars at any level of nesting if applied to an array. Recall that the level of nesting is _not_ the same as the rank in APL. 

Consider a nested vector that contains both numbers and other vectors of numbers:

In [8]:
⎕ ← nesty ← (1 2 3 (3 4 (5 6)) 7)

As an obvious and simple example, let's say we want to negate all the numbers:

In [42]:
-nesty

Where it can be applied, scalar pervasion is an efficient operation in Dyalog APL. It works for dyads, too:

In [44]:
(1 2) 3 + 4 (5 6)
(1 1⍴5) - 1 (2 3)