The APL Way#

Every reader should ask himself periodically “Toward what end, toward what end?”—but do not ask it too often lest you pass up the fun of programming for the constipation of bittersweet philosophy. –Alan Perlis

Up until now, we’ve skirted around one of the main advantages of APL – array-oriented, or data-parallel programming. This feels awkward and unnatural at first, but finding data-parallel approaches to problems is a skill that makes for efficient solutions in other languages, too, not just APL, and libraries such as Python’s NumPy encourages such solutions (it was inspired by APL, by the way).

Note

In this chapter, we’ll be making some comparisons between data-parallel APL and “loop & branch” implementations in Python. We chose Python because its syntax is clean and understandable by a large proportion of programmers from other languages, too. In case it’s not immediately obvious, no effort has been made to find optimal Python solutions here; indeed, quite the opposite. View the Python examples as pseudocode illustrations of the algorithms, and yes, we’re fully aware that one can string together elegant, efficient Python solutions using iterator algebra and comprehensions.

A few pointers – Richard Park gave a series of webinars on Thinking in APL that you should check out, and Adám Brudzewski gave several interactive Cultivations dedicated to the topic, Array programming techniques, too.

⎕IO  0
]box on -s=min
]rows on
assert{'assertion failure'  0⍵:⍺ ⎕signal 8  shy0}
Was ON -style=min
Was OFF

The power of the Array#

Several factors super-charge array-oriented programming in APL:

  1. Many operations on non-nested arrays are fast; really fast, taking advantage of data parallelism in the processor (SIMD).

  2. By operating on whole arrays, we can – with a bit of practice – avoid branches, thus avoiding processor branch-prediction misses.

  3. Arrays in APL are implemented frugally when it comes to memory, especially Boolean arrays.

The helicopter pitch for the simplest case might go something like this. Here’s a naive “loop and branch” way to pick all elements matching some predicate in Python:

def select(f, array): 
    """Loop & branch"""
    result = []
    for element in array:
        if f(element):
            result.append(element)
    return result

and here’s a data-parallel version:

elems  f array ⍝ Boolean mask
elems/array     ⍝ Compress; i.e. pick elemets at 1s

Let’s say we want to find the odd numbers in a sequence:

where  2|data  20 ⍝ mod-2 for the whole array at once
where/data
1 3 5 7 9 11 13 15 17 19

In the APL version, there is no loop, and no branch. The data is simple, and most likely allocated contiguously, and the CPU will be able to process several items in parallel for every cycle without any branch-prediction misses. Of course, with only 20 elements, algorithmic performance is a largely academic concern.

We can think of many problems that can be solved using the pattern of a scalar selection function applied to an array. One example, given by Richard Park in his webinar mentioned above, is to pick out all vowels from a string:

'aeiou' {()/} 'pick out all vowels from a string'
iouaoeoai

…or for train spotters:

'aeiou' (⍤/) 'pick out all vowels from a string'
iouaoeoai

It’s the same idea as the ‘odd numbers’ above: create a Boolean mask using set membership (dyadic ) and compress.

As an aside, and for the avoidance of any doubt, the actual APL way would be to just intersect the string with the vowels:

'aeiou'  'pick out all vowels from a string'
iouaoeoai

Shorter, faster, array-ier, nicer – but not the point we wanted to make.

Luhn’s Algorithm#

For something meatier, let’s look at implementing Luhn’s algorithm, an error-detecting code used for validating credit card numbers. This was used to illustrate array-oriented concepts by Dyalog CTO, Morten Kromberg, at a presentation given at JIO. We’ll follow Morten’s clear explanation of this algorithm:

  1. Split the credit card number into a body and the last digit, a checksum

    Card number: 7 9 9 2 7 3 9 8 7 1 3
    Body:        7 9 9 2 7 3 9 8 7 1    Checksum: 3
    
  2. Multiply every other digit in the body by 2, starting from the last digit with a 2

    Body:        7  9 9 2 7 3 9  8 7 1
    Weights:     1  2 1 2 1 2 1  2 1 2
               ×______________________
    Products:    7 18 9 4 7 6 9 16 7 2
    
  3. Separate any numbers greater than 9 into their individual digits, and sum columns

    Tens:        0 1 0 0 0 0 0 1 0 0
    Units:       7 8 9 4 7 6 9 6 7 2
               +____________________
    Sum:         7 9 9 4 7 6 9 7 7 2
    
  4. Sum the sums

    Sum:         7+9+9+4+7+6+9+7+7+2 = 67
    
  5. Sum the digits of this, modulo 10:

    Result:      (6+7)%10 = 3
    

If the result equals the checksum, the card number is valid.

Here’s one way this could be implemented in Python:

def luhn(cardnum):
    checksum = 0
    parity = len(cardnum) % 2
    for index in range(len(cardnum)-1, -1, -1):
        digit = cardnum[index]
        if (index + 1) % 2 != parity:
            digit *= 2
            if digit > 9:
                digit -= 9
        checksum += digit

    return checksum % 10 == 0

It may or may not be clear that the Python implementation follows the algorithm description. We step through the card number backwards making it easier to ensure that we get the factors right, regardless of the number of digits.

Now let’s implement this in APL. We can follow the algorithm description very closely.

1: Split the credit card number into a body and the last digit, a checksum

card  7 9 9 2 7 3 9 8 7 1 3

  body  (count¯1+≢card)card
  check  /card
7 9 9 2 7 3 9 8 7 1
3

2: Multiply every other digit in the body by 2, starting from the last digit with a 2

  weights  count(2|count)1 2
  products  body×weights
1 2 1 2 1 2 1 2 1 2
7 18 9 4 7 6 9 16 7 2

3, 4. Separate any numbers greater than 9 into their individual digits, and sum

  digits  0 10products ⍝ We can think of `0 10⊤` as divmod
  sum  +,digits        ⍝ By raveling, we save doing sum the sum of cols
0 1 0 0 0 0 0 1 0 0 7 8 9 4 7 6 9 6 7 2
67

5. Digit sum, mod 10

10|-sum ⍝ Magic
3

If we put all that together, we arrive at Morten’s solution:

]dinput
luhn  {
    check  /
    body  (count¯1+≢)
    weights  count(2|count)1 2
    digits  0 10body×weights
    check=10|-+/,digits
}
assert luhn 7 9 9 2 7 3 9 8 7 1 3

The Luhn algorithm was obviously chosen as an ideal fit for a data-parallel implementation. A few things stand out: firstly, the APL implementation follows the algorithm definition very closely, much closer than the Python version. Granted – and this is an interesting experiment – you can take the APL approach and implement a Python version following a similar pattern, albeit without array operations. Secondly, the solution is completely free or loops and branches. As Morten also observes, over 10 digits, there is no meaningful performance implication here, but it’s not hard to imagine what would happen over a million or more digits.

Balancing the Scales#

This is Problem 8, Phase II from the 2020 Dyalog problem solving competition. You can see the problem statement here. Obviously, there are spoilers to follow – if you want to have a crack at it yourself, stop reading here.

Our task is to partition a set of numbers into two groups of equal sum if this is possible, or return if not. This is a well-known NP-hard problem, called The Partition Problem, and as such has no fast, always correct solutions for the general case. The problem statement indicates that we only need to consider a set of 20 numbers or fewer, which is a bit of a hint on what kind of solution is expected.

This problem, in common with many other NP problems, also has a plethora of interesting heuristic solutions: faster algorithms that whilst not guaranteed to always find the optimal solution will either get close, or be correct for a significant subset of the problem domain in a fraction of the time the exhaustive search would take. We’ll look at one of these approaches, too, although it probably wasn’t what Dyalog wanted in terms of the competition (I might be wrong; I didn’t take part at the time).

So given that we have a defined upper bound, and that Dyalog wants the solution to be optimal for all inputs up to this bound, and we know this problem to be NP-hard, we’re looking at an exhaustive search of some sort. We can also guess that given this is a Dyalog competition problem, whilst the algorithm might be crude, we should be able to exploit APL’s fast array processing to implement it efficiently.

The algorithm (if we can call it that) is simply to try every possible combination in which the numbers can be put into two separate piles and check if the partition sum equals half the total sum. If we have n items in our set, that corresponds to considering all numbers up to ¯1+2*n in binary (recall that * is exponentiation, not multiplication), and the two sets correspond to those matching the 0 bits and 1 bits respectively. For example, if we have the numbers 1, 1 and 2 we need to consider the following partitions (although we can skip the first and last):

(0: 0 0 0)
 1: 0 0 1
 2: 0 1 0
 3: 0 1 1
 4: 1 0 0
 5: 1 0 1
 6: 1 1 0
(7: 1 1 1)

In this case, the solutions would be either 0 0 1, or its inverse, 1 1 0, corresponding to the partitioning (1 1)(,2).

So what are we dealing with here, in terms of the search space?

¯1+2*20
1048575

Just over a million or so different partitions to check, which should be manageable. The search space doubles in size for each additional item, which is something alluded to in the old fable about the inventor of Chess:

e¯1+2*e¨21+⍳9
2097151 4194303 8388607 16777215 33554431 67108863 134217727 268435455 536870911

To generate all binary combinations for n bits, we can use the following incantation (for n=3):

2¯12*3
0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1

Given a Boolean vector, how do we turn that into the two corresponding sets of numbers? There are several ways you can try, for example compress:

1 1 0 {(/)(/⍨~)} 1 1 2 ⍝ compress and compress ~
1 1 0 {(~v)(v/)} 1 1 2 ⍝ compress and set difference
┌───┬─┐ │1 1│2│ └───┴─┘
┌─┬───┐ │2│1 1│ └─┴───┘

but we can exploit the fact that we’re really interested in the sums of the sets. The sum of the items corresponding to the set bits is really just the bit pattern times the input set, summed:

+/1 1 0×1 1 2
2

which we by now hopefully should recognize as an inner product:

1 1 0 +.× 1 1 2
2

And of course, inner product can be applied to an array, too:

patterns  2¯12*3
patterns +.× 1 1 2
0 2 1 3 1 3 2 4

We can now draw up a plan for this.

  1. Our target partition sum is the total divided by two.

  2. Generate the bit patterns up to ≢⍵.

  3. Calculate the sums of the numbers corresponding to the set bits in each pattern

  4. Find the first point where the partition sum is equal to the target

  5. Return the corresponding partitioning

This translates readily to APL:

]dinput
Balance  {
    total  +/
    2|total:              ⍝ Sum must be divisible by 2
    psum  total÷2         ⍝ Our target partition sum
    bitp  2¯12*≢    ⍝ All possible bit patterns up to ≢⍵
    idx  ⍸<\psum=bitp+.× ⍝ First index of partition sum = target
    idx:                ⍝ If we have no 1s, there is no solution
    part  idxbitp        ⍝ Partition corresponding to solution index
    (part/)(/⍨~part)     ⍝ Compress input by solution pattern and inverse
}
Balance 1 1 2
┌─┬───┐ │2│1 1│ └─┴───┘

and on the full 20-bits:

Balance 10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93
┌────────────────────┬────────────────────────────────────┐ │99 84 87 76 85 41 93│10 81 98 27 28 5 1 46 63 25 39 78 64│ └────────────────────┴────────────────────────────────────┘

To a Python programmer, this approach must seem incredibly counter-intuitive, borderline criminally wasteful. Despite the fact that we only want the first solution (which occurs around bit pattern 1,500ish), we go through the entire search space. In a “loop and branch” language, we’d try each candidate pattern in turn, and break once we find the first solution.

But the seemingly naive APL solution is quick, even though it will process 2-3 orders of magnitude more patterns than would a scalar solution:

]runtime "Balance 10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93"
* Benchmarking "Balance 10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93" ┌──────────┬────┐ │ │(ms)│ ├──────────┼────┤ │CPU (avg):│28 │ ├──────────┼────┤ │Elapsed: │29 │ └──────────┴────┘

So what’s going on here? Well, two things. Firstly, we’re hitting the most optimal core of Dyalog APL, its handling of Boolean vectors, using only primitive functions. Secondly, APL is able to vectorise our operations, taking advantage of SIMD operations in the processor.

For completeness, let’s see how a scalar solution would perform. We can, for example, take a tail-call Scheme-like approach:

]dinput
BalanceScalar  {⎕IO0     
    total  +/
    2|total:              ⍝ Sum must be divisible by 2
    psum  total÷2         ⍝ Our target partition sum
    data  
    bitp  ↓⍉2¯12*≢   ⍝ Pre-compute the bit patterns
    {                      ⍝ Try one sum after the other, halt on first solution
        0=⍵: 
        patt  bitp
        psum=patt+.×data: (patt/data)(data/⍨~patt) ⍝ Exit on first solution found
        ¯1+
    } ¯1+≢bitp
}
BalanceScalar 1 1 2
┌───┬─┐ │1 1│2│ └───┴─┘

Ok, let’s compare:

'cmpx'⎕CY'dfns'
d10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93
cmpx 'Balance d' 'BalanceScalar d'
Balance d → 2.7E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ * BalanceScalar d → 4.0E¯2 | +46% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Ouch. That difference is pretty stark.

Dyalog can no longer vectorise, and performance slips considerably. If you take home anything from this chapter, this is the thing to remember.

However, the brute-force algorithm is of course still crude, even if APL can be quick about it at the specified scale. Add a few more elements, and the unforgiving O(2^N) complexity would soon crush us.

As I mentioned earlier, there are a number of heuristic algorithms that have a much more benign complexity, at the cost of not being able to guarantee optimal solutions in all cases. One such algorithm is called the Karmarkar-Karp, after its inventors, and has a complexity of O(N log N), but is not guaranteed to find the optimal solution, even if such a solution does exist.

In short, the KK algorithm maintains a priority queue of the remaining numbers, and picks the two largest numbers in each iteration, replacing them with their difference until a single number remains – this represents the final difference between the two partitions. We can construct the corresponding partitions via backtracking. A thorough analysis of this algorithm can be found here.

]dinput
KarmarkarKarp  {
    sort  {[;[0;]]}
    (pairs last)   {
        2>≢⍉⍵:⍺ (1 0)
        (i x)(j y)  ↓⍉[;0 1]
        (,⊂x y)sort (21),(i-j) x
    } sort (-)()
        
    last  {
        (a b)  
        0=≢⍵: 
        pair  ¯1
        apair: (a (b,1pair))¯1
        ((a,1pair) b)¯1
    } pairs
}

In our test case, the KK does return an optimal solution:

KarmarkarKarp 10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93
┌────────────────────────────┬────────────────────────────┐ │41 85 99 5 93 10 63 27 64 78│1 25 28 76 81 39 46 84 87 98│ └────────────────────────────┴────────────────────────────┘
cmpx 'Balance d' 'KarmarkarKarp d'
Balance d → 3.7E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ * KarmarkarKarp d → 1.4E¯4 | -100%

and, unsurprisingly, although completely scalar, it wins hands down. However, it’s still a heuristic:

Balance 4 5 6 7 8
KarmarkarKarp 4 5 6 7 8
┌───┬─────┐ │7 8│4 5 6│ └───┴─────┘
┌─────┬───┐ │4 5 7│6 8│ └─────┴───┘

A thing to note in the KK implementation above: it simply sorts the array at each iteration, instead of using a heap queue. As an exercise for the interested reader, try re-implementing it using a heap. However, and for similar reasons as we saw earlier, you will probably find that APL is likely faster at sorting a simple array than the complexity cost of maintaining a more nested data structure.

Merge#

Ok, we’re on a roll. Problem 6 from the same year was a spin on the old “mail merge” concept: given a template file, expand templates with values from a corresponding merge file. For example, given a template of

Hello @firstname@,
You have been selected to test drive a new @carmake@!

with a merge file of

{
    "firstname": "Bob"
    "carmake": "Aston Martin"
}

the final output should be

Hello Bob,
You have been selected to test drive a new Aston Martin!

A literal @ is represented by @@ in the template, and templates with no corresponding match in the merge file should be expanded to ???.

We’ll try two different solutions. For the first approach, it’s hard not to think of regexes here – replace “templates” matching a defined pattern with corresponding values.

Let’s look at the two data files first:

template  '/Users/stefan/template.txt'
⎕NGET template
@salutation@ @firstname@ @lastname@; Congratulations! You have won a @prize@ worth over £@value@! @firstname@, please come to our office to pick up your @prize@. Please feel free to contact us at info@@contest.com. Your email address in our domain is @firstname@@@contest.com
merge  '/Users/stefan/merge1.json'
⎕NGET merge
{ "firstname":"Drake", "lastname":"Mallard", "prize":"yoyo", "value":100 }

So our patterns are defined by the keys in the JSON, plus the @@ for literal @ and any remaining templates that have no corresponding entry in the merge file, along the lines of @[^@]+@. We’ll start with reading the JSON data into a namespace and pulling out the keys and values:

mrg⎕JSON⎕NGET merge
keysmrg.⎕NL¯2
valsmrg.(¨#.keys)
keys vals
┌─────────┬────────┬─────┬─────┐ │firstname│lastname│prize│value│ ├─────────┼────────┼─────┼─────┤ │Drake │Mallard │yoyo │100 │ └─────────┴────────┴─────┴─────┘

and our patterns and replacements then become:

(({'@',(),'@'}¨keys), ('@@'), '@[^@]+@') ((¨vals), (⊂,'@'), '???')
┌───────────┬──────────┬───────┬───────┬──┬───────┐ │@firstname@│@lastname@│@prize@│@value@│@@│@[^@]+@│ ├───────────┼──────────┼───────┼───────┼──┼───────┤ │Drake │Mallard │yoyo │100 │@ │??? │ └───────────┴──────────┴───────┴───────┴──┴───────┘

Putting it all together we get:

]dinput
Merge  {
    mrg⎕JSON⎕NGET 
    keysmrg.⎕NL¯2
    valsmrg.(¨#.keys)
    
    (({'@',(),'@'}¨keys), ('@@'), '@[^@]+@')⎕R((¨vals), (⊂,'@'), '???')⊢⊃⎕NGET 
}
merge Merge template
??? Drake Mallard; Congratulations! You have won a yoyo worth over £100! Drake, please come to our office to pick up your yoyo. Please feel free to contact us at info@contest.com. Your email address in our domain is Drake@contest.com

All well and good, but not very “array-oriented” now, is it? Let’s remedy that, and forget about cheaty regexes.

If we partition the data such that each partition begins with @ we get this:

'@' (=⊂⊢) 'aaaa @bbb@ ccc @@ @ddd@' ⍝ Partitions starting at @
┌────┬──────┬─┬──┬────┬─┐ │@bbb│@ ccc │@│@ │@ddd│@│ └────┴──────┴─┴──┴────┴─┘

in fact, let’s drop the leading @, too:

  tmpls'@'(1¨=⊂⊢)'aaaa @bbb@ ccc @@ @ddd@' 
┌───┬─────┬┬─┬───┬┐ │bbb│ ccc ││ │ddd││ └───┴─────┴┴─┴───┴┘

What we have now is a nested vector where every other element is a template.

Our replacement values will again come from a namespace

(mrg⎕NS).(bbb ccc ddd)  'bees' 'cees' 'dees'

and we’ll make a little helper function to look up value by key, with the added functionality to return @ for `` and ??? for non-present keys:

]dinput
val  {
    0=≢⍵: ,'@'
    ~()mrg.⎕NL¯2: '???'
    mrg
} 
val¨'bbb' 'ddd' 'ccc' '' 'hubba'
┌────┬────┬────┬─┬───┐ │bees│dees│cees│@│???│ └────┴────┴────┴─┴───┘

We can use Dyalog’s handy @ operator to make the replacements. Recall that the @ operator can take a right operand function which must return a Boolean vector, which in our case should select every other element, starting with the first:

val¨@{1 0 1 0 1 0}tmpls
┌────┬─────┬─┬─┬────┬┐ │bees│ ccc │@│ │dees││ └────┴─────┴─┴─┴────┴┘

Finally, we’d need to tack on anything prior to the first @ and enlist.

]dinput
Merge  {
    mrg  ⎕JSON⎕NGET 
    templ  ⎕NGET 
    first  templ'@'
    first>≢templ: templ    ⍝ No templates at all
    prefix  firsttempl
    val  {
        0=≢⍵: ,'@'
        ~()mrg.⎕NL¯2: '???'
        mrg
    }
    prefix,val¨@{1 0}'@'(1¨=⊂⊢)templ
}
merge Merge template
??? Drake Mallard; Congratulations! You have won a yoyo worth over £100! Drake, please come to our office to pick up your yoyo. Please feel free to contact us at info@contest.com. Your email address in our domain is Drake@contest.com

Right-align a block of text#

Let’s work through a more comprehensive problem. Here’s a tweaked version of one of the Phase 1 tasks from the Dyalog Problem Solving Competition, 2021:

Write a function that:

  • has a right argument T which is a character scalar, vector or a vector of character vectors

  • has a left argument W which is a positive integer specifying the width of the result

  • returns a right-aligned character array of shape ((2=≡T)/≢T),W.

  • if an element of T has length greater than W, truncate it after W characters.

The challenge here is not resorting to ‘eaching’ the rows, or employing some creative regexing. So let’s treat this as an exercise in array-oriented problem solving.

We’re given a couple of examples of how the solution should behave for arrays of different ranks:

⍝ 6 Align '⍒'
'     ⍒'

⍝ 10 Align 'Parade'
'    Parade'

⍝ 8 Align 'Longer Phrase' 'APL' 'Parade'
3 8'Longer P     APL  Parade'
Parade
Longer P APL Parade

Note

The last example above is different from the way the problem was published. For extra points, solve it as it was presented in the competition:

3 8⍴'r Phrase     APL  Parade'

A hint is given in the problem statement on where we could start: we’re told what the shape should be of the resulting array: ((2=≡T)/≢T),W. Given the result shape, we can try the following approach:

Figure out the result shape, and squeeze the data into this shape, truncating if we need to. Locate trailing spaces. Prepend a space on each line, and then replicate by the number of trailing spaces.

We’re given the shape. In order for this to work for character scalars, a character vector, or vector of character vectors, we need to operate on the ravel of the right argument when finding the shape:

6 {,(2=≡,)/≢,} '⍋'
10 {,(2=≡,)/≢,} 'Parade'
8 {,(2=≡,)/≢,} 'Longer Phrase' 'APL' 'Parade'
6
10
3 8

To fit the data into that shape, we need to Mix the ravel (↑,⍵) to increase the rank and then Take elements, rank 1 (along vectors), and then reshape. Let’s explore what this means:

data  'Parade'
sh  10 {,(2=≡,)/≢,} data ⍝ Shape
↑,data                         ⍝ Mix ravel (no change for a char vector)
101⊢↑,data                   ⍝ (over)take, rank-1
10
Parade
Parade

More interesting to consider is the nested vector case:

data  'Longer Phrase' 'APL' 'Parade'
  sh  8 {,(2=≡,)/≢,} data ⍝ Shape
↑,data                           ⍝ Mix ravel
  mat  81⊢↑,data            ⍝ (over)take, rank 1
3 8
Longer Phrase APL Parade
Longer P APL Parade

So far, so array-oriented! Now we need to find trailing spaces. The equal sign makes for an excellent search function! Where are the spaces?

' '=mat
0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1

All well and good, but we need the trailing spaces. A handy technique here is to and-scan, which is one of those APLisms that is so obvious (once someone pointed it out to you, you found it on APLCart, or you’re a genius). It keeps all 1s from the beginning, and zeros everything after the first zero:

\1 1 1 0 0 0 1 0 1 0
1 1 1 0 0 0 0 0 0 0

but that looks at leading not trailing spaces, so we need to flip, and-scan, flip back:

⌽∧\' '=mat   ⍝ We're keeping on arraying
0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1

Conceptually, we’ll add extra spaces at the beginning of each line, the same number as any trailing spaces. In practice, we’ll add a single space, and then use a replication vector to clone it the relevant number of times. Recall:

  data  ' ABCDEFGHIK'         ⍝ Note: leading space
  repl  5 1 1 1 1 1 1 1 1 1 1 ⍝ Five spaces, one of everything else
repl/data
ABCDEFGHIK
5 1 1 1 1 1 1 1 1 1 1
ABCDEFGHIK
trailing  ⌽∧\' '=mat     ⍝ Find trailing spaces, as per above
  spaced  ' ',mat       ⍝ Add a space as the first char of each row
  keepers  ~trailing    ⍝ Everything not a trailing space
  padding  +/trailing   ⍝ Amount of leading padding to be inserted per row
  repl  padding,keepers ⍝ Number of replications per character
Longer P APL Parade
1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0
0 5 2
0 1 1 1 1 1 1 1 1 5 1 1 1 0 0 0 0 0 2 1 1 1 1 1 1 0 0

Now what remains is to do the replication and ensure that everything gets the shape it should have.

repl/⍤1spaced ⍝ Replicate along vectors
Longer P APL Parade

That’s it! Putting it all together, we get the following:

]dinput
Align  {
    shape  ,(2=≡,)/≢,  ⍝ Shape, as specified
    mat  1⊢↑,          ⍝ Truncate rank 1
    trailing  ⌽∧\' '=mat  ⍝ Find trailing spaces
    spaced  ' ',mat        ⍝ Add a space as the first char of each row
    keepers  ~trailing     ⍝ Everything not a trailing space
    padding  +/trailing    ⍝ Amount of leading padding to be inserted per row
    repl  padding,keepers  ⍝ Number of replications per character
    repl/⍤1spaced          ⍝ Replicate rank 1
}
6 Align '⍋'
10 Align 'Parade'
8 Align 'Longer Phrase' 'APL' 'Parade'
8 Align 'K' 'E' 'Iverson'
Parade
Longer P APL Parade
K E Iverson

Pretty cool, eh? As an exercise, now shrink that to a one-liner suitable for submitting as a competition entry.

Shall we do one more?

FizzBuzz#

Adám Brudzewski used the classic FizzBuzz problem – the darling of technical interviewers everywhere – as an illustration in one of his Cultivation lessons, Lesson 39 - Array programming techniques.

Wikipedia has the following to say about FizzBuzz:

Fizz buzz is a group word game for children to teach them about division. Players take turns to count incrementally, replacing any number divisible by three with the word “fizz”, and any number divisible by five with the word “buzz”.

but most programmers will be familiar with it as a task frequently set as a technical challenge in job interviews. In fact, we’ve seen an APL version of this already. Right at the beginning it was used as an illustration of a trad function.

Here’s a “loopy” version as an APL dfn, for context:

{(3(0=3 5|)1)/'Fizz' 'Buzz'}¨1+⍳16
┌─┬─┬────┬─┬────┬────┬─┬─┬────┬────┬──┬────┬──┬──┬────────┬──┐ │1│2│Fizz│4│Buzz│Fizz│7│8│Fizz│Buzz│11│Fizz│13│14│FizzBuzz│16│ └─┴─┴────┴─┴────┴────┴─┴─┴────┴────┴──┴────┴──┴──┴────────┴──┘

Can we do without the each, given what we’ve already learnt? Let’s ponder what an array solution might look like.

Let’s begin by creating a Boolean mask showing the positions of the numbers which are divisible by 3 (the “fizzes”) and those divisible by 5 (the “buzzes”). We can do that as an outer product:

data  1+⍳16
  fizzbuzz  0=3 5∘.|data
0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0

So the columns that have only zeros in them represent the “numbers” - the “non-fizzbuzzy” bits. Let’s add a new row to our matrix to make this clear:

  mat  (⍪⊢)fizzbuzz  ⍝ Tacit for {(⍱⌿⍵)⍪⍵} -- column-wise logical NOR as the new first row
1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0

If we now multiply the first row by our input numbers, we have the numbers that aren’t “fizzbuzzy”

mat×@0data ⍝ Check that out! Modified assignment
mat
1 2 0 4 0 0 7 8 0 0 11 0 13 14 0 16 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0

Now we substitute all 1s for “fizz” in the second row, and for “buzz” in the third row:

mat[1,0 0⊢⍸1mat]'fizz'
mat[2,0 0⊢⍸2mat]'buzz'
mat
┌─┬─┬────┬─┬────┬────┬─┬─┬────┬────┬──┬────┬──┬──┬────┬──┐ │1│2│0 │4│0 │0 │7│8│0 │0 │11│0 │13│14│0 │16│ ├─┼─┼────┼─┼────┼────┼─┼─┼────┼────┼──┼────┼──┼──┼────┼──┤ │0│0│fizz│0│0 │fizz│0│0│fizz│0 │0 │fizz│0 │0 │fizz│0 │ ├─┼─┼────┼─┼────┼────┼─┼─┼────┼────┼──┼────┼──┼──┼────┼──┤ │0│0│0 │0│buzz│0 │0│0│0 │buzz│0 │0 │0 │0 │buzz│0 │ └─┴─┴────┴─┴────┴────┴─┴─┴────┴────┴──┴────┴──┴──┴────┴──┘

Merge columns, remove zeros:

0~⍨¨,mat
┌─┬─┬────┬─┬────┬────┬─┬─┬────┬────┬──┬────┬──┬──┬────────┬──┐ │1│2│fizz│4│buzz│fizz│7│8│fizz│buzz│11│fizz│13│14│fizzbuzz│16│ └─┴─┴────┴─┴────┴────┴─┴─┴────┴────┴──┴────┴──┴──┴────────┴──┘

Nice. So in this case, is the longer array solution better than the clever loopy one we showed in the beginning? Well, the intention was to demonstrate how to approach solutions in a data-parallel way, but the problem was of a magnitude that probably made this unnecessary. But practice makes perfect.