Lookup without replacement#

“Lookup without replacement” is a very old (and famous) programming problem in the APL world. You can see a thorough investigation of this in video form, too.

Consider two vectors,

L←⎕←'abacba'
R←⎕←'baabaac'

abacba

baabaac

Dyadic iota ⍳ lets us find the first index of occurrence of the elements in R in L:

L⍳R

2 1 1 2 1 1 4

However, what if we wanted the first b in R to “consume” the first b in L so that the second b in R would have to contend with the index of the second b in L? That is, we want some function which gives 2 1 3 5 6 7 4. You could call it “iota without replacement”.

Let’s begin by labeling the elements so we can see what goes where:

'a1' 'b1' 'a2' 'c1' 'b2' 'a3' ⍳ 'b1' 'a1' 'a2' 'b2' 'a3' 'a4' 'c1'

2 1 3 5 6 7 4

As we numbered the as (which otherwise all match each other) and the bs, the right pairs get matched up. If you recall the chapter about ⍋, you may also recall what ⍋⍋ does. While ⍋ gives use the indices that will sort, ⍋⍋ gives us the positions that each element will occupy in the sorted result.

↑L(L⍳L)(⍋⍋L⍳L)⊣L

a b a c b a
1 2 1 4 2 1
1 4 2 6 5 3

The first line is the data and the second is the indices of the first occurrences (in other words, all identical items will get the same index). The third line is the position that each will occupy when sorted. That means that identical elements get consecutive positions.

For example, you can see that the first b gets 4 (because there are 3 as) and the second gets 5. This almost solves the problem.

However, there are a couple of issues:

The two arrays must have the same set of elements.
The two arrays must have equally many of each unique element
The unique elements must initially occur in the same order

Why these conditions?

is because otherwise the purely numeric “labels” will match the wrong things.
is because otherwise one element’s “label” will be paired up with the label of a different value element of the other array.
is because otherwise identical “labels” numbers refer to two entirely different things, and so the matching won’t give a meaningful result.

But if these conditions are met, we get the right result:

L←'abacba'
R←'aaabcb'
(⍋⍋L⍳L)⍳⍋⍋R⍳R

1 3 6 2 4 5

The first a in R gets paired with the element in position 1 of L, and the second a in R goes with the element in position 3, and the third goes with the last element of L.

Let’s have a stab at how we can ensure that all conditions are eliminated, and then we’ll have our solution. Since we’re going to look up elements of R in L anyway, we can use indices into L (that is L⍳R) instead of the lookup of R into itself (R⍳R) This ensures that elements of R are labelled with “L’s labelling system”.

L←'abacba'
R←'bcabaa'
↑L(L⍳L)(⍋⍋L⍳L)
↑R(L⍳R)(⍋⍋L⍳R)

a b a c b a
1 2 1 4 2 1
1 4 2 6 5 3

b c a b a a
2 4 1 2 1 1
4 6 1 5 2 3

The first line (of each group) is the data, the second line is the first-positions of that data in L. The third is the progressive labeling of that. Now you can see that the first a is labeled 1 for both L and R and the first b is labeled 4 for both L and R.

(⍋⍋L⍳L)⍳(⍋⍋L⍳R)

2 4 1 5 3 6

We now have that the first b of R takes out element 2 of L, and the c takes out element 4 of L and so on. But this still requires both sides to have the same set of elements and equally many of each element. How can we ensure that there are equally many of each unique element on each side? Well, if you think about it, L,R and R,L must necessarily have the same set in equal proportions. But this also gives us way more elements than we need. We’ll take care of that later.

(⍋⍋L⍳L,R)⍳(⍋⍋L⍳R,L)

2 4 1 5 3 6 9 7 11 8 10 12

Note that this sequence begins with what we want, and now we have equal proportions, so we’ve eliminated issue 2. We just need to reshape (or take) to chop the unneeded elements:

((⍴L)⍴⍋⍋L⍳L,R)⍳((⍴R)⍴⍋⍋L⍳R,L)

2 4 1 5 3 6

Now it works even though we have a d in R which doesn’t occur in L. In accordance with the rules of ⍳, not-found elements get the index 1+the last index of the left argument. Since we chopped the left list of labels to the length of L, that’s what we get.

L←'abacba'
R←'bcdabaaaaa'
((⍴L)⍴⍋⍋L⍳L,R)⍳((⍴R)⍴⍋⍋L⍳R,L)

2 4 7 1 5 3 6 7 7 7

And so, we’ve taken care of issue 1 (different sets of elements). This algorithm can also be adapted to use with any-rank arrays by using ≢ instead of monadic ⍴ and ↑ instead of dyadic ⍴ and ⍪ instead of ,. Let’s have a look back at what we did. Consider:

↑(L R)←'abacba' 'baabaac'

abacba 
baabaac

We then labeled the elements:

↑('a1' 'b8' 'a2' 'c12' 'b9' 'a3')('b8' 'a1' 'a2' 'b9' 'a3' 'a4' 'c12')

┌──┬──┬──┬───┬──┬──┬───┐
│a1│b8│a2│c12│b9│a3│   │
├──┼──┼──┼───┼──┼──┼───┤
│b8│a1│a2│b9 │a3│a4│c12│
└──┴──┴──┴───┴──┴──┴───┘

And looked those labels up:

('a1' 'b8' 'a2' 'c12' 'b9' 'a3') ⍳ ('b8' 'a1' 'a2' 'b9' 'a3' 'a4' 'c12')

2 1 3 5 6 7 4

But actually, we don’t need the original values (the letters); the numeric labels are enough:

(1 8 2 12 9 3) ⍳ (8 1 2 9 3 4 12)

2 1 3 5 6 7 4

And how did we get those labels?

↑(L R)←'abacba' 'baabaac'
(⍴L)⍴⍋⍋L⍳L,R
(⍴R)⍴⍋⍋L⍳R,L

abacba 
baabaac

1 8 2 12 9 3

8 1 2 9 3 4 12

So now we can define our function:

pdi ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵)⍳(⍴⍵)⍴⍋⍋⍺⍳⍵,⍺} ⍝ Progressive Dyadic Iota

'abacba' pdi 'bcabaa'

2 4 1 5 3 6

Here’s an example. We want to fill a plane with multiple classes, using first-come, first-serve. We may want to ask: for each customer, will they fit on the plane? Say we have a plane like ‘11bbbpeepee’, where 1 is first class, b is business, p is economy plus (extra legroom at emergency exits), and e is regular economy. We now have a bunch of customers coming to buy seats: ‘1bbbpppeeeee’. That’s one 1st class customer, three business people, three want more legroom, and a load of regular people.

'11bbbpeepee' pdi '1bbbpppeeeee'

1 3 4 5 6 9 12 7 8 10 11 12

Being that the plane only has 11 seats, we can see that one plus and one economy will not fit (indicated by the 12s), but we just want a Boolean, not the actual seating. Progressive dyadic iota (or iota without replacement) asks “For each element, where would it go in the remaining elements?” Now we need to ask “For each element, does it fit in (i.e. is it in) the remaining elements?”.

“is it in” is APL’s ∊. Just note that the arguments of ∊ and ⍳ are “reversed” in that the array we look up in is on the left for ⍳ and on the right for ∊, so we just swap the parts of our function and substitute ∊ for the middle ⍳:

pde ← {((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)∊((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵)} ⍝ Progressive Dyadic Epsilon

'11bbbpeepee' pde '1bbbpppeeeee'

1 1 1 1 1 1 0 1 1 1 1 0

Alternatively, we could just call the function with swapped arguments:

'1bbbpppeeeee' {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵)∊(⍴⍵)⍴⍋⍋⍺⍳⍵,⍺} '11bbbpeepee'

1 1 1 1 1 1 0 1 1 1 1 0

This function is “membership without replacement”, or “progressive dyadic epsilon”. Did you notice the pattern? We are taking two functions and modifying them in a consistent manner. This calls for an operator!

WithoutReplacement←{((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵)⍺⍺(⍴⍵)⍴⍋⍋⍺⍳⍵,⍺}

↑ (p c)←'11bbbpeepee' '1bbbpppeeeee'
p ⍳WithoutReplacement c
p ∊WithoutReplacement c

11bbbpeepee 
1bbbpppeeeee

1 3 4 5 6 9 12 7 8 10 11 12

1 0 1 1 1 1 1 1 1 1 1

Notice how the APL code reads much like normal English.

APL Cultivations

Lookup without replacement

Lookup without replacement#