Introduction¶
k is a family of concise, fast vector-oriented languages designed by Arthur Whitney. Calling k a “family” is deliberate; there is no single definitive k, but instead a sequence of slightly incompatible versions. If you decide to stick with k, you’ll see mentions of k4
, k5
etc. Exactly what those numbers mean aren’t important–they’re “generations”, rather than versions, and not every generation was ever made available to the public, or even completed. A Python 2/3 split every time, if you like.
Reputedly, Arthur always starts from scratch when making the next generation of k, happily and deliberately sacrificing backward compatibility in order to build something better and faster, cut fat or revert design decisions that didn’t pan out. The bleeding edge k is being developed right now (at the time of writing) by Arthur’s latest venture, Shakti. Shakti-k is dubbed by some as k9
. If this all sounds a bit anarchic, that’s because it is. There is no expectation that code written for one generation will always work unchanged in the next. Embrace it. Evolution is healthy.
The main commercial version of k available today is q/kdb+ from kX systems, Arthur’s previous venture. It’s stable, fast, “batteries included” and really the benchmark k against which others are measured. kX’s k version is usually thought of as k4
. However, kX’s main product is the language q
–a k-derivative (implemented in k) that looks (a bit) more like a traditional programming language, and the freakishly fast, distributed columnar store kdb+. kX views k as “exposed infrastructure” and actively discourages its users from using it. kX provides exceptionally good documentation for q which is always worthwhile reading: even if q isn’t k, it’s close enough many times. k9
currently has Arthur-style documentation only; the ref-card.
A commercial license for q/kdb+
is eye-wateringly expensive and likely out of reach for hobbyists. You can, however, run it for free under a non-commercial evaluation license, but note that the product is “tethered”, and so sends telemetry data back to base and cannot be used without an active internet connection.
You can try k9
under a free non-commercial evaluation license, too, but k9
is very much a moving, changing target at the time of writing.
Why k?¶
I’d like to avoid the advocacy piece, as there seems to be little middle ground. As you’ve landed here, you’ve clearly somehow sought out k, and you likely have an idea what it’s about. K, like its Iversonian siblings APL and J, values conciseness, or perhaps we should say terseness, of representation and speed of execution.
The same baseless accusations of “unreadable”, “write-only” and “impossible to learn” are leveled at all Iversonian languages, k included. I covered that a bit in the introduction to my book on APL, and so won’t repeat that here. The k cognoscenti, when told the language is unreadable (again) will simply show you the whites of their eyes, mumble “whatever”, and get on with their lives. Readability is a property of the reader, not the language.
K is a general-purpose programming language that excels as a tool for data wrangling, analytics and transformation. The analytics use case really drove both its inception and adoption in the financial industry. Compared with APL it’s more consistent (partly a consequence of Arthur’s propensity for always starting from scratch), perhaps a smidge less mathematically pure (instead choosing to optimise for speed and pragmatism), fewer “batteries included” and with a vector-, rather than array-oriented model.
If you come to k knowing APL or J, the transition is pretty pain-free. If k is your first foray into vector and array languages, coming perhaps from Python or JavaScript, the learning ramp will feel steeper than you’re used to, but at least the vector model will feel more familiar than APL’s rank. In k, like in Python, depth and rank are the same thing.
K is a teeny, tiny language. It has no libraries to speak of. You can pick up the basics in an afternoon. In terms of complexity, it’s about as hard as learning regular expressions. However, like with most things, it takes tons of practice to become good at it. In the hands of a master practitioner, it’s truly a sight to behold.
If you persist, you’ll find you have a computational superpower at your fingertips. You’ll learn new ways to think about data. Going back to a mainstream language will feel dull and boring.
I think of k as APL’s “more punk rock little sister”. It’s not for everyone.
Open source k¶
There is a small, but thriving, open source k community, and there is probably half a dozen or more open source k implementations of k knocking around of varying degree of ambition. The k language deserves a future outside the commercial implementations marketed to the financial industry, and this book will deal solely with open source k. If you’re a hedge fund jock who sees k as a way to get ahead, you’re welcome here, too, but note that we’ll not touch on either q
, kdb+
or the Shakti equivalents.
This book is written as a jupyter-book, using the ngn/k kernel developed specifically for this book. All examples are thus run under ngn/k
, which is a k6
. You can run ngn/k
directly in your browser, too, and we’ll link all examples to a live, web-based repl if you want to experiment.
Installing ngn/k
requires you to get your hands dirty and build from source. At the moment, it’s buildable on Linux, FreeBSD, OpenBSD and MacOS.
John Earnest’s oK
is another k5/6
, available on-line. Most examples here should run unchanged under oK, and if that’s not the case, we’ll try to point that out. oK
has a well-written manual in addition to the traditional Arthur-style ref-card, and John’s also written a more general intro to programming in k, too.
There is also kona, a k3
, and ktye/k, which is the only open source k to support some of the ksql database extensions.
Nomenclature and style¶
K uses nomenclature introduced in the J language (another APL derivative) with which Arthur was deeply involved. You may come across Arthur’s sketched J interpreter fragment, known as the J Incunabulum, and been either awestruck or repelled that one can do such unspeakable things in c. In J, and also in k, we borrow terms from linguistics, rather than from mathematics, to describe the building blocks of the language. J/k folk insist that this makes it easier to learn and understand, but if you’re already versed in other programming languages, this is sure to grate a bit in the beginning. What the hell do you mean “adverb”?
In k, we have nouns, verbs and adverbs, rather than data, functions and operators. We’ll stick to these conventions. K, like APL and J, also uses the words “monadic” and “dyadic” to refer to verbs or adverbs taking one or two arguments respectively. “Monadic” has nothing to do with Haskell monads, you’ll be pleased to hear.
Traditions in k dictate that code should be as terse as possible. The use of unnecessary whitespace and non-single-letter names for things are seen as signs of inexperience. However, for the purposes of this book, we’ll make no apologies for breaking such style guide lines where we think this aids clarity.
About this book¶
I can’t claim to be an expert on k. Like with my APL book, this is a jazzed-up version of the notes I took when learning. There is a lack of accessible introductory texts to k for experienced practitioners of other languages, and this is my contribution to help plug that gap. Moreover, k deserves to break free of its association with the financial industry, and perhaps this can help with that, too.
Ks have traditionally been ‘documented’ via a so-called ref-card. This is the briefest possible listing of built-ins, usually with zero context or explanation. This is part of the sometimes unhelpful mythology surrounding k, and – some might argue – a semi-deliberate barrier: you’re expected to get yourself to a point where you also believe that the ref-card is sufficient documentation, and so complete the circle. ngn/k
, the dialect we’re chiefly concerned with here, carries on with the ref-card tradition, but its version is actually unusually comprehensive. You can view its ref card(s) with a few backslash commands in the repl, starting with a single \
for the index page:
\
\ help \\ exit
\a license(AGPLv3) \l file.k load
\0 types \t:n expr time(elapsed milliseconds after n runs)
\+ verbs \v variables
\: I/O verbs \f functions
\' adverbs \cd path change directory
\` symbols \other command(through /bin/sh)
\h summary
By no means should you expect to be able to actually learn k from the ref cards, but – and I do feel slightly uncomfortable saying it – you’ll soon find them indispensable. For example, \+
lists the built-in functions verbs with both monadic and dyadic forms in one compact screen:
\+
Verbs: : + - * % ! & | < > = ~ , ^ # _ $ ? @ . 0: 1:
notation: [c]har [i]nt [n]umber(int|float) [s]ymbol [a]tom [d]ict
[f]unc(monad) [F]unc(dyad) [xyz]any
special: var:y set a:1;a -> 1
(v;..):y unpack (b;(c;d)):(2 3;4 5);c -> 4
:x return {:x+1;2}[3] -> 4
$[x;y;..] cond $[0;`a;"\0";`b;`;`c;();`d;`e] -> `e
o[..] recur {$[x<2;x;+/o'x-1 2]}9 -> 34
[..] progn [0;1;2;3] -> 3
:: self ::12 -> 12
: right 1 :2 -> 2 "abc":'"d" -> "ddd"
+x flip +("ab";"cd") -> ("ac";"bd")
N+N add 1 2+3 -> 4 5
-N negate - 1 2 -> -1 -2
N-N subtract 1-2 3 -> -1 -2
*x first *`a`b -> `a *(0 1;"cd") -> 0 1
N*N multiply 1 2*3 4 -> 3 8
%N sqrt %25 -> 5.0 %-1 -> 0n
N%N divide 4 3%2 -> 2 1 4 3%2.0 -> 2.0 1.5
!i enum !3 -> 0 1 2 !-3 -> -3 -2 -1
!I odometer !2 3 -> (0 0 0 1 1 1;0 1 2 0 1 2)
!d keys !`a`b!0 1 -> `a`b
!S ns keys a.b.c:1;a.b.d:2;!`a`b -> ``c`d
x!y dict `a`b!1 2 -> `a`b!1 2
i!I div -10!1234 567 -> 123 56
i!I mod 10!1234 567 -> 4 7
&I where &3 -> 0 0 0 &1 0 1 4 2 -> 0 2 3 3 3 3 4 4
&x deepwhere &(0 1 0;1 0 0;1 1 1) -> (0 1 2 2 2;1 0 0 1 2)
N&N min/and 2&-1 3 -> -1 2 0 0 1 1&0 1 0 1 -> 0 0 0 1
|x reverse |"abc" -> "cba" |12 -> 12
N|N max/or 2|-1 3 -> 2 3 0 0 1 1|0 1 0 1 -> 0 1 1 1
<X ascend <"abacus" -> 0 2 1 3 5 4
>X descend >"abacus" -> 4 5 3 1 0 2
<s open fd:<`"/path/to/file.txt"
>i close >fd
N<N less 0 2<1 -> 1 0
N>N more 0 1>0 2 -> 0 0
=X group ="abracadabra" -> "abrcd"!(0 3 5 7 10;1 8;2 9;,4;,6)
=i unitmat =3 -> (1 0 0;0 1 0;0 0 1)
N=N equal 0 1 2=0 1 3 -> 1 1 0
~x not ~(0 2;``a;"a \0";::;{}) -> (1 0;1 0;0 0 1;1;0)
x~y match 2 3~2 3 -> 1 "4"~4 -> 0 0~0.0 -> 0
,x enlist ,0 -> ,0 ,0 1 -> ,0 1 ,`a!1 -> +(,`a)!,,1
x,y concat 0,1 2 -> 0 1 2 "a",1 -> ("a";1)
^x null ^(" a";0 1 0N;``a;0.0 0n) -> (1 0;0 0 1;1 0;0 1)
a^y fill 1^0 0N 2 3 0N -> 0 1 2 3 1 "b"^" " -> "b"
X^y without "abracadabra"^"bc" -> "araadara"
#x length #"abc" -> 3 #4 -> 1 #`a`b`c!0 1 0 -> 3
i#y reshape 3#2 -> 2 2 2
I#y reshape 2 3#` -> (```;```)
f#y replicate (3>#:')#(0;2 1 3;5 4) -> (0;5 4) {2}#"ab" -> "aabb"
x#d take `c`d`f#`a`b`c`d!1 2 3 4 -> `c`d`f!3 4 0N
_n floor _12.34 -12.34 -> 12 -13
_c lowercase _"Ab" -> "ab"
i_Y drop 2_"abcde" -> "cde" `b_`a`b`c!0 1 2 -> `a`c!0 2
I_Y cut 2 4 4_"abcde" -> ("cd";"";,"e")
f_Y weed out (3>#:')_(0;2 1 3;5 4) -> ,2 1 3
X_i delete "abcde"_2 -> "abde"
$x string $(12;"ab";`cd;+) -> ("12";(,"a";,"b");"cd";,"+")
i$C pad 5$"abc" -> "abc " -3$"a" -> " a"
s$y cast `c$97 -> "a" `i$-1.2 -> -1 `$"a" -> `a
s$y int `I$"-12" -> -12
?x uniq ?"abacus" -> "abcus"
X?y find "abcde"?"bfe" -> 1 0N 4
i?x roll 3?1000 -> 11 398 293 1?0 -> ,-8164324247243690787
i?x deal -3?1000 -> 11 398 293 /guaranteed distinct
@x type @1 -> `b @"ab" -> `C @() -> `A @(@) -> `v
x@y apply(1) {x+1}@2 -> 3 "abc"@1 -> "b" (`a`b!0 1)@`b -> 1
.S get a:1;.`a -> 1 b.c:2;.`b`c -> 2
.C eval ."1+2" -> 3
.d values .`a`b!0 1 -> 0 1
x.y apply(n) {x*y+1}. 2 3 -> 8 (`a`b`c;`d`e`f). 1 0 -> `d
@[x;y;f] amend @["ABC";1;_:] -> "AbC" @[2 3;1;{-x}] -> 2 -3
@[x;y;F;z] amend @["abc";1;:;"x"] -> "axc" @[2 3;0;+;4] -> 6 3
.[x;y;f] drill .[("AB";"CD");1 0;_:] -> ("AB";"cD")
.[x;y;F;z] drill .[("ab";"cd");1 0;:;"x"] -> ("ab";"xd")
.[f;y;f] try .[+;1 2;"E:",] -> 3 .[+;1,`2;"E:",] -> "E:typ"
?[x;y;z] splice ?["abcd";1 3;"xyz"] -> "axyzd"
Are you ready? Let’s crack open the kool-aid.