Constants, tools and utils
Contents
Constants, tools and utils#
In this section we’ll cover some system constants and utility functions.
Alphabetic chars ⎕A
#
⎕A
is the uppercase English alphabet :
⎕A
ABCDEFGHIJKLMNOPQRSTUVWXYZ
There is no built-in for the lowercase alphabet, but you can get it with the case convert system function, ⎕C
:
⎕C⎕A
abcdefghijklmnopqrstuvwxyz
Digits ⎕D
#
⎕D
has the digits:
⎕D
0123456789
Null item ⎕NULL
#
⎕NULL
is a scalar null value. It isn’t really used much in APL itself, but you can meet it e.g. when importing spreadsheets where it represents empty cells. Note that it is not JSON null
, which is represented as ⊂'null'
to match true and false being ⊂'true'
and ⊂'false'
. Note also that ⎕NULL
equals itself. These three (⎕A ⎕D ⎕NULL
) are system constants; you can’t assign to them.
Win/unix command ⎕CMD ⎕SH
#
⎕CMD
and ⎕SH
are identical, but the first feels more natural to Windows users while the second feels more natural to UNIX users. Pressing f1 on them will give you the help appropriate for that OS. They are used to call the OS command processor:
⎕SH'ls /'
Applications Library System Users Volumes bin cores dev etc home opt private sbin tmp usr var
Comma separated values ⎕CSV
#
⎕CSV
will import and export Comma/Character Separated Values.
⎕CSV '"abc","def",3' 'S'
┌───┬───┬─┐ │abc│def│3│ └───┴───┴─┘
It has a ton of options for almost anything you could want, including import and export directly to and from text files.
Data representation ⎕DR
#
⎕DR
is Data Representation. Monadically, it will tell you how an array is represented internally, and dyadically, it allows you to convert between data types:
⎕DR 42
83
Dyalog APL data type codes have two parts, the 1’s place and the rest. The 1’s place tells you which kind of data it is, the rest tells you how many bits are used to store it, with one exception: pointers are always 326 even on 64 bit systems. The number 42 gave us 83, where 3 means integer and 8 means 8-bit.
Dyalog APL has single-bit Boolean arrays, so they are type 11 where the rightmost 1 means Boolean, and the leftmost 1 means 1-bit.
⎕DR 1 0 1 1 1 0
11
Dyadic ⎕DR
lets you convert between types:
11⎕DR 42
0 0 1 0 1 0 1 0
This takes the memory which was used to represent 42 and interprets it as if it was a Boolean array.
You can also combine two steps of ⎕DR
into one. A two-element left argument will interpret the right argument as that type, then convert it to the type given by the second element of the left argument.
Format ⎕FMT
#
⎕FMT
is ForMaT. It is like a beefed up version of ⍕
. ⍕
retains the rank of its argument (except for numeric scalars becoming character vectors). ⎕FMT
always returns a matrix. Also, ⍕
treats control characters as normal characters, while ⎕FMT
will resolve them:
str←⎕←'abc',(⎕UCS 8),'def' ⍝ 8 is backspace
⍴⎕←⍕str ⍝ ⍕ treats backspace as any other char
⍴⎕←⎕FMT str ⍝ ⎕FMT resolves it
abdef
abdef 7
abdef 1 5
You see that the 'c'
really was erased by the backspace.
Dyadic ⎕FMT
gives you access to a whole new language, namely a formatting specification language. We won’t go though all the details here (see docs!), but here’s a taste:
'I3,F5.2' ⎕FMT 2 4⍴⍳8
1 2.00 3 4.00 5 6.00 7 8.00
The formatting string I3,F5.2
means that each row should first have an integer, then a float which uses five characters in width and has 2 decimals, then this formatting is cycled as much as needed for all the columns (here twice).
Import/export JSON ⎕JSON
#
⎕JSON
imports/exports JSON. It works for both arrays and objects:
⎕JSON'[[42,null],"hello"]'
┌───────────┬─────┐ │┌──┬──────┐│hello│ ││42│┌────┐││ │ ││ ││null│││ │ ││ │└────┘││ │ │└──┴──────┘│ │ └───────────┴─────┘
⊢ns←⎕JSON'{"abc":42,"de":null,"f":"hello"}'
ns.(abc f)
#.[JSON object]
┌──┬─────┐ │42│hello│ └──┴─────┘
We can also export from APL to JSON:
⎕JSON ('abc' 1 2 3) 4 5
[["abc",1,2,3],4,5]
Just be aware that if you want to convert an APL string to JSON, you need use the left argument to specify whether you want import (0) or export (1).
You can also tell ⎕JSON
that you want your JSON fully white-spaced:
⎕JSON⍠'Compact'0⊢('abc' 1 2 3)4 5
[ [ "abc", 1, 2, 3 ], 4, 5 ]
Finally, whilst you can import any JSON object, not every APL namespace can be exported. For example, a namespace with APL functions cannot be converted to JSON. Again, ⎕JSON
has some more advanced options — see the docs. ⎕JSON
is fully compliant with JSON, though, but we do allow some leniency which allows you to create some JavaScript objects which are not valid JSON. For example,
⎕JSON 'hello' (⊂'world')
["hello",world]
We opted for a generalised system for strings without quotes, rather than special casing null
. The I-beam that preceded ⎕JSON
did in fact use ⎕NULL
. By using enclosed strings, we can losslessly roundtrip. However, If you DO want to use APL’s ⎕NULL
, you can specify this using the Null
variant to ⎕JSON
:
j←⎕JSON⍠'Null' ⎕NULL⊢'{"name": null}'
j.name
j.name = ⎕NULL
[Null]
1
The JSON format doesn’t support arrays of higher rank, only lists-of-lists. This means that not all APL constructs can be converted to JSON directly, for example:
⎕JSON 2 3⍴⍳6 ⍝ DOMAIN ERROR
DOMAIN ERROR: JSON export: the right argument cannot be converted
⎕JSON 2 3⍴⍳6 ⍝ DOMAIN ERROR
∧
However, when speaking with the world outside, we probably want our matrices to be converted to lists of lists. For this, we have the HighRank
variant option:
⎕JSON⍠'HighRank' 'Split' ⊢ 2 3⍴⍳6
[[1,2,3],[4,5,6]]
This works universally, also recursing into namespaces:
mat←⍳2 3
cube←2 2 2⍴2
⎕JSON⍠'HighRank' 'Split'⎕NS'mat' 'cube'
{"cube":[[[2,2],[2,2]],[[2,2],[2,2]]],"mat":[[[1,1],[1,2],[1,3]],[[2,1],[2,2],[2,3]]]}
Another thing that ⎕JSON
can now do is to understand and create JSON5:
(ns←⎕JSON⍠'Dialect' 'JSON5'⊢'{noQuotes: [0xdecaf,0xC0FFEE] /* comment */}').noQuotes
⎕JSON⍠'Dialect' 'JSON5'⊢ns
912559 12648430
{noQuotes:[912559,12648430]}
Maybe most importantly, JSON5 allows trailing commas in lists and objects:
⎕JSON⍠'Dialect' 'JSON5'⍠'Compact'0⍳3
[ 1, 2, 3, ]
Compare with
⎕JSON⍠'Dialect' 'JSON'⍠'Compact'0⍳3
[ 1, 2, 3 ]
Map file ⎕MAP
#
⎕MAP
is a function we’ll only mention and not demonstrate (see the docs). It basically allows you to use a file as an array instead of keeping the array in memory. Very useful.
Unicode convert ⎕UCS
#
This brings us to Unicode Convert, ⎕UCS
, which in its monadic form flips characters and their Unicode code points:
⎕UCS 954 945 955 951 956 941 961 945
καλημέρα
The dyadic form takes a left argument specifying an encoding scheme and converts to and from byte values rather than code points:
'UTF-8' ⎕UCS 206 179 206 181 206 185 206 177 32 207 131 206 191 207 133
γεια σου
Verify and fix input ⎕VFI
#
⎕VFI
is Verify and Fix Input. It takes a string and returns two lists. It cuts the string into space separated fields. Then it attempts to convert each field to a number. If it succeeds then the corresponding element of the left result list is 1 (else 0) and the corresponding element of the right list is the number (else 0).
⎕VFI '123 four 42'
┌─────┬────────┐ │1 0 1│123 0 42│ └─────┴────────┘
You can also specify one or more valid field separators as left argument:
';/'⎕VFI '123 four,42 5/2/4'
┌─────┬─────┐ │0 1 1│0 2 4│ └─────┴─────┘
Here 123 four
were grouped because space is not a separator anymore, and so it is an invalid number. So too with 42 5
. Only 2
and 4
were valid. You can get just the valid numbers with:
//';/'⎕VFI '123 four,42 5/2/4'
┌───┐ │2 4│ └───┘
XML convert ⎕XML
#
⎕XML
is converts to and from XML, but the corresponding APL format is rather involved. We usually just use ⎕XML to verify that some XML is valid or to normalise whitespace:
⎕XML⍣2 ⊢ '<xml><document id="001">An introduction to XML</document></xml>'
<xml> <document id="001">An introduction to XML</document> </xml>
Case conversion ⎕C
#
⎕C
provides various handy case conversion operations for strings. The left argument, if given, currently has to be a single simple scalar integer, 1 or ¯1 or ¯3:
1 does upper-casing
¯1 does lower-casing
¯3 does case normalisation
For ASCII, and most European languages, there’s no difference between lowercasing and normalising case. However, some languages have multiple forms of a single letter. Normalising makes all those forms the same, so they can be compared easily. For example, Greek has two lowercase forms of Σ: σ and ς. Even Latin script (like in English and German) used to use a medial form of S: ſ. Note that it does not “de-diacriticize”: á and a are still seen as different. Nor does it do decomposition or other length-changing normalisation. The constants 2 and ¯2 and ¯4 are reserved for length-changing mapping (upper/lower) and folding (normalisation) in the future.
Here’s an example: given a character vector, uppercase the first character.
'hello, world!' → 'Hello, world!'
1⎕C@1⊢'hello, world!'
Hello, world!
Next up: a better (still not perfect) palindrome checker. Given a string without diacritics, but which may have spaces, determine if it is a palindrome. Examples:
'race car' → 1
'Σοφος' → 1
'hello' → 0
'Νιψον ανομηματα μη μοναν οψιν' → 1
((⊢≡⌽)¯3⎕C~∘' ')¨ 'race car' 'Σοφος' 'hello' 'Νιψον ανομηματα μη μοναν οψιν'
1 1 0 1
Here’s a trick too: monadic ⎕C
is the same as ¯3∘⎕C
.
Date-time conversions ⎕DT
#
⎕DT
provides a wealth of date-time conversions. It allows you to convert any numeric representation of a date-time into any other representation. You can use it to glue together two 3rd-party systems that otherwise can’t easily communicate.
20 ⎕DT 44053.674 ⍝ Dyalog to Unix time
1597162233
Dyalog’s basic representation of a moment is the number of days since 1899-12-31. The advantage of Dyalog’s system (which was actually the original one) is that you can then find the day-of-week with 7|⌊
:
7|⌊44053.674
2
0: Sunday, 1: Monday, etc.
Does anyone use some software that has its own date format? Answer: yes, you all do. APL does. It has the 7-element vector ⎕TS
for the current Time Stamp.
¯1 ⎕DT 44053.674 ⍝ to ⎕TS
┌──────────────────────┐ │2020 8 11 16 10 33 600│ └──────────────────────┘
The left argument tells ⎕DT
what you want to convert to. The numbers are largely arbitrary, but not entirely so. Positive codes indicate a scalar format (one number per date-time) and negative numbers indicate a vector format (multiple numbers per date-time). Also, the number divided by 10 and floored indicates the family. So we had 2(0) for UNIX and 4(0) for applications (Excel). The last element of ⎕TS
is the milliseconds. We can get more precision in the ⎕TS
-style result by using ¯2
for microseconds and ¯3
for nanoseconds:
¯2 ⎕DT 44053.674
¯3 ⎕DT 44053.674
┌─────────────────────────┐ │2020 8 11 16 10 33 600000│ └─────────────────────────┘
┌────────────────────────────┐ │2020 8 11 16 10 33 600000000│ └────────────────────────────┘
Notice also that vector formats are enclosed. This allows ⎕DT
to handle arrays of dates:
¯1 ⎕DT 44053+⍳3
┌─────────────────┬─────────────────┬─────────────────┐ │2020 8 12 0 0 0 0│2020 8 13 0 0 0 0│2020 8 14 0 0 0 0│ └─────────────────┴─────────────────┴─────────────────┘
There are many of these codes; we won’t cover them all here, but they are readily available in the documentation. What you do need to know is how to convert from one of these formats. Until now, we’ve just used the Dyalog day number. That’s the default for simple scalars in the right argument. The default for enclosed vectors is the ⎕TS
format (¯1
). If your input is anything else, you need to give ⎕DT
a two-element left argument. The first element is the input type, and the second is the output type.
For example, this converts an ISO year, week of year, day of week to ⎕TS
-style:
¯11 ¯1⎕DT⊂2020 40 3
┌─────────────────┐ │2020 9 30 0 0 0 0│ └─────────────────┘
Another example: given two ISO-style dates (as a 2 element vector of Y,M,D vectors), compute the inclusive number of days between them. E.g. (2020 6 25)(2020 08 10)
should give 47. (2020 08 10)(2020 6 25)
should also give 47. (2020 08 10)(2020 08 10)
should give 1.
diff ← {1+|-/1⎕DT⍵}
diff (2020 6 25)(2020 08 10)
diff (2020 08 10)(2020 6 25)
diff (2020 08 10)(2020 08 10)
47
47
1
Format Date-Time 1200⌶
#
Above we covered how to convert between different numerical date-time representations. What about converting a numeric date-time representation to text? For that we can use the Format Date-Time I-beam function, 1200⌶
.
When you want to convert a numeric date-time to text, the first step is always to convert it to a Dyalog day number. After that, you can use 1200⌶
to convert that to text. It takes a left argument which is a format pattern.
'YYYY DD MM hhmm'(1200⌶)1⎕DT⊂2020 08 11 11 32
┌───────────────┐ │2020 11 08 1132│ └───────────────┘
The system in the pattern for 1200⌶
is that numeric parts of the date are uppercase, while parts of the time are lowercase. You can use a single character for a variable-width pattern, or multi-character for a 0-padded pattern. If instead you want space-padding, use an underscore as the first character:
'YYYY-DD-MM@hh:mm' 'YYYY-D-M@h:m' 'YYYY-_D-_M@_h:_m'(1200⌶)¨1⎕DT⊂2020 8 11 1 3
┌──────────────────┬───────────────┬──────────────────┐ │┌────────────────┐│┌─────────────┐│┌────────────────┐│ ││2020-11-08@01:03│││2020-11-8@1:3│││2020-11- 8@ 1: 3││ │└────────────────┘│└─────────────┘│└────────────────┘│ └──────────────────┴───────────────┴──────────────────┘
t
is for 12-hour. h
is for 24-hour. Furthermore, the format also allows for casing and languages other than English:
'YYYY MMM D "at" h:mm'(1200⌶)1⎕DT⊂2020 8 1 4 30
'YYYY Mmm D' 'YYYY mmm D'(1200⌶)¨1⎕DT⊂2020 8 1
'__fr__YYYY Mmmm D'(1200⌶)1⎕DT⊂2020 8 1
┌──────────────────┐ │2020 AUG 1 at 4:30│ └──────────────────┘
┌────────────┬────────────┐ │┌──────────┐│┌──────────┐│ ││2020 Aug 1│││2020 aug 1││ │└──────────┘│└──────────┘│ └────────────┴────────────┘
┌───────────┐ │2020 Août 1│ └───────────┘
Like ⎕DT
, 1200⌶
has lots of options, including custom languages. Have a look at the documentation.