Transcript
Transcript prepared by Bob Therriault, Adám Brudzewsky, Igor Kim and Sanjay Cherian.
[ ] reference numbers refer to Show Notes
00:00:00 [Nick Psaris]
And this language just brought me to such a high level and allowed me to focus on the business problem and not worry about all the code I had to write, like I forgot the code like you could write at the speed of thought - like you could code at the speed of thought. It was just I think that's one of the phrases we were using back then.
00:00:15 [MUSIC INTRO]
00:00:29 [Conor Hoekstra]
Welcome to another episode of ArrayCast. I'm your host Conor and today with us we have our regular panelists which will go around and do brief introductions and then we've got a couple of announcements and then we'll get to introducing our guests. So, we'll go to Bob, then to Stephen, then to Adam, and then to Marshall.
00:00:43 [Bob Therriault]
I'm Bob Therriault and I am a J enthusiast and I remain working on that wiki in the mines down below everything else, but it's, it's coming along.
00:00:51 [Stephen Taylor]
I'm Stephen Taylor. I'm a q developer and an APL programmer and an enthusiast for both. Also for Bob, I do like Bob.
00:01:00 [Adám Brudzewsky]
I'm an APL Jack of all trades, I guess it's called, working full time, but both teaching and coding APL.
00:01:09 [Marshall Lochbaum]
Named?
00:01:10 [AB]
Named Adám Brudzewsky. Sorry, I forgot that Ok. It happens sometimes. I forgot to look at my mailbox this morning so...
00:01:20 [ML]
And I'm Marshall Lochbaum. I used to be a J programmer, then a Dyalog developer and now I am the BQN designer.
00:01:27 [CH]
And as mentioned before, I'm your host Conor - Polyglot, Programmer, Array language enthusiast and yeah, I guess we'll hop into our two announcements. We'll go to Adam 1st and then to Marshall.
00:01:37 [AB]
Sure, what's my announcement?
00:01:41 [CH]
You have a anti-podcast podcast that episode that's coming out.
00:01:44 [AB]
Oh, that's the announcement. Ok, yeah, so you might have heard about the "APL a Notation as a Tool of Thought" podcast and Richard Park and I have been recording another episode that's now available for you to listen. I haven't gotten around to making it a proper podcast along the proper avenues, Conor sorry. Have other things going on and also we're considering changing its name to something that's a little bit less of a mouthful. Or maybe taking the current title and making it into a subtitle so we can have everybody's opinions on that, considering calling it "Beautiful Squiggles" and then subtitle "APL a Notation as a Tool of Thought" What do you think? [01]
00:02:24 [ML]
That's a good one.
00:02:26 [BT]
I've never heard I've never heard APL referred to as a squiggle language, but Ok.
00:02:31 [ML]
Well, that happens all the time.
00:02:31 [AB]
Squiggle Cast.
00:02:34 [CH]
Isn't there a language called Squiggle?
00:02:36 [ML]
There is.
00:02:38 [CH]
I think I like the fact that APL's in it.
00:02:40 [NP]
How about Glyphcast?
00:02:42 [CH]
Glyphcast, that's nicer. Glyph is nicer than squiggle in my opinion. You gotta ask the people you gotta have a poll that's what you gotta do.
00:02:49 [AB]
Yeah, but they don't. We don't even have like an e-mail address they can contact us on or proper website with a proper URL.
00:02:54 [CH]
On Twitter, you can do polls on Twitter.
00:02:58 [AB]
Ok yeah Ok fine.
00:03:01 [CH]
You just choose choose four you like and then say if you have your own suggestions and then yeah, then go from there.
00:03:07 [BT]
You know with podcasts, it's generally not good practice of try and play hard to get and I think you're playing hard to get right now..
00:03:17 [AB]
But I don't want to register domain until I have the name and I can't decide the name until people can contact us at contact at whatever that domain is.
00:03:21 [AB]
So you see, it's a deadlock.
00:03:22 [CH]
I feel too, like if you do Beautiful Squiggles, it's like. See? Even the people that use the language like call them squiggles, they don't even know what they are. It's like no. Anyways, we'll wait. We'll link to the poll in the show notes and Adam will set it up for a couple of weeks so people can go and opine what they think it should be called. Over to you, Marshall.
00:03:44 [ML]
Right, so my announcement is not really my announcement, but we are at the time of recording we are six days into the advent of code which is - I don't know if you'd call it a competition, but a site where programming problems are posted one a day, like an Advent calendar, for the 25 days of Christmas. And many, many people are solving this in various array languages, so I've been tracking the people who publish BQN solutions and 37 people have done at least one day in BQN and published to their solution to that which is really cool to me. I will probably have a list of BQN repositories out by the time this podcast is out, but there's also lists on the APL wiki, in APL. I think the page is just advent of code as well as the k wiki, for k specifically, and there are various leaderboards that people can talk about. And yeah, lots of activity going on. [02]
00:04:45 [CH]
Go ahead, Stephen.
00:04:46 [ST]
I got an announcement too the annual advent of code competition brings out a lot of q veterans out of their winter hidy holes publicly publishing some solutions and in our solutions even to quite simple problems are sometimes quite startling and not things I would have thought of. So unfortunately, a lot of these solutions are shared on private forums, so I've made it my mission this year as one of the qbists, the curators of good q things on the Internet, to gather them up from the Iverson college Vector Dojo from the k4 topic box from the kX community and write them up and study them. And you can find this or online at https://github.com/qbists/studyq/tree/main/aoc/2022 . I'm struggling to keep up with the torrent of solutions that are coming in and they can foresee that this may be still with us when the last of my mince pies has long disappeared, but I do plan to get through all 25 problems and write up and study the various solutions a lot to learn from being included. [03]
00:06:02 [CH]
Yeah, maybe we should do a full advent of code episode. So, we'll throw links in the show notes to everything Marshall mentioned, Stephen mentioned a couple of other repos. I have one. Adam's got his hand up.
00:06:18 [AB]
Yeah, I just wanted to mention for those that are doing advent of code in APL, it seems to be in all these languages. It's actually fairly easy to solve many of the problems, and a big part of the problem is parsing the input, and so I created a video a while back about using Dyalog APL's quad CSV, which is really nice for parsing a lot of that in when you get there. You can put a link to that as well if you're working on it that might help you. [04]
00:06:45 [CH]
Yeah, so maybe we'll create a whole section. I'm not sure if we have sections in our show notes, but yeah, check out in the show notes the links to a bunch of advent of code resources other people that are doing them in a plethora of array languages. And then maybe sometime in January or at the end of December, we'll do an episode where we choose a couple of our favorite problems and talk about, you know, the benefits of solving them in different language is. The problem today, I won't say what it was, day six was, uh, it was beautiful, beautiful and I thought I did it nicely and then I went and looked at Jay Foad's solution, the former CTO of Dyalog and like his was like I don't know two or three characters shorter, which is like doesn't sound like much, but the solution was already only like 10 characters and I was it was a use of dyadic iota with making an observation of the problem and I like stared at it for a bit and I was like how does. And then I was like ohh wow, that's so beautiful. Anyways, links in the description. We'll do a whole discussion about this later. And with that we will transition to introducing our guest today who is a panelist emeritus. He was on. I think it was episodes 2-3 and four or three, four and five. Three episodes in a row as a recurring panelist, Nick Psaris, who has been programming in q since 2006, I believe, and I think is most well known, at least sort of, in the community at large for having two different books on Q tips and Q for Fun, I think both called...
00:08:13 [NP]
Fun Q (reacts in video on live recording).
00:08:17 [CH]
Oh, that's not that's not going to translate too well to audio, but you can take a guess. Anyway, so yes, Nick was on as a panelist for three episodes where we talked about a various number of things, so we'll link those in the show notes if you want to go back and listen to those 3 old, very old and right when we were starting the podcast episodes. But today we're bringing Nick on I believe because we just had John Earnest on where we talked a ton about k, and we're going to have him again on in the future. I think in one of our our next episodes, but we thought it'd be a good time to bring on a q expert, which Nick is of course and talk about a little bit more about the q language and sort of teaching Q, which he has a lot of experience with, [05] and from that I'm not actually sure if we've ever given Nick the chance to do sort of an in length, you know, history of take us back to when you discovered computing, but if you want, maybe give us one of those and then we can hop into talking about q and you know the advantages of q maybe versus some of the other languages.
00:09:17 [NP]
Sure, thanks thanks Conor, I guess. Yeah we want to go back when I first got experience with computer programming. That would only be pretty much when I got out of out of college, but my first experience with computers I had an Atari 800 [06] as a kid. My father would you know bring back the the the cassette recorder and play the the computer program into the machine. Then we had the 5 1/2 floppies and things like that. So, I had experience with computers and understanding how they worked. But all my friends would become, you know, they took computer science in high school and I never did. I was more interested in the sciences physics, more specifically and all through college I kept studying more physics, then I studied Chinese and I realized I wanted to get a job and and neither of those were going to were going to help me with that, so I started studying economics. And when I got my first job at Morgan Stanley, [07] they hired me to do the, you know, support for I guess mainframe job scheduling and things like that and on the side I had to learn Perl to kind of automate my daily routine and so that's when I I found out that I actually loved computer programming. Who's to say who I'd be or where I'd be had I been a computer science major to start with, but from that point on I just started learning as much as I could, trying to catch up with what everybody else knew.
00:10:49 [NP]
I remember one time trying to understand in Java what was the difference between a hash map and
00:10:57 [CH]
A tree map?
00:10:58 [NP]
No, no, there's there's, they call it. And it's there's there's one that has the lock on it, and one that doesn't have the lock on it. The very first original ones all have locks and then they introduced.
00:11:07 [CH]
Oh yeah, yeah.
00:11:09 [NP]
Like a anyway, so they they just gave them.
00:11:12 [CH]
Concurrent hash map maybe, can't remember.
00:11:14 [NP]
Like a hash hash dictionary or hash map, they have two different ways of referring to them, and it turns out under the hood they're they're both hashed, except that one has a lock on every operation. The other one forces you to put the lock on and it took me forever to understand that they really were still hash maps. But nonetheless, you know all those details of computer programming I picked up on myself. Started with Perl then I went to on a trading desk either in VBA which killed me. Uh, then I went into. Actually I learned Java and I built a whole bunch of GUI using Swing and then I moved on to C++ on the trading side because all this time I was in finance, I was actually on an automated option market making desk and after all the guys were made made and build I kind of wanted to move myself more into the business logic of things and so I started learning C++ and the STL was like to me it was beautiful like so many times in my study of science. I was like always asking questions like in biology like how does this work and why does it work that way and they kept telling me, don't worry, you'll learn that when you get to chemistry. And then I thought, Ok chemistry. And so I learned chemistry and I kept asking questions like how does this work and they they said, don't wait, you'll figure that out and understand that when you get to physics and I said great, so I'm going to take physics and all the same questions. Well, how does this work? How does this work? And then you know, halfway through my college career, at some point they said you just have to believe. And I was very, very disturbed at that point and I kind of lost my religion when I, you know, I thought physics was the one true answer and I kind of lost it. The same thing happened to me with the the STL. I really thought it was so beautiful and so elegant, but then when I found like for example the bit vector was not a vector of bits and you couldn't get a reference to one of the bits that was in there because it was just specialized like oh, that's not fair. You know you have all these operations, but in this one particular case it doesn't work, or at least at the time probably it's fixed since then maybe Conor you know
00:13:10 [CH]
Nope.
00:13:36 [NP]
If you use bind second on a non convex function like you couldn't do it, it just wouldn't compile like all functions where you called bind on had to be constant at the time in particular cases and I thought there's nothing particularly that needs to be constant about binding a function to it. And so that I kind of, after all that elegance and I thought this was just really well implemented. You know, the things I tried, I tried to use it everywhere and all the algorithms and every time I tried I'd run into another problem. So I kind of said yes it's nice, but let me keep going. And at that time was around 2006. I had finished masters in computational finance degree and I moved off to Hong Kong. It was still within the firm I was at and my job had to change because they didn't have automated market making in Hong Kong at the time. And so I joined a team that was converting stat arm trading algorithms from Java into kDB?And I was kind of like this is crazy like, you know, I had an interview and you're like, well, what's your favorite language to to manage data with? And I was like, well of course you know it's gotta be Perl. You know it's got hashes and it's got lists and you can have hashes of hashes of lists or lists of hashes. And you know the depth goes on and I really thought that was just truly fantastic, along with the regular expressions that it had as well. And the person's like, well, no. I think you'll figure it out. But kDB [08] is is actually, you know, far superior that when dealing with data. And so I was like, yeah, well, we'll we'll see, I couldn't, I didn't believe it,
00:15:04 [NP]
And so I was just about to go on vacation at that time and so I printed out the abridged. So at the time on the kX website there were two documents I think with Don Orth I had written one and Arthur had written the other and they were called the like the kDB abridged introduction to kDB and the abridged introduction to Q, [09] or even there was like a simple introduction to q and kDB as well. So there were two pairs of documents and the abridged version. I thought, well, why would you, I mean that if it's a bridge, but it turns out the way Arthur writes, that's abridged is like it was just the true and utter essence of the language. nd it was. It actually had to me in my mind, more information in it than the wordy longer version that had lots of English in it. So I had printed those out and I went on my one or two week holiday and all I had with me was these printed out papers and I couldn't even test it on a on a computer and I would pour through the documents trying to understand what is this vector language and as I went through it, by the time I got back to the computer I was like I was raring to go, I was ready to test everything out and every time I would try something like. It wasn't documented like I try, you know, a different type of left hand upper hand to one of the the functions to to the operators and it would do something completely different than I had expected and it truly blew me away how like there was so much what I felt genius in the language that was just only in the authors mind, but at the time I did not know there was a legacy of APL and J in there. You know that obviously led to the the overloads that were in the functions, but in my mind I was like, well, why would you think of this, you know and how did you imagine to to like make the the function do do that? It was just so much of like a treasure hunt or like what else can the programming language do, and so that really grabbed me in the process. And we were like I said, migrating Java to kDB for all the back tests and and the trading algorithms and the amount of speedup that we got, you know, overnight processes that took four hours would take, you know, like 10 minutes every morning. Just the the true advances in just you know what you can do and test new ideas and new new ways of new signals and things like that, where quite liberating.
00:17:25 [NP]
So I was out in Hong Kong by myself doing all this and I didn't have anybody to talk to. I had the manual and just playing with the software and lucky for me I had like a direct Line into Arthur to ask questions or suggest improvements. So from 2006 to 2012 or so it was, you know, three years of quant trading using kDB and then another three years of building an option market making system in kDB. And so those were actually two different dimensions of the language. On the one hand it was just massive data sets, analyzing and grouping or doing huge massive as of joins and the other side was high frequency trading which pushed you know I mean, if you're going to really be doing high frequency trading, of course it should be in C++, but if you don't mind to be slightly slower, you can iterate very fast when you have a trading engine that's up and running, and you have a new idea or even you know simulator and you can just inject a new definition of your function into the process and see how it behaves. Obviously you should be doing back testing along with that, but trying to use kDB in a or you know Q, not necessarily as a database but as a programming language in a very high frequency world. You know, I built a profiler to time all the functions and I consistently found certain functions just were too slow for my needs, one of them being like left join or asof join or and so I give the feedback to Arthur and you know he's like, yeah, you're right, we should rewrite the whole function like this and then he would reimplement it. So there's always that back and forth, which was just truly fantastic.
00:19:09 [NP]
After that stint ended, I had all these ideas about the language, having learned it from first hand and not being taught by anybody, and and not really even seeing it being used as a database like I had been using it as a language to build trading applications. And I thought my perspective was different than anyone else had ever spoken with, and I had very strong opinions on syntax or you know how to write it efficiently and things like that. So as I left, I wanted to get, for two reasons, I wanted to write a book. The first one was I had all these cool libraries that I had written at another party at another firm and I was going to have to rewrite them again, wherever I wherever I went, and I figured, why not donate these ideas to the community at large, so that if I ever leave again, I can just take them because they've already been in some sense open source, so I spent my time from scratch, rewriting a lot of the core libraries, whether it's logging a timer library, the profiling, the code profiling, and things like that, and so I wrote all that code and then I built a book. I wrote a book to use all that and explain how those how those libraries worked, but I wanted to do it in a way that showed what I thought was best of dream coding and explain the things that I had probably uniquely I had an understanding of how it worked because I had already spoken to Arthur and asked him to change a lot of things. So I incorporated all of that with a with a with a bent towards computational finance, because that's kind of where my mind was and so I combined those two things and I created the the book Q-tips because I really wanted them, I really wanted it to be a list of tips for people when they're coming up and learning their new language. I really did believe Q-tips was a was a very clear and yes cheeky name to the book. I recall that someone said to me, you know, because they thought I was in Hong Kong I didn't know what a Q-tip was. They said you know that in America you we. We used Q-tips to clean our ears and I was like yes I realize that I realize that it is a play on words, but I'm keeping the title. Thank you. So that's that's kind of the background, I mean and since then, maybe we can go into, you know, other things I've just been using q and kDB at big banks, more on the data analytics side than actually harnessing and writing and supporting the database myself, although of course I know how that works and I and I help with those teams, but I'm really focused on how to transform businesses with the data and the analytics rather than build on the database side?
00:22:00 [CH]
Amazing story that you went through all of Perl, Java, C++, VBA might be missing one or two, but you went through like a plethora of languages, ended up in an interview because you were switching offices to a different country and then they say, what's your favorite language? You say Perl and then they say you're wrong. Well, I mean, maybe not wrong, but like for data, you know shuffling, and you'll learn that q and kDB is the best way. And then you go away. You come back and you start playing around and you say, you know there's a ton of stuff that's not documented and you're running into a ton of overloading and your reaction, you say, is that your your like curiosity is just like super peaked and you're like what? What's going on here? There's like something that this genius about like all these ideas. Can you speak to like, because I'm sure there's like, in general, two groups of folks, the groups of the group of people that have your reaction and then the other group that just gets like confused and frustrated and just sort of like throws their hands up and is like I'm expecting X and I'm getting YZ and you know A,B and C. And I'm sure the story has changed from I guess what was that 15 years ago roughly like what is it that led you to being more curious than more frustrated?
00:23:30 [NP]
I guess well, so there are a few examples like there's this, an operator, an operator of VS [10] right which takes a single value and turns it into a VEC. Here you apply that to a number and depending on what the left hand operator is, you can actually like if you take a floating point. Sorry if you take an integer number and you you do a 0B a Boolean it will give you the binary representation of that integer and if you put 0X0 it will give you the hexadecimal representation of that number. I've never seen, like with like two or three characters it let me look so deep down into the machinery of a number, like I don't know any other language which would let me take an integer and easily just I want to see what is the binary representation of this number and it just it allowed me to embrace the actual compute language under the hood and how it's represented it brought me closer to how things were implemented, even though the abstractions were so high. It's like this I had been learning programming like I said all by myself and for a language to expose to me all of those I felt was was truly amazing. And on the other side of that is that same function S exactly the opposite. The SV[11] where you take a vector and return a single value if you give it a list of symbols and you call SV on it, it will concatenate them together and join them. But if you give a file handle and a list of symbols and you join them together. It combines them with the path operator, the path, the back slash on Unix or forward slash on Windows. And I was like why is it doing this? Like why does the way it joins these symbols depend on the actual type of the the strings I'm passing in, but ultimately, it's what is most useful. That's this language was like what is most useful to do in this particular case, and when you're joining a file name with an extension dot TXT, you're going to be using dots. But when you're joining a path with another file name or a directory name, you're going to be using back slash on it. And so most of the things in the language, when you ask why is it doing that, the answer is typically well what's the most useful thing that it should be doing? That motto has stayed with Arthur for the longest time.
00:26:02 [NP]
And I know that k6, I believe it was k6 [12] what Earnest was talking about, a lot of the way things were going to work was because he was implementing it and testing it on Advent of code, going back to that topic. Every week, someone would post a solution and be like, what can I do to the language itself to make that even simpler? And then he would like implement a new operator to like throw away half of the code so that it would just be basically in the function. And so, what is most useful in any particular case has really been driving that language, and to me, as building a trading platform or a back testing platform, that the language was very, very useful. It had all the things I needed and then like I said, I mentioned when I needed something that it didn't have, it was like, you know, a day or two later it would be there. And so I just love that that close relationship with the the machine, the the language, and even the author and that's what really got me going.
00:27:02 [CH]
So also to clarify when you say you're going back and forth with Arthur. Is he working at kX [13] at this point in time and then and then implementing.
00:27:07 [NP]
Yeah kX was his company. I mean, it was his company with like three other people or something right?
00:27:13 [CH]
And so at that point it's not like. Has q been created at in 2006 and so he's adding things not like primitives necessarily, but like extensions to the language because as far as I know, like breaking changes haven't been made to to Q.
00:27:32 [NP]
Right, so so here's an example. You can start a q process with A-P and give it a port number and that's the port the listening socket it will open up a server socket, but I wanted to create a naming service have the port start up as a random port, and then I would register that process with a naming service and say this is my port. I didn't want an accidental conflict where two people all the time. Everyone keep using the 5001 port. I wanted to be able to start up on a random port. So I was like, well, can you just if I give you the infinite like 0 W if I give you an infinite value can you treat that as ohh you figure out the port in a bind to it because in C if you bind to port zero I believe zero, we'll pick one of the the next available port and bind to it and then let you know what it is. And so in those cases it's functionality. Yes, it's not necessarily the language itself, but like it, it is A/P you can give it a null value or an infinite value, it will bind to the next available socket, so certainly use tings that were useful for building a trading system. There were other cases where I would ask for faster versions of left join like I mentioned. The very first version of left join and this is hard to describe, but if you have a table that has a column A and it has null values on it. Sorry it doesn't have null values. Let's say they're all filled in all the left hand table has all values and then on the right hand table you join in another column A but it has null values. And the original implementation of left join was when it joined it on. If the original table had a value, but the new table had a null value, it would fill in the null value with the value that was already there that made things very slow, and in fact it's not what I wanted. So when you get market data coming in in real time, and let's say the price disappears and now it's a null value. I want to know the fact that it's null like, I love nulls. kDB makes handling nulls so efficient because it has all these operators putting a null value there is actually more important for me. I need to know that the price of the stock, you know it's it's not trading anymore or you know it's not quoting. Null values are more important to me than filling in with whatever the previous value was.
00:29:56 [NP]
And so when he was between one version and the next, it was like three or three four. He reimplemented left join to make it a bunch of times faster, but also the null values would no longer be filled in, it would just be overwritten. Well, he agreed that that was the logical thing to do, but obviously you say no background breaking changes. Well that that broke things, and so if you look in the language, you'll see that there's an LJF, a UJF an IJF and an EJF. The F version is the field forward version which he had to go back. The team had to go back and give people that other version which is slower. I don't know any why I've never needed it. I don't know anyone wants it, but they had to put it back in just in case someone wanted to have backwards compatibility. So yeah, so there's platform changes and sometimes language design changes as well.
00:30:53 [CH]
Interesting also too, I've realized, I'm not actually sure if we've said Arthur's full name in this episode. And I'm assuming 99% of the listeners have been listening to past episodes, but just in case... Arthur refers to Arthur Whitney, who is one of the proteges of Ken Iverson, who created APL and has created the k family of languages that k4 went on to become q. Which is what we're talking about, right? But we he's kind of. Like what would you? What would refer to him as? He's like a mythical... Like he's a real person,
00:31:12 [AB]
the legend
00:31:14 [CH]
Yes. The legend of Arthur Whitney because he doesn't give interviews and he is our number one most requested guest. And probably to never up here, but like we've got our fingers crossed that there might be some special circumstance that maybe. You know some event in the future? Anyway, yeah, we'll throw it to Stephen.
00:31:40 [ST]
I actually had a conversation with the CEO of Shakti on exactly this topic last year and I was... Let me see if I can remember this exactly: "Stephen, you do know that Arthur will never appear on the arraycast?"
00:32:01 [CH]
Well, we don't know that though. We don't know that.
00:32:02 [ST]
I said. Wait, wait, I said no, I don't know that.
00:32:08 [CH]
Exactly there is... We'll find the links or to it and throw them in the show notes. There's been, I think, two different interviews, but they were both in sort of conversations that were transcribed with Arthur Whitney. So he has done some form of an interview, but yeah, we're holding out hope we're holding out hope. Adam.
00:32:32 [AB]
Another dimension who Arthur Whitney is for the listeners. These three names you need to know Arthur, that's Arthur Whitney invented and keeps inventing "k".
Roger, that's Roger Hui, [15] who like more or less, invented J together with Ken, that's Ken Iverson, who well founded all of the stuff that we're doing. Get those names straight.
00:32:57 [CH]
Very, very important all right back to q. So you were curious because it sounds like these are the answers to my original question. Was that a lot of the quote unquote? What I referred to, you know, sort of might be frustration to you half the time wasn't even, maybe, frustration. It was more just like surprising, but immediately you could see the the beauty and the power. Like when you're talking about the binary representation and the hexadecimal like that very quickly. As soon as you see that. You're not going to be confused, you know what that is, and it's like WOW. Like in most languages, I'm going to have to hand roll this myself, and the languages that do have something built in. Like I know, Ruby has a .to_s() which if you give it a number, if you call that on a number and you pass to it a base, it'll do that conversion for you. But even that requires that you know integer dot 4 characters, parentheses, and then the base. Which is actually quite... That's the the shortest language that I know. Because I didn't know this existed in q that I can get a, you know insert base representation of a an integer, but you're saying that you basically stumble across this in q. It's a couple, you know, characters and you can see the power of that. And for the other things, maybe it's not as immediately as obvious, you know, what the motivation behind, you know, the design choices is. But very quickly you start to realize that, you know, the other things made sense. This probably makes sense too, and it's just going to be a little bit of playing around with it to realize that you know: it's because of the utility of it that it's...
00:34:29 [NP]
Right, like I can give you 2 more examples of where it just blew my mind. So when you take the "pipe" operator and "or" the ampersand right? Well, what most people would think is "and" or "or". Turns out that when you apply those to binary values, they are actually "max" and "min" [16] and in q. And maybe I don't know APL, but if you use the "or" operator on 2 vectors and they don't have just binary values. They have 1,2,10,100. You actually get the "max" of the two vectors. And that extension of what I knew from Perl and C like if I use "pipe pipe", right like that was "or" that that concept of, well, if you just think a little bit beyond boolean values and you have integer values. That pipe value the pipe of "or". If you say "2 | 4", it actually means 4. It's the maximum of the two, and the ampersand would be the minimum of the two. And so like that just was like wow. Like that's why do you need another operator for "and" and "or" or max and min when you can use the same one? Because it conceptually has the same meaning. Another version of this which is, I find it rare to use, but truly, I thought, brilliant and again maybe it. Its origins are from APL is the "where" operator. [17] When you have a vector of booleans and you say "where" on the vector of booleans, it gives you the list of integers. Of where they are. Great. But it turns out I was like I ran "where" on a list of values that were not booleans but 1, 2 and 5 and it like gave me, you know, if it's one, it gave me one zero and then if the next value was two, it gives me 2 ones and so on. And I was like. Yeah, that at its limit, that is the "where" for a boolean. But it's so much more generic than that more general and that, you know. Taking all the knowledge that I had learned by myself and it just exploded my mind every time I saw how an operator when used in one way had one meaning. But if you change the type or change the values, it was internally consistent but give you something brand new out of it and that was, it was just kept finding more and more of these things.
00:36:47 [CH]
Is that how "where" works in all the other array languages? Because, I know, I think, this actually came up once Marshall when we were talking about one of the most amazing inverse I've discovered and then you were like "I did that!" which was the "where inverse". And then you're like "Yeah I did that!" and...
00:37:08 [ML]
But that only... That makes the most sense for booleans or for a boolean result. Well, no, that's not really true never mind.
00:37:14 [CH]
Yeah you could, you could technically on the... but so anyway. So the original question is, does...
00:37:17 [ML]
No, it's it's important to non booleans too. I was just... that was not right.
00:37:22 [NP]
So yeah, so the. To to your point, to to your question is. Why did I just attach myself to the language and it was because I wanted to answer business questions. I didn't want to build plumbing, you know, get this fixed message from here to there. I wanted to answer questions and and build alpha or signals and trade. And this language just brought me to such a high level and allowed me to focus on the business problem. And not worry about all the code I had to write like I forgot the quotes like "you could write at the speed of thought" like "you could code at the speed of thought". It was just I think that's one of the phrases we were using back then. And that was, it was so true. Like I had an idea in a couple of lines of code I could answer that question so many times. You know someone next to me like, well, I wonder what would happen or I wonder what the value is? Well, what is the most likely thing? Like, well, stop asking these questions. Get the data and in 10 minutes you'll have the answer. Stop asking questions. Once you have the data and you have q your answers come fast and furious. So that's that you know it made me more efficient, more productive, and that's how I valued my time. I then spent a year or two going back to C++ again and it was the compiler was killing me. I just... it was a.. You know it led me to different places over time, but it was not a fun period of my time again. Like, I was just I felt unproductive. And yes, I was building high performance systems, but I was missing the kDB and so luckily the career moved back towards kDB again. But it's all about me and my productivity and my time. I really enjoy it. It really makes me feel like I'm empowered.
00:39:11 [CH]
Yeah, that should be like our motto or slogan "coding at the speed of thought". Because it's like not we're going to irritate some of the listeners by bringing this up again or I will, but the advent of code day six this morning. [18] Literally like once I understood the problem, I was like this is like three or four things that are, like you know, not bread and butter, but like very idiomatic. If you've been doing enough array language stuff. And it's just like you do the first one couple characters. Second one four characters. Boom, boom and like 10 or 12 characters. And like literally in like a couple of minutes you've solved the problem and to try and do that in C++. Like, I don't even have the algorithms in the standard header like I will probably buy C++23 once they get all of the quote unquote, sort of like range adapters and stuff, but like in order to do a sliding or a windows and then like. You know, unique and all this stuff, and even once I have those in order to spell them, requires so much more typing like I'm sure there are people, maybe not most of our listeners, but out there they would hear this. Can be like. It's, you know, how much, like, is it really that much of a tax that you have to spell some namespaces and that you know slicing is spelled with? You know 6 or 7 characters and then in order to pass a lambda and the answer is yes, like it doesn't make that an unusable language, but like in order to what you just said, code at the speed of thought, it completely gets in the way of that. And then the first time you do it. You're probably going to not do it correctly, and because you weren't, what was it? Who was it that said? The coding by successive approximation that was from one of our past episodes. I think that was like, either Stevan Apter or Joel Kaplan, one of them. iIt might have been Stevan Apter. And it's this idea that because you're in a REPL like and I know the four things I want to do, you do the first one and then you immediately see if that's what you were expecting. Whereas in C++ you got to write the whole thing and then like figure out you know. Like you said, you're fighting with the compiler to. Is this code actually working whereas? Yeah, in these languages you sort of figure out the way "coding at the speed of thought". I really like that.
00:41:13 [NP]
That ability for me at least, was enhanced when I was up at Morgan Stanley. I had written an Emacs, [19] most. I'm an Emacs user for my life basically, and I was like, alright well, here's a new language. I need a mode for it. So I had written it, but then left Morgan Stanley and I had lost it. So I had to rewrite it again from scratch and so again I wasn't going to follow... fall into the trap again this time. So I wrote it at home and then I put it on the... many years later, I put it on the Emacs package management helper melpa, and so that ability to write code in buffer one and with two keystrokes injected into the session in buffer 2. That, you know, successive approximation I'm not constantly hitting the up arrow to retype up arrow to retype. I've got like key bindings where I type a little bit of code and I evaluate it and I value it and evaluate it. For me it's really, really fast to go through testing that way. I don't know what other people do with q and whether it's Jupyter or notebooks or what, you know, they have other nano, VS code. They're coming out with one. But for me that ability to code in one buffer and evaluate the code immediately it's got two things. One is that it's fast, but the other thing is I don't lose my work right? The code is in the file that I've saved. All of my history, every time I tinker around with anything I write it in a file and I evaluate it by sending it to the q process or even a remote q process which is even more powerful. But all my work is always in a buffer somewhere. As opposed to like if you're in on the command line and you hit up arrow. I mean I know from what I from I understood is when Arthur writes his code, he like he writes these really long lines horizontally and it's because he double clicks and pastes double clicks and paste to see, you know to test. And for me I can get away with not doing that. You know, really long lines because, you know, Emacs, will, you know, find the one line or the function definition. It will go back to the beginning and to the end and then inject that. So, that for me has been, you know, transformative. I can't imagine how well you know... I couldn't have written the books without that ability.
00:43:38 [BT]
Well, one of the things that strikes me is because we had a podcast [20] where we tried to define our Iversonian languages and it was quite the episode because Marshall revealed his inner bouncer and was quiet strong about moving Iversonian languages in and out of the Club as Conor described it. He moved q out of the Club or maybe at least moved it to the back, about by the door.
00:44:08 [ML]
Yeah, we well we should be clear that we're splitting things into two categories array languages, which we all agree q is an array language. And then there's the the inner Iversonian languages so.
00:44:17 [BT]
Right, we weren't taking q out of array languages, no?
00:44:19 [ML]
That I'm willing to say q is definitely on the on the boundary of that kind of group of languages is.
00:44:26 [BT]
And this is where my comment slash question comes in. You are, I think, an expert in q. I can't think of any other way to describe you as an expert.
00:44:36 [CH]
q God isn't what they call them.
00:44:39 [BT]
I would say q God. Sure yeah, I go with that.
00:44:41 [CH]
Stephen, his hand in his face, his face. In his hand. Is that like? You're not? We're not supposed to say. Promote that word alright. Cut it cut from the record.
00:44:48 [ST]
Let me do my let me do my little rant about that. Nick was saying earlier how a lot of the people who first started using k, the ancestral language of q were really, the the language in which q is implemented. I had come from an APL background. [21] And you could give them a quick summary of what's in q and it's like oh yeah, I get it and I can see how to use it because "I'm you know, I'm I'm part of the part of the legacy". And then a lot of other people came in and were confronted with this almost minimal documentation. And it's metro level workshops in q which tell people how to get started and get productive quite quickly. And they're very good at that. Oh, don't you worry about this, just do this. And and you get, you get busy in q. And what's missing in between is a proper and formal grasp of the language where you actually understand the syntax and how to parse it, and all the details and the kind of things you need to go to q-tips perhaps to to read and understand. And I think as a consequence a lot of the people coming in just looked at what stuff that the code that was being written by Arthur by Nick by Stevan Apter. And it's just like I have no... I have very little idea how that works, and even when I do manage to understand it, I can't imagine ever being able to write that stuff myself. I've got no access to the thought processes which generate those solutions, and I would submit Conor. It's the thought processes that generate that solutions that particularly fascinate you. One of the consequences was people just said well, these are the q gods. It's the q gods, they write stuff in that is divine, and we even when we can understand it, we've got no idea how they thought of it and the phrase honours the skill, but it also locates it out of reach. And I'm on a Promethean mission to break this down. I absolutely am. I'm gonna do that because q, like any other language, yields to study and practice. It's not reserved for a race of semi divine beings. And that's why whenever I hear the phrase the q gods - head in the hands.
00:47:26 [CH]
All right, we won't use that phrase, it's a phrase. What do you call it? The term of art from the past? But definitely I know that feeling because I remember one I can't remember the problem, but I posted it and Marshall did something that was like of you know, 25% of the characters and it made use of inverse on some primitive that because inverse doesn't really exist. It exists in APL, but not as a single character, and BQN it's, I think a lot more uses for it. I saw it, I stared at it and I was like. What is going on here? And then like 20 minutes later after like step by step trying to figure it out I was like ohh my goodness like I would never have thought of this in 100 years like I would never come up with like that kind of solution. So I've been there anyways, back to Bob. Because Bob was in the middle of asking Nick as a q expert not a q God, because there's no such things, only only historically speaking. Go ahead Bob.
00:48:19 [BT]
Well, thanks. Actually, thanks so much Marshall and Stephen for the clarification on that because I think it frames my question better. Is q far enough removed from APL or k you know. k being you know Arthur Whitney language. But q sounds like it's an outgrowth of k. Is that far enough removed from Ken Iverson to not call an Iversonian language? Is it actually something different again?
00:48:48 [NP]
When we had our first conversation and you told me that arrays in APL and J, and probably, you know, BQN were all, you know, one long vector in some sense with like you know, offsets for the subarrays that was like oh. I mean, I know that in you know FORTRAN or whatever, like underneath the hood like that's how things are done. And yeah, that's how they're stored in many C libraries and things like that. But coming from from q I thought that the list of lists was the right way to do it in some sense, because that's how. Like in my mind it was a list. [22] I didn't see the benefit of saving everything in one long list. You know, like when you do memory management if you need a new list of the same size. You don't need to go reallocate the whole vector, you can just go find a new bit of memory, and I suppose the offset of you know exactly where to find the next element as opposed to doing a pointer lookup makes it a little bit faster, but I don't know if we want to indicate that. That in itself, that migration half of 1 long list could push it out of the Iversonian Language.
00:50:00 [ML]
Yeah, so you're ready to kick k out, I wasn't going to do that.
00:50:07 [CH]
You're going the wrong way, Nick this isn't what we brought you on for.
00:50:11 [NP]
But listen, listen. There is so much more to this language and I wanna make sure that the fact that you have SQL [23] in the language. Teaches people more how to code in a vector format than just having J by itself. So for example when I teach a course at Carnegie Mellon market microstructure and algorithmic trading, and the foundation of it is a bunch of kDB databases. So we have a New York Stock exchange taq database, we have a crypto database that I've set up. SMP mini futures database. And we teach the the students to analyze the data that's on the database and in q. And when all they're doing is writing select statements, there's no room for loop. Like here's my select statement and I need to write. I need to write a new function that analyzes the data. Well, that function better take a column as its parameter and things like that, and so people start thinking in SQL. You start thinking about "group by"s. You're thinking about you're thinking of these high level algorithms, and so when people are when quants, for example, are learning to code in q. Typically it's with select statements, often not the underlying raw function. And that approach to programming is not new to anyone who's seen a database. It's a vectorial way of doing things. And then when you tell them, oh well, you know. Not only do you have a database, but you can write your own functions in the database and they're like, Oh my God, this is so powerful and so they write a function that can be used along with their select statements, their updates, they're you know their group by and things like that, so I think. You know, if if I were to, you know, come up with a new paradigm of teaching of q at least non APLs. Like you start people off with the select statements to force them in their mind of doing things in a vector approach. And then you say, ok, well here are some functions you could write in. And by the way, everything is a column or everything is you know your "where clauses" are vector of booleans that you're just filtering on and things like that. That it's not hard for anyone to understand it when you approach it that way. Which makes the code when you read it actually a lot cleaner. The other thing about the language is that yes, it's got SQL in it. I mean it's own dialect qSQL but. In a normal database, when you write SQL, you can't run a bit of code and save it off as a temporary variable and then apply another translation to that. It's one massive SQL statement that gets really ugly and you can do your best to factor it. But ultimately it all gets kind of parsed by the server and all by itself. q allows you to write functions where you have a single select statement, a single update, and you can read it. And you say, Ok, here's the transition from line 1 to line 2, I get it, line 2 to line 3. And so where people are writing these vectorial great SQL statements and maybe there are SQL Gods you know. Like you know those people like I don't know how you can imagine how did you do that. When you turn that into kDB and into q, the code gets, if you do it right, you're allowed to at least make it a lot cleaner by doing transitions and the one here's my select. Here's my join and and so on. So I really think that it it bridges the gap between how the one domain people do think vectorially and programming language which is implemented to write vector functions coding extremely fast so that combination, yes it maybe, maybe not, kick it out of Iversonian, but the whole adding an SQL to it is of, you know Whitney and Whitnean languages where we can call some other some other languages that include SQL.
00:54:01 [ML]
Can J still be a Whitnean in language? That's what we're really wondering about that.
00:54:05 [NP]
That's right, yeah, jdb you don't. It doesn't have a select statements in it, does it?
00:54:10 [ML]
It does have jd [24] that has, but that's not. I don't think it's integrated as closely. I don't know quite how SQL and q works, but jd is more like a library you can call.
00:54:22 [AB]
Same thing goes for APL with DDB Dyalog database.
00:54:27 [CH]
So I want to stay on the: Whether we're kicking these languages in or out. But before before we do that. I want to follow up on so you said that basically sequels SQL is such a great match to be bundled and paired with an array language because. It is a style of programming that is not welcoming to loops whatsoever. Is that? That so you wouldn't go as far it would.
00:54:57 [NP]
Yes. It forces you.
00:54:58 [CH]
You wouldn't go as far to say that SQL is an array language, but you would say that it's a perfect fit because you're never doing these, or you're maybe you would.
00:55:09 [NP]
Is it not?
00:55:11 [CH]
Well, this is my lack of knowledge about SQL cause I've. you know, full disclosure, probably only written maybe five production lines of SQL code like I've done a few leatcode things, but like...
00:55:24 [NP]
Ultimately, it boils down to a filter, right? You got you got a "where" clause and you've got a "select". You know if picking a few columns you've got a "group by". And a few "joins", right? I mean. Beyond that, and they're all...
00:55:42 [CH]
I guess now that I think about it, everything is stored in a table.
00:55:46 [NP]
Yeah, although some some tables are row based, but ultimately you're you're operating on columns so.
00:55:50 [CH]
Well, it's are we having like a huge Eureka moment. We gotta go check the Wikipedia page is SQL the world's most popular array language other than other than well Excel, sells not, but I can see Adams raising his eyebrows. But like I'm. Nick's got a good point like it's...
00:56:07 [ML]
What I would say is that database tables are very similar to, but they're different from arrays. So like one thing that joins on database tables, you have nothing like that on arrays. So that's something that SQL does and APL doesn't. And on the other hand, APL, like an outer product or I mean in q or k, you'd use "each left" and "each right" for this sort of thing. That's something that you don't really get in. Well, I guess that's actually an "outer join", [25] so that's not a good example, but...
00:56:37 [CH]
I was gonna say isn't there like isn't there a join that maps to outer product? Yeah, there's and that's The thing is previously what ...
00:56:43 [ML]
Although the other the thing about the outer join is that it doesn't, it doesn't have the same structure, so it doesn't give you the multidimensional array out. It just it flattens everything so that that's another difference between you know how the database paradigm versus your array paradigm works. So I would say they're very closely related. They can use a lot of common functionality. But I wouldn't call SQL in Array language is my opinion.
00:57:10 [CH]
This is so ... Because now I'm thinking in my head is that like sequel has like a huge brand because like when I think of sequel like my thought is like because it's. It's like, you know, capital letters for all their functions. It's just like it's you want to go to access and do the drag and drop thing where it just auto generates you the. You know the the statement because like I've never really seen a sequel statement and gone. Oh my goodness has the most beautiful line of code. It's usually like a a thing of necessity that you're specifically, you know, trying to filter, join, do all this stuff to get a specific set of information and SQL is the tool that you reach for that but.
00:57:46 [NP]
So the reason one of the reasons why I think SQL in q at least is useful is I mean beyond just the fact that people can read what's actually happening and adds a little you know, verbal fluff to to the code, but it directs you to what's actually happening. But above that, beyond the... When you're passing massive data sets back and forth, like in APL, if you have a I don't know, you know when your data structure gets beyond a certain size, whether it's a two lists, so that's that's easy to deal with. You can index with zero and one or something that effect, but then you need to move to a dictionary, [26] right? You have named variables that get, you know, here's the key to the dictionary and the value. Like that dictionary gets passed to other functions. At some point you need to refer to things by name in your code, and if every time you get pass the data structure you have to index into it and save it into a local variable, you know like a quantity, save it into q qty you know in size, save it into size and then you have vectors on your hand and you're going to run all your vector operators on that data. Why do you need to keep pulling it out of the data structure? And putting it into a local variable just to apply these functions. Why can't you use the functions in the scope of the data structure?And so what qSQL allows you to do is not only on a table, but also on a dictionary. You can say "select a * b from d, where d is a dictionary". And so, like you've now scoped, you've moved your code into the data structure instead of pulling those vectors out of the data structure. That scoping of variables into the domain of the table or the dictionary actually simplifies the code quite a bit. You don't have like I say, you don't have to pull the data out of the data structure, operate on it, and then put another copy of it back in.
00:59:41 [CH]
This is exactly similar to the reason I really like combinators, [27] is because combinators do the exact same thing, where a lot of the times the only reason you need to store something into a local is because you need to use it twice. But a combinator, you know makes that unnecessary, and so then you can just write a single chain of operations. And, materializing things in a local for the sake of being able to do something with it is no longer necessary. Which is is very useful. It sounds very similar to that.
01:00:07 [AB]
Well, I just looked it up and there's something called multidimensional arrays in SQL [28] that was added fairly recently. Now I'm curious.
01:00:16 [ML]
[several chuckle] Yeah man, now it is (an array language).
01:00:19 [AB]
Yeah, it says the newest standard, SQL 2019 adds (Part 15) multidimensional arrays: MD array type and operators. I don't know, sounds like ... something.
01:00:30 [ML]
Yeah, but we did discuss; you know it's about how it's used, not about what it supports. So I mean, having multidimensional arrays is a lot different from putting everything in terms of multidimensional arrays.
01:00:42 [CH]
That's true. Alright, I just looked at the time; we've blown .... Alright I was gonna say something.
01:00:46 [AB]
No, I was just going to whack BQN for having things that are not in multidimensional arrays, that's all [chuckles].
01:00:54 [ML]
Well, I mean, yeah, when they're not arrays at all; but your collection type is a multidimensional array.
01:01:00 [AB]
Everything is an array [chuckles].
01:01:02 [ML]
Not in BQN
01:01:04 [BT]
So to bring it back to sort of more pragmatic terms, when I talked to Nick before, one of the things that often comes up from people who are interested in array languages is: can I get a job in working with array languages? In a lot of cases, it's difficult I think. In some cases you need to work with a specific company. In other cases you need to find companies that are interested in this kind of area of quantification and I guess you can sort of carve out your area and maybe use the array languages that way, but I think with Q, it's an example of an array language where there actually are a fair number of jobs if you actually learn that language. Is that right Nick?
01:01:44 [NP]
Yeah, so I would say that there is a range of types of jobs, right? The fact that kDB is quote-unquote, "the world's fastest time series database" makes it a place where if you want a job in technology and you learn q and the way databases are stored on disk and things like that, you can get a job as a database administrator, [29] but that's a very ... [sentence left incomplete]. As if you were an Oracle DBA. You would have a job as a kDB database administrator, but because kDB is written in the q language, you actually then are learning. Like just because you've learned how to support an Oracle database doesn't give you a launchpad, a trampoline, into some other career. But once you've learned q to support kDB and how it works, you have the ability to transition your career because a lot of the people who are analyzing the data are using q ... the data as just the data and then they're analyzing it to do research in the language itself, and it works so well together because the storage is the same. So you can then move over to get a job on a quant desk. Even if you're very junior, but if you have the mind and you have a mathematical background and you want to learn something about finance or something, you can, if you're lucky, get that transition as long as you keep asking questions. You can transition from just the DBA into more of a quantitative role.
01:03:23 [NP]
My goal is to promote the language as much as I can and that's why I do teach it, and even at work, I train as many people as possible because I love programming in the language and I don't want to see it go away. So one of the gripes is that we can't find enough people who know the language. And you're saying: can you get a job in it? The answer is: yes, there's just not enough people, and so the more people I can train, the better it is for me, because the more of a footprint q gets around the industry and people don't just throw it out the window because we just can't find anybody. So I think the best way to promote it is to train as many people as possible and not just on the database administrative side, but also the end user side and analytics and things like that. So you can download (like you've mentioned before) the home version [30] as long as you're connected to the Internet; you can use it; you can play with it. I think, you know KX, one of the more recent things that they're doing is recognizing that Python is, you know, the standard in data science, and so how can we get kDB to play nicely with Python. [31] There's been many attempts; there's PyQ and BenPy and now they have, oh, kX has one also called PyKX. They're slightly different in all their implementations, but the the idea is ok: at some point, there's just too much data and you need to treat it as a database and you need to send the code to the data. Summarize it, distill it down to minutely buckets or something to that effect; then pull it into Python and do your data science on it. It's not going to be possible to do a lot of data science on every change in the national best bid and offer [NBBO] for every symbol. That would be very ... [sentence left incomplete]; your machine wouldn't be able to handle all that memory. So that combination of: can I code data science in Python, moving some of the heavy lifting over to the q side, all in the same memory space perhaps? Or perhaps over the network; that combination is probably a really great ... [sentence left incomplete]. The synergies there are really good, and I think we'll see a lot more of that going forward.
01:05:53 [BT]
So it's more like q becomes a really fast filter for large amounts of data, and then you can use Python routines to actually work on that data, with the way that you might be more familiar with.
01:06:04 [NP]
Yeah, I think that ... [sentence left incomplete]; I mean I don't want to limit q to just that, you know, a filter. But yes, I mean there's some analytics when you can push it to the server; it's a lot faster. I mean, and even when we talked about moving data to the cloud, you pay for data egress, right? You put all your data in the cloud every time you run a query; you have to pay for the data that you pull out. Well, does it really make sense to pull all that data to Python, analyze it to distill it down, or can you send your, you know, like 5 line/10 line q function to your data set that's on the cloud, distill it down to what you need and only pay for the results that are coming out, that you will then be performing your machine learning on. So you're like, your transformation into feature extraction ... that probably should be done in kDB. Putting it all, generating your feature sets or something like that and distilling it and then pulling it out. Yeah, there's definitely a domain, and it's not a clear line, but there's advantages of having skills in both sides of that.
01:07:04 [ML]
I think we, want to ask John Earnest [32] about this because I remember, I think, he said on that APL Farm that he really doesn't like the idea of, you know, having k or q inside another language as doing, you know, just some things. He wants it to be the whole system.
01:07:26 [NP]
So think of the following: you've got Python, which is very slow with numerical calculations [here, ML agrees], so they've added Numpy, [33] right? Numpy is, you know, a vector language modeled on APL, in some sense, right? But then they didn't have databases, the table or anything like that. R had a table so Pandas was created. But Pandas has its warts and even the author has, you know, a blog about the things he wished he could do better with Pandas. But what if you had a version of Pandas that was perfect, that very well managed its memory; was fast with the joins and it was also ... could share its memory space with Numpy vectors, so you could take advantage of all the algorithms which are already there. Why wouldn't you swap out Pandas with a kDB table under the hood?
01:08:21 [ML]
I think that's about what I would ask John [both chuckle]. Because, yeah, I do agree that sounds pretty useful to me and this is something that I kind of hope to do with BQN, but haven't done any real work on. Yeah, it does seem like a good direction.
01:08:34 [NP]
And that's in some sense also the direction that Shakti is going, trying to build that you know: keep the SQL, keep the vectors, but make it as fast as possible, because ultimately if you drill down to it, the reason why operations on Pandas vectors is slower than Numpy vectors in some sense, is because the Numpy vectors don't handle nulls. So like if you do the sum of a vector of nulls and Numpy, you get a null out. If you do sum on a vector in Pandas, it will realize that some of them are nulls and it will just throw them away. Or if you have average or things like that. So Pandas, adds a lot of the the friendliness that q also provides, right? The average value of a vector in q is the average excluding the nulls. The same would be with Pandas, but Numpy is like the raw metal: you asked for the average; it happened to have nulls ... your average is null.
01:09:30 [NP]
So it depends on, you know, how fast you need to go versus how much ease of use do you want? And Shakti will be the ultimate in performance, giving up a little bit of the ease of use for average and things like that. So I think that's where Shakti is going to come in. And much more memory, much more faster, taking Numpy, expanding it to the ultimate version of Pandas that you can imagine.
01:09:59 [CH]
Alright, well we've blown by the hour mark as I mentioned before, like 10 minutes ago, but before we wrap up: you talked a little bit about "Q Tips" and what went into that book. But you have another book, "Fun Q" and also you mentioned the CMU course which I had no idea you taught, which I looked up: the market microstructure and algorithmic trading. [34] Can you give us a little bit of: what's in "Fun Q" and how is that different "Q Tips"? And my guess is that it's not possible for people to go and just, take your CMU course as if it was a Coursera course, but I guess if you want to pitch that a little bit too for students that are in high school and potentially going to school and maybe they'll go to CMU just so they can take that course.
01:10:50 [NP]
With respect to the course, it's a graduate course. So if you're going back to graduate school and you want to get a degree in computational finance, it is probably ... [sentence left incomplete]. I don't know if there's any other course in other universities that would teach kDB. Yeah, it's trying ... [sentence left incomplete]. I know where these students are going to get jobs: they're often on trading desks; they're at banks; or they're at hedge funds. And even [at] hedge funds these days, kDB has got a big presence and so my goal is to give people the skills that when they leave the course and they get on the trading desk or wherever it is, you know, people are like: wow, where did you learn that? I want them to be up and running, efficient and effective as soon as they get there and not having to be taught. Of course there's some domain knowledge they'll need to learn, but my goal is that, I wanna be proud of the people who graduate so that, you know, [it] gives me a good name, but also allows, like I said before, a farm of people who know q and kDB so that when trading heads are deciding "what should we invest in?", it's not like: "uh, why should we use q? I just can't find anybody". So yeah, so that's Carnegie Mellon; that's the course. It's the last course they take before they graduate. It's once a year; I teach October, November [and] December. And I just taught my last class last week, and you know, I'm free. Open to build up the material again for for next year, over through the summer. KX is nice enough to give us an educational license every year, so that's really great. The hard part on my side is loading the data into the database because in a silly sense, I thought: ok, let me go find February 2020, the spike of the COVID era, right when the markets collapsed. I said: ok, I'm going to load that data into a one month database. And the machine I happened to have didn't have enough disk space; didn't have enough RAM. Like, I was fighting with it so I had to pull out all my skills. I'm trying to, you know, a square peg into a round hole. I had to trim the sides of the data set. I had to do it one security at a time, but I finally got it up and running. Once you can get the data stored in kDB, the memory of the machine doesn't matter. The amount of disk space, it's ok; like it's all compressed and things like that. And querying it is fine; there's no limitation there. It is: how do you get the data from the massive CSV zipped file into the kDB format? Because it's vectorized. People don't actually load all the columns, all at once; or they don't load all the rows, all at once. They're just always asking for a subset of the data, so once you get the data there, it's all good. And so that's the course.
01:13:48 [NP]
Yeah, I take a whirlwind tour through "Q Tips", basically you know. I run through a lot of the material that's in there and then of course a lot of financial knowledge as well; a lot of theory and practical knowledge that has gone, that I've learned over the years. So that's the program.
01:14:05 [NP]
So the book "Fun Q" is is very different than "Q Tips" and is very different than the class. I'd spent about five years implementing different machine learning algorithms in kDB. Well, because the first course I took was the Coursera course in MATLAB, and all those were implemented in MATLAB. So machine learning was implemented in MATLAB and I said: well, what's so special about MATLAB [35] that we couldn't do this in q? And so I took it as a self-challenge to prove that it could be done and I learned a few things along the way and what was lacking in q that MATLAB had, and you know, things like that. So after I had implemented all the algorithms for the course, I went on and looked for more machine learning algorithms and I implemented those and I just kept going and everything was just so elegant, like 3 or 4 lines of code and all the functions I had written to implement the algorithm A turned out were reusable for algorithm B and C and D. The amount of code reuse was so, I think, fantastic and beautiful that I just really wanted to share it.
01:15:20 [NP]
So the book itself is about 10 chapters. Each chapter relates to a different machine learning algorithm and even they're ordered, so that k-nearest neighbors, [36] I believe, is first and then k-means. So it starts with the simple ones and it gets more and more complex. And towards the end, those same algorithms show up again when we do collaborative filtering. So like when, you know, [on] Amazon, people who bought Bunny slippers also bought Panda Bear sweaters, something to that affect right? That algorithm can be implemented in some sense, as like k-nearest neighbors algorithm, as long as you handle null values. So I had to go back to the k-nearest neighbors function, just with a couple of extra characters, handle null values, and then out-of-the-box, the code could work for collaborative filtering. That's just one version of collaborative filtering; there's other ones that use like alternating least squares and things like that, which is, again, is just linear regression, which comes out-of-the-box with with kDB as well, and so all these things it turned out were quite elegantly implemented in q and I wanted to share that beauty with people. I wanted the people as I mentioned before, who happened to be database administrators, who perhaps had a math degree to kind of move on their career path from just knowing q to move to learning and understanding at the crux: what are these machine learning algorithms? Are they so fancy and complex? No, it turns out they're not. They're a couple lines of q code and and maybe when they're written in C and C++, you can get some extra performance out of it, but ultimately a lot of these things are ... [sentence left incomplete]. Because you can move it off to the GPU but if we could get q and kDB operators to offload to the GPU, we might actually get a lot of performance out of the existing implementation that I have. So that's kind of where "Fun Q" came from and why I shared it. I was just hoping that we could just demonstrate you can ... [sentence left incomplete]. I really like to show how q can be used beyond just storing things as a database; like how you can use the language and build really interesting products out of it.
01:17:33 [CH]
Yeah, that sounds awesome. I mean it sounds like if you were early in your education and you're at college or university right now, it is not a 100% strategy, but close to a 100% strategy. If you go through your degree, get your CS credentials and then unless you happen to be at CMU, I guess you probably can't get access to that course. But if you go and read a couple of these books: "Q for mortals", [37] "Q Tips", "Fun Q" and basically become an expert in this language, odds are you can line up a pretty good job on Wall Street or in some finance firm with these skills. Because it definitely sounds like there's demand for it and currently there's not enough q developers to satiate the demand for them at these firms. And yeah, definitely, you know, I wanted to be very early in my college education, a quant. I ended up being an actuary, which is like a long lost sibling; less exciting, less sexy, but for a long time I had read every single, you know, Michael Lewis book and you know "The Quants" by Scott Patterson. Like I probably read like 40 books on Wall Street and I was just obsessed with, like, this idea of making money for the sake of making money (and this, you know, past version of myself). But back then, if someone had told me, just go learn this language and you can basically, not guarantee yourself a job but really set yourself up well to land a well paying job and (you know these firms that have been around for decades) I probably would have taken that to heart. And so if there's ... [sentence left incomplete].
01:19:02 [NP]
I try to teach, I try to emphasize to the students that: ok, yes, this looks like a bizarre language, and you're going to be twisting your mind around quite a bit to get through it, but I promise you that the lessons you learn on how to write vectorially will keep you in good stead even if you put q behind you and you go back to Pandas. Because again, if there's a web page, you know, how do you make your Pandas code faster? [38] The first one is, like, don't use any rows, right? [others chuckle]. There's a couple more, like: in passing your data to the function and don't iterate it. There's so many of these concepts that are really important, and when you're forced to do them in q then you don't even have the option. Technically I guess you do, but it really teaches you, you know, the mechanical sympathy part; how to be sympathetic with the machine under the hood and those lessons should do you well when you move back into Pandas as well. And you look at code and you're like: no, that's not the right way to do it. Pandas does allow you to write quite efficiently with Numpy and things like that. You just have to have right mentality.
01:20:19 [BT]
And we've tied a lot to it to finance because that's an industry right now that uses these tools and has, you know, has a strong bond between. The context works well for the tools. But when you think about it, it's much wider than that because there's so many other areas of information, resources trying to balance things and across different resources. All these different choices involve distilling a lot of information down to a point where you can make a decision. That's essentially what you're doing, and these skills are super important as you move ahead and you get into a world where we're getting more constrained. We have to look for these niches where we can be more productive. These languages fit really well into taking that big information, coming down to a smarter decision, and specifically, as you're saying, with Q: the ability to pivot and change it quickly. You might not be as fast to actually put an algorithm into process. You know C++ may be quicker, but you can be much more quicker in changing it if it's not working, and I think that's an ability (that flexibility and that agility) [that] is really important.
01:21:31 [CH]
"Coding at the speed of thought"; it's my highlight of the podcast for sure. That's, I think, something that's been in the back of my head, but have never have never heard it articulated like that.
01:21:41 [CH]
But yeah, we've gone way over, but thank you for spending all this time with us again, Nick, as a guest this time and not as a panelist. This was super awesome and I definitely know that there's going to be some listeners that are hearing this and are going and picking up q because of it, because, yeah, it's a language that is ... [sentence left incomplete]. There's demand out there for people who who know the language, so we will, as always put the links in the show notes to both of Nick's books and we'll throw a link to the course that he teaches as well, although it is reserved for grad students at Carnegie Mellon. But you know, we might have a couple listeners that are grad students at Carnegie Mellon. Maybe they've even already taken the course. But yeah, thanks so much for coming on. I guess Bob, we should plug the contact@arraycast.com. [39] Did I get that right this time?
01:22:25 [BT]
You nailed it; contact@arraycast.com if you would like to get in touch with us or leave a comment. We really appreciate that. Nick, do you want to plug anything that's coming up that you might be doing, that you want to put pressure on yourself?
01:22:39 [NP]
There's two things that I want to look at. There's this concept of, you know ... [sentence left incomplete] (again, I'm always thinking about new ways to write books; new topics to write books on). One of them is about PyKX. I don't know if there's enough material to fill up a whole book on that, but I've taken it [as] my challenge to kind of, to talk about: why would putting a q table under Python be so fantastic, and what are the functionality that it would provide, and in what cases would it be faster? One example I have is: when you use the rolling function in Python, compared to like a moving average function in kDB, it is horrendously slow and it's just a matter of how it's implemented. Rolling [in Pandas] is quite generic and moving sum (or moving average or moving deviation) [in q] is actually quite specific. It's a memory versus performance tradeoff. So there's examples where you do things in Pandas and you're like: oh my God, can it possibly be any slower? And then when you do it in q (because the primitives were implemented with performance in mind) it's actually quite fast, so that comparison of: you could do it this way, or you could do it this way if you had PyKX. That's one thing in my mind that I'm starting to look at.
01:24:00 [NP]
And the other one is: if you had "a q god in your back pocket" type of thing. Now I don't wanna use the word "Q god". In China and in Hong Kong there are these cartoons where there's a bald guy and his name is "Old Master Q" [40] and there's just a series, a long series, of cartoons and they're really, really funny, but his name is "Old Master Q". And so I was thinking that, you know: if you had "Old Master Q" in your back pocket, one of the things that you would need to tell you [sentence left incomplete]. And so I would like to work with someone else and then we would co-author a book together. Let's for example, if you had interview questions: you go to a kDB [or] q interview and like they ask you these horrendous questions. What if you had "Old Master Q" in your back pocket? How would he help you out? So one of the tricks of the trade that are the dark corners of Q, type of thing. And it wouldn't be the first thing you'd pick up, you know, because in some sense you'd be learning those corners that you really shouldn't have to know about. But if you're building a trading system or a platform, maybe they're important because it's one of those gotchas. Like you know, the question: in Q, what does the year 0 stand for? It turns out [at] the change in the millennium, when kDB was at 3.0, 3.6 came out [and] they recentered dates to 0. So the year 2001-1-1, if you cast it to an integer, you get the number 0. It's not centered on the change of epoch. So anyway, questions like that; do you really know q down to the nitty gritty detail. So maybe I thought that there might be a book in there about that. Nothing's been written yet; a lot of thoughts have been put into it, and we'll see if I have the time to get through it all.
01:26:01 [CH]
Oh, there we go. Third book in the making, and project. We'll have you back on when the book's done [everyone chuckles] I guess we'll finish once again by just saying: thank you for coming on and we will wish everyone Happy Array Programming!