Transcript for August 21, 2021 Attila Vrabecz
Thanks to Rodrigo Girão Serrão for producing this transcript.
00:00:00 [Attila Vrabecz]
Think I know enough C++ or maybe I know enough to stay away, I'm not sure.
00:00:05 [Conor Hoekstra]
I mean, I don't know enough. So you know C++ is the language you perennially feel like you don't know what you're doing and you're just fighting with the compiler so.
00:00:11 [Music theme]
00:00:26 [CH]
Welcome to another episode of ArrayCast.
I'm your host Conor, and today with us we have two panelists, Bob and Adám, and another guest.
We'll do short introductions, we'll go with Bob first and then Adám, and then we'll hop straight into our interview with our guest today.
00:00:42 [Bob Therriault]
My name is Bob Therriault and I am a J enthusiast and I am not a professional programmer, as I've said many times I'm just coming up on 20 years of doing J, which has been a lot of fun and I really love doing array programming languages and I really love learning about some of the other array programming languages and that's why I'm looking forward to today's episode.
00:01:02 [Adám Brudzewsky]
I'm Adám Brudzewsky. I've been doing APL for a long time. I've been doing so professionally for 7-8 years, also very interested in other array languages and what they can learn and borrow from each other. So, looking forward to today's episode, hearing about things and corners I haven't heard about.
00:01:23 [CH]
And as mentioned, I'm your host, Conor Hoekstra. As you know, probably, I am not an array programmer, I am just an array enthusiast. I develop in C++, but I'm a huge APL fan and J fan, and more recently a BQN fan. I always want to say bacon, but that's not, it's not how you pronounce it after Marshall's episode. And yeah, I guess a little fun note is that APL and J both have the quadratic scan. This is like jumping into the details, but it's just got me excited and BQN doesn't. So I was on APL Orchard the other day and trying to assess whether now BQN is my favourite language because they don't have that little, that little tidbit, and maybe we can get some feedback from the k side on which way they fall. Do they have the the quadratic scan? If, I think I know the answer, but we'll get we'll get Attila to respond there. So yeah, we'll hop well up straight into the interview and later on we can circle back if Adám or Bob have thoughts on the scan thing.
00:02:23 [CH]
But today our guest I'm hopefully going to pronounce his last name correctly, but if not, he will immediately correct me, is Attila Vrabecz and Attila has been working with the array languages dating all the way back to k3 I believe and more recently with k4 and q. He founded Quantum KDB, which was later on purchased by First Derivatives, which is the same company that purchased Kx, and has been working at Marshall Wallace [Wace] for the past 6 plus years and is considered by a lot of folks as an authority on the k/q language. So we will start off with the same question that we typically start off with our guests, which is if you want Attila, feel free to walk us through your sort of path to the array languages going far as far back as you'd like and tell us, you know how you got to. You know array languages and where you are now.
00:03:17 [AV]
Thank you very much and for the introduction. I actually did chemistry at university and I did physical chemistry, so I was, you know, kind of chemistry was my one of my passions, and computing was the other and I was doing computational chemistry or computational physics and while I was doing my PhD, I kind of, kind of felt like that I have to broaden my horizons and have to learn, I was doing it in C and I was, you know, looking at all kind of other technologies, languages out there, and I, you know, I mean the weirder the better they are, I guess. So you know I did a bit of Lisp, I did a bit of OCaml and then I think I don't remember to be honest how I came across APL and that was like all right, this is really, you know, absolutely different and yeah, playing with it a little bit but it wasn't so easy to, you know, to get like a good quality, you know APL environment so I kind of moved onto J, you know, solve that problem and and I tried to learn a language and found it quite interesting, but then somehow I came across the the Wikipedia page for the array languages, which also mentioned k and I was like, oh, I thought, I missed this one and I read the manual and I couldn't understand a word of it. I was like I don't know what they're talking about and it was also like one of those...
I think I mean the thing which captured my attention is that like having a language where you don't have a separate primitive for not equal, like you know, just make it out of two and I was just like that. That's interesting. And then it kind of kept bugging me, so I kept coming back to, you know, trying to understand what's going on and yeah, this was back early 2000, so you could have an evaluation version of k3, which I believe was like limited to like 100 megabytes or something. So it was enough to play with and I kind of been doing project Euler and and stuff like that to learn the language. And it was different from anything I've ever seen before and and somehow it felt like that this is much more fun than, y'know, like Python or or C++ or anything anything like that before. So then, I heard about k4, just you know, kind of, I think it was just coming out or it was around the horizon and I contacted Kx who were uhm, graciously granting me a license to, you know, like because I was in academia and I started playing with it and when I was just about finishing my PhD, I realized that A) there is not that many I mean, uh, research positions in Hungary, you know, for physical chemistry to begin with and I have to, you know, try to see what I can do. So I was looking at the, you know jobs and and like I was thinking about my career and I happened to bring this topic up with Stevan Apter who I was, you know, just talking with because of k and stuff like that and he said, but like, oh, I think you know enough k that you should be able to get a job with k. And it's not something I even considered I was like really just doing this on the side. And then basically Arthur introduced me and then as we said alright, let's have a phone interview and then I was on a plane in a week's time to join First Derivatives and then next week I was literally sitting in an investment bank because kdb was just really booming at the time, and everybody was just trying to you know hire. And I'm just like, you know, it was really like, a big push on, kdb+ came out, and the 32 bit limit got removed and stuff like that and so I basically kind of got thrown into the deep water and I just learned a lot on the job. This was in 2006, and I had the luck to work with quite a few good colleagues, which was very helpful and maybe what was very different at the time is the list was extremely active. And I felt like that I learned the most from there by people posting questions. Me trying to answer them and then others you know, like improving what I had and then we tried to understand why that's better, right? Then basically ever since I've been working in the finance sector in London. Been working with kdb ever since and building analytic application databases and whatever you do with kdb. But that's quite, it's quite long winded, but hopefully it gives you like a good overview. That's where I'm coming from.
00:09:54 [CH]
So I it sounds like you went straight from finishing your dissertation and your PhD work into the finance / k world to First Derivatives and then at some point I guess you stepped away from First Derivatives and then did you end up back at First Derivatives when they purchased your company or?
00:10:12 [AV]
No, I mean I left reasonably early on I think I left after a year and a half. But yeah, when when my company got purchased? You know to be, besides, at that point in time, I wasn't a shareholder in that company anymore because I started back to work on a different project. So I was not involved with the negotiation or the sale or anything. I was just a co-founder. It was a my co-founder though who kind of, you know, after the initial like you know half a year he kind of kept going with the company and I went to work on, uh, small project on the side which didn't turn out to be like the best idea of my life, but what can you do.
00:11:07 [CH]
OK, so it sounds like. I'm not sure, uh. I'm not sure if yeah you're the 1st that has experience with all APL like, you know, practical, having written all three to a certain extent. So how long did you play around with APL and J before you ended up stumbling onto the Wikipedia page and starting to play with k?
00:11:30 [AV]
I mean I think you know, I wouldn't say that I, I know APL or J to any reasonable extent. I mean, I think I know enough that I could tell you why I prefer k and but I wouldn't be able to write APL or J, you know fluently that's for sure, or even read. I think it was only like you know, maybe a couple of weeks or something.
00:11:58 [CH]
Yeah, you've predicted the question I was going to ask next, which is you know, what is it that spoke to you? You know why did k speak to you over APL and J? Like what were the the differences that led you to really start exploring that one, with the Euler problems and?
00:12:13 [AV]
I mean, I guess you, with APL, but you know, maybe I didn't you know, come across the right back pages or whatever, but APL at the time it really felt like that. How to say this, some like it was kind of, you know, not developing or it was kind of, you know, stuck or or it's. There's really not that many vendors and the vendors who were available, they were like they were more like, There was no good pre documentation or maybe I didn't look hard enough. And I did like, you know, the symbols, but then, but everyone else comes through like you know, trying to type the symbols and all that stuff that that struck me back a little bit and, but I think the biggest you know difference for me between kind of k and APL and J is for me, kind of APL and J felt more like focusing on math whereas k felt more focusing on computing. k felt like it was more taking, taking shortcuts or taking like decisions, which are maybe less pretty, in a mathematical sense, but they are more pragmatic in terms of how computers work and like you know, like yeah a multidimensional matrix is a really really nice, but it turns out that you know you can get away with not needing them to some extent. I'm just like emulating them. Kind of felt like the minimalism of k, but like somehow like made more sense to me than having a lot more stuff built in.
00:14:12 [CH]
So this leads, this is, perfectly ties in then back to the quadratic versus linear scan so. Uh, do you? Are you able to? I'm I'm guessing, you know, are able to speak to whether k do they fall into the quadratic scans so that you can have.
00:14:29 [AV]
00:14:30 [CH]
Yeah, so so there we go. So both k and BQN are in the, in my opinion, correct camp, although I'll let Adám and Bob weigh in on what what they think is correct and I assume Attila you are in the thinking that you know from a computing standpoint. Making what could be linear, a linear algorithm, quadratic just so that you get. I think the reason is that you want. What is it? Alternating sums and continued fractions, which is like a cute mathematical trick, but I have no idea what that's useful for, and I definitely don't want 1) my scan to be quadratic in time complexity. 2) there are certain algorithms, specifically Kadane's algorithm that I tweeted about it, but you have to put an asterisk next to it, because in APL it doesn't really work that way, you need a APL implementation with a linear scan, so. In in that regard, I put k ahead of APL and J, although I don't know much k. I guess Adám and Bob do you want to weigh in slash also to, I've been asking all the questions so I should. I should let you two ask any If you have them.
00:15:34 [AB]
Well, I mean, we can definitely speak about the scan and yes, I think you're right that it's it comes from some mathematical properties originally, and Ken Iverson? Maybe a pioneer in computer science, but he was, I think, primarily a mathematician, and it didn't. It doesn't seem to me from what I've read and heard from him that he was very concerned about what the what, the cost, the runtime cost would be of various algorithms, what he was concerned about was the actual design of the language that the main point was that this notation would be a tool of thought. This is how you think about it, and with the convention that functions have long right scope, they're right associative, which is true for for k and for BQN as well, then if you want to apply reductions over the prefixes of a list, if things just come out like that and if you were, if you were to, if you take prefixes and you stick in APL functions in between the elements of those of each prefix. So then each one of them needs to be evaluated from the rear towards the front and there is no way other than for very special cases where you can take some computational algorithmic shortcuts. There is no way to use the previous result to compute the next result because because the next the next computation starts off with a different initial element that you didn't have available at the previous computation. Every one of them is unique, is different, and so yes, it comes out that you have to do you have to do first, do zero computations, then one, then two, then three and so on and it. It comes up, it's it's O(n^2) that you end up with. Unless you can take some shortcuts like for for a running sum. You can do that for a running difference, if you're really clever, you can do that and certain other things you can do that. So it's not. It's not just a "progress to the next element and then apply a function one more time to the previous result you had". It's not the same thing. And that said, it's, I don't think I ever use an alternating sum really other than for, as you say, cute mathematical demonstrations or alternating products. I think if I understood it right from when we had Henry Rich here the the new Fold primitives in in J and actually do allow you to choose which direction you want. So you do have everything available. I, after that episode I played around with them a bit. The syntax should we say cumbersome in my opinion? But uh, yeah, it seems they can do what you want them to do. That of course increase the, increases the vocabulary and k is looking for minimalism.
00:18:31 [BT]
And with J a lot of times. Well, when you're learning, I'm not sure whether this applies to scan as well, for reduction it's mentioned over and over again don't assume the order that you're going to be doing things in, so I'm I'm not in the position where I'm doing the figuring out the algorithms that are going to actually making the primitives work, but it may be that there may be opportunities in the future if you've take out, it will break some people who've made the assumption, but we're being told over and over again don't make that assumption, so that might be a direction forward that that J ends up taking, but Henry is definitely the one to talk to about that.
00:19:12 [AB]
I notice that in the in the ADSP podcast that that Conor's running and they spoke about this about two different types of reduce or you possibly could call one reduce and one fold. Uh, one where you are allowed to make the assumption about the order that things are being done in, and one way you're not allowed to make the assumption and and it makes a huge difference, of course, in in computation that can be done, in optimization, parallelization that can be done if everything is sequential, then of course you cannot parallelize, but they have some nice properties. It sounded to me like they Bryce and Conor thought that both would be nice to have.
00:19:50 [CH]
Yeah, there's I think it's episode 25 we call it "The Lost Reduction" where we have two different ones, accumulate and reduce and accumulate is the least constrained and therefore can't be parallelized, reduce's the most constrained so can be parallelized. But then, it requires both associativity and commutativity, and I believe if I recall correctly, that we wanted one that was non associative but still commutative or vice versa, so there's there's like a missing algorithm in between that could still be parallelized. Uhm, but would allow a new category of algorithms that require sort of the left to right processing to work and currently the work around that people do for that is they just they use a parallel scan which has that property and then they just take the last element, but that's inefficient 'cause scans generate more output than you need at the end of the day.
00:20:52 [AB]
Well, I just wanted to mention an interesting thing in in Dyalog's implementation. In APL, of course, we have this problem as well, and as has been mentioned before in this podcast, APL and APL-like languages are often used in the financial sector, and some of those people they get rather upset if from one day to another, the amount people have in their bank accounts changes because we change the algorithm. Now, the merit of storing people's values as float the floats could be questioned, but the fact of the matter is that many amounts are being stored as floats, and that means imprecisions, and that means the order in which you sum makes a difference, and so Dyalog makes a guarantee that we don't change the exact result of, in a plus reduction and that means we can't parallelize. We can't speed things up if the, we, we need to preserve the floating point errors that have always been there, so people amounts in their bank accounts, don't jump around, but there's a trick around it, especially when doing tacit programming and so you see this a lot in J. Hm summation is the same thing as evaluation in base one. Might want to take a couple of seconds to let that sink in. So by allowing place values, right, normally we look at a number in base ten. We say these are the ones that the 10s, these are the hundreds, but in unary there are only ones, so these are the ones, and these are the ones, and these are the ones, and these are the ones. Normally we do not allow in a number any digits that they are equal to or or higher than the base. So you can have a number like 23, both the two and the three are less than 10. The base that we run. You cannot have a digit that's above 10. We don't even have that kind of digit, although you could like in hexadecimal conventionals you use letters for that you could potentially write something like that. What would happen is you could have potentially more than 9 ones and more than 9 tens and it all just works out. You can all just add this. So it comes out that being that every place value in unary is are the ones. That means if you evaluate any any list of digits in base one is the same thing as summing them. So let's say it says 2 3 in base one, so the three is the least significant, those are the ones. And the two is the 2nd least significant, those are also the ones. So it's so it's, three times one plus two times one times one, and so on. Effectively all the times one fall away and it just becomes 3 + 2, which is 5. So evaluation in base one is the same thing as summation.
00:23:47 [BT]
You're counting sticks.
00:23:48 [AB]
You're counting sticks, and this means that you can substitute whenever you have summation, you can substitute with evaluation in base one and it might look ridiculous to the reader. But we do not make any guarantees as to the order and precision of evaluation in base one, which means we can implement all manner of optimizations in that algorithm that cannot be implemented in summation because we need to preserve, we need to have some stability in our amounts. So if you want absolute speed in summation you don't care so much about the precision or or the stability in the result, then you can use base one evaluation. So in the sense in that one case, we do allow you to choose which one do you want you want to make, the assumptions you want, don't want to make the assumptions, so there are there are tricks that could be done like that. I just thought it's an interesting thing to note. You can see this in code here and there.
00:24:45 [AV]
k has the same thing that you know you're you're running a version which uses the SSE, AVX, whatever special instruction and and you know it parallelizes the code at the single level, then you will get different results, but if you have a build which doesn't have those enabled, then there's no guarantees in kdb, so absolute speed. Go back over.
00:25:16 [AB]
OK, so what do the economists say about that? That means if you do the exact same computation, same code on a different computer, you can get a different result.
00:25:24 [AV]
Not different computer, even the same computer, but you can have a build which uses these instructions and another one which doesn't and this has come up because, you know, for a while the 32 bit version was free and the 32 bit version didn't have all these vector C instructions and stuff like that. So if you ran something with that, we might get different results than what we wanted, like kind of the production 64 bit version and they just said that’s it, 32 bit is not supported.
00:26:02 [BT]
It sort of shows you what the kind of constraints that you're playing with when you start designing languages, I mean you get to the point where you can do it one way and get accuracy, or it's almost I guess it's almost like quantum. You know you can you can measure your velocity or you can measure your position, but at the same time you're going to get different things.
00:26:22 [AV]
Yeah you can't have both.
00:26:25 [CH]
So can you talk a bit more? Jumping back to sort of where we left off on the k stories? You know k focuses more on computing, and I think I'm not sure, if this is definitely true, but from what I've heard that k is, you know, lightning fast, and probably you'd say that k is faster than J and APL. Can you speak to a bit more about that? Like you know, I assume it's not just the scan, you know what are the what is it about k that makes it more you know, friendly for computing versus for mathematics.
00:27:00 [AV]
I mean, I just you know, definitely found that having less primitives. We don't have to say that's more appealing to me because I have to learn less. Uhm, but then of course there's like lots of overloads, so it's a bit of a false economy? I think other things which made sense to me is. No, uhm moving away from having functions only up to two arguments I mean, I can again see the appeal of it, but like you know, people have two eyes, two hands and all that and two dimensional joins are easy to understand, but still like they they come up all the time that you know you would have three or four or five and if you have to, kind of put that together from functions which only take 2, then that would be quite tedious. Then I also quite like the implicit argument for for lambdas. That seems like a very elegant solution to something that you would go tacit in J. No, wait, I mean I try to get my head around, you know all tacit stuff, and I I kind of see the appeal. But it's just that was just that was beyond me. I mean, it sounds like that it's it's not worth it like the readability suffers so much, but at my level, but I would rather just be explicit at that point and.
00:28:45 [AB]
Can you expand on that a bit? Tell us about, what is this, implicit arguments you said, was that what you called it?
00:28:52 [AV]
But but yeah, but basically, like you know if if you have a lambda, then up to three argument X, Y and Z, you know if you mention them then your function takes that many argument without declaring that, oh, this is a function which takes, you know, A, B and C if you just use, let's say Y, then it will assume that you will take X&Y. If you mentioned Z, then it takes X, Y and Z. I found that quite quite neat. You know, short and painless, you know, compared to in the fact that, and this isn't, actually maybe an interesting question, but the fact that you know you can declare different data types in k can be an advantage, especially like you know when you work with like anyone else? So if you're all in yourself, sure, but if you have to, you know, go with any other system then it's nice to say that while I'm expecting a 32 bit integer and you know and I think it's a 32 bit integer. I want to deal with this in a different way, and but interestingly, it seems like that Arthur in Shakti is kind of move moving back towards the APL model a little bit, but we don't even have a differentiation between Boolean and integers, the only difference with things to make is integers and floats and, I'm not sure, I can I can see that good for application but not sure about databases that make sense.
00:30:46 [CH]
So have you followed the evolution passed k4 to I believe k 5 6 7 they skipped 8 and then went to k9, which is now rebranded as Shakti. Have you followed that sort of evolution while having worked at your various places since starting with k3?
00:31:15 [AV]
Well, I I tried to, you know, like be involved and just try to see what Arthur is thinking and plug it these different. We've tried a little bit.
00:31:21 [CH]
And have you have you noticed any, because you mentioned Shakti, uh? And yet, I believe you're still working with k4 slash q. Do you think like the future is going to be, you know, k and k4 and q will still be sort of the dominant k dialect. Or I know, I know Arthur doesn't like them being called dialects. He considers them sort of Lisp variants that they're all different languages, 'cause they're written from scratch but, or, you know, is Shakti the future? Or are you able? Is it too early to tell?
00:32:01 [AV]
I mean, I think very early to tell obviously so I mean, it's even more complicated than like, you know, the Python 2 versus 3 story, but there is really like, no, I mean, we had the same problem or or you know customers had the same problem between version three and four and and they were like, you know, big investment banks too were basically like working on migrating their k3 codebase to k4 and they collapsed you know, during the financial crisis, before they could. So as for this question I think it's you know it's it's obviously have the slashKX you know it's very mature and new you know a lot of different places, but Shakti has Arthur, so I mean I I don't know. I don't know which which which horse to bet on and I think it's I'm I I I'm I would be hoping that any innovation on either side is just, you know, just helping everyone else kind of help progress the community to you know, like, have different, maybe slightly different solutions to apply to different domain having different trade-offs and then all in all, like you know, raises our profile altogether, hopefully.
00:33:38 [BT]
Attila, if you were like you, you've started up with with k4 and sort of fell into that that arena and then before you knew you were working in the language. If somebody was starting today, what would you say they should start off with is there? Would you say, go for Shakti because it's the new thing?
Or would you say there's another version of k that you think is actually more stable, easier to learn, sort of feels like a a language that would be of more interest, somebody starting out.
00:34:11 [AV]
It depends on what the goal is? I guess if they ever if they wanna, you know, have a job that's quite different from somebody doing it just for the sake of curiosity on a like a month time span or something and I think it's yeah, definitely I think if you wanted, you know concerned about finding a job, then I think it's definitely still q and kdb at the moment and if you are curious like I say, I think it probably would be still k4 in the sense that there's a lot more material like the problem with Shakti is that it's still like, you know, the documentation is literally like you know a 85 by 25 character screen and then good luck. Figure it out if you don't already know enough so I I think that that would be almost impossible for anyone to come in freshly and and try to try to learn that, uhm? Of course, the somewhat unfortunate thing is that with kdb, there's a lot of focus on q or most of the material is about q, not k. It is again what it is. I mean I I would personally prefer k but that's not public consensus for everybody buys q.
00:35:43 [AB]
But it is k that k that goes together with kdb+, is it even officially supported? OK, does that have documentation?
00:35:55 [AV]
I mean, it's literally just slightly different, in fact I mean q is just slightly different syntax for those k so in the sense or to the extent of q being documented, k is documented. It's like plus is the same thing in both.
00:36:14 [CH]
So you could. It sounds like you could very easily write a q to k4, I don't want to say transpiler, but just like translator, that replaces all of the keywords in q with the equivalent.
00:36:24 [AV]
Yeah yeah that's trivial
00:36:30 [CH]
Interesting, so does that mean that certain banks that are banks or companies etc that prefer sort of your style of preferring k4 to q?
00:36:42 [AV]
I don't I Doesn’t exist. I don't think I've seen them.
00:36:48 [CH]
Well, that prefers that.
00:36:49 [AV]
I don't think that exists?
00:36:51 [CH]
That prefers to write in k so all every all clients right in q then basically.
00:36:55 [AV]
Yeah, yeah, I think you have the choice between k and q. Everyone comes to write q you because you have a goal.
00:37:04 [AB]
But do they even know that k exists? It's barely mentioned.
00:37:10 [AV]
I mean they usually... They they they usually just joke about it. I could say the urban legend or whatever. Nobody actually cares, they just talk about it and then. They still find q, you know, still alien, and if you're coming from Python or or or SQL but not as alien as k.
00:37:35 [CH]
Interesting, so it's it sounds like the the clients that are using this. Well, I guess you can make the argument that q is more expressive because it uses readable keywords, but a lot of array programmers talk about the expressivity of J and APL and I think that would extend to k, so they're using the "wordified" k, aka q, but then they actually don't care too much versus k like the k symbols versus the q words, which means that they're really just using it for the performance and the database is that.
00:38:13 [AV]
I mean, I think there's a bit of that, but I think it's also you do lose some of the terseness, but compared to a lot of other stuff out there conceptually, or at the token level, you are still very terse. Like you know, even in q like, some words are longer but it I think it still enables you to get whatever. I don't know, maybe 80% or 70% of what k might give you, or it's just like a made up number, but I yeah, I think it takes you quite along the way for array languages that I wouldn't discount is on that.
00:39:01 [CH]
Right, I think I think most of the keywords in q are three or four letters even like I believe some of the algorithms that have classic names like transpose, Arthur renamed to like a four letter word, so transpose in q if I'm if I'm not mistaken, is flip.
00:39:20 [AV]
And versus reciprocal. Nobody uses reciprocal.
00:39:26 [CH]
Yeah, I was. I was just about to ask what's the what's the longest keyword in queue? Is it reciprocal?
00:39:30 [AB]
Why would you not just write "1 divided by"? Then it's, it's 2 characters.
00:39:34 [AV]
So that's what we I think that's that's what people do.
00:39:37 [AB]
But even I think you even have things like "sum" where you could just write plus slash, why would you write "sum" if you can write plus slash?
00:39:45 [AV]
I mean, I think the argument goes that. If you have very, very casual users, they can read "sum" or you have like someone you know who's used to excel they can read "sum". They wouldn't know what “plus over” is and and I guess the other argument in k versus q is, but if you if you want to use “plus over” in q but you have to put brackets somewhere or or parenthesis, is it so you would either put “plus over” in front of self or you would put the argument in brackets? And and at that point, but you know that basically “plus over” and two parentheses or brackets, that kind of four tokens, and some is actually just, you know, free left.
00:40:39 [AB]
So you can't just use case syntax inside the q code like that.
00:40:42 [AV]
You can, but you have to like you kind of have to escape back to k. It's usually involved some extra parenthesis and that makes it like, not as nice as you know, just vanilla k.
00:40:56 [AB]
So does that, does that also mean you can't use just a dash, a minus to negate values without adding a parenthesis?
00:41:03 [AV]
No, you can because that I don't, I was talking more about them like adverb and stuff like that, but but no with negate you can. So negate and then no parenthesis that’s fine.
00:41:17 [BT]
So it sounds to me like going back to the original question, the the language that makes the most sense if you're just interested start with q because you're probably going to get a lot of the concepts.
00:41:28 [AV]
Yeah, I, I think so and I think, Oh well, you got. I mean most of the database stuff is kind of we can view it as some type of q, but it's a bit of a not technically true, but kind of true.
00:41:45 [BT]
Are there any ways that q would lead you astray?
00:41:48 [AV]
Oh, I'm sure I mean there's there's plenty of plenty of quirks, some things you wouldn't expect. Too much overloading, running too much overloading is always a problem, like... And I mean the right to left, I guess most of us wouldn't complain about that, but seems to be getting a lot of people by right to left. Missing semicolon. Stuff like that. I think it does feel like. A bit like you know, like bit like JavaScript, but it's a bit too easy to make something too dynamic especially when you are building a or like a system where you can have one process sending something to another one very easily but this get get to do and then you end up with like I get it, but like kind of more at the architectural level, model them.
00:42:45 [BT]
And and that's kind of the, you know, the curse of having a pragmatic language. You can do things, and as a result, you can do things.
00:42:54 [AB]
But I mean, I don't think it does any way to get around that. Any sufficiently powerful programming language will allow you to write crazy code. I mean, I can't see how one could possibly say that q allows you to write nasty code in a way that k doesn't.
00:43:13 [AV]
No no that's definitely true, I mean. Also you are thinking about q versus k I didn't think of it in that context. I was just thinking like q versus anyone else. Now, I think we've only really couple of things you cannot do in q you could do in k slightly shorter. So I think it's just the very kind of pragmatic trade-off which I think made actually a huge impact on unfair. Like you know, I think if you look at like the biggest difference between kind of kdb and and kdb+ or kind of case the the system builds on k3 and then the system builds in k4 is the addition of q and kind of pushing the two languages together became free when when you had a database language kind of separately and it was very confusing to people. That is why if you had provided the third prog, then you have to jump back to k there as your and, uh, SQL, like language, was kind of very restricted and we couldn't really like, see or understand the difference, but I think you that's just by the same. I think it's a big differentiation.
00:44:36 [CH]
So I actually didn't realize that kdb and kdb+ were two different databases, so it sounds like kdb was shipped with k and kdb+ was shipped with k4 and q. Is that correct or?
00:44:49 [AV]
Yeah, so the k3 I mean, I think originally they just they're trying to sell k3 as you know, like it's a platform, that's a programming language with all the bells and whistles, including that GUI and I think that wasn't nearly as successful then they basically I think Arthur just basically implemented kdb in k itself, oh, mostly k3 and then they sold that that's a separate product and I think that. And that's where like finance people really started to pay attention because they had this problem of large volumes of of market data which they couldn't fully analyze with other technologies of the time I think what? And that's why I also probably Arthur went back and focus much more and better databases back with k4 because you can sell databases, you cannot sell programming languages anymore.
00:45:54 [BT]
But then, once you've once you've sold the company on a database, they don't want to change it. That sort of locks you in at that point, and sort of. That's is that where you where you k4 is right now it's a blessing and it's a curse is that it's it's pragmatic and it's being used in industry and it just industry doesn't want to see very many changes. I guess unless they were significantly quicker, they might make adjustments in that case. Do you see anything like that coming up?
00:46:25 [AV]
I mean, I think Shakti is promising something like that, but I'm not sure if it's there yet. But I think that's the idea and otherwise I think it would be a hard sell to. No, I mean we kind of have an existing system which is working and doing what the business wants just for the marginal improvement is usually not easy.
00:46:51 [CH]
Yeah, hopefully we'll be able to get either someone, someone from the Shakti team or Arthur himself to come on at some point and talk to us about Shakti the and get more details on that.
00:47:02 [AB]
I found an interesting read from from Marshall Lochbaum who was was here with us where he he speaks about I guess there's like an article is written, it's it's in the on his site about BQN. But he speaks about the claims about performance in k and, and, uh, well obviously we connect to that from the show notes but
00:47:20 [AV]
Oh yeah,
00:47:22 [AB]
It sounds like I think it sounds like you you know about this page. Maybe you have something you want to say about that?
00:47:31 [AV]
No, I mean I saw it and and I read it and you know I'm not an implementer of k4 itself or anything like that, but I think what the maybe Marshall was talking more about I mean computation I mean, he was focusing very much on the computation part and I think the kdb. Is really like winning the customers is kind of on data in databases and I think that that part is like lot harder to to measure and like this is like we cannot just take that other Python and say that oh it can do everything kdb they can because it can't. Even then, like lots of packages lot more clunky I think I'm well at least that's how I understand when people talk about speed and they will talk about maybe kind of indirectly, the speed of development, not just another speed of the product and I think you have to take that whole context into account not just, but yeah you can always write some C code or C++ or Rust or whatever to beat kdb quite easily, but it's kind of the whole package of you know how fast it is to develop with how little effort, and like how well they've chosen the primitives are we can do something better for something specific, but it will be a lot more effort to code.
00:49:03 [AB]
I think that comes back to something we spoke about very early on in the podcast, I think at least it was in this podcast we did that that, well, you know, as you you said, I said, well, you can always write some some C code or something that can that can run fast in any particular part of an array language and of course I mean being that many array languages are implemented in C obviously, for every such every array language program there's an equivalent C program that can run at least as fast. If nothing else, the entire same thing with the entire interpreter, and but it's not feasible, I certainly can't do that, but for one because I don't know C and and secondly because, as you say, it's not reasonable. Yeah, you might be able to do it for one little piece, but those decades of array language implementers working on fine-tuning algorithms that you then have available to you at the press of a couple of buttons that is very hard to to replicate, and by writing your own code and you would have to reimplement that every time you needed it. Just something as simple as sorting, there are so many different sorting algorithms. For any particular type of data, you might want a different algorithm for that, any size of data, types of data, and so on. And and it's and it's unreasonable to ask everybody to rewrite that from C up every time but if you have a library built in, that's where the language itself selects the probably most reasonable way to sort this data based on some preliminary analysis of your data. You don't even have to worry about it. You're getting maybe not the optimal performance, but pretty good performance. And of it, and I just I if I understand, right, that's what you're saying that it's the whole package. And you can get high speed overall by leveraging all that work that has gone into the interpreter.
00:51:06 [AV]
Absolutely and I think that's why all the free open source off-shoots of k are not even close to performing like k4 can do because it took Arthur decades to get there. Most of us are not Arthur.
00:51:35 [AB]
I don't think anybody can argue with that. And but yeah, that's it. That's interesting, then so so it's actually not a lot as far as I can tell I haven't done it myself, but as far as I can tell from people around me, it's not really that much work to implement an array language. What is a lot of work is to implement an array language and make it fast and that might then tie into this interesting phenomenon where we're seeing that the world generally runs on free open source programming languages and it's unusual to have a language where you would pay to use that language. It's only when there's when there's these proprietary algorithms that have been fine-tuned or a large data set that's built into the language where it would even come into the picture to to pay and then yes, people can make a clone that is functionally the same, but not doesn't have the performance. And so that's how we can have commercial implementations of APL and of k. And you can have free open source implementations that do pretty much the same way, even more, but much slower and even even J, even though it's free open source. The J database system is not free right?
00:53:00 [BT]
Yes I was going to say that Jd, which is what Eric developed, is a database system. As you know, is adjunct to J and you can you can use it without a commercial license, but if you were using it commercially you would need to purchase it and use it that way and then you get support and a lot of other things. But yeah, it's the the database is something that's more of a commercial end of things. The big advantages within that database you can use I'm thinking this is similar to k, within that database you're using J language in the case of kdb, you're using k, or in kdb+ you're using q as the language that you're you're controlling your database with. I hope I'm right about that.
00:53:43 [AB]
I don't think this is exclusive to the APL family, even for things like MATLAB. There is a a very much equivalent Octave, where much of the code will run identically on the two , but MATLAB being much faster than Octave and that is a paid-for products and then also, of course there's this Wolfram language or Mathematica. Well, it's maybe not the speed you're paying for, but the massive organized data that's there. So maybe maybe there still is room in today's world, for paid-for programming language, for things that are actually mission-critical when it comes to performance or availability of data.
00:54:26 [CH]
All right, so we're we're just coming to the hour mark, but there's one last question that's sort of been in the back of my head that a couple of times I've wanted to ask. So you said when you were starting to explore, while doing your PhD, programming languages, you said the weirder the better and then you ended up at APL and then ultimately at k. So what is it that that I guess one you were looking for more sort of like esoteric, although if you called, depending on how many Lispers are listening to this, you know, they may take offence of calling Lisp esoteric because there there's quite a few Lispers out in the world, but what was it about the, you know, sort of weirder languages that attracted you to them, and ultimately, I guess we sort of talked about k versus APL and J, but not not as much just the array languages in general.
00:55:17 [AV]
I think if the person is truly like the, I mean definitely interactivity, but being able to kind of play feels like you're playing and you're just starting with something simple and you tweak it and then we really build your solution by iteratively, at least I do. Especially in other compiled languages, it felt like you were like, you know, writing something and writing and writing and I know that the compiler helps you in some sense, but it wasn't helping you to explore, which I think you know which has been addressed quite a bit in this context. Yeah, I think those will be I mean I, I kind of like Python but but I still didn't like it still felt like too big. For example, it felt like because it kept growing and growing and growing, and the person wasn't as neat or us as minimal or something by kdb. So I was. I was really like. I felt like I was looking for something kind of terse and mathematical without doing the mathematical application myself or just I mean, day to day, probably the most complicated thing we do is, like you know, summing up numbers. I mean, it's really not complicated in the end, but just somehow the the the pieces are all there to kind of manipulate the data. It is very easy way or at least much easier than anything else I'm not sure why. I mean, I do remember that and they were just doing some kind of lab session and I think in Pascal I basically, and this was before I knew APL or anything. I took I you know I have a solution and then I spent another few hours and trying to make this solution as short as possible, but I kind of I think I was predisposed because I like you know, I I kind of I have found some fascination with that and then and then I, you know, jumped into APL. And it's like Oh well, they took this really like so much harder than I could. For a field, but I really enjoyed that.
00:57:42 [CH]
Yeah, that's awesome. Yeah definitely the part where you'd said you know, interactively building up an expression resonates like literally just one or two days ago I was teaching someone who has like 0 computing experience a little bit of APL. And it was I don't know 5 or 10 minutes into it that I realized I hadn't taught them how to in the RIDE editor for APL to go control+shift+backspace to get the previous expression. And they were typing the expression each time building it up and then adding something and then I was like "Oh my goodness", like the most important shortcut is like getting the previous expression 'cause that's the whole, workflow is: you build it up 1 by 1 if you have to type that out each time that that takes away like 95% of the the fun-ness of like building up an expression, and it yeah was such an important thing that I had completely forgotten about it and was like "Oh no, no" and I guess in k and J it's the up arrow or whatever the shortcut is just to get the previous line but yeah, that is that's something that I miss very much in languages that don't have like a REPL because it's such a nice way to solve a problem to just sort of build it up slowly versus trying to write out the whole thing at once and then compile it and see if it works. You are a lot more certain at the end of building up one of those expressions in a REPL that like it works, 'cause you saw it work while you were building it up. You weren't just writing semicolons and parentheses and things and compiling and then waiting and running a test to see if it worked. So I think with that we've got one end of episode announcements from Adám.
00:59:23 [AB]
Thank you Conor. And yes, on the 29th of August I'm running an APL campfire and event. That is this time going to feature Ray Polivka who has been doing APL, I think since the very beginning of APL. It's amazing that he's still with us and he's been, he has been author or co-author of of some of the most popular successful APL teaching books over the years, and maybe not so relevant anymore, but I'm sure he's going to have a lot of interesting stories to tell, so we'll put a link to That in the show notes.
01:00:05 [CH]
Awesome, and we'll also leave links as we always do to a ton of the k&q resources it definitely sounds like from talking with Attila that q is the place to start if you want to start exploring k4 q and I know they have actually some pretty decent documentation and a couple different pages. Like you know, the built in functions with little examples that out of the box you can sort of copy and paste and q is downloadable. There is some free version up up to like a certain number of cores, correct that you can download locally.
01:00:25 [CH]
Definitely yeah
01:00:35 [CH]
OK, yeah, so we'll definitely leave links to the free versions of those that those that want to check it out can go and download that.
01:00:47 [BT]
I was just going to mention that that's a that's a great cue for me to talk about the show notes and the fact that on the arraycast.com website there are show notes with each episode as well as transcripts and I've also worked in that I've put in within the transcripts I put links that link back to the show notes. So if you're just getting a transcript you can click on things and get that information that way too, which is a lot of people seem to like and if you want to get in touch with us, it's contact@arraycast.com and that's kind of your feedback to us talking about what you've heard, what you would like to hear all those kind of things. We're all ears for that kind of information.
01:01:25 [AB]
Everything you sent to us does get read.
01:01:31 [CH]
Alright, awesome and yeah, once again thank you so much to Attila for coming on. It was awesome to hear a bit more about k and q and I learned a ton and the problem with these podcasts. Every single time we have a guest on I now I'm I'm like now I gotta go learn another every language on top of all the other languages, I want to learn so it was awesome having you on. Thanks for thanks for coming on.
01:01:51 [AV]
Thank you very much for inviting me.
01:01:53 [BT]
Happy array programming.
01:01:54 [AB]
Happy array programming.
01:01:58 [AB]
We were waiting for you, Conor. You're supposed to say that.
01:02:03 [CH]
Oh, yeah, Happy array programming. Whoops
01:02:05 [Music Theme]