Transcript
Transcript prepared by Bob Therriault, Adám Brudzewsky, Sanjay Cherian and Igor Kim.
[ ] reference numbers refer to Show Notes
00:00:01 [Conor Hoekstra]
Well, I think that was fantastic and I'm glad I asked it. I hope most listeners stayed tuned. I mean the ones that gave up at some point are not listening anymore, but.
00:00:10 [Bob Therriault]
Probably somewhere around the music analogies I guess.
00:00:13 [CH]
I mean, maybe we should make the cold open, "Now I'm gonna say my inflammatory thing" so that people will hang on until they know it's coming. Yeah, yeah, they'll know that. Or maybe that's what we should do. We should always make the whenever we're not sure, people will listen the whole thing. Just put a fake cold open.
00:00:30 [Marshall Lochbaum]
Isn't this what we do already?
00:00:31 [CH]
No, just put a fake one there so that, like, oh, when's it coming?
00:00:34 [ML]
And then we guarantee they listen to the entire time waiting for it.
00:00:37 [Music Theme]
00:00:49 [CH]
Welcome to another episode of ArrayCast. I'm your host Conor and today with me I've got 3 panelists. We're gonna go around and do brief introductions and have a couple of announcements and then get to our topic. Today, we'll start with Bob then go to Marshall, then go to Rich.
00:01:00 [BT]
I'm Bob Therriault, and I am a J enthusiast, and I'm very enthusiastic about J.
00:01:05 [ML]
I'm Marshall Lochbaum. I started programming in J and I worked for Dyalog for a while and now I make BQN.
00:01:12 [Richard Park]
I'm Rich Park. I've never really programmed in J, but I'm an APL programmer and teacher and evangelist working for Dyalog limited.
00:01:20 [CH]
And as mentioned before, my name is Conor. I am a polyglot programmer and a huge array enthusiast and I have programmed a tiny bit doing toy things in J so three out of four is not bad. I think we'll start with Rich because Rich has I think 6 announcements. So we'll try and burn through all of those and then we'll go to Bob for a final announcement.
00:01:43 [RP]
I mean, in my defence, like I can do plus reduction in J. I can do that. OK, yeah, I've got a yeah, a load of announcements, [01] APL related. So firstly the APL problem solving Competition 2023 is now live. You can go on contest.dialog.com or Dyalog APLcompetition.com. That's not only put one of those in the show notes to make it less confusing. If you are a full time student, you can you have the chance to win cash prize. Think the large one is $2500 and the invitation to Dyalog 23. User meeting to come present your work. If you're not a student, you should still compete because you could win free ticket 2 the same user meeting and it's always awesome to meet the competition winners. So, like half a year, I think deadlines on the website. So I won't say exactly what it is because I don't have it on top of my mind, but I think it's sometime in July. Solve problems in APL and potentially win money, so that's cool. Also, contest programming language competition related, the sort of website service called kattis. They do some like verification of programmers for various companies, but they also have like an online contest platform called Open kattis. I think that's open.kattis.com I think. APL is now one of the programming languages you can use. Dyalog APL to solve the Kattis problems so they have instructions on how you handle the input and how you're supposed to return the results. But yeah, if you want to go try some automatically verified contest, you know programming problems, then open dot kattis. Now that has APL support, which is cool. On very short notice news when this comes out, it'll be Wednesday after this comes out I believe APL Seeds 23. So just a few days left to register if you haven't already. A few talks aimed mainly at people who have either just started learning APL, or maybe they don't know APL yet, but they're interested, and then afterwards we'll have like an informal meet up over zoom where you can come hang out and chat about APL and array language stuff, links in the show for where to find information and register for that. We're halfway there. OK? Linux format is, well, on the front covers, I think it says like the premier or #1 open source software magazine in the UK. I mean, I remember that was one of the ones I grabbed from the newsstand when you go on those long car journeys, you have to stop at a service station, find Linux format issues 300 which came out this month has a like 4 page, fairly extensive article on APL and Arrays which is really cool to see with some quotes from Morten Kromberg, Michael Wallace as well, who was a previous guest on here. I believe talking a little bit a little bit about what he does and some other things?
00:04:41 [BT]
AKA Tangent storm.
00:04:42 [RP]
That's right. There's also a new APL show episode, the podcast of Me and Adám talking about APL and sort of notation as a tool of thought. It's a reaction video. We go on a limb, take a bit of a risk and do that, but it's about a really nice presentation someone did about notation and thoughts at the 2022 Software Engineers Conference in Australia and lastly there is a new one off episode of APL Campfire which is the sort of like history focused APL show hosted by Adám. I should have looked this up before to see who it was and about, but if you go to APL dot wiki.
00:05:20 [BT]
It's it's Norman Norman Thompson.
00:05:22 [RP]
Norman Thompson, if you got a APL dot wiki forward slash campfire.
00:05:28 [BT]
And Norman Thompson was one of the guys when like when I was doing my Masters degree, one of the things in education for APL, Norman Thompson has done a whole ton of stuff in terms of education and working with groups and how to educate many people. Really interesting guy and I missed it live, but I will pick it up on the on the bounce.
00:05:47 [RP]
I also missed it live, if you can't tell and I'll and I'll see that this week. Right. Yeah, sorry for the long one.
00:05:57 [CH]
Round of applause, round of applause. We got through.
00:06:00 [BT]
We made it through.
00:06:02 [CH]
All right, Bob, over to you for your one announcement?
00:06:05 [BT]
Well, my one announcement is the J Wiki continues to be emergent. It's slowly coming around, but this last week Ed Gottsman made a really neat presentation [02] about information on a wiki and how to present it, so he's developed kind of a GUI that can he calls it mapping and it's a really interesting concept. Basically, you keep the entire map of the GUI up all the time, and then as you click on different areas, your information comes up on the other side of the screen so that that sounds a lot. So simple and he's done it in a way that's really interesting. And because he presented at the meeting, because we do video the meetings I made a video of it and put it up on my YouTube site, so there's a will be a link there as well if you're into that sort of stuff, not so much to do with the array languages, although if you're interested in how these wikis work and maybe some of the tools that could be put towards them, my jaw was on the floor as I was watching it. Yeah, it was good.
00:07:03 [CH]
Yeah, I saw it as well. That's pretty neat and Speaking of videos, you just remind me that I actually also made a YouTube video that I mean it should have been top of mind called. What was it called? Why I love BQN so much, which I'm sure makes Marshall happy. [03]
00:07:18 [ML]
Yeah, I saw that was there. I have yet to watch it because it's just from today?
00:07:23 [CH]
Yeah, I made it sort of the late decision at like 7:00 PM because I had looked at some video anyways, is it the video that my YouTube subscribers want? No. Is it the YouTube video? My YouTube subscribers get? Yes, it's it's pretty. I makes there's certain videos I make, it's like 100,000 plus. It's pretty I know the formula to get that and then I yeah, those ones just are... They're interesting, but not as interesting as the BQN ones and APL ones anyways, link in the show. I'm pretty sure like the overlap of people listening to this and watching those videos is quite high, but if you happen to not know about that Channel and also I guess it's worth mentioning cause the APL contest Dyalog APL contest got mentioned. If you want a little helper videos of like how to solve those problems, Adám who's not here today, he has a sort of APLQuest, I think it's called, which is basically going through the backlog of problems going back to whenever the contest started. [04] So he has, you know, I'm not sure if it's hundreds, but definitely 10s of videos solving previous contests sort of phase one problems which so if you're not really sure how to get started watching, you know, a few of those on that playlist would probably be super helpful. With all of that being said, today's topic, which is probably, I mean the listener always knows if they look at the title of the episode, is going to be on performance. And I think also measuring performance in specifically array languages, so I'm not sure where we want to start I mean, I thought.
00:08:52 [ML]
I know.
00:08:55 [BT]
Actually I think and and we brought this up in the pre show, we should explain that Stephen is not here. All right. Yeah, because he's hanging out with Arthur, and that's honestly he ditched us for Arthur, and that's all that needs to be said OK.
00:09:14 [ML]
And therefore we have no q or KDB representative.
00:09:20 [RP]
Oh, you're not saying that gives us carte blanche to just crap all over q and KDB personally.
00:09:25 [ML]
It does.
00:09:28 [BT]
Well, we can do it. We just feel bad because Stephen's not here if we do it, so we have to, we have to be civil.
00:09:34 [CH]
Yeah, I mean, I mean I'm not even sure if we can, which sort of brings us to we were discussing this before. So we've brought this up on past episodes that they're on and off has been a free downloadable q and KDB plus execute executable that you can get from kx.com, [05] but that is a free trial license that comes with sort of certain restrictions, one of which is basically that if you are using it, you are not allowed to make or state sort of any claims on the performance that you measure. If you are measuring performance with that executable, which puts us in a somewhat of a precarious position if we are talking about performance in that, I think maybe we'll probably have like a Part 2 and Part 3 going forward on on this kind of topic. So in the future, if either sort of via the KX company or first derivatives or sort of, you know if Stephen is able to say certain things when he's on in the future, we might be able to talk about it then, but I think for the purposes of today's episode, we're probably gonna focus more on APL, J and BQN, because we can say whatever we want about those languages without worrying about, I don't even know what would happen legally would we get would Arraycast get sued? I don't know. Will we get taken off the air?
00:10:45 [ML]
They would start with a cease and desist. I don't know where it goes after that. If they decide worth their time.
00:10:50 [CH]
Yeah, if you're wondering where Stephen is, he's he's had had prior plans today and also we're probably not going to mention a bunch about q you know feel free to write your local KX representative if you'd like a change in that policy. But with that out of the way.
00:11:06 [BT]
Well, actually just talking about policy, I think in some ways that that policy is reasonable and we'll probably get into it during this episode, but if you allow people to publish that kind of information, they can control the situation that they're running it under, and they might have come up with results that are entirely valid for what they're doing, but not fully report how they're getting those results so I can understand why a company might do that because they want to control what they present and hopefully is valid that I mean it closes the door and being able to be open about it, but it opens the door in terms of what they present should be valid for what they're testing.
00:11:47 [ML]
But it is worth noting that neither KX nor Shakti, as far as I can tell, present any benchmarks of their of their programming languages as array languages, so they don't test against against APL or J's. From what I've seen. They published benchmarks about its use as a time series database and that's all.
00:12:07 [BT]
And that's their language is tuned to the database. Yeah, like, does the language really exist aside from the database, I mean, I don't know. It's a language, but you need the database to really run it don't you?
00:12:20 [ML]
I mean, no, you can just like run code in q.
00:12:24 [CH]
Yeah, I don't think I've of all the q I wrote which admittedly probably is less than like half a 1000 lines of code, which still is substantial considering you can do quite a bit with q is I don't think I've, I've maybe written like.
00:12:33 [ML]
It's q, yeah.
00:12:37 [CH]
Two or three KDB plus like queries where you're using like the where keyword and stuff like that. Like most of the stuff that I do is is leaning on the you know reduction and scan primitives and stuff like that. So it's very possible to use q as sort of more of like an APL or J or BQN like language versus a database querying application.
00:12:57 [BT]
And I guess I'm thinking with Shakti, Arthur's providing the databases that he's running it against, whether it's the taxi database for New York or those kind of things those are available.
00:13:09 [ML]
Yeah, Shakti [06] has shown some benchmarks on open databases it's not. I mean, they haven't shown source code or anything of of like what they're running, but I mean I think if you were a client, you would be able to reproduce what they do, but the rest of us just have to kind of take their word for.
00:13:28 [CH]
Yeah, I was gonna say that, like, on on one hand, I agree, Bob. But like, I really don't. And then like from a programming language enthusiast point of view. Like it's kind of the equivalent of like a language like rust or zig coming out and they're trying to compete with low level, you know, performance, you know, heavy hitting languages like C++ and C and then saying that they want to protect the reputation of the language because people are going to go and run sort of faulty benchmarks, which, like admittedly, there's tons of people out there that are the, you know, how specific are they, you know, setting and there's even a bunch of talks. There's one by Emery Berger, [07] who talks about how when you're profiling stuff on a computer, like so much stuff impacts like one of the points in his talk is he's working with a student and can't figure out why something is running so much, so much slow on his computer versus Emery Berger, the professor's, and they realize that it's because of the length of his username, which is in like the the present working directory. Like so it makes the length of like you know PWD that much longer and somehow that leads to like a 10X.
00:14:34 [ML]
Well, it changes the alignment of the of the interpreter or the whatever they're running the binary. Where its addresses are located in the memory space. And I still don't understand why this is, but with with the way CPU's are designed, they'll run your code at different speeds if like the loops are aligned at different positions.
00:14:57 [RP]
First, the location of the program on the disk itself.
00:15:01 [ML]
So the the alignment of the addresses, like if it's an address, it's an exact multiple of 16 or something versus one offer that changes performance. I don't really understand why. I mean, like if you're running a loop like that all the code should be in the micro op cache. I don't care why that happens, but I've also seen, you know, up to like, close to two times factor like for a for a small hot loop just based on, you know, like you'll you'll change some code that's some other place in the interpreter, and you'll run the exact same code. It's the exact same machine code. You can look at it. Check and it will be at different speed based on how it's aligned. So yeah, that's very annoying as a as an implementer.
00:15:46 [CH]
Yeah. So the project that the talk that Emery Berger gave was a profit, I want to say UMass, but I could, I could have that wrong. He basically shows all these like foot guns and things he totally wouldn't expect that are like affecting perf and profiling and then introduces some framework called Mesh that its goal is to like basically randomize like a ton of system environment variables and all these different things that you wouldn't expect. And so you're kind of doing this, like random sampling of stuff that supposedly is supposed to make, you know, the profiling more accurate anyway. This is all to say that I agree profiling is a tricky thing that being said, especially when you're sort of releasing a language, it is, I would prefer to be able to, you know, compare it versus other things, right. Like if if Rust came out and said sorry, no, you cannot, we will like, we will come after you if you try and compare us against C++ like that would kind of be like a nonstarter.
00:16:40 [ML]
And by the way, here are our benchmarks against C++. You don't have that, so you don't have that check where and this is to be clear, KDB has not done this against APL, but they've done this against other time series databases. You don't have that check where the other database vendor is able to say, well, you did this wrong and that wrong. By reproducing the benchmarks and going through. So you know if it if a K person was to run a benchmark and they saw OK's out. You know way ahead they probably wouldn't question it and would say all right, well, this is a good benchmark. Let's go. Whereas then when the other database maker is seeing this, they'll say, well, that's wrong. My code shouldn't be that slow I it should be able to do this thing fast and be able to check.
00:17:27 [CH]
That and a great example of this is when I was doing some benchmarking stuff. I had sent it sort of internally, and then Marshall pointed out that, you know, BQN was I was basically profiling the three consecutive odds problem. [08] If you've listened to my other ADSP podcast and it basically involves a sort a you know adjacent comparison which you can spell different ways in each of the languages and then a reduction at the end of the day.And obviously, you know anything involving a sort is going to, you know, obscure stuff. But Marshall pointed out that the way I spelled it in APL using sort of a point free for work was not the was not optimized at all, and he said you should switch that to a defun. This is the actual sort idiom that is like performance optimized. I had no idea. Sure enough, I went and ran it and then it was like speed up, way faster. And that's the kind of stuff that Marshall's talking about. Like I can go and mess up as someone who's trying to profile stuff and then put it out on the Internet and then people be like. Well, actually this thing is optimized. If you go and do this, you'll actually get what is possible with APL. And there's no way to do that kind of stuff. If you've got some restrictive you know, clause in your license that says.
00:18:36 [ML]
Yeah, I mean, but there is. That's a big problem with APL too, isn't it that the obvious thing is not the fast thing.
00:18:44 [RP]
No, you'd like it to be. Yeah, I think that's true.
00:18:47 [ML]
And so, I mean, there's a question of what is your benchmark measure do. You do you want to measure what the best performance you can get is. Or do you want to measure what what a typical programmer would see? Or do you want to measure what a new programmer would see? Or I mean they're?
00:19:02 [RP]
Well, I mean, if I had to pick typical sounds reasonable, doesn't it?
00:19:08 [ML]
All right. Yeah. So just find some typical programmers and have them all write code.
00:19:13 [RP]
Well, I think part of.
00:19:14 [ML]
Make sure they're all exactly as typical as each other.
00:19:17 [RP]
So I'm not really like a, you know, a super low level programmer or a performance buff particularly, but I did one video some time ago about reasoning about computational complexity in array languages, [09] and that's one thing that you do have, which is the ability through, you know, the way the primitives are constructed and put together. Think about how these short expressions should perform in I guess the worst cases talking about computational complexity. But then in real applications what happens is you find ohh this thing that lots of people are doing. The way they write it in APL, the way you have to write it all is nice. To write it in an array language and then executing each of the primitives 1 by 1 isn't quite as fast as if we did the same you know thing overall, but you know we know of some clever algorithms that we can we can code in the interpreter. So that's kind of part of what you're talking about with these hidden tricks that you might you might not know about. That also makes it really it makes it difficult to know what's the best thing to do. But on the other hand, you have this simple set of rules for most of the language, for the typical cases which I think is really beneficial.
00:20:40 [CH]
I think a part of it is also like PGO like Profile Guided Optimization and as you start to learn a language more and more like at first it may not be the typical thing to do. But as you become an expert in that language, you know the things like Marshall pointed out and like one of the other things I quickly discovered is that there's no equivalent of like an N wise reduction in BQN and up until doing this profiling I had always done the like moral equivalent or spiritual equivalent, at least of what I thought, which is like I believe it's called Windows, [10] which is the dyadic form of range, which is the equivalent of IOTA in APL. So basically you you turn a rank 1 array into a matrix and specify how many elements do you want on each row. And once you have that matrix, you can then perform a rank one operation on each row and get the equivalent of a sort of sliding reduction or an N wise reduction. But in profiling that I realized holy smokes, the creation of that matrix is is brutal. And really what you want to do is like a take and drop one and a drop -1, and then you have got like two different arrays that you can just do a binary operation on and that is the performance equivalent of doing the N wise reductions. But like, that's the kind of stuff is that like now I know I'm probably still in YouTube videos going to do the Windows reduction because it's easier to spell and I think closer to what you can do in other languages like J and APL, but knowing that if I actually want like the performance equivalent of what I'm doing, I need to spell it this way. Like, that's the kind of stuff that you start to learn. Like would it be ideal if BQN had some version of that, you know, noise reduction? So I could save myself 3 characters or 4 characters and still get the same perf. But like that's I think learning any language, you just need to start to pick up those things over time to know like is performance important here? If so, use this method if if not like the exact same thing happened in the video I made yesterday I did to sort a list of numbers in reverse in Python. I did sorted paren, the list of numbers and then bracket, colon, colon, -1 and bracket which is like a shorthand shortcut for reversing any list of numbers and I got like 4 different comments on YouTube and one on Twitter that was like there's a parameter to the sorted function that is just reverse equals true, which is much more idiomatic and much more performant, but I even said in the video like this is probably not the best way to do this, but it's the terse system what I want for the purposes of this video and anyway, so it exists in every language, including the most popular ones in the world like Python ramble over.
00:23:12 [ML]
Yeah, I mean, so there's a big question of, like, not only what's the fastest the language can achieve, but does the language guide you to better performance which you're not going to see on a benchmark because the benchmark usually tries to show you know how fast. It's the good way to do this. Perform. Not you know how many tempting bad ways to do this are there? And Windows and BQN is definitely a problem like that? There are designs for Windows that would perform quickly, but what I was thinking when I designed Windows was trying to make something you know that's more approachable for someone coming to BQN. And I decided what I wanted to do was to have, you know, the the rows be windows so you can easily see your left argument says the length of the windows. So if I do 5 Windows, I get all the slices of length 5. So at least if you have a short window the the more CPU friendly way to do this would be not actually to arrange it like that, but instead have a list of two lists where the first list is all the first elements of every slice and the second list is all the second elements of every slice. And so that way if you do a subtract reduce on that then you can you can subtract those two lists and it'll workout quickly.
00:24:31 [CH]
And from a compiler point of view too, where I guess it's not. It's interpreted, but like you could.
00:24:36 [ML]
No, it. Well, it's compiled some and then interpreted some like pretty much every language so.
00:24:42 [CH]
I hypothetically though, like you could put in some kind of optimization where you recognized a sequence of primitives that is windowed followed by a rank one reduction on rows, and you know in code that so that you don't have to materialize that rank two matrix like.
00:24:58 [ML]
We already made some improvements to Windows that were based on on what you were measuring. I've noticed another benchmark with Windows too. I didn't really think people would be be benchmarking with it so much, but apparently people really want to do like N wise sums and stuff when they're when they're benchmarking.
00:25:16 [CH]
I mean I have like a in my top ten GitHub repo where I compare different problems. The only one where APL is the winner, I think beat so of the 8 problems. I think BQN probably wins five times. Haskell wins twice and APL once and the only one where APL wins.
00:25:32 [ML]
And this is in terms of how nice the code is you're saying.
00:25:34 [CH]
This is completely subjective. It's has nothing to do with performance. It's like how much I like the spelling of the solution. That's it. Like perf doesn't matter. You know it's elegance and, you know, beauty is in the eye of the beholder. And the only one that APL wins in is the three consecutive odds, because specifically it has an N wise reduction where you just are passing this binary operation to a higher order function essentially, whereas in every other language you have to do something a little bit more verbose or it's something not as elegant. So like in general I think BQN a lot more things like under and inverse and more combinators and the sort primitives and I could go on and on, but there are like at some point I'll make a talk of like a Venn diagram of the features that like all array languages have and then like what BQN has that the others don't. What APL has that the others don't, and what J has that the others don't, and I think like that I I mean, I actually don't want to make that video. I want someone else to make it but no one else is going to make that video?
00:26:38 [BT]
And as as you say it's subjective and we'll never know for J because our digraphs are just inherently ugly to you, aren't they?
00:26:46 [CH]
Every once in a while I do solve a problem in J and then I'm like this is close, but it's just always there's like one or two or three extra characters. I think the closest J might win if there was something involving primes cause you do or like you get to the more esoteric that like is specifically built on some math function, and then I discover, oh, J actually has that math function. But yeah, there's there's hope for J, I still uh.
00:27:10 [BT]
I don't think as long as long as the big issue is whether we have dots or colons following our primitives I don't think there is hope for J, but that's OK. We don't mind.
00:27:21 [CH]
No, know what? We haven't, we haven't talked to Henry yet too, is we got the upcoming full episode. And from my understanding of folds. [11]
00:27:29 [ML]
But those are dots and colons.
00:27:33 [CH]
But if you can spell like the double fold pattern that I'm trying to think of, which is actually, it's poorly named cause there's a single fold, it's a one pass thing. I gotta, but what should we call it because originally I was calling it A2 Operation Fold A2 opt fold.
00:27:45 [ML]
Do you mean like a fold over 2 arrays together?
00:27:48 [CH]
So it's like a scan followed by a reduction on my other podcast, our guest, Zach called it a scanduction, but it's the idea that, like you, don't really need to do a scan followed by reduction in a sophisticated enough compiler interpreter, it would see the juxtaposition of those two things and go. Really, we just need a single reduction that's carrying two pieces of state and throwing one away at the end of the day and just returning one.
00:28:13 [ML]
Ohh yeah.
00:28:14 [CH]
I'm not sure if J Fold can do that. I know it can do things like that, but anyways, other episode we're talking about performance right now, there's hope for J.
00:28:24 [ML]
But now is the time to say that performance really doesn't matter.
00:28:28 [CH]
Alright, alright, go ahead Marshall. Tell us how you feel.
00:28:31 [ML]
I feel like performance is something that is that's very prominent that people use to that they do use to, you know, investigate which language they would like to learn that they use to justify their choices of language. But really, performance is not that much of a concern in you know whether a programming language works for something, and so you see in terms of practically cases like APL has a few, has a few efforts that are that are solely aiming for performance like the apex parallel compiler is an early one and codfns is one now and what you see is people only use these because they have less features.
00:29:15 [CH]
Sorry, they don't use them or they only use them.
00:29:18 [ML]
They don't, I mean.
00:29:20 [RP]
Yeah, they don't use them. Like I have no evidence that that I mean and Dyalog API is used by all sorts of companies and stuff.
00:29:27 [ML]
As far as I can tell, Apex and Codfns have never been used for a major project. [12]
00:29:32 [CH]
And this... And you're saying this is because it has a subset of the full functionality provided by Dyalog APL, which is where most APL people are coming from.
00:29:38 [ML]
I mean so that clearly establishes that you know the number one thing is to have a comfortable programming environment. And I mean this is in the domain where APL is dominant. I mean there are other domains so, but in these I mean people are typically writing, you know the direct low level code, they're... you know, writing GPU shaders and stuff like that to really get full performance out of whatever hardware they have available. But in the domain where APL is, which is, I have a problem and I want to solve it. Generally the CPU can solve your problem in a very short amount of time. And so, I mean, yeah, if APL was a million times slower, that would be a problem. But you know, getting another factor of 10 out of APL is not actually, that much of a benefit for for what the users see.
00:30:27 [RP]
But I guess a lot of the Optimizations of the array primitives like where it matters. Doing you know SIMD style processing. Yeah, that stuff kind of already existed anyway. Those types of performance optimizations happened a bit, and in bits and pieces over time, in the cases where people, you know, people might have relied on that and now. Having like, say having the environment, it's it's far more important than, they say 10%. I don't know what. The we need more co-dfns benchmarks.
00:31:00 [ML]
Yeah, well, it's not a matter of like, oh, if you just had enough of a performance increase, it would matter. I mean, for every task you do, well for most tasks, I mean, there are some things that you can just pour more and more processing power into. Finance is a good example of this because ideally you would have enough computing power to figure out exactly what everyone else is gonna do. So you need more computing power than the rest of the entire market. But most things. You have a task to do you want to process some image, apply some operations to it and once you finish it, it's done and so then the speed as the speed increases, you go from... This is just impossible, no one would ever wait for this, to you need huge amounts of processing power to, well, you can finish this on your computer, but on your laptop or whatever, but it's inconvenient to this finishes as fast as you can even ask for it, and at that point no amount of performance increase is going to do anything for you for this particular task. Now people do a lot of things and some people, I mean most people have something that they need to do that needs to be really fast. And I mean, for those there are, there are lots of tools and tricks that you can reach for, but for other things, it's like, why are you chasing the fastest programming language? When you when there are tons of programming languages that are fast enough.
00:32:27 [CH]
Bob, you're going to say something.
00:32:27 [BT]
Is it like so take it back a step I guess. In terms of performance, is it possible to evaluate languages based on how they can improve their performance? In other words, languages, there are some languages that typically because they focus so strongly on performance and I do think k is probably shakti is one of those that Arthur [13] is so tied into performance that he's just going after that: fast, fast, fast, fast. And so he, you know, it's a language that is going to be at least focused on performance. There's a lot of other things about it, but one thing that you know is he will be making it as fast as it can. Other languages aren't like that. Is that a valid way to evaluate a language to whether one of its focuses is to be performant or not?
00:33:17 [RP]
If you're doing stuff in the domain where it claims to be performant, I guess.
00:33:21 [ML]
If you are in that rare case where you need the top performance, then you don't want to pick something that where the there's no performance concern at all. Although really there are not that many mainstream languages that aren't optimized in some way, it's just a matter of what things are good for. I mean, some language designs are inherently, you know, held back by various concerns about, like you know, having features be possible for the programmer.
00:33:53 [BT]
I guess what I'm saying, I guess I'm sort of dancing around the topic, though, taking it out of the array languages for something like Python, I don't think very many people claim Python is inherently performant. It's not really fast. What it is good at is providing lots of glue.
00:34:09 [ML]
Although it is over the past, just few years it is getting a lot faster, so it's becoming a worse and worse example.
00:34:15 [BT]
OK, so but fair enough, but the thing is if you if, say 10 years ago you started with Python, you weren't, weren't choosing it?
00:34:22 [ML]
Hopefully not.
00:34:23 [BT]
Well, I guess that's the question. Can you evaluate a language based on its potential to improve?
00:34:27 [RP]
But you weren't using it 10 years ago because you were hoping in 10 years it's going to be a bit faster either. You're using it because you can just look up some docs for a package with an API and plug your data in and get the answer in. Like half an hour.
00:34:43 [BT]
And that's your performance, right. It's not the speed on the machine, it's the speed...
00:34:47 [RP]
Programmers are expensive. Compute computers are relatively cheap.
00:34:53 [CH]
I guess my question is. And maybe you can qualify your statement Marshall. If I've missed, I miss understanding what you're saying is just you're saying for at least what I heard was that. In the domain in which APL and Array languages operate for most cases, performance isn't really the key thing people are reaching or using those languages for and my question is like...
00:35:23 [ML]
Maybe the thing they're asking for, but in that case, they're often misinformed about what they really need.
00:35:29 [CH]
So I guess my question, my follow-up question that is. Is that just a function of what APL and the array languages evolve to be? Because one of my evolving theories is that from what I can tell, reading Iverson's papers and his work, he did not care about performance or like the implementation.
00:35:53 [ML]
And I'd agree with that.
00:35:54 [CH]
He cared about: notation as a tool for thought and the functionality of what you could do with the primitives and the choice of primitives. And I think because of that, my kind of theory is that these languages evolved, you know, in a way such that that's not what the programming language implementers were concerned. It wasn't their primary concern. Therefore, you end up with languages that aren't extremely performant, and then when something like machine learning comes along, which is just a bunch of matrix multiplication. The language that is chosen for that is not these array languages like APL, J, which to me seems like a the missed opportunity of like a century that we had these languages where the primitives were arbitrarily ranked arrays, but they weren't chosen because they didn't have performance as like one of their top three things. Therefore they weren't a good fit because when you're doing choose your AI application, you know self driving or self driving cars or you know, optical character recognition, there's a bajillion of them. You know you're not reaching for APL, and there's been papers. I know there's one by Bob Bernecky and friends on, like, convolutional neural networks in APL. [14] But the the like the hard truth is that like these languages are not used for these performance critical applications that involve sort of neural Nets and AI different techniques. And I think that like if history had evolved differently so that there was a language, an array language, maybe not APL or J, but one that was, you know, and that's The thing is, technically, I mean, we're not talking about q. But like, you could argue that. You know k q were the performance sensitive ones, but as we mentioned, they were focusing on Wall Street and Time series databases stuff not on the you know matrix multiplication and the stuff that's necessary. Anyways, I'll stop talking like interested to get your thoughts on. Like that sort of angle is like is it a Self fulfilling prophecy or is it something different?
00:37:50 [ML]
Yeah, well, I have a few things to say about that. The, I mean, the first is that early APL implementers were definitely concerned about performance. APL was pretty competitive, definitely with other interpreted languages in the early days. And even the APL 360 had already. It had packed bit booleans, which is a pretty fancy optimization. And APL is also to the APL, you know, implementers did one of the first just in time compilers, which is APL 3000. So there was a whole lot of work on performance in APL in the early days. The other thing to note is that although Iverson did not care about performance, he appears to have been very lucky in stumbling on this paradigm that is inherently high performance. So much so that some of the vector instructions in CPU's now in AVX 512, they compress and expand. Instructions are just lifted directly from APL like they took APL functions and decided this was the best way to make CPU's faster. So I don't think the problem in APL getting picked up was anything to do with performance, because it definitely have those capabilities and I mean. Sure there is no APL that was, you know that did general APL. stuff and was running on the GPU at the time that people were picking up machine learning stuff. But what went against APL? It was custom frameworks that were, you know, all the tensorflow. I think it was really tensorflow in the early days was the... [15]
00:39:33 [CH]
And then PyTorch later on.
00:39:34 [ML]
Was the one and then later stuff like Pytorch, but stuff that was purpose designed for machine learning. So they took, they said alright, we have these machine learning problems. This is deep learning is getting a big a big field. We will design these frameworks especially to work with this and to run quickly on GPUs because this is our requirement. I don't think there really would have been any problem doing this with something APL-like. What I've seen, and I don't know a ton about machine learning, but what I've seen from like machine learning and APL papers. Is that the solutions they have spend quite a lot of time doing, like writing manually out like what you do for back prop and stuff like that. And that's tensorflow's is big advantage is doing that for you automatically. So I would really say the thing that made tensorflow that puts tensorflow ahead. Was actually having a better feature set for machine learning, and secondarily the performance because I mean you do need to run on GPUs if you're going to learn anything fast.
00:40:40 [CH]
I mean most most of PyTorch and Tensorflow and all these frameworks are tied in like they have multiple back ends. But like I know specifically PyTorch one of the back ends is like cuDNN which is a library that NVIDIA makes so like the. There's a high level part and then the back end part and the back end part typically is not you know it's written by, I mean maybe maybe some of the backends are written by folks at Facebook or wherever the most I know it was a Facebook project at one point. But it's also hooking into, you know, tuned GPU code that was written by GPU experts at NVIDIA, right.
00:41:12 [ML]
And I mean it's also worth saying that what these what these frameworks are actually running now is is low level intermediate representations that are designed specifically for machine learning. So if you want to build any language you want to. On top of those representations, you can do it and compile down to MLIR's one. And indeed, good machine learning performance. So I mean yeah, the the performance of the language is not actually a challenge here. It's getting a language that people want to use for their machine learning.
00:41:44 [BT]
And that might be purely familiarity if.
00:41:46 [ML]
Oh yeah, definitely. I mean, I'm not saying that PyTorch is, you know, inherently the fundamental best way to represent these machine learning operations. I mean, it's definitely got some stuff that's chosen to work with machine learning. That's pretty nice, but. I mean, a lot of it is just what what looks like the sorts of mathematics that people are familiar with or what looks like, you know, what's easiest to explain based on people's past experience and that sort of stuff. But yeah, I think it is primarily a usability problem that these new machine learning frameworks solve instead of a. I mean it's a problem of having the usability in a way that the CPU can optimize at all. So I mean if you write nested loops and C, you could probably write machine learning that way. I mean, I also don't think that would be as ergonomic. But also, it's pretty tough to get that to run on the GPU, right?
00:42:42 [BT]
I guess you know for j, we're waiting for for number theory to take off because you know we're gold then.
00:42:51 [ML]
Ohh quantum computing. Has got some number theory going around there.
00:42:56 [BT]
I mean the question is, as you say, is if it say, say you took something really wild like quantum computing wherever that goes, I mean it's it's the.
00:42:56 [ML]
A little bit.
00:43:05 [BT]
It's the cold fusion kind of our age right now, but if you you had libraries that made that easier, your language would probably take off. If that became more popular and became something that could be done.
00:43:17 [ML]
Yeah, but in this specialized domain, so I actually do think it's really great that, you know, APL has remained a general purpose programming language because now anybody who wants to learn an APL or a one of the languages in that family can then be able to apply it to problems they're familiar with. Although I mean some areas are easier than other others, all the web stuff is pretty tough because then what you want is a bunch of interfaces you know to connect existing, you know, connect whatever the server gives you to to your piece of code or something like that.
00:43:54 [BT]
It seems to me some of the challenges, the critical mass of the number of people actually who know the languages and then would produce libraries and would be interested in those developing areas and whether they can compete against a much larger group of people who are working on bigger projects and have funding and a number of other things behind them to swoop in. I'm trying to think of the Y Combinator Paul Graham talking about the his magic. you know, wand was Lisp that he could program things in Lisp and that gave him an advantage over other people. [16] I'm not. I don't know whether if you don't have a critical mass, whether that's true of the array languages anymore. I think some of these other languages have got so big with their libraries. Your you might stumble on it. I mean, there's always theory and there's always things like number theory where you get the right if you get the right person thinking about a problem, you can solve a problem very quickly. I think ???"Bob Berknakie had a paper about Big 10 versus algorithms", which is really good because if you get the right algorithm, you get the right thinking process about it. I guess that comes back to what I think is the goal about these languages is I think they help you think about problems and I think that's their magic.
00:45:11 [ML]
But it's no substitute for having somebody else think about the problems for you so.
00:45:17 [BT]
That's true, yeah. Or have a whole group of people think about the problem.
00:45:23 [ML]
So I mean. Yeah, APL is a great paradigm and it scales well. But what also scales well is just having one person solve the problem and then everybody else use their solution, and then the programming language doesn't matter so much so.
00:45:36 [CH]
Well, I've just been trying to. I've been What's the word? Mulling over what you said because I definitely. I mean, I work at NVIDIA, so I know that there are a certain category of companies and individuals where performance is... they're running. You know this the scale of what they're running is it's immense and perf is like a it's non negotiable. And those are the kinds of people that, even if they have to go and program in a less ergonomic language than Python such as CUDA C++ they will. They will go and suffer through. I won't say suffer, but it's definitely not Python and it requires a higher level of expertise, but they're willing to do that to get maximal perf. Even if it's less ergonomic and a part of you know my job NVIDIA is trying to create a better story for that. You know we want. Ideally we could have a more ergonomic language and make it easier for individuals and companies to, you know, conveniently and expressively, solve their problems and get maximum perf at the same time. So I'd that category of person exists, but when you sort of mentioned that it's, I don't think it's really perf that drives that, like there's no reason that APL or J or some array language couldn't have been the lingua franca for machine learning and deep learning. It was more that the ecosystem didn't exist, and the popularity might not have existed. So the lingua franca ended up being Python, which was better suited for or not even better suited. It was just a more popular language at the time and certain people started creating libraries and then people started speeding up the back ends to.
00:47:15 [ML]
Well, I mean you can't say in any sense that the people that the. That the high performance code is written in Python. It's written in a Python library.
00:47:23 [CH]
I'll find a meme. That is a Corvette. It shows a Corvette and then it says Python on top of it and then the next photo below it is the Corvette on top of a whatever a truck that is carrying a bunch of cars and that truck is C++ and that's that's the joke. Is that any performant library is really actually and on the back end written in C or C++ and if it's actually performance. It will be a multi threaded or GPU accelerated thing. But I guess your sort of argument is that you need to put your language in. A place where it has got enough of a following and enough of an ecosystem so that if it comes to a point where performance is actually something that you need for your application, you now have that ecosystem that says, hey, we have this library, let's go create a new back end and bindings and we can go write the back end in whatever performant language you need. And accelerate that, which then. Kind of makes me think.
00:48:12 [RP]
We've added the yeah, you've just added the big caveat of back end and you know. A pluggable API. This type of abstraction that actually the array language is a kind of antithesis to in some ways right. In theory. In theory you should become. You should have like. OK, we've got a primitive called reverse (⌽) and like for whatever date you've got this is gonna be the fastest reverse that you can fastest way of turning the array. You know the elements in the in the array the other way round, and it's only when you get to like this whole string of things where you've written it in that way because it's a really nice way to express it with these primitives, with the syntax. Ohh, but actually like the thing you're trying to do, there's this other way to do. That's like way more faster. Whereas in Python you know you don't care because you got this name over it. That's abstracting away the details of how you're actually doing it, but that means that also means that. You can just go ahead and and swap out the back end with. You know this thing written in rust or whatever.
00:49:10 [CH]
The follow up to my sort of ramble is that there are languages you know funding. We when we had trolls on, we got the whole story of how funding was and grants were given to the University of Copenhagen, which led to a couple of different projects. But the one that sort of still exists today is futhark, [17] and on top of that, there I believe his name was Bodo Scholz has the Single Assignment C language, which actually is of similar closer to Python as you just mentioned. And it actually does have multiple backends. I think it has a CUDA back end and it has a a couple different accelerated backends, but like, I don't think anyone would disagree with the statement that like those languages have not succeeded at like a sort of large scale, and so when you say Marshall, that there's no reason that you know that language kind of couldn't exist and couldn't be successful, my mind goes to like Futhark and Single Assignment C, which haven't really been successful. And my question is like. Was it just the ecosystem and popularity of Python and it was like history would have had to evolve differently in order for one of the a language like SAC or Futhark to to be the go to. Or you know, why is it that people are coding in Python and and not in some accelerated array languages I guess my thought?
00:50:25 [ML]
Well, I mean, there's different things and I think. I mean for Python the answer is pretty easy. I mean people like programming in Python and that's all there is to it. Because performance is not a consideration there.
00:50:37 [CH]
I love programming in BQN I. Just made a video about it on Monday.
00:50:42 [ML]
But I mean there is this other segment that's the people that want the the most performing possible and I think Futhark like fits pretty well in that segment and it might see some uptake. Dex is another one that's produced by Google and I think that may be seeing some use.
00:51:00 [CH]
We should have someone from Dex on.
00:51:01 [ML]
I mean, these have fundamentally different goals than APL does. They're trying to... They start with saying you need to get the full performance out of your GPU and you know be able to use the all the capabilities that it has and then they build a language around that. And that involves being able to do some stuff that well, it's kind of a completely different execution model from APL and it involves, you know, cutting you off from using like first class functions in certain places when their particular methods for taking things to the GPU don't support what you're doing. Where APL is designed you know the completely other direction. What's a good language for working with these arrays and stuff? What's amazing is that gets you, you know, very good performance by interpreted language standards and even pretty good by compiled language standards. But it does not get you good performance by these restricted compiled language standards where you're giving up some programming capabilities in exchange for performance capabilities. So I mean that part of the market you would have to do something to APL and I mean that's kind of what Apex was trying to do. Two, maybe you can argue that Apex is too early for it.
00:52:26 [RP]
But this is I mean this also you said about like people don't use co-dfns and that is true. But what I think the... But the usage model I think for some people is supposed to be a bit like this where you've got some part your application you're writing in the language where you appreciate all the features of being able to do whatever you want, you know, in the way that you know. But then you might have some part of it that is performance critical and then you can just like encapsulate that in a namespace, send that to co-dfns it turns that into compiled code which gives you that same API that you wrote as a function back and you just send your data to it. That way you just have to like have a part of your code where that matters.
00:53:09 [ML]
Yeah. Well, and I think co-dfns also has the problem that I mean for the segment that Conor's talking about, I mean for like for the people who are currently using APL, it's pretty good. Because if you can, if you can write in this subset of APL that's supported, which is a reasonably large subset. Then you're good and you can speed stuff up. But it is not as fast as as something like Futhark, where you can specify the types of everything and all the. And you know, have a closer correspondence with what the GPU is going to do so.
00:53:46 [RP]
And at a certain point, you're then reaching to what a library written in Futhark compiled to binary and then just hooking into that with whatever foreign function interface you've got, which are back to the black box of the different back end solution.
00:54:01 [ML]
The reason for this is that co-dfns is like it's still trying to support, you know, the normal APL. It's not. Apex did some stuff. I don't know if there was enough I mean I haven't used Apex. I've only, you know, tested out co-dfns once or twice, but from what I understand I mean to get things to run on the GPU, you have to know exactly what types are. You have to know how they're laid out, and some things that are. Like you can get a compiler to do it from APL code, but it's not going to do it very well, and so you're going to get performance that's still a long way off from, you know, actually definitely from somebody who's writing, you know, the C++ shader type code for GPU. And I think also from something that's written in Futhark, where it's the language that's designed around being used on the GPU. So I mean, I think the the stuff that's designed for GPUs can still pick up a lot of useful things from APL, but I don't know if necessarily the way to go is taking APL and trying to run it on the GPU because then you have a lot of difficulties and. In having the programmer supply enough information to tell you how it's supposed to be compiled for the GPU.
00:55:16 [CH]
That's a big problem coming from a dynamically typed like. That's the if for anyone that's programmed in Futhark very quickly the something you'll come across is that you you have to type, you know if you want to do a maximum reduction on a list of things, you're going to have not for all the primitives or functions that are in the prelude, but you know you'll have to call i32.maximum, so you have to specify you know the type of the reduction that you're doing, which is clearly anathema to any of the Iversonian languages where that stuff is implicit, which you know it's it's not doesn't complete.
00:55:50 [ML]
Well, I mean, but I can. I can jerk us around a little. We've been talking about the GPU and I don't think APL is that strong on the GPU, as I've said, but it's really great on the CPU. And actually I think that not having these types specified is a huge strength on the GPU on the CPU.
00:56:10 [CH]
Was like I was like. You don't think it's a good fit, but it's a huge strength. That was like a Marshall choose a lane, Choose a lane!
00:56:19 [RP]
Right, unspecified types good for CPU.
00:56:23 [ML]
Yeah. So I mentioned even back in APL 360, they were doing packed bit booleans. If your array consists of zeros and ones and the language figures this out (I mean say it's the result of a comparison, then it's always going to be zeros and ones), then it stores it packed. So each byte stores 8 zeros and ones, and this is eight times more compressed than even storing one bit in a byte, which is actually typically what a C programmer would do. You get this enormous advantage of having the data smaller, and this also means that faster algorithms can be used for that. So for example, if you're summing an array of bits (there are actually too many methods to even go over), there are a whole bunch of ways to to sum a bit array that are faster than expanding to bytes and summing those or something like that. If you want to sort an array of bits (that's not terribly common) there are other related things. If you want a sort an array of bits, you just count the number of ones in it and then [by] subtracting, the number of zeroes. And then, you write out that many zeros in order, and that many ones, and so that's much faster than it would be [by] sorting with a comparison function or something like that. So being able to compress these types even when the user doesn't ask for them, gives you a pretty big performance boost. And what that means also is that in cases where the user wants the safety of a large type ... [sentence left incomplete]. So say you're doing some sort of financial software like a financial database (you don't expect any of the users to have over a billion dollars, but maybe someday it will happen). What you do [is] you don't have to specify a type; you just say, well, each user's amount of money is a number. And then, I mean, this is not actually good with finance because you're supposed to use decimal floats, but you can do that in Dyalog [chuckles]. You just know that the amount of money is a number. For most things, a double precision float is good. Finance is like the one thing where it's not (so I messed up on the example [everyone chuckles]). But you just say "it's a number" and then, if everybody doesn't have a billion dollars [and] they have an even number of dollars ... [sentence left incomplete]. I need a different example. But it's like the number of cats they own; that's never going to be higher than 128 and it's always a whole number. So that's good. Then your database can optimize that and say: "well, I'm going to store that in a one byte integer", and do all sorts of fast stuff with it. But you have the safety: if anybody ever does acquire 128 cats, you're still going to get the right answer, just a little slower. So I mean, that and and stuff like text processing, where all your indices are typically less than like a two byte integer (because you know, the text file that you're processing is probably just not that big). That's a big speed advantage, and you get that with the full safety of having a large numeric type backing you up.
00:59:33 [BT]
Aside from counting cats and high finance, I'm going to toss out an analogy that's going through my head right now, that kind of builds on what Marshall had said earlier, that performance maybe isn't the most important thing, in some of these languages or maybe any of the languages. And that is if you use the analogy to music, I don't think there's anybody who would dispute that somebody, Eddie Van Halen say for instance in a rock band ... [sentence left incomplete]. Van Halen was a virtuoso guitarist, you know, many people have said. Somebody like Prince another incredible guitarist. But you wouldn't say that they are necessarily a better musician than Yo-Yo Ma, who might be part of a string quartet or plays the cello. But they're two very different things. And so in essence, I guess what I'm saying in some ways rock bands are kind of like the Python: they've got a big audience. You're unlikely to have a huge audience watch a string quartet, even if you could in a huge arena. You're probably unlikely to want to watch that sort of thing that way. So you're kind of choosing what you want to watch. The type of music dictates the size of the audience to some extent, and I think some of the array languages (if they were to make that crossover) they have to make the case that (like a crossover between genres of music), you have to be so popular that they'll bring you across to be in a different genre because you're that good in that what you do. But if you don't make that jump, you'll be known in another area and it may never, ever get that big. I don't know. Does that analogy make any sense at all?
01:01:27 [RP]
The end sounded a bit like regular expressions [chuckles].
01:01:32 [ML]
So the idea is like, how you could bring APL to a bigger audience?
01:01:37 [BT]
Well, the idea is that if APL is to reach like what we started talking about, with some of the other languages that have taken off with machine learning ... [sentence left incomplete]
01:01:45 [ML]
So like, Van Halen is the Python or whatever, right?
01:01:49 [RP]
I mean Yo-Yo Ma is pretty popular, so I don't know.
01:01:52 [BT]
No, that's a case where I ... [sentence left incomplete]
01:01:53 [ML]
You need a worse cellist!
01:01:55 [RP]
Yeah, you need a more obscure cellist [chuckles]
01:01:57 [BT]
[chuckles] Yeah, but he's one that's made that crossover. So, he has been so good and so popular, but I don't think Yo-Yo Ma plays arenas. I know he plays big concert halls, but I don't think he plays arenas.
01:02:13 [ML]
Yeah, which are really not even the same order of magnitude, are they?
01:02:16 [RP]
True. Some people hear about APL or languages where the basic data type is arrays and certain things are optimized for arrays and I guess, like Conor was expressing earlier, you get this impression that, well: "OK, these types of problems, they should be like rocketing through with performance; this should be the best thing". And there's quite a steep learning curve as well, right? So, there's a fairly significant amount of effort you have to put in to gain proficiency in these versus certain other more accessible, arguably accessible things. And maybe there's that feeling that: "well, OK, if I knew that after all that effort, I would get the most performant, then I'd be willing to put in that effort". And we're kind of saying here that maybe that's just not really the case.
01:03:09 [ML]
But I mean, then there's that "most". I mean, I can with BQN, today, run 100 times faster than anybody 20 years ago could accomplish with well, maybe on a supercomputer [chuckles]. And then go back 30 years or 40 and you know, even the supercomputers aren't going to do it.
01:03:29 [RP]
Yeah
01:03:31 [ML]
So why are you saying that my goal is to get the most performant language instead of just, you know, outstanding performance? I don't want to say "best", but I mean there are some things in terms of expressivity that APL can do that (or that [the] APL family can do) that you just don't see outside of there.
01:03:51 [RP]
I mean, that's a much harder headline sell [chuckles] than "fastest column store; database time series".
01:04:01 [BT]
And in some areas like financials, the sell isn't to a CEO that: "you can program this computer to be the fastest thing ever". It's that we can put people in your company who can do that for you. Nobody's expecting the people who are making decisions to be able to do that, but they're saying: "this is a tool that you have people that could use, that would make you that much more powerful". And so, I think the learning curve I think is an important part of it. And then, similar to my music analogy, I think there's a little bit more to music appreciation for the general public if you were into classical music, than there might be for rock'n'roll. I'm not saying there's not a huge level of appreciation within rock'n'roll. I'm saying if you take a typical (again, normalizing things!) classical music person, they're probably going to be aware of more things about the music than a person who wants to watch a rock band is, just because of the immense power that's coming off the rock band. That's the visceral thing that [latter] person will pick up first. After that, they might get into the musicianship. I think classical music people are going to appreciate musicianship first and so that's the learning curve to get into classical music. And maybe opera is a crossover of that because it has a visceral push and more people are becoming interested in opera or music theater because it has that push of emotion.
01:05:31 [ML]
You know, I think one of the really big strengths of APL actually is that you don't have to learn a lot of specifics in order to get good performance. So you need to be able to solve your program with APL, which is a big learning [step] (or with with array programming). You know, you can't write APL like Python And get good performance, but what APL does for you actually is by using pretty normal strategies of just writing things with arrays and doing things that actually, improve your ability to solve problems quickly (and write less buggy code and things like that) you can get good performance without knowing a lot about the machine. The array language implementers are the ones who have to learn all this fancy stuff about branchless programming and SIMD and caches and all that. And then, if they've done a good job, you as the array programmer can get world class performance. And I mean, actually faster than C at times, on the CPU. I've mentioned this on the podcast: we did our bootstrapping compiler with BQN. [18] And we wrote the C program in the way you would write a fast C program. I don't know how to speed it up. And it's slower than the BQN program, (not on the GPU, but on the CPU where you're running completely general code) you can actually get even faster than compiled languages. With BQN, in this case. But all you have to know is what operations are going to be good. So things like plus scans are good; reductions are good and the basic primitives working on lists [are good]. And you need to know how to break your problem down in terms of those. But if you can get those down, you can write all sorts of text processing tasks and have really high performance without knowing much about machine. Or I mean really, anything.
01:07:42 [BT]
And essentially I think that's what Arthur has done with with Shakti and the different versions of K. He's tuned the machines to do a specific task very, very quickly [Marshall agrees]. And it's a task that people require to be done and they don't need to know that they are standing on his shoulders to be able to do what he can do, at the lower level.
01:08:04 [ML]
Well, and if times change, you know, different factors in the machine become important (and I guess we'll start to see, you know, some GPU integration and array languages as implementers start to get a handle on how to use GPUs). As times change, you write the same array code and it is updated with the hardware so that same code runs faster.
01:08:32 [BT]
And that's where the advantage of being able to formulate your ideas in an array language becomes important, because at that stage, your advantage might be between the ears, not the machine. The fact that you can think about a problem in a number of different ways, come up with a number of different solutions, bounce it off the machine, then say "well, that's the quickest one". You might be able to find that quicker than a person who's been used to programming in a non-array language.
01:09:00 [ML]
Yeah, and one of the really big advantages of what you can do [with] array programming is that when you write that high performance code, it's not that far off from just straightforward normal APL. Maybe once or twice you'll use a workaround for a certain primitive that would get the job done better but just isn't fast enough. But I mean, basically you're writing APL, and you can rearrange and reorganize that as APL and you can read it and get a much better understanding. So if you try to simplify your code and improve your ways of doing things or add features or adapt it to stuff, you have a good programming experience with that as opposed to if you're writing this high performance C [chuckles] with all these nested loops and fancy blocking things and branchless algorithms and all that stuff. Then you'd be wading through that, because your details of how you perform well on the machine are mixed with the details of just what you're trying to do. [With] APL, you're able to mostly just write what you want to do in this particular array programming style and then it's fast and that's all.
01:10:22 [BT]
And I guess I'll ask a question to Conor: in case of C++, wouldn't the typical C++ programmer be in the same situation? They're not trying to write that ultra ... [sentence left incomplete]. Or they're not trying to create those standard libraries and things. They're trying to use them and just put the code together.
01:10:45 [CH]
I mean it's a mix. I think most C++ developers aren't doing things with custom allocators, and they're just using what the standard library provides you. Or even if it's not the standard library, you know Google has abseil, Facebook has folly, Bloomberg has the Bloomberg Standard Library. There's all these massive companies, with performance (I wouldn't say critical, but sensitive) applications. And yeah, most folks aren't writing those libraries. They're consuming them, so I think you're correct in stating that, but that doesn't mean that there aren't the library developers, whether that's the standard library or the folks working on abseil at Google, folly of Facebook, et cetera. So I think yeah, in most cases people are using very performant things, I think similar to languages like Rust as well. You know, Rust has a huge crates ecosystem and they're making use of the iterator trait or the itertools extension to the iterator traits, and they're just consuming that stuff. They're not actually implementing it themselves, but there's definitely a mix. I don't know what the percentage is. It's probably, 90%/10% or something like that. Something close to the Pareto principle where most developers are consuming the libraries and there's only a small handful that are actually implementing them. But yeah, it's definitely not comparable to the array languages where you're always just consuming. You're never writing a custom allocator in in APL and then modifying the way that your reduction works on some kind of container. You're definitely not doing that.
01:12:22 [CH]
We're 15 minutes past the hour as per usual. I have stayed true to my pre-show agreement of not saying anything inflammatory, at least I think so. But now for the listener that has stuck around for the full 75 minutes, I shall say the most inflammatory thing I can say without getting in trouble: which is that based on my profiling of the languages, we're allowed to talk about (aka APL, BQN and J), I assert that BQN is the fastest. Marshall, we're already over time. You have two minutes to explain why: go! [laughs] Or deny the claim if you wish to.
01:13:04 [RP]
Just link to the page on your BQN website (link in the show notes) so they [can] discuss all about performance.
01:13:10 [ML]
Well, I mean, yeah, there's a BQN page about performance. [19] I think Rich is right, that I should refer to that for anybody who wants real information. And for anyone who really wants real information, you should just go to bencharray and see the graphs of how fast each primitive is that tries to explain what performance you'll see.
01:13:28 [CH]
Bencharray? Wait a second. What is bench...? I know we haven't brought this up, benchmarksgame.com or something [it is actually https://benchmarksgame-team.pages.debian.net/benchmarksgame/], which is the website that compares all the different languages based on a few programs. This is clearly not. Now what is bencharray? Is that a BQN thing?
01:13:43 [ML]
Yeah, pretty much. So bencharray is my effort to benchmark stuff in BQN, mostly so that we have guidance when we're developing it. But I've also tried to present ... [sentence left incomplete]. There's a page that it links to where I try to present these benchmarks along with some information about what you can expect from the primitives. So it shows, you know, if you're sorting a 2 byte array, here's what the number of nanoseconds per element it takes on my particular CPU that I've been benchmarked on. And it explains, you know, here's what we do. Here's why you see this little bump here or whatever. So that's if you really want to get into BQN performance seriously. I mean, of course you should always write your own benchmarks, time your own program. It's hard to say how much, but a huge amount of the time, you will find that the thing that you thought was slow and try to optimize (if you didn't benchmark in advance), it's not actually the bottleneck and it doesn't matter at all. And there's some other part of your program that you need to be focusing on. So I mean, the number one rule of performance is benchmark everything. Measure as much as you can. But if you want to learn general rules about BQN performance, you can look at this and see how fast the primitives perform and then have an idea of which primitives you want to use more and which you don't want to use as much.
01:15:17 [CH]
Unfortunately, your 2 minutes is up and we didn't answer the question at all! [everyone laughs] I'm just kidding. Go ahead.
01:15:25 [ML]
So that's if you want, to get away from the drama and actually figure out how to make your program fast. But for the drama, yes, I do think BQN has some advantages, even in the language design itself, over APL and J. There are also some design decisions in APL and J that I think are holding them back in performance somewhat. One in APL, that's kind of interesting is actually that all the bits in a packed bit array (regardless of the system you're on), they're stored big endian, [20] which is the opposite of the way that Intel processors, and most (x86 and ARM) other processors these days do it. At the time it was kind of a mix of both and they chose to go with big endian. But this means that some things have to be flipped around and it slows them down. One of the big things I see with J that's really hurting performance is that it doesn't optimize as much in terms of the types it uses to represent things. So I said a big advantage of APL is that you can store things in bits or one byte integers or two byte integers. J pretty much just stores its integers as 64 bit integers, which is very large. And of course the floats are 64 bit floats. You can't really do much about that because you can't lose precision. So there's no smaller format that works. But with the integers, that's a significant problem for the kind of code that I work with, because you know I'd be seeing one and two byte integers, like a lot of stuff like compiling BQN (a lot of programs that I write are BQN related). And so yeah, for the problems I see, BQN's going to get that eight times speedup from storing it in one byte or four times from storing it in two bytes. That's not necessarily always the case, because other people work with floats a lot, and then for them, J is great because they get double precision at the speed that the CPU can do it. And I think this must be a lot of J's user base because I know they're very concerned with optimizing their floating point performance. Like their matrix multiplies are better than any other array language I believe. They've done a lot of work on that; good stuff. And things like that. So actually for some scientific applications, J is probably really good. So it depends on what you measure a bit. But also I think BQN has some has some good stuff that it does and we have worked a lot on our algorithms of course, so I was not terribly surprised to see your measurement that BQN is faster than APL and J here, because like I've also used bencharray. There's a mode you can use that I don't publish because I don't want to keep track of what changes are in APL and J right now. But I can compare against J and APL timings and for the stuff that we measure on bencharray, BQN is quite often a lot faster. Twice as fast as Dyalog and a few times faster than J (it depends on the type a lot).
01:18:38 [BT]
One of the key things I hear you talking about is the fact that you have bencharray and you have a way of breaking down ... [sentence left incomplete]. For somebody who is really interested in being performant, you're basically giving them the keys to say: "this is how it works; this is why it's fast; this is where its hitches are".
01:18:55 [ML]
Hopefully. I mean, one of the things is that there's always so much more detail that's affecting certain edge cases or things like that. So it's really hard to know how exactly the primitive performs.
01:19:08 [BT]
But in the case of J, I think what Henry has to resort to (I mean he can give some pretty good, really good explanations about how it's working and everything) ... but there are times where he just says: "here's the source code; take a look at it because you're going to know as much as I might about certain areas of this".
01:19:26 [ML]
And there are absolutely things where I can write them, but I can't say how that always performs. I don't know if there are bad cases and this is with more complicated primitives like the interval index or bins (is what BQN calls it). All right, so we do know how addition performs. We do know those things, but [for] more complicated primitives, there are a whole lot of algorithms that you can apply. Some are better in certain cases that I don't think is important, but I don't know if people really rely on those cases. So what I try to do is lean to stuff that has predictable performance. One of the big factors is it's not going to depend a lot on CPU branch prediction so it works as quickly on random data that the CPU can't predict as on regular data that it can. And maybe your data is regular and a branchy algorithm would be better, but it gets hard to say. I mean, I try to lean to stuff that's more predictable, that I can tell a story that is understandable about what the performance is.
01:20:41 [CH]
Well, I think that was fantastic and I'm glad I asked it [others chuckle]. I hope most listeners stayed tuned. I mean the ones that gave up at some point are not listening anymore.
01:20:53 [BT]
Probably somewhere around the music analogies, I'm guessing [everyone laughs].
01:20:56 [CH]
I mean, maybe we should make the cold open: "alright, now I'm going to say my inflammatory thing so that people will hang on until ..." [sentence left incomplete because everyone laughs]
01:21:04 [ML]
And then they know it's coming, yeah!
01:21:05 [CH]
Yeah, they'll know that ... Maybe that's what we should do. Whenever we're not sure people will listen [to] the whole thing, just put a fake cold open.
01:21:12 [ML]
Isn't that what we do?
01:21:16 [CH]
No, just put a fake one there. So they'll [be like]: "oh, when's it coming?"
01:21:18 [ML]
Then we guarantee they, listen to entire time waiting for it!
01:21:20 [CH]
Yeah, they're like: "why I was waiting the whole time!"
01:21:23 [BT]
When I when I'm editing, sometimes I wonder: [what] if I put something in nobody's ever going to hear? What would they do? [everyone laughs]
01:21:31 [RP]
We got this far [but] we didn't actually talk about how do you actually run your expression timings in any of these languages?
01:21:40 [CH]
Oh yeah! I was going to rattle those off super quickly cause it's super easy. We'll link something in the show notes, [21] but for BQN it's using a system function called underscore timed. For J it's using a foreign conjunction (I think it's called), which is 6 exclamation mark colon 2, which in most examples people just assign to a function named time. In APL it's bracket runtime [for] which you can specify a parameter, hyphen repeat equals the number of times, and there's also other ones in APL, like compare x.
01:22:09 [RP]
That runtime is a wrapper over CMPX from dfns.
01:22:13 [CH]
OK. And note that you have to load dfns (which is: right parentheses load space DFNS) in order to get access to that and then in q it's slash TS and then if you want to add a colon plus a number that will repeat the number of times. And depending on which language you're in, be careful because they multiply your ... [sentence left incomplete]. They don't do the division of the number of times you ran. So if you do something, [you might wonder]: "How is this 100 times slower?" You might actually have to do that division manually. What are you gonna say Marshall?
01:22:43 [ML]
Yeah, q is the tricky one because it is total time in milliseconds. So yeah, watch out for that.
01:22:50 [BT]
And if you do get in q, you can measure it but you can't tell anybody [everyone laughs].
01:22:55 [ML]
Oh but you can measure ngn/k. [22]
01:23:00 [RP]
And John Earnest's OK. Or is he more... I guess he was not performance concerned.
01:23:05 [ML]
No, ngn/k has has decent performance.
01:23:08 [RP]
Yeah, ngn is more decent performance. OK is not performant, right?
01:23:17 [ML]
Yeah, because it's written in JavaScript so.
01:23:19 [CH]
Well, I guess maybe the last thing we'll say is: if you have feedback on this episode because I don't know how we would break it up? Like the first, maybe 75% was a bit more philosophical. There was a couple of technical bits here and there, but really not until the end where we [where] talking more about implementation details. We have no idea what the listener enjoys or wants to hear more of. We talked about maybe doing a Part 2 or Part 3. If you are interested in hearing extended versions of certain parts of this podcast, please reach out to ... I guess we can throw it to Bob. You can reach out to us at ...
01:23:52 [BT]
contact@araycast.com. [23] That's where you send. And I'll just also get a shoutout while I can for Sanjay and Igor, who provide the transcripts for these episodes, which are often the way people consume them.
01:24:05 [CH]
Yeah, and uh, yeah, at least I personally am interested in the feedback people have listening to this stuff. Because I'm sure we have a certain category of listeners that could care less about performance. They want to hear more about annotation as a tool of thought topics.
01:24:19 [ML]
But they're the wise ones [everyone laughs]
01:24:22 [CH]
But there might be another category of folks that are really interested. I think it would be great to have another one of these conversations, whether it's more technical in nature or not. When Stephen's back or even not Stephen, but a representative from KX that is able to say up to a limit of what they're able to say. I'd be interested in that conversation. Whether the listeners would be or not, we will only find out if you let us know.
01:24:48 [RP]
Release the K tapes [everyone laughs]
01:24:52 [CH]
Yeah, this we want the Snyder cut! All right. I guess with that, we'll say thanks for listening and happy array programming.
01:24:52 [CH]
01:24:54 [others say in unison]
Happy array programming.
[music]