Episode 98 Transcript — The Array Cast

Transcript

Transcript prepared by

Adám Brudzewsky, Bob Therriault, and Sanjay Cherian

00:00:00 [Phineas Porter]

Hadley Wickham had this paper on how to manipulate data structures. And so I spent all my time designing all these pipelines. And I went down to the floor where the quants were trading U.S. equities, and I asked them if they wanted to use this new shiny new cluster. And they said, no, thank you. We just load it on this one machine here and then we use q to do everything we need to do. And I said, what's q?

00:00:23 [Music]

00:00:34 [Conor Hoekstra]

Welcome to episode 98 of Arraycast. I'm your host, Conor. And today with us, we have a special guest who I will get to introducing in a couple minutes. But first, we're going to go around and do brief introductions. We'll start with Bob, then go to Stephen, then to Adám, and then finish with Marshall.

00:00:48 [Bob Therriault]

I'm Bob Therriault. I am a J enthusiast.

00:00:52 [Stephen Taylor]

I'm Stephen Taylor, APL and q.

00:00:55 [Adám Brudzewsky]

I'm Adám Brudzewsky. I'm excited about APL.

00:00:58 [Marshall Lochbaum]

I'm Marshall Lochbaum. I'm the creator of bqn and Singeli. Been doing some work on k lately.

00:01:03 [CH]

And as mentioned before, my name is Conor, host of the show Array Language Enthusiast at Large. And with that, we are going to do a few announcements before we introduce our guest. So I think we're going to go to Adám first, who's got three, and then we'll skip over to Bob, who's got one. And then I will finish with a couple short follow ups.

00:01:22 [AB]

Okay, so by the time this episode comes out, [01] we'll have finished a round of the APL challenge. And very soon, depending on when you're hearing this, the next round is beginning. So the APL challenge is this very low level APL competition, where you have some quite simple problems and everything is self-contained. So even if you have just been listening and you've been thinking about, should you get started with something, this competition is completely self-contained. You don't need to go and study something to participate. You just get started, it will tell you everything you need to know in simple language and you're off to a flying start. So that's just beginning now. Then there is a meetup happening in Madrid, in Spain. And this is on the 18th of February. And it's run by the Functional Programming Madrid. It has an entry by somebody from Dyalog on APL materials. And then that's in English, if I understand right. Then it has a Spanish entry as well, about formal verification, which I think is not going to be in APL. But it's about functional programming. And then it's very early still, but the date for the spring meeting of APL Germany has been set. And also the location is going to be the 27th and 28th of March. It's a Thursday, Friday, and that's happening in Berlin, Adlershof. And the program schedule for that will be announced shortly.

00:03:04 [CH]

Over the Bob.

00:03:04 [BT]

And I've talked in the past about J Viewer, which was something that Ed Gottsman put together. And it involved a download of a big SQL file, I suppose, a couple of gigabytes of information about J. And once you did that, you could actually have rapid access to a lot of J information. And in fact, across the forums and a lot of other things. Well, Ed has done something else. He actually showed it at a recent British APL meeting. And it's called ArrayPortal.com. It's a web page that you can go to. And when you go to that, you have access to all these things with J. And it's amazing. It's got wikis. It's got J solutions in Rosetta code. It's got GitHub. If you've done, if you've got a GitHub file that involves J, it's going to show up on there. And it's got access to all that. You can go through and do searches on J primitives, code. You can do word searches. You can do phrases. It really is quite amazing. And it just exists on a web page. So you go on to ArrayPortal.com and you start looking around and it's got all this information with J. And there has been some interest as Adám raises his hand in the corner from what I can see. I'll pass it over to him. Adám, tell me about APL and the ArrayPortal.

00:04:34 [AB]

Well, two things I'd like to mention in this context. One is the plug for the BAA vector online meetups. It appears repeating every other week. A handful of array programmers come together. Any array programming language goes. And if you have something you want to show off, it doesn't have to be anything formal. Technically speaking, things are recorded, but practically speaking, those recordings are never really published. So you don't have to be embarrassed. You can come and show off your thing and just have a little talk with like-minded people. And yes, this was showed off. And Ed then had a part two, which was wide. And the reason for it being called the ArrayPortal was why don't we widen the scope a bit to other array languages? And I'm really excited about this. This is something I've wanted for APL for a long time. So as the first step, I'm going to meet with him this week before this episode is going live, actually. And discussing what we can do. Maybe it will be an actual array portal for like everything array programming one day.

00:05:40 [CH]

Amazing. Links in the show notes. And actually, Bob, I did have a comment. I was just waiting until my announcements to make the comment on Adám's announcements, which is that when Adám said low level, I wanted to clarify the ambiguity because a couple of our listeners might be thinking low level, APL, what are we talking about here? And I think by low level, you meant very beginner level, not low level in terms of Zig, C++, Rust, NIM, V, all the systems languages. Which one?

00:06:11 [ML]

Singeli

00:06:12 [CH]

Singeli. Yes. None of those. This is definitely geared at beginners in general. I've done all of the APL challenges, and I think like the first three to four problems are almost like explanations of single or like two primitives. And it is more familiarizing yourself with the glyphs. And if you already know that glyph, it's going to be like a ten second problem of reading it and then parsing it. Towards the end, though, definitely they can get a bit trickier. But anyway, so for everybody, not necessarily for systems level programming. Over to my two announcements. They are follow ups to Tacitalk episode 14, which was a episode with a real live person because half of that content is AI generated. We love it, folks. We love it. We're dropping one tomorrow, which is going to be in the past when this comes out, which is going to be on Array Theory and Niall. It's just the robots talking, folks, but I enjoy it. Anyways, the follow up is that Omnibar. I didn't even mention the real live person. I just said it was with a real live person was with Madeline Vergani, creator of TinyAPL and the Omnibar. I was remarking we discovered some functionality where you can search by primitive or name of primitive. And then it gives you, you know, if you if you type in rank, it'll show you all the glyphs across all the APLs and what they use for the rank symbol, which this is something I always wanted. And I didn't even know it existed. Problem was it includes all the APLs, but not BQN, not Uiua, and also ironically, not TinyAPL. You got to fix that, Madeline. Since then, since the recording, BQN has been added, folks. It's fantastic. And Uiua is in the works. I've had it on good authority. It's just it's a stack language. So we need a different way to spell the meaning. And that leads to my second follow up, because I also made the remark that TinyAPL has a rank primitive. And I was just informed by our lovely panelists that BQN also has a rank primitive. And you go in and...

00:08:01 [ML]

Rank function.

00:08:02 [AB]

Hold on, hold on, wait, I'll stop you, Conor. Rank function, exactly. Let's cut out the ambiguity here.

00:08:05 [ML]

Tells you the rank of the argument.

00:08:07 [CH]

Rank function? What did I say?

00:08:08 [AB]

You said rank primitive, but there's a rank operator as well, a modifier.

00:08:12 [CH]

Oh, right, right. I'm saying primitive. Too much ambiguity in these announcements, folks. There obviously is a rank modifier. We all, well, obviously, we all know about that, hopefully, if you've been programming a little bit in BQN. But there's also a rank function. It is the monadic definition of equal, folks. And we were talking about it. I'm not, we're not going to get into it. Maybe we can, but we got to introduce our guests. But apparently, Ken Iverson also overloaded, because I was asking, does any APL overload the monadic definition of equals? And apparently, Ken Iverson liked to use NUB IN is what it's called. There's an APL wiki page link in the show notes. It's fantastic, folks. It's a pattern that you probably have all typed at one point, if you need some kind of frequency values. Anyways, we might circle back to this if it comes up. But we got to get to introducing our guest, who is Phineas Porter, guest of ADSP episode 176. I looked it up. It was back in 2024. Obviously, that's the most important thing about Phineas is that he has been a guest on one of my other podcasts. But you probably know him also because he was one of, we don't like to use the words K-God or Q-God here at ADSP, the podcast. However, Nick Psaris does like to use that terminology. And he listed his top 10 Q-Gods when he was at KXCon 2023. And Phineas was on that list. If you listen to ADSP, you probably know that he is currently a software developer at Jump Trading, but has worked at other firms as well. I think it's UBS and Citigroup, if I recall correctly. And so has been using q on and off throughout the years. And we're here today to talk about q, to talk about his path to array languages. And in general, I think when we got back or I got back from KXCon 2023, we said we were going to bring on a bunch of folks that had attended it. And then like half a year passed and we didn't bring on a single person. And I think now, Phineas, I'm not actually going to remember off the top of my head who the first individual. I know Johnny Press, he went. He was definitely one of the first, but I think we had someone else as well. We'll throw it over to you, Phineas, if you want to introduce yourself and maybe talk about your path to q and array languages in general. And then we will go from there.

00:10:20 [PP]

Yeah, before I get too far, none of the views I'm expressing in this podcast are my employer's, Jump. These are all my own. My path is I started at Citibank working for the big data. So basically designing things like Hadoop and Spark when that was the cool thing to do before AI was the cool thing. Big data was the cool thing. And I was building a cluster for Citi to like have our own big data cluster. And I was building stuff and people have said, like, we can use pandas, but like there's a better version for pandas called Spark, which has also these data frames. And I read the Split Apply Combine paper from Hadley Wickham. Hadley Wickham had this paper on Split Apply Combine about how to manipulate data structures. And so I spent all my time like designing all these pipelines. And I went down to the floor where the Quants sit, who were trading US equities. And I asked them if they wanted to use this new shiny new cluster. And they said, no, thank you. We just load it on this one machine here. And then we use q to do everything we need to do. And I said, what's q? And so that was like the first my first encounter with the language and then kind of got interested and started to get it. And one nice thing about a bank is that you have unlimited licenses inside the bank. And so it's fairly straightforward to get started. So that's how I kind of started to learn how to use q.

00:11:49 [CH]

Had you heard about array languages at any point like J-APL on your journey or q was the first time you had been exposed to any of these?

00:11:57 [PP]

q was the first time I was exposed to any of this, but then I went down a rabbit hole. I probably read everything that I could find on the J wiki, [02] all the essays, including what is one of my favorites still like what is an array by Roger Hui, which goes and tries to explain that actually dictionaries are also arrays. And so are sparse matrices and sort of a bunch of other things. And everything is really an array. Anything that is effectively array is just a function that maps a domain to a range or say a range to a domain. And that was kind of the first thing. And then I kind of realized later that actually I had been using NumPy for a while and NumPy was also like essentially a dialect of APL.

00:12:32 [CH]

So isn't there's the article that gets quoted like every six episodes, which is NumPy, the ghost of Iverson or something like that, which is showing how NumPy is essentially.

00:12:43 [ML]

It's another Iverson ghost.

00:12:45 [CH]

Yeah, exactly. OK, so you realize you had been using NumPy, which is actually, you know, spiritually a descendant of this. And at this time, are you just doing this on the side? Like you had learned about q and then now you're doing all self-learning or like at what point? Because I know personally that you've used q quite a bit in your career. Like what was the gap between having discovered that this thing existed to like getting to do it professionally? And where did your self teaching and learning like did it all happen simultaneously or what was the path there?

00:13:16 [PP]

I basically was doing a development for building out these systems. And I realized that the bank, that the best place to be is closer to where people actually trade, where people like sort of what's called the front office. And that's where the front office quants lived. So I figured if I learn q, I'll probably be able to get a job at the front office of the sort of back down where the traders are, since they don't use any of the big data stuff that I was building. So I went basically got good enough to write some basic data transformations and interoperate with Python. This was early days. q had like very poor interoperation with Python, but it had some limited amount. And I basically said, came down and said, hey, I know q, which was somewhat of a lie, because I probably I thought I knew more than I knew. This is sort of like classic Dunbar. So I came in and was like, oh, I know a whole I know how to use the language. And this surely it's not any different than any other programming language. But at this point, I had already picked up quite a few languages between like Java, Scala, Python. I didn't realize these were all the same language from the perspective of from the perspective of an array programmer. There's all those languages are really just one style and like just matters like where the curly braces basically go. But you're not really learning anything new, maybe a little bit with Scala, you learn some functional stuff. But it's still it's nothing compared to what you get when you start moving further up the like deeper to the functional rabbit hole. Like I hadn't covered Haskell at that point. I had like picked it up a few times, but never like in any serious way. And so I basically gave myself the project of like rewriting their portfolio optimizer into q, because I because I had heard their portfolio optimization was very slow. And I said, I could do this. So I just went and kind of translated line by line their Python code. And every time I saw a loop, I removed it. That was that was that was already the right intuition. It turned out later that you had to like invert the problem a little bit to actually do this correctly. So like, is a bunch of like refactorings that went through it. Like in the first instance, I just was like, oh, I can go one column at a time. And I realized like, oh, you're just trying to remove like trying to do basically take this giant matrix and just find a smaller one inside of it. That's and then squish that together. So I realized like all you do is just find all these indices and then just do indexing to get to create the smaller matrix that we actually care about. So that but there were a lot of like little tricks like that. And basically that took the portfolio optimization from going and taking roughly 10 minutes to running in under 30 seconds, which basically meant the difference between a human running the optimization whenever they felt like it, you know, when they came back from their lunch break and then again in an hour or like this running essentially continuously and then sending effectively just giving them like a new screen, they could just click like, yes, send these orders to the to the street. So that's that's kind of that was my first success. And then from there, like a lot of other products started falling my way as well.

00:16:35 [CH]

I was going to say so that I imagine the traders were over the moon about this. Did they care slash know that this was due to the q language or were they just like this guy is a wizard? We need you to do the same thing you did for this to all these other processes that take minutes or hours.

00:16:53 [PP]

They knew that there was q there, but they knew that like they they were actually funny enough, they knew enough q to be able to like query. And they were very happy. So q is one of the few languages that has like a human readable query language on top of it. Like even people who know I don't sorry, traders would never feel comfortable like query in Python. Sometimes they do, but they don't feel but like select from people can get their hands around, get their minds around that. So they used to query the table to find out like what positions they had, if they wanted to answer a question that wasn't available directly in the dashboard. But from then, basically every analyst that was that we onboard in Citi has like a nice summer analyst program where like, essentially, college students come in for this for two months to work. And they would basically get assigned to me and informally informally in many ways, many times. And they would say, this person knows how to build dashboards with q here, take them on, they're going to build they're in charge of this little project. And then I would go and like, and it didn't stick with everyone. But some people did appreciate like going from the basics and saying, like, Oh, I get it. This is just a list. Everything's just a list. Okay. And would work go from there. So that was kind of like, a big insight was realizing like, this isn't doesn't have to be as hard as people make it out to be. And if you haven't been, you know, doing this for a long time, it's much easier to, to kind of get the, you know, the lessons across.

00:18:19 [CH]

Yeah, well, I mean, so what were the biggest gaps between the folks that it was that it just like experience level that like the more junior and the less the less languages that they have under their belt, the easier it is to show them that like, this is an amazing tool. And just because it doesn't look like, you know, the the language de jour, it doesn't mean that's a bad thing. Or was it something else between the ones that it stuck with and the ones that it didn't?

00:18:43 [PP]

I think so. There's two, there's a couple of things. One that made things easier is people who were willing to accept that things are going to look like kind of weird for for a while. So some people were really, really thrown off. They were they'd been taught, like, we have to have very long variable names, we have to have like everything. Like, basically long vertical code is the way that code should be written. And then if you show them anything that was like, kind of one horizontal, they would get like immediately freaked out. And then they wouldn't make much progress, they would try to figure out a way to like avoid doing anything in q. And I always try to give them a way out. So like, they could just write some Python, write it to a CSV, and then have just like the last layer, just like display whatever was in the CSV. And that I basically just give to them so they can get something, you know, over the line, because it's always nice to have just like a finished product. But some people who got real fancy, would would see something that I had done, like I had done as a one off, basically, and this is kind of, like, very duct tapey, but it was kind of cool, was you could generate JavaScript with q. And then you could ship that onto the browser. And then so you could dynamically do different things on the front on the browser with JavaScript. So that so I had done that as a one off to create like a little spinny widget for while the portfolio optimizer waits for 30 seconds. And then someone one of the analysts just came in and was like, Oh, why don't we do this arbitrarily, and we could do anything we want. So like, they would create like all sorts of other animations, or loading script, like screens to go transition from one dashboard to another. So they basically built like, I guess what would be called at the time, or maybe it's still now is called like the single page app, but done by you by generating JavaScript from q.

00:20:33 [CH]

Not one of the applications that I thought we were going to hear today from a q developer, generating JavaScript on the fly. Wow.

00:20:42 [PP]

It's a multi purpose tool. It's some and then was also interesting was you could see on the desk people who had tried to use best practices, using q, who had copied their style from C, and you or C, I should say, c++. And you'd get these like, files, and the file would go would call a function, and the function would call another function. And it'd be like, this is the file reader API, and it would call something and underneath it would be like something like an IO API, and you just get like 17 layer, like six layers deep. And then finally, there'd be like, one or two characters of like the code that does do actual work, like it'd be like read. And so, and I was very confused at the time of why you would do something like this. When you had gone through all this, like all these like efforts to go and make everything terse, so you can kind of just see it, and then create all the interaction, it turned out there were there was some that the framework that I was looking at was actually pretty good idea in terms of what it was, it was basically an early microservices framework. And so by creating all this, like abstraction, it actually helped them go off and generalize across a lot of use cases. So for that, but I mean, like, we were able to create we had this idea of like, sources, event sources that can generate new events. And in the beginning, it was just like, we would just basically the events were all market data events, like new, new, new trades, new things like that. And then later, we had like timer events, we added and we added, like, essentially was like a file event, which is just like a file that emits its its data, you know, line by line or whatever. And so this was actually pretty cool layer, like, is interesting marriage of q with sort of this like, object oriented style. And it worked for that project, it actually worked pretty well. In my mind, I still haven't seen like good marriages across across array and other paradigms. I guess the stack one is probably the one that's currently the most interesting.

00:22:49 [CH]

Well, I was just gonna say, is this a pattern that you've seen since building that kind of little microservices framework? Or is this a one off thing?

00:22:55 [PP]

I've seen it only in regard to this kind of like microservices framework, I haven't seen other q code that I've really liked that was using this style. Like it, it worked really well for this use case. And I don't know that I've seen it replicated well in other places. And in fact, like they had other libraries that were designed the same way, like, for example, for saving, for saving all of your data into what q calls like HDBs or historical databases. And they had a lot the same type of indirection. And that one really got in the way. Because if you want to do anything that the library hadn't thought of, library maintainers hadn't thought of, so like you want to change the compression of a particular column or something else, you had to kind of like dig through all these layers of indirection to figure out what like what was actually going to happen to the file and then how it's going to be saved. And it was often the case that you couldn't like pass those options all the way through those layers of functions, because people hadn't thought about, you know, what what might you want to pass down. And then if you if you didn't think of it, then people would like use globals. And then you basically just have like, it's kind of like spaghetti, because you have to set up a bunch of globals, then call the function. And there's no really tying the two together. So I don't know, it's, it's, it's pretty difficult to design APIs, I guess what I'm saying.

00:24:19 [CH]

So this is all at your time at Citibank, correct?

00:24:22 [PP]

That was all of Citibank.

00:24:23 [CH]

And then so will you take us through the next couple chapters, because I know you took a stop at UBS before your current position. Were you still doing q there? And yeah, what was the the work you were doing there?

00:24:34 [PP]

Yeah, so when I got to UBS, I was basically doing the so at Citi, I was working on the central risk book, which manages this is after after after the Volcker rule made it impossible for banks to take risk. All the risk had to be managed into a book centrally, where it was essentially just like liquidated. But UBS was doing retail market making. So that's if you've heard of like Robin Hood, the the on the other side of that trade is some wholesaler who buys that flow. So that's generally most of the time at Citadel and Virtu. But UBS still has a small desk that like maybe interacts with 5% of that flow. And so I was managing, trying to help them figure out a way to make money on on the retail on the retail business. And it was kind of that was a very also kind of a very interesting project. First of all, there's there's a lot of constraints when you're dealing with retail flow for good reasons. Regulators want to make sure that retail customers have a good experience and you have to always prove that you're giving them the best execution. And so a lot of best execution has to do with understanding what was the state of the market at the time the order arrived. So that requires a lot of as of joins effectively and as of joins on state of the book. So like what was the not just like what is the best bid and best offer. So the best bid is sort of the price of someone's most the highest price someone's willing to buy at. And the best offer is the lowest price someone's willing to sell at. And so you not only want to know that, you also want to know things like what are the other things that are going on in the book? And by book, I mean, like, what are the other orders that are like in that in that instrument? So one thing that's array program programs are not necessarily naturally good for is building a book. A building a book is a very much a sequential problem. It's like if an order comes in, you want to kind of simulate this through time. So you want to cut an order comes in and then you want to remove some remove some book, remove some order from the book. So that basically looks like a priority queue and priority queues are something that I don't know. I haven't seen. I don't know if anyone knows here of good implementations, array implementations of priority queues.

00:27:01 [CH]

I mean, Uiua has the path function, which obviously has a priority queue somewhere.

00:27:06 [ML]

Just hide it all away from the programmer.

00:27:09 [CH]

So I'm not sure if you can use the the path modifier function and Uiua want to do some upside down queuing. But yeah, that's the closest thing that I know to like some kind of built in. I'm not sure. Marshall or.

00:27:23 [ML]

No, I've actually considered, you know, like like BQN has this hash map object. Maybe we should also have a built in priority queue. I guess not really, it's more like a class that you'll give it the data and it'll generate a priority queue for you that you can then modify. But I mean, yeah, doing it just with arrays. I mean, I guess you can you can put the heap into a list which you then modify. But I mean, that's not really array programming.

00:27:49 [PP]

Yeah, that's what I did. I wrote I wrote a little thing that just like pretended heaps are often implemented as are backed by arrays. So I just implemented that using indices. It's fast enough. It's not but it's not particularly elegant. And it was something I was particularly proud of. But what I was pretty proud of later was we had to do FIFO analysis, which was. First in, first out analysis, so you're trying to figure out your P&L and trying to do attribution between orders that you buy and sell orders and you want to match buys and sells. And Jeff Borror in his book, Q for Mortals, has a one liner. That does that does FIFO allocation. And the way it does that is it takes the deltas. It takes the cumulative sum of both of your buys and your sells. So you have the cumulative sum along the buys and sells. Then it takes a min outer product of the two and then takes deltas along the columns and along the rows. And what that gives you is a sparse matrix. That's basically you can think of as like the diagonal cases of where the of how many shares got allocated between which row and which column. So if the buys are the columns and the sells are the rows, what you basically have is how many shares at each were allocated between. Between each pair, for those paying attention, that's an N squared algorithm. So it's although quite nice in that it's vectorized, it actually is like uses a lot more space than you want if you have like millions of buys and millions of buyers and millions of sellers. That's not going to work really well. So what I ended up doing is basically I implemented a version that was like sort of the dumb, the one potato, two potato approach, which goes one by one without doing any allocation by just using indices, pre allocating everything. And then I benchmark that against basically taking batches of data from both piles and keeping track of where I left off in both piles as I went and running the N squared algorithm for these batches. And that ended up being a lot faster. And of course, choosing the batch size is sort of like computer by computer. But for what we needed, I was able to do like a few million allocations a second, which was good enough that you can run all sorts of analytics and you didn't have to run them. You could do what if analysis, which was really cool. So you could say like if we had stopped interacting with this client, what was our FIFO? What would be our FIFO allocation to all these other orders? Or if we interacted only with these three clients, what was our FIFO allocation? So it allowed really quick iterative analysis. And I actually I think I have even a blog post where I kind of summarize how I did this on IABDB.

00:30:41 [CH]

So I was I was going to say, is there some resource? Because I definitely got lost. Two quick questions, not that it will super clarify. So you said it was a min outer product [03] with deltas on both the rows and columns. So, I mean, are you performing? You're performing deltas on the like row wise and then deltas column wise, because depending on the order, you're not necessarily going to end up with the same result, correct? Are you creating two different?

00:31:11 [PP]

No, no, here you are. Treat you end up with the same result. Doesn't matter which direction you apply first really, because you're doing because the way to think about this is you're doing queue sum of the buys and queue sum of the of the cells. So you're you have two monotonically increasing sequences.

00:31:11 [PP]

No, no. You end up with the same result; doesn't matter which direction you apply first.

00:31:16 [CH]

Really?

00:31:27 [PP]

Because the way to think about this is you're doing [a] cumulative-sum of the buys and cumulative-sum of the sells. So you have two monotonically increasing sequences and then what you're doing is you're taking the the outer-min product.

00:31:36 [CH]

Oh yeah, yeah.

00:31:39 [PP]

Maybe it's best to do an example. So if you have an example like you have 20 shares and 30 shares and 40 shares, so that'll become 20, 50, 90. Let's say those are your buys. And then on the other side, you can do like 10 and 80, which will become 10, 90; that'll become the sells. So if you take the the outer product of 10 and 20, then that'll just give you 10 shares, allocated to the first pair, between the first buy and the first sell. And then all the rest have to get allocated to the sale. You'll basically get this like L shape, which is you have 10 shares allocated in the first cell. And then the second cell of the first column, you'll get another 10 shares allocated. And then the rest will just get allocated against that 80 shares that are in the second row. So you'll have essentially zero in the first row after that. And so that's what it's doing. It's telling you how many shares are allocated between each pair of buys and sells. But it's definitely best shown graphically.

00:32:59 [ML]

So I guess you might say in order to get 50 shares allocated, you have to reach the horizontal point where you have 50 sells and also the vertical point where you have 50 buys. For the for the value to be at least 50, you've got to be in some quadrant. So there's a particular point and you have to be in the lower right of that.

00:33:30 [PP]

Yeah

00:33:31 [ML]

Yeah, the L shape is the stuff that's not in the quadrant. Everything in that is going to be less than 50. And I think everything in the lower right corner has to be 50 or more.

00:33:43 [PP]

So you're taking a cumulative-min across the row and then you take deltas. The outer min is what gives you that fact that it gets capped by the row and the column. That's the max of that quadrant. That's right. That's the way to think about it. Every quadrant has essentially a max capacity that it could have shares that it could allocate.

00:34:19 [CH]

Well, we will definitely link to the blog post. But you're going to say ...

00:34:22 [PP]

In some ways it's very similar to this tree-sum algorithm that I think you mentioned on a different podcast.

00:34:31 [CH]

OK. We can link that as well. That was the first clarification was that it is deltas and it is order independent; you end up at the same spot. The second clarification was: it wasn't clear to me which one was faster. Was it the batched N^2 or was it the indices one that you came up with? I guess you came up with both of them (the batching of the N^2) but which one ended up being more performant?

00:34:54 [ML]

Well, it's a hybrid, right?

00:34:56 [PP]

Yeah, the hybrid is faster; the batched one is faster. That vectorized path on the batch is very fast as long as you don't end up touching too much memory. And there's probably reasons to think that the amount of memory you touch has to matter. It's probably dependent on like cache sizes, which I know very little about. But Marshall probably knows what's going on there.

00:35:21 [ML]

Well, no, I think it's simpler. It's just that you're handling of a batch is big O(N^2) in the size of that batch. So this is actually a really common pattern that I don't know if we brought up on the ArrayCast. So you have a scalar algorithm that gets good asymptotics, in this case, linear time. But it gets that by a lot of data dependent testing and choosing. In this case, you get up to a certain point in the buys and a certain point in the sells. Then, you know, you've ruled out any earlier sells or earlier buys. It's taking a particular path through the data that you couldn't really do in an array style. So you have enough data that testing pays off. At the small scale, running tests on a CPU and especially these sort of sequential tests is really bad for performance. It's better to do an array algorithm. It might be doing all sorts of work that you would think of as wasted, but it does it really fast [chuckles] so it's able to get the parts that you want out quicker than the sequential algorithm at the small scale. What you do to combine these is you need to understand both algorithms and then you write an algorithm that that runs arrays in blocks or sections. But then it still uses the tests from the asymptotically good scalar algorithm to keep the cost low as the size increases. That comes up a lot. I mean, it's a hard way to write an algorithm because you have to write a good scalar method and you have to write a good array method and you have to have your head wrapped around both of them well enough that you understand how they fit together. But when that works, it's often the fastest way that you could possibly imagine a computer solving the problem.

00:37:23 [CH]

Yeah, that's very cool. And we will also link to ... was it Q for Mortals or Q for Gods? They're both his book, correct? Which was the one that you referenced? We'll make sure we link to that.

00:37:36 [ST]

It was Q for Mortals.

00:37:37 [CH]

It's Q for Mortals. Or did he author both of them or am I making up the Q for Gods?

00:37:43 [ML]

Well, simply consider: are you a mortal or are you a god? Pick the appropriate book!

00:37:47 [ST]

Was that not the line from Ghostbusters? [Conor laughs]

00:37:52 [ML]

Well, no, you don't want to apply that that line. If somebody asks you whether you're a god, you should answer honestly. Ghostbusters says you answer "yes", but here you need to be honest.

00:38:01 [CH]

I'm so out of my depth. I watched Ghostbusters as a young child and had nightmares for like a couple of years. I have never seen a Ghostbusters since. I know it's a comedy, but as a small child, you see those green things flying around and it's absolutely terrifying. Absolutely terrifying. I don't understand this joke. Maybe we'll find a link to some YouTube video that's a scene from some Ghostbusters movie.

00:38:24 [ML]

I think the gist of it is that some very powerful ghost entity asks one of the characters whether he's a god and he answers no and that was a bad answer [chuckles].

00:38:37 [CH]

[Laughs] All right. Yes, Phineas has just linked us, so we will link you, the listener, this blog post. OK, so where does where does that take us in the journey? And I've also got a great ... [sentence left incomplete]. I usually leave the titling of the podcast up to Bob, but I went through a phase where I read like every single finance Wall Street book I could get my hands on, which included, of course, the "My Life is a Quant" by Emanuel Derman, who is the D from BDT. Is it [an] option pricing formula? Is it alternative to Black Scholes? I don't know: my finance days are behind me. But anyways, he wrote this book: "My Life is a Quant". And so now I feel like this is the audio version of a "Life as a Quant (using Q)" with Phineas Porter. And I got a bunch of follow up questions but I feel like I'm hogging. We'll let you finish your career arc and then I'll open it up. Then if we have time at the end, we'll get back to my random questions. But back to you, Phineas.

00:39:35 [PP]

Yes. So I was going to say: one more thing that was kind of really interesting about is that [at] the end of that algorithm, you end up with these essentially, allocations, which are like indices. And q has this really nice feature called "link columns" (I don't know if any other APL has this notion). Essentially, in C, they'd just be pointers. They basically allow you to do instant joins. What you can do is someone could pass you this table, decorated [with] a lot of information about the buys and sells that they want to allocate. There's all these other, extraneous, interesting metadata about the columns but you don't actually have to touch any of that. All you have to just do is return to them this three column table. The first column is a pointer to their original buy table. The second is a pointer to their original sell table. And the third is the allocation that happened, which is the quantity that was actually allocated that you found by using the algorithm. Then any analysis they want to perform can be performed by just indirectly going through these link columns. So if there was like a timestamp associated with every trade, you can do "buy.time column", whatever. And it'll show up directly there is like an instant join, which I thought was really cool. You touch as little of data as possible and you return this like maximally useful object where all of the information that they gave to you is still preserved. I don't know if anyone has seen something like that before, but I thought that was really cool.

00:41:10 [CH]

I have definitely not. Well, you said this was a feature that you know of is only in q, not other array languages, right?

00:41:19 [PP]

Yeah, I don't know. And then from UBS, the company I'm working currently for (Jump), is working on essentially building out a retail trading business as well. So that's why I'm there. They actually don't use any q (when I joined, at least, they didn't use any q). Now they use some q. They don't love that but it's actually made my job a lot easier [Conor laughs] so I've been able to do things that they've never been able to do. That was pretty cool. I was able to do a lot of real time markouts for their trading business, because q really excels at that kind of streaming. It's one of the nice things is just having the IPC built in (so that's the inter-process communication built in). You can essentially stream data from your orders or whatever, and then send it to another service that goes off and runs markouts and then joins it back. And again, I make use of this idea of link columns. The markout data is only just the markouts plus this link back to the original order. You could do all sorts of pivots. So yeah, that kind of rounds out where I am today.

00:42:38 [CH]

I mean, like I said, I've got a bunch [of] follow up questions to what you just said there, but I'm going to pause and open it up to the panel. We'll go to Bob and then other questions as well so that I'm [laughs] not the only person that is in control of the narrative here. Over to you, Bob.

00:42:53 [BT]

I'm seizing the narrative. Actually, it's a follow up to something you just mentioned, Phineas. You were saying that sometimes q can be a bit of a difficult sell to people you're working with there. How do you sell it? I mean, the obvious one is I can do something you can't do otherwise. But are there other techniques you use to say: "you know, this is something that would actually do really well for you."

00:43:14 [PP]

So what actually ended up working for us was I tried to not use q for about a year and then but I built them like some pretty cool dashboards, actually using Clickhouse. [04] Clickhouse, actually, if you kind of look at it in squint, it's sort of like another version of an array programming language. It's a SQL database, but it's all based on these column vectors and so you can do a lot of the same things and a lot of the same tricks work, except it's not as powerful because a lot of the functions are not arbitrary. So they have array functions that can work on arrays, but they can only do array sum and they can't take arbitrary lambda expressions because they didn't want the SQL queries to blow up (depending on how much time it takes). And so I built a lot of stuff on top of Clickhouse and then people were complaining that it's still very slow and that things are pretty janky. So I said: "well, why don't we try using the q technologies?" I got a trial license and then I built over the course of essentially like a weekend and then like maybe a little bit into the following week (maybe a couple of days, like four days). The exact thing that I had already had that took me a year to build [chuckles]. Pop, pop, pop, pop! Half of that battle is learning the landscape. Once you know, it's much easier to rebuild so it's not really apples to apples comparison. But still, I think people are pretty impressed that like everything that they had before and more was up and running the next week. And then they were like: "OK, yeah, let's let's switch."

00:44:47 [BT]

That's a pretty tricky way to do it, because basically what you're doing is you're saying: "I'll do it your way" [chuckles]. And you raise the pain point to a point where you say: "well, maybe maybe try my way". And then it's so much faster. They go: "oh, yeah, you got it". That's that's very subversive.

00:45:02 [ML]

Well, was it faster? We need to know.

00:45:03 [PP]

It was a lot. And it used a lot less memory. There were a lot of things that by virtue of being a programming language, you can actually define what you want to search over. An interesting thing that happens in Clickhouse is if you select: "I want to join this table to this other table", even if you know that you shouldn't look at 90 percent of the table, there's no way for the programmer to hint to the database that: "hey, just ignore those other those other tables, those other partitions, those other those historical data". That's just totally irrelevant. When I'm doing markouts for today, you only need to look at today's data. There's no really [other] good way aside from just generating SQL queries that specifically pull that little flag out. So it's kind of it's kind of painful.

00:45:49 [CH]

I know, Stephen, you have a question, but I quickly want to ask: I remember doing like a vector DB kind of dive. Is Clickhouse the one that has a bunch of the familiar reduction and scan built-ins that work on like columns. I'm looking at the docs right now and I recall looking at docs of several of these and this doesn't seem like that but Clickhouse does in my memory seem to be that. Do you know the answer to that question?

00:46:20 [PP]

Are you saying like a vector DB or the AI style vector DBs?

00:46:24 [CH]

No, no. Like there was a couple like Dolphin DB [and] Clickhouse that like and then one of them I looked at. I was like: "oh, this is actually very like array inspired". But from looking just at the docs at Clickhouse, it doesn't seem but maybe it's just because the examples they're showing are just like some basic SQL queries.

00:46:42 [PP]

Yeah. there's a whole list called "array functions" and they're all the functions you can do with arrays. They're effectively the standard things you can do to an array, which is you can map over an array so you can apply any scalar function. There you can apply like arbitrary lambdas in the map as long as they always give you the same type out. And then it can do a bunch of counts and filters and things like that. So yeah, the array functions in Clickhouse are actually pretty APL inspired or the ghost of Iverson sort of way. It probably wouldn't make an APLer or happy.

00:47:34 [CH]

All right, over to you, Stephen.

00:47:35 [ST]

One of the best things about NumPy after the Iverson ghosts is, of course, pip, the Python package manager, which gives you access to 20 years or more of accumulated Python expertise from the community of Python users. q doesn't have a package manager. So when you came to rewrite your system in q, I suppose you must have been writing everything from scratch out of the knowledge which is in your head. So whereas the Python community has got a central store of knowledge and expertise and software in the package basis. You come to do your work in q really with what you've got in your head [and] not much else. And in that you're joining a long established tradition amongst APLers [who] are doing everything from scratch. It's probably got a lot to do with the absence of anything like a package store or repository. This has been on my mind because I've been recently documenting a APL package manager and thinking [laughs] about the implications for the future of the language. If you cast your mind back to how you rebuilt your system, are there parts of I imagine, I needn't have written this myself from scratch. What I did in four days, maybe I could have done in two if I'd had access to a package repository.

00:49:13 [PP]

There's definitely some truth to this. I think part of the fact that you don't have a package manager, makes you leave out a lot of bells and whistles that you otherwise would accrue over time. What ends up happening is if you want to have something that reads data from HDB, you basically just say (there's very little ceremony in doing this): you probably just put something that says "load HDB": that's your entire library. Its like one line that says: "load HDB" and it knows maybe about one or two directories where your data is stored. Same thing with writing. I think everyone in the KDB community by this point has written their own logger with the basic primitives of: "let's have some levels; we'll have some integer enums and then we'll go and check in which enum we're on and then log to standard out. And at Citi, we actually had a pretty fancy one that was like a binary logger so not only would it log to standard out, but also it could write this data to a centralized database. I think there's even a library that does this in Python, [that] I think it's called schema logging or template has logging or whatever it is, or like object logging. But the idea is that you can do interesting debugging if everyone logs to the same database, then you can go back and see (especially in the microservice world), who caused the outage is probably not the service that was being queried. It's probably some other upstream service that basically created a file source. So anyway, all the tools that you end up building are very minimal and they just do the thing that you needed to do at the time when you wrote them. Then later, maybe you'll come back and add stuff to it. There's always a little file called like "miscellaneous", which is just like random utilities that you find yourself writing over and over and over so you just went and put it into a file, so you don't have to write it again. But I think that's kind of what ends up happening. And I don't know if a package manager would be good or bad in that in that world. I know that in City, it'd be great that they even had a package manager. But it was basically just like everything got everything that was ever written, that's supposed to be centralized, which owned by one team, and you would just download that, that entire repo. And that was sort of package management done for you centrally. But it wasn't like anyone could just create something and then register it and you could just use their stuff. There's sort of like a curation problem as well with package management. You have to solve both in order for package management to be valuable. Otherwise, you just end up importing a lot of cruft into your own code base.

00:52:12 [ST]

Logging is such a great example and as you're saying, in a bigger shop like City, the knowledge accumulates. So you get a logger with more bells and whistles on. There's clearly some advantage to be had from accumulating and, as you say, curating. I was particularly thinking about this because I got an email from Kx [05] this morning saying that there will be something about a free commercial license. This is a conversation which I think I first came across on the ArrayCast when we had Ashok on here the other week. We talked to him about a similar license at Dyalog. I don't have any details about what Kx is actually going to do about this but if it's anything like what we're talking about, the idea would be that as with Dyalog APL, you can use the language to do things that are commercial. You don't actually have to start paying for it until you're making some money with it. It occurred to me that in a scenario like that, where you could walk into a new shop and new role and casually suggest using q, much more cheaply than you can now, that the payoffs for q developers change. At the moment, q developers carry what knowledge there is about the language around in their heads, which makes devs relatively expensive. That's quite a nice thing for the pay package. But if the threshold for using q were much lower, I could imagine that there starts to be some community investment. People would invest their time into writing packages as they do in other places. Any thoughts on that?

00:54:14 [PP]

Yeah, I mean, I think to the extent that people can centralize and agree on certain things, it would be very useful. There was even open source: Finos, was a group of banks, which I was a part of, which tried to do something like this. One of the problems is every bank already had written their own logger or their own X like email sender, right? Everyone has their own KDB version that invokes the system command to send mail to other people. So you can send nice, beautiful reports and every bank had done it like slightly differently. No bank wanted to standardize on someone else's version. There's always this problem of: once you've branched, how do you get people to all go back to some trunk unified version? I just remember that was sort of fraught. It wasn't combative. It was not moving anywhere. It was very hard to get consensus. This is one of those things where you just kind of hope that some basics are actually given more with the language. So once Arthur already decided (at the time when he gave away this "how you do inter-process communication"), it would be nice if later when Kx would have went and said: "this is how you do logging and everyone has to agree". Or this is how you do a microservice infrastructure. So like the way that Java has Spring Boot, not that everyone loves Spring, but there's a certain beauty of everyone [using] Spring. So when you go from one place to another, the libraries are somewhat compatible with each other. And with q, it was a little bit more minimal (what the surface area was). Everyone built their own little microservice framework. I think if things were more open; if there was more adoption, there's always more motivation for people to write that kind of code. But there's also probably just places where people can be more productive, in new fields where q isn't so heavily used yet. I don't know if there are applications which are not quite finance related, you can imagine where it's not used so heavily. It's more greenfield and you can actually set up a bunch of standards for what are the basics that people can all operate over.

00:56:38 [CH]

I mean, I still feel like there's like massive value in that kind of ecosystem because I totally appreciate the issue of: "there was no logger up front [and] now every bank has their own". Now they can't agree. But that is kind of predicated on the fact that you're just focusing on your existing clients. Obviously, q has a big hold on the finance industry, especially in certain domains. But if q as an ecosystem or language is trying to go to these new places or even schools and stuff, is the story that everybody that uses q already wrote their logger, so now you have to write a logger too? Or should it be that: "no, we are going to try and at least get some standard or package manager that then people can all upload them". I was trying to figure out; I swear I heard on a podcast that there was a newer language or maybe not language. I thought maybe it was Dino, which is like the new JavaScript runtime that's competing with like NPM. But I thought there was one that was really trying to prioritize this problem of what you mentioned. That there's people that say: "oh, the smaller, the better". And like, actually, the lack of a package manager is an asset or it's a...

00:57:52 [PP]

That's Odin.

00:57:56 [CH]

... it's a good thing. Or is it Odin? Yeah, and it's that you don't have this proliferation of craft and so then you're writing good quality code. But I've voiced on this podcast before that I think that like code that I don't have to write that someone else has written for me is amazing. So the question is just how do you set that up so that you get good quality code. And I know that there's a couple of languages are trying to have these, I know NPM comes with security audits and stuff, but they're trying to go past just the number of downloads and the number of stars versus: "is there a way that we can objectively and mechanistically evaluate the quality of this library", whether it's from a "leave reviews and like five star ratings" or whatever (similar to a podcast over time). You're going to have "a thousand reviews with a 4.9, whatever". Versus you go to cargo.io or crates.io or whatever it is, and then you search for the library you want and then there's 16 ones and you say: "wow, this one's the most downloaded and I guess it's the best". I think there is some happy place that we're still trying to find because I don't want to have to write the world. I'm trying to come up with the best example. But I don't want to have to roll my own norm CDF function. And even then, if I'm going to roll my own, like I need the error function and it's like: "you're telling me I have to program my own error function, if my language doesn't provide it somehow or in some math library doesn't." We can't write the whole world. We need to depend on certain things, right? I don't know. Maybe that's not the APL way: "you got to learn how to write the error function if you want to be a Ken Iverson level APL programmer". I don't know. I'll leave it open to thoughts from folks.

00:59:38 [ML]

It's not going to have a good precision if you write it yourself [chuckles].

00:59:41 [CH]

Well, I mean, someone wrote it at some point. So, you know, who's the person that's able to write it with good [precision]?

00:59:47 [ML]

It takes a pretty large amount of compute time to come up with a good implementation of the error function. So not only do you have to write your own error function code, but you have to write your own polynomial search thing to find yourself the right interpolating polynomial that gets you close enough to the error function.

01:00:08 [CH]

So what would Aaron Hsu? [06] Because he's Mr. anti-library. What would he say here?

01:00:13 [ML]

Oh, he would probably just do an imprecise version [laughs].

01:00:16 [PP]

Yeah. Arthur has a stats.k file from a long, long time ago. And that's what I reach for whenever I need the norm or the CDF functions. I just find that on GitHub and it's still archived somewhere and you just grab it. It has a bunch of stats functions in there also mixed in with some combinatorial functions, also mixed in with some other things that he was working on at the time. It's just like there and you just grab what you need and then you move on [chuckles].

01:00:45 [CH]

I was about to say, that is clearly like an argument for package management. But then I realized actually it's not. You're arguing [laughs] that there should just be a file living somewhere that you can copy and paste from every once in a while if you need something that's a little bit ... [sentence left incomplete].

01:00:59 [ML]

Well, I mean, everybody says package manager, but what you really want are the packages, right? The only reason we have to manage them is because they keep turning out to be bad and people have to publish new versions.

01:01:09 [CH]

Yeah. I mean, yeah, I guess.

01:01:11 [ML]

I think if just had good packages.

01:01:12 [CH]

Yeah. Is it even packages? Technically [in] Roc and I think Go to some extent (I'm not sure how similar their functionality) you just include the git URL to some raw file and then it just includes that. I know Roc does some hashing so that it can't be changed under the hood. I don't exactly know what Go does, but like technically there's no package manager in that system, right? You're just literally network dependently calling to some git URL. And even that would be a pretty good starting point. I mean, definitely does not solve the code quality and proliferation of cruft. But it does solve like having to depend on stats.k or copy and pasting from stats.k [laughs] from some old GitHub repo or wherever it lives. I'm not sure.

01:02:05 [PP]

There actually has been an attempt at one point. I think I even use it called QPM, which was a package manager written by this guy, Compsit, I think. And he also wrote the Sublime Editor plugin, which I still use. I just like sort of like I used started using when I was at Citi and like never migrated off even though everyone's moved on to VS code now. But I still like happily use my Sublime Editor because habits. But yeah, he wrote a package manager and it managed the few packages that can that like worked according to his convention. And anyone was free to adopt his convention, but I don't think many packages did. But I think it like would if you pointed at a Git repo, it would like try to see if that package like follow the conventions and had like the appropriate like requirement files and it would go off and try to install them. So it worked. It worked very well for installing his stuff. And then kind of like that's where it ended.

01:03:01 [BT]

Yeah, I think that's kind of the challenge, isn't it? Because you on one hand, you want somebody who's going to program really well at the level that you want performance, but at the same time, they have to leave it open ended so that it's useful for people using other systems and other situations. So you've got a programmer who can write really good programs that perform really at a very high level. And then they're also expandable to other contexts. And I think that is a real I'm wondering whether that is actually something that just evolves over time because you can't always know all the context that something's going to be playing it,

01:03:36 [ML]

Well, but this is a good quality filter, right? So you have a package manager that's I don't know if this applies to the one you're talking about, but say it's, you know, completely undocumented. It's pretty hard to set up a package that's going to work with the package manager. Then if you use this package manager, and you download somebody's package, and it actually runs, they must be a good programmer because they figured out how to make a package.

01:03:57 [CH]

Oh, boy. No holes in that theory right there. Absolutely none. Absolutely.

01:04:04 [BT]

Spoken like a true bouncer.

01:04:07 [PP]

It's a great filter.

01:04:09 [CH]

All right. Well, I've got I mean, this is a slightly well, there's two slightly less array questions. But there are definitely questions I think that you could answer that. Well, I'll pause if there are more array-esque questions for Phineas, we should ask those first. I will withhold my very interesting in my opinion, but it probably outer product will not be in the answer of the question. I could be wrong about that. Only time will tell. We have no takers. All right. I mean, actually, we'll start with the more fun one and the least array one. There was a book that was published in 2024 that I just finished reading in January. I finished reading it in less than 24 hours. That's how good it was, folks. Bloomberg called it the book, the Wall Street book, the best book of Wall Street 2024, something like that. It's called The Trading Game by Gary Stevenson. And it was written by a guy that worked at Citibank. The fact that I saw you reach for your keyboard means that you're Googling it and you have not read it. That just means we're going to bring you back on for episode either 198 or sometime sooner than then. Because I was curious to know if this book had made its way through Citibank employees slash like Citibank alum. And there have been thoughts. It is. I cannot recommend this book highly enough to everybody, folks. I mean, if you are mathematically or array inclined or finance inclined, you'll love it even more. But I think it's just a general generally great book in terms of like this guy worked at a bank and he does like a tell all of the crazy stories and stuff. And that really happens. So to be answered in a future episode. The second question, which is a little bit closer related to something we were talking about earlier, you mentioned when at UBS, I think you were talking about like retail flow that comes from companies like RobinHood, which is more well known that Virtu and Citadel, Virtu famously, they rebranded six different times from Knight Capital, which had the 800 million dollar loss at one point, which they called it was, I don't know, like one of 10 flash crashes at this point, which is a company actually I used to want to work for KCG, which is what they were rebranded for. But anyways, that, like I said, finances in the past. Do you have like a overarching view of the players on Wall Street in terms of like, like you said, like, oh, these two companies are primarily responsible for like the other side of that. So when you think of from your quote unquote, like life is a quant using q and you think about all the big players, do you have like a little mini summary of like people think about Goldman this way, people think about, you know, JP Morgan this way, people think about UBS, Citibank and also too, obviously, you know, this is going to be public. So maybe this is a question better asked at the next KX con than recorded. But I'd be very curious to get your overarching thoughts for what you're willing to share on this kind of thing, because, yeah, I'm completely removed from the finance industry. And when you said that little tidbit, I was like, oh, that's interesting. I didn't know that.

01:07:19 [BT]

And before you answer Phineas, as your legal representative here, I think it's important to note that your views are your own and they do not reflect any group that you may be part of or have been part of in the past. Now proceed.

01:07:33 [PP]

Sure. So, yeah, again, this is, I would say this is just a sampling. And there are probably people actually that I would recommend you listen to if you're interested in that kind of thing. There's a guy who's on Twitter called Gappy, who just joined Bally as their head of risk. And he's been on a few podcasts and he talks about the different cultures at different hedge funds. So he's at Millennium and he was at Citadel. He was at a few other places. He's at Bally, I think now. So whatever, he's done a tour. He has some opinions about where, what things are. I can only tell you about the places I've been. The one thing I would say is Citi is a very, very large organization. It's like when I was employed, there was a 400,000 people employed by Citi. And so it operated like a small country in terms of the bureaucracy and things like that. So you just just kind of things went slowly kind of by construction. But they had like everything. So that's why they had like their own like in terms of KDb [07] infrastructure. I don't think I've ever had better KDb infrastructure than at Citi because they were supporting such a large operation, they could afford to invest into it. And they had like just incredible, incredible things. And when I went to UBS, which was 10 times smaller, so about 40,000 employees, and it's run much more in a Swiss way. So there's sort of like a little bit, there's still bureaucracy, but the bureaucracy is sort of more efficient. But because of that, it's also leaner. So they just don't, they didn't have, they weren't able to afford as much, like infrastructure for the KDb or any or just in general, like things were much more, you were supposed to do more of your own work to kind of get things over the line, less things were centralized. And that meant, for example, you know, sometimes I meant that like, well, we're here, we'll give you the raw data, but we're not gonna let you operate on it directly, because then you can mess things up for others. And that would be too much of a headache for us to manage. So you ended up, like everyone ended up having copies. So the efficiency in the centralized unit ended up making sort of this, like, ironically, you'd have this like less efficient situation where everyone else in the firm would make their own like little thing that was on the outside, because the center didn't want to be in charge of it, or didn't have the mandate to be in charge of it. And then I know a little bit from like, from people who are friends at Goldman, Goldman is much more of a risk taking bank. So they're much more like a trader in bank versus like Citi or BS, or much more of like a, UBS is much more of a wealth management bank. So they're a bank that likes to manage their, their, their main business is essentially managing wealthy people's money. And then Citi has a lot of, has a lot of different businesses. Actually there's a whole story of Citi that's pretty interesting itself is just that it's really like, essentially, a bunch of different things together that were aggregated together by I forgot the guy who was from Salomon Brothers, Sammy something I forget. But yeah, he like put all the all this, like Travelers Group was part of it. There's a whole bunch of different businesses that all were like united together. So Citi is sort of like by design, this like, big, big conglomerate. And then Goldman's been a sort of less, much more risk taking bank and historically even had like that famous trade in 2008 that was profiting off of the, off the, off the crash.

01:10:58 [CH]

Oh yeah. I mean, that's the Big Short that talks about that, right? Where Michael Burry's trying to get his quote on his positions. And I mean, it's, it's in the actual movie, even though it seems like a very kind of mundane, but like they managed to, who was the director? David something, David Finch maybe. That could be totally wrong. Anyways, Big Short, great, great movie as well. But yeah, he's like, oh, I see what you're doing. You're trying to make your own trades before you can mark my position so that you're not losing money on this, which yeah, not entirely.

01:11:32 [PP]

Yes. Citadel also, there's also good distinguishing. Citadel has like two different types of like operations and there's a hedge fund side and then the security side. When I say, when I say they do the security side is the side and actually the ones that actually trade on like in this, in the market are the ones who do the wholesaling, who like buy, buy retail customer orders. And Virtu's again is, is it, Virtu's a very interesting situation because they're a public company, one of the few market makers who's actually a public company. And so they're run by essentially by lawyers. So it's a very different company as well. So you could, you could learn more about Virtu than you can about pretty much any other company because they, all their stuff, all their statements are public.

01:12:16 [CH]

Well, I will be sure to look up this individual that lives on Twitter and then reverse engineer all the podcasts he was on. And we will, I will go and do that and then I will link those or I will pass those links along to Bob who will put them in the show notes for those that are interested in this world because I'm sure we have, I know that when we went to KX Con and I talked to a bunch of the q folks and also too, I did go and look, the order of appearance was we actually did a review of KX Con immediately with Nick Psaris, which I always forget because Nick had been on already a bunch of times. So it wasn't someone that we met and then brought on from KX Con, but he still counts as someone that was on the podcast to talk about KX Con. Then we had Conor McCarthy was a second guest. Then we had Johnny Press and now you are the fourth technically. So I said second or third, you're actually the fourth. But I remember talking to folks at KX Con and a lot of them said that they had checked out, the ones that had heard of ArrayCast had checked it out, but there wasn't a ton of q content. And so we have been trying to make an effort. And admittedly though, I think a lot of folks, which is why I'm very excited that you put this individual Gappy on my map, a lot of folks in the finance industry, they don't really, there's not a strong interest to go and talk like famously, Jim Simons of Rentec, which is one of the most successful hedge funds of all time, like didn't ever get any interviews because it's like the whole philosophy is like if you find a hidden cave of gold, are you going to go tell people about it? Or are you just going to slowly take trips back and forth to the cave of gold and then just collect all the gold without telling? Like there's the anecdote that like really good financial advisors just advise themselves because why would you want to go tell people how to make like free money, right? And so sometimes it's hard to get folks to come on and talk. But yeah, we're trying to increase the q content. Still Michael Higginson was one of my favorite q conversations who was the, he was the winner of some contest, correct? Was it the APL contest?

01:14:17 [BT]

I think it was the APL, wasn't it?

01:14:27 [CH]

Yeah. Even though his background had been in q. I want to throw it back to you though, Phineas. I mean, we talked on the ADSP 176 episode about some of the, because I guess we didn't mention it here, but for those that haven't listened to that episode, Phineas and I, I'm not sure if it was because of Arraycast, potentially. We got in contact and then we've been messaging ever since. But a part of our friendship and getting to know each other is we drove because I didn't realize that there was a train that you could take to Montauk because apparently my Googling abilities aren't that great. And anyways, I ended up renting a car and then we drove down and we ended up having a conversations, which we don't need to repeat here because we already discussed those conversations on episode 176 about the design of prior and deltas and differ being specializations of those. But I wonder if there, is there a followup of thoughts on q, whether it's with some of the built-in primitive functions or the state of q today, what's your take? If you've got the platform now, potentially some q listeners, array language folks listening, the floor is yours to say what you would like to say about q or some aspect of q. It's kind of open-ended, but we'll see where you take it.

01:16:14 [PP]

So I have a few things that I can go down. So one is if anyone has interesting ways of doing dynamic programming [08] in q, that's been the recent obsession of mine. So I've done, I did a dynamic image compression thing where you like compress images and that was one of the talks I gave. And then I gave another talk on dynamic time warping, which is another dynamic programming problem. So basically problems where you have to essentially memoize things and then memoize and then solve the problem bottom up or top down. I've been informed that these are essentially all basically shortest path problems on a graph and you just have to figure out what the graph is. And if there's some generic way to write this in q or in an array style, I'm curious. So this is sort of like an open call if anyone wants to reach out with that.

01:17:06 [CH]

Does Uiua have the memo modifier or am I misremembering?

01:17:12 [ML]

J does. I don't know about Uiua.

01:17:14 [CH]

Or J does.

01:17:15 [PP]

J definitely does. I wrote my own, obviously, in q. But it's better just to do it top, to do bottom up if you can figure out the right-

01:17:23 [ML]

Yeah, bottom up is the more right way because you have like one layer and then you build the next layer and so on.

01:17:30 [PP]

So that's one. And then the other thing that I think is very underutilized currently today in q that I think other languages should borrow is q already has this notion of attributes. I don't know if this is available in other language, in the other APLs, but basically arrays can essentially have metadata about them. And the metadata can be things like the data is sorted. So when you do a find on the object, you should use binary search or some other, like take advantage of the fact that it's sorted. I think you kind of, it's baked in that implementation that you have to use binary search because that's how you can do code up step functions using a sorted dictionary. So you can basically put values at the different breakpoints and that will allow you to use the dictionary as essentially a function over real values. But there's sorted, there's unique. So the dictionary, the array has like just unique elements or grouped, which is like the array elements are actually hashed underneath the hood. And so when you do lookups, you can use the hash table. And then there's sort of partition, which is sort of a blend between group, between the hash and the unique, which is just that there's, there's not unique elements, but all the unique elements are consecutive. So they don't have to be sorted, but every unique element, there's no unique, the unique elements are not scattered throughout the list, they're contiguous. They're formed in contiguous blocks. So one is that this is a very, this general idea of having metadata associated with your array, I think is very interesting. One interesting thing about it is you can sometimes derive this directly. So if you take maxes, you always get a sorted array at the output. So that's kind of an interesting thing that you can just do for free for the user. And then the other is that there are other things that I've thought of that can be, I think it was other attributes, and anyone's feel free to like borrow and implement these if they want to. One is a sparse array. So you can imagine arrays that are actually not backed by the real array. It's just like a few sparse indices. So you could have like sort of this idea that like, I give you, it behaves like an array to every array programmer, but obviously the memory underneath is like not really there. And then we don't have like reverse sorted. So whenever something is reverse sorted, and you want to take the, if you want to take something and sort it, and you want to take max and min, that's obviously just a first and last. But if you have it reverse sorted, there's no attribute. So you're kind of stuck, you know, doing a linear scan of the data.

01:20:10 [ML]

Well, I can stop you and say BQN does, and I did this initially at Dyalog, BQN does have sortedness flags. We have both sorted up and sorted down. And actually that's pretty nice because if you have both of them, that means the array is constant. So every value is the same. So I have a page on, which is not all implemented, but I have a page on, you know, how you can detect sortedness and what you can use it for. I think as far as stuff I've come across, the sortedness is way more valuable than other stuff. So not that other things aren't worth implementing, but I think sortedness in most array languages is a pretty good idea to consider.

01:20:55 [CH]

So is this, like, this is incredible. I can't believe this just like randomly came up of like an open-ended question, because I literally have a slide in a deck, which is called, well, it's inaccurately called Algorithm Advisor, but it's literally, I have sort, max scan, min scan, or scan, and scan, plus scan on positive or negative numbers, like if you only have one of them. And then like a set of operations where you can do this kind of like, I literally have the min-max, first-last kind of thing. But I think you mentioned too, like, you can switch algorithm implementations, like, my favorite, because it is

01:21:36 [ML]

Yeah, well, and sometimes it's an asymptotic improvement. Because like if you're -- I'm trying to think of an example, but if you only care -- well, unique is a good example. If you only care about the unique values, you can do a -- you can skip over values now. Whereas if you don't know the array sorted, you know, you've got five over here and five over there, in between them could be anything, if you know it's sorted. The only thing in between them can be five, so there's no unique values there. And so you can do a sort of galloping binary search thing and jump ahead.

01:22:07 [CH]

It's such a nice -- like, I mean, that's the way that the poorly named C++ stood unique. It's the adjacent remove if, when equal, basically. But like, if you don't know that you have a sorted sequence, that is not the equivalent of a deduplicate. It's the C++ definition of unique. Which so, like, it's crazy, is it three different people now, and potentially more in this group of seven. Is this like a Nate or a group of six? I don't know why, who's the seventh, I don't know, some ghost. The ghost of Iverson.

01:22:43 [ML]

Bob, slunking around his room.

01:22:45 [CH]

Yeah. Does this have a name? Is this -- I thought that this was like a thing. So you're saying it's attributes in q, but like there's a limited set of what's implemented.

01:22:55 [ML]

I call it a flag, I guess. Of course, flag only really makes sense if it's a Boolean property. So like with the hash table, that's quite a bit more data than a flag. So attribute makes more sense there.

01:23:09 [CH]

So Dyalog APL does this to a certain extent. At least you said definitely for -- Just sortedness. BQN does it. q does it.

01:23:15 [ML]

Well, and you can consider the arrays element type is -- obviously that also informs the format, but the element type is pretty useful in terms of a lot of things. Like for example, if you do a cumulative sum of a Boolean array, we know that every value is -- no value is negative. The only possible value is a zero and one. So that's sorted. So the type sort of functions as an attribute, too.

01:23:40 [CH]

Yeah, that is -- yes, definitely a whole other set is like if you track the type in terms of not necessarily like -- I guess actually that is a way you know that you have a Boolean array is -- but like there are some operations that aren't necessarily like a predicate. It's not -- you're not mapping a predicate or eaching a predicate over a matrix or an array. But you do inevitably, like if you know you have positive integers and you do like max one or something, like for the whole bounds of the integer type, you're not guaranteed to have a Boolean array. But if you know already, like if you're tracking like, okay, I only have positive values and then I do some max one operation. Now I'm now in the Boolean space. Depending on the operation that you're calling, you can do some alternate implementation. If three -- if 50% of us have already thought about this and what languages -- this has to have a name then. I was thinking that this is like a thing that's like, oh, this would be great, especially if behind the scenes you do it. But like also if you had some language that was like, oh, you know, you could actually, you know, use this instead of this like as a teaching mechanism. We got to go to Deep Seek. [09] We got to go to Deep Seek R1 and we got to ask, what is this called?

01:24:55 [ML]

I think it has many names is the answer. So not only does it have one name.

01:25:00 [CH]

That's a problem. That's a problem.

01:25:01 [AB]

Implementation calls will take advantage of these flags and actually read them when you ask about something, even though some of these things are not directly queryable using the primitives. So certain patterns could potentially be recognized, like to check if something is sorted. You have to kind of compare it to sorting it or grading it and see if it's just the indices.

01:25:26 [CH]

Well, I imagine you never want to actually -- well, unless if you have some quadratic or worse algorithm, you'd never really want to -- I guess it's actually linear time. So potentially. I was just going to say, you don't want to actually check if it's sorted thinking that was an n log n, but no, sorting is n log n, checking if something's sorted.

01:25:47 [ML]

Oh, yeah. There are plenty of places where it makes sense. So, for example, if the user sorts an array, the first thing you should do on that array is check whether it's already sorted. Which actually you can combine that with a range check. So you get a lot of work done. But yes, for many algorithms it is actually worthwhile to -- and actually also, I implemented a while ago the BQN's indices inverse, which we've talked about very positively on this podcast. So that counts how many of each index appears in the argument array. If it's sorted, you can do this with the binary search, pretty similar to unique, except that you care about the index where each value appears. And actually, I have -- I mean, I've got a pretty complicated method that does all sorts of things here. But one of the things it does is it splits it into blocks and it checks for each block. First what's the range of this block? Because on small range, it's actually faster to do a bunch of comparisons and add them up. And second is it sorted so that it can use the sorted method. So yeah, even if you don't have the flag, it's often worthwhile to check some property of the array.

01:27:01 [CH]

I wonder if this is an array specific thing, because except for the things that, you know, like BQN and other languages, I guess q, you have the dictionary as well. But in a language that has a plethora of data structures, it doesn't make sense to have like metadata specifically just for this one data structure. Maybe it does. But it seems like a more obvious idea in a language where you really -- everything kind of relies on this one thing array. And it's like, well, if we just store a few Boolean pieces of information, we could potentially --

01:27:33 [AB]

I've been thinking for years that if you have a list of lists, then keeping track of whether or not all the lists have the same -- all the element lists have the same length might be a valuable thing.

01:27:46 [ML]

Yeah. Or even storing them as just store the -- whatever you want to call it. The joined lists along with some sort of partition indicator.

01:27:54 [AB]

Exactly. Those kind of things. So if they're homogenous, if you can store it in a more optimized way, of course, that might mean that in a moment you need to break it all up again. Because the next transformation.

01:28:05 [ML]

But, I mean, I would say that this -- the big place where this shows up is actually databases. And, I mean, those have exactly what Adám's talking about. Where I know for JD, I implemented a thing where if you want to store a bunch of strings in the database, it doesn't try to store them individually. That it hardly even makes sense to have a file for each or something. But it stores a list of all the characters and then the indices of index and length of each string in that, I think. So that's a little redundant if you just store it immediately. But then that allows you to add and delete strings more easily. There are a lot of -- there's a lot an array implementer can learn by looking at what databases do. Because in a lot of areas, they have kind of explored further than we have.

01:28:55 [AB]

You could potentially save by having an offset in the header for the array in memory. So that if you -- something I end up doing very often, I always feel a little bit bad about is dropping leading elements. Because I know that that means everything gets rewritten in memory. Whereas if I drop trailing elements, then they can just -- it's a free operation, basically. At least if the ref count is one. So sometimes I'll do -- if it's really performance sensitive, I'll do all kinds of contortions in my code to make sure that I'm trimming from the rear instead of from the front.

01:29:32 [ML]

I think this is pretty Dyalog specific at this point. Because I know J and BQN and I'm pretty sure ngn/k as well. All of these store the header and the data separately.

01:29:42 [PP]

Yeah, but drop still -- q at least, drop creates a copy.

01:29:48 [ML]

Okay. So q doesn't do that. So KX's stuff doesn't. But yeah, I think it's pretty -- I would say it's just the right way to do it is to keep the header and the data separate. And many implementations now work that way.

01:30:05 [PP]

Does that mean the header has a pointer to where the array actually starts?

01:30:09 [AB]

Yeah. And then you just update the pointer instead. But it's effectively the same. If you had -- well, not a flag, but an offset. And say, okay, it actually -- the data is supposed to begin over here. But actually start reading it over there. Then you could have a free chop from the front. At the cost of some memory, of course. Being lost memory these days.

01:30:30 [ML]

Yeah. Well, and there are also various tricks you can have to allow the data to immediately follow the header and have some sort of code where you indicate that that happens. And then you can also have some other header point into the same data, but not be connected. So there are various arrangements.

01:30:49 [CH]

I mean, I have a burning question. But it's going to start a new topic. And I swear, the last time I looked at the time, it was 1.11. Or an hour and 11 minutes into this recording. And now I'm looking at it, it's an hour and 35 minutes. And I did not recall 25 minutes passing since the last time I looked. So probably it's a good time. Because I also know we've got folks with engagements and appointments.

01:31:15 [ML]

I think our seventh podcast member was really just talking a lot and took up all that time.

01:31:22 [CH]

That darn seventh panelist. Or I guess sixth panelist, seeing as we got one guest. Or fifth, because technically I'm a host. All right. Not important. If you've got thoughts, comments, questions, and you want to reach out to us, either for us or for Phineas and we can pass it along, you can reach us at?

01:31:39 [BT]

Contact@arraycast.com. I finally got it right after how many episodes of giving that information. So it's contact@arraycast.com. If you've got comments or suggestions or questions, we do our best to answer all of them. We certainly read all of them. And also the usual big shout out to Igor and Sanjay, our two transcribers, as well as to Adám, who provides us with the raw material so that we can provide transcripts that people can read if they don't want to listen to us. They still get the information. Or if they're unable to listen to us, they still get the information. So that's something that we do and we feel good about.

01:32:17 [CH]

And thank you, Phineas, for coming on. I know, well, I was going to say I feel, but I not only do I feel, but I know we could probably sit here for the next two, three hours and continue to chat and time would continue to fly by. But we unfortunately do have to land this plane at some point. But hopefully in the future, you'll be willing to come back and to chat again about Arrays with us and about q and any other future topics. So yeah, thanks for taking the time. This has been a blast. And with that, we will say Happy Array programming.

01:32:47 [All]

Happy Array Programming.

01:32:47 [Music]

Transcript

Episodes About