Transcript
Many thanks to Rodrigo Girão Serrão for producing this transcription
[ ] reference numbers refer to Show Notes
00:00:00 [Jeremy Howard]
What I mean, this is a reason that I, another reason I like doing things like APL study groups, 'cause it's a way of like self selecting that small group of humanity who's actually interested in trying new things, despite the fact that they're grown ups. And then try to surround myself with those people in my life.
00:00:25 [Conor Hoekstra]
Welcome to another episode of Array Cast.
00:00:28 [CH]
I'm your host Conor and today we have a very exciting guest which we will introduce in a second but before we do that we'll do brief introductions and then one announcement. So first we'll go to Bob and then we'll go to Adám, who has the one announcement and then we will introduce our guest.
00:00:41 [Bob Therriault]
I'm Bob Therriault.
00:00:42 [BT]
I'm a J enthusiast and I do some work with the J wiki. We're underway and trying to get it all set up for the fall.
00:00:49 [Adám Brudzewsky]
I'm Adám Brudzewsky, full-time APL programmer at Dyalog Ltd.
00:00:53 [AB]
Besides for actually programming APL and I also take care of all kinds of social things, including the APL wiki, and then for my announcements. Part of what we do with Dyalog is arrange a yearly user meeting or a type of conference. And at that user meeting there is also a presentation by a the winner of the APL problem solving competition. That competition closes at the end of the month, so hurry up if you want to participate. It's not too late even to get started at this point and also at the end of the month is the end of the early bird discount for the user meeting itself.
00:01:34 [CH]
Awesome and just a note about that contest I think, and Adám can correct me if I'm wrong. There's two phases in the first phase, it's just 10 short problems. A lot of them are just one liners, and even if you only solve one of the ten, I think you can win a small cash prize just from just from answering one, is that correct?
00:01:53 [AB]
I'm not even sure you might need to solve. Uh, you might need to solve them all, but they're really easy.
00:02:00 [CH]
OK, so the point being though is that you don't need to complete the whole contest in order to be eligible to win prizes.
00:02:05 [AB]
No, not for sure.
00:02:05 [CH]
There's there's a certain amount that if you get to that point, you hit a certain threshold and you can be eligible to win some free money, which is always awesome. And yeah, just briefly as as I introduce myself in every other episode I'm I'm your host, Conor, C++ professional developer, not an array language developer in my day to day, but a huge array language and combinator enthusiast at large. Which brings us to introducing our guest who is Jeremy Howard, who has a very, very, very long career and you probably have heard him on other podcasts or having other talks. I'll read the first paragraph of his three paragraph bio, 'cause I don't want to embarrass him too much, but he has a very accomplished career. So Jeremy Howard is a data scientist, researcher, developer, educator and entrepreneur, he is the founding researcher at Fast AI, a Research Institute dedicated to making deep learning more accessible, and is an honorary professor at the University of QLD. That's in Australia, I believe. Previously Jeremy was a distinguished research scientist at the University of San Francisco, where he was the founding chair of the Wicklow Artificial Intelligence in medical, medical research initiative. He's also been the CEO of Analytic and was the president and chief scientist of Kaggle, which is the basically data science version of LeetCode which many software developers are familiar. He was the CEO of two successful Australian startups, Fastmail and Optimal Decisions Group and before that in between, doing a bunch of other things, he worked in management consulting at McKinsey, which is an incredibly interesting start to a the career that he has had now because if for those of you that don't know, McKinsey is one of the three biggest management consulting firms alongside, I think bain & Co and BCG. So I'm super interested to hear how he started in management consulting and ended up, you know, being the author of of one of the most popular AI libraries in Python and also the course that's attached to it, which I think is if not you know, the most popular., a very very popular course that students all around the world are taking, so I will stop there, throw it over to Jeremy and he can fill in all the gaps that he wants. Jump back to however far you want to to tell us, you know, how you got to where you are now, and I think the one thing I forgot to mention too, is that he recently tweeted on July 1st and we're recording this on July 4th that he he quote the tweets reads "next week i'm starting a daily study group on my most loved programming language: aPL" and so obviously interested to hear more about that tweet and what's going to be happening with that study group. So over to you, Jeremy.
00:04:36 [Jeremy Howard]
Well, the study group
00:04:37 [JH]
is starting today as we record this so depending on how long it takes to get this out at all have just started and so definitely time for people to to to join in. So we'll I'm sure we'll include a link to that in the show notes. Yeah, I definitely feel kind of like I'm your least qualified array programming person ever interviewed on this show? I am, i love aPL and J, but I've done very, very little with them, particularly APL. I've done a little little bit with J mucking around, but like yeah, I find a couple of weeks here and there every few years, and I have for a couple of decades. Uhm, having said that, I am a huge, enthusiastic array programming as it is used, you know, in a loopless style, in other languages. Additionally, in parallel and nowadays in Python. Uhm, yeah, maybe I'll come back to that 'cause I guess you wanted to get a sense of my background. Yeah, so I actually started at McKinsey. I I grew up in Melbourne, Australia and I I didn't know what I wanted to do when I grew up, at the point that you're meant to know when you choose a university, you know major. So I picked philosophy on the basis that it was like you know, the best way of punting down the road, what you might do 'cause with philosophy you can't do anything, and and honestly, that kind of worked out in that i needed money and I need money to get through universities so I I got over like one day a week kind of IT support job at at mcKinsey, the McKinsey Melbourne office. During university from first year, I think that's from first year. But it turned out that like yeah, I was very curious, and so I'm very curious about management consulting. So every time consultants would come down and asked me to like, you know, clean out the sticky coke they've built in their keyboard or whatever. I would always ask them what they were working on and ask them to show me and I've been really interested in like doing analytics-y kind of things for a few years at that point. So during high school where basically every holidays I kind of worked on stuff with spreadsheets or Microsoft Access or whatever. So I, turned out i knew more about like stuff like Microsoft Excel than they did so within about two months of me starting this one day a week job i was working 90 hour weeks, basically doing analytical work for their consultants and so that you know that actually worked out really well, because I kind of did a deal with them, where they would they gave me a full time office and they would pay me $50.00 an hour for whatever time I needed and so suddenly I was actually making a lot of money, you know, working working 90 hours a week and and yeah, that was great because then the i would come up with these solutions to things they're doing in their projects, and I'd have to present it to the client. So next thing I knew I was basically on the client side, all the, or all the time. So I ended up actually not going to any lectures at university and and I somehow kind of manage this thing where I would take two weeks off before each exam go and talk to all my lecturers and say, hey, I was meant to be in your university course. I know you didn't see me, but I was kind of busy. Can you tell me what I was meant to have done and then I would do it and so I kind of scraped by uh at a BA in philosophy. But I don't, yeah, you know, I don't really have much of an academic background, but that did give me a great background in like applying stuff like you know linear regression and logistic regression and linear programming, and you know the basic analytical tools of the day, generally through VBA scripts in Excel. Or, you know, Access. You know the kind of stuff that a consultant could check out, you know onto their onto their laptop the client side. Anyway, I I that I always felt guilty about doing that, 'cause it just seemed like this ridiculously nerdy thing to be doing when I was surrounded by all these very important, you know, consultant types who seemed to be doing much more impressive strategy work, so I tried to get away from that as quickly as I could because I didn't want to be the nerd in the company and yeah, so I ended up spending the next 10 years basically doing strategy consulting. But throughout that time I did, you know 'cause I didn't have the same background that they did the expertise they did the MBA they did, i had to solve things using data and analytically intensive approaches, so although in theory I was a strategy management consultant and I was working on problems like you know, how do we fix the rice industry in Australia or you know how do we you know like you know how do we deal with this new competitor coming into this industry? Whatever it was, I always did it by analyzing data. Which actually turned out to be a good niche you know 'cause I was the one mcKinsey consultant in Australia who did things that way and so i I was successful and I became I think for I ended up moving to AT Kearney, which is the other of the two original management consulting firms. I think I became like the youngest manager in the world, and you know through this, we had parallel paths I was doing and then through that learned about the insurance industry and discovered like the whole insurance industry was basically pricing things in a really dumb way. I developed this approach based on optimization of optimized pricing that launched a company uh, with my university friend who had a PhD in operations research. And yeah, so we built this new approach to pricing insurance, which is, it was kind of fun, i mean, it's uh, you know it went well, you know commercially, took, spent about 10 years doing doing that and at the same time writing an e-mail company called Fastmail which also went well. Yeah, we started out basically using C++ and i would say that was kind of the start of my array programming journey, in that in those days, this is like 1999, the very first expression templates based approaches to C++ numeric programming were appearing. And and so I, you know, was talking to the people working on those libraries doing stuff like, particularly stuff doing the the big kind of high energy physics experiments that were going on in Europe. It was ultimately pretty annoying to work with though, like the amount of time it took to compile those things, it would take hours and it was quirky as all hell. You know, it's still pretty quirky doing metaprogramming in C++, but in those days it was just a nightmare. Every compiler was different, so I ended up switching to c#. Shortly after that came out and you know, move in a in a way it was disappointing because that that was much less expressive as a kind of array programming paradigm. And so instead I ended up basically grabbing Intel's MKL library, which is basically bLAS on steroids. If you like and and writing my own C# wrapper to give me you know kind of array programming-ish capabilities but not with any of the features one would come to expect from a real array programming language around kind of dealing with rank sensibly and you know not much in the way of broadcasting. Uhm, which reminds me we should come back for talking about BLAS at some stage 'cause a lot of the reasons that most languages are so disappointing at array programming is because of our reliance on on BLAS you know, as an industry. Uhm, I, fastmail, on the other hand, is being written in Perl. Which I really enjoyed as a programming language and still do Mr Love Perl a lot. Uh, but the scientific programming in Perl i didn't love at all and and so at the time Perl 6, you know we was just starting to the idea of it was being developed, so I ended up running the Perl 6 working group to add scientific programming capabilities or kind of, you know, and at the time I describe, those APL inspired programming capabilities to Perl. And so I, I, uh, did an RFC around what we ended up calling hyper operators, which is basically the idea that any operator cannot can operate on arrays and can broadcast over any axes that are that are mismatched or whatever. And those RFC's all ended up getting accepted and Damian Conway and Larry Wall kind of expanded them a little bit. Perl 6 never exactly happened, it ended up becoming a language called Raku. Yeah, and that, you know, and the kind of the performance ideas, i really worked hard and never really happened either, so that was a bit of a yeah, that was all a bit of a failure, but it was fun and it was interesting, you know I saw after running these companies for 10 years is one of the big problems with running a company is that you're surrounded by people who you hired and they you know have to make you like them if they want to get promoted and not get fired. And so you could never trust anything anybody says. So I was, you know, very i had very low expectations about my capabilities analytically. So I hadn't like, you know, I basically been running companies for 10 years. I did a lot of coding and stuff, but it was in our own little world. And so, after I sold those companies, uhm yeah, I one of the things I decided to do was to try actually to become more competent, you know I I had lost my feeling uhm, to some extent I had I had I had lost my feeling that I should hide my nerdiness, you know and try to act like a real business person. And I thought no, I should actually see if I'm actually any good at this stuff, so so I tried entering a machine learning competition at a new company that had just been launched called Kaggle uh, with this goal of like not coming last. So basically the the such, you know the the way these things work is you have to make predictions on a data set and at the end of the competition, whoever predictions for the most accurate, wins the prize and so my goal was yeah, try not to come last. Which I wasn't convinced I'd be able to achieve, because as I say, I didn't feel like this as i'd never had any technical training you know and everybody else in these competitions, with PhDs and professors or whatever else. So it felt like a high bar. Anyway. I ended up winning it. Oh wow and that that changed my life, right? Because yeah, it's like oh OK, I am, you know, empirically good at this thing and and people at my local user groups, our user groups are quite big as well, well, you know I I told them I'm going to try entering this competition, anyone want to create a team with me? I want to like learn to use R properly and I kind of went back the next user group meeting and people were like I thought you were just learning this thing, how did you, how did you win? It's like I don't know. I just used common sense. Uh, yeah. So I ended up becoming the chief scientist and President of Kaggle and Kaggle, as you know, anybody in the data science world knows, has kind of grown into this huge huge thing, ended up selling it to Google so I ended up being an equal partner in the company. I was the first investor in it and that was great, that was like, i just moved in, we moved to San Francisco for 10 years, you, you know, surrounded as surrounded by all these people who are just sort of role models and idols, and partly getting to meet all these people in San Francisco with this experience of realizing all these people were actually totally normal, you know, and they weren't like some super genius level like they're just normal people who uhm yeah as I got to know them, it gave me, I guess a lot more confidence in myself as well, so uhm.
00:18:07 [AB]
Maybe they were just normal relative to you.
00:18:10 [JH]
I I I think in Australia, we all feel a bit, you know, intimidated by the rest of the world. In some ways, we're a long way away, you know? Well, our only neighbors really have New Zealand. It's very easy to feel, i don't know like yeah, we we we're we're not very confident of our capabilities over here other than in sport, perhaps? So one of the things that happened well as at Kaggle was I had played around with neural networks a a bit a good bit you know, like 20 years earlier and I always felt like neural networks were one day going to be the thing it's like, you know they they are at a theoretical theoretical level, infinitely capable, you know, they never quite did it for me, and but then in 2012, suddenly neural networks started achieving superhuman performance for the first time on really challenging problems, like recognizing traffic signs, You know, like recognizing pictures and, and I'd always said to myself, i was going to watch for this moment and when it happened I wanted to like jump on it, so as soon as I saw that, I tried to jump on it. So I started a new company after a year of research into like the you know what what are neural networks going to do? I decided medicine is going to be huge. I knew nothing about medicine and I yeah I started a medicine company to see what we could do with deep learning in medicine. So that was Enlitic. Yeah yeah, that'd up ended up going pretty well. And yeah, eventually I kind of got like a bit frustrated with that though 'cause it felt like deep learning can do so many things and I'm only doing such a small part of those things. So deep learning is like neural networks with multiple layers. I thought the only way to actually help people really you don't make the most of this incredibly valuable technology is to teach other people how to do it and to help other people to do it, so my wife and I ended up starting a new kind of research lab, Fast AI to to help to help do that basically. Initially focus on education and then increasingly focus on research and software development to basically make it easier for folks to use some deep learning. And that's yeah, that's where I am now. And that, everything in deep learning is all Python and in Python we're very lucky to have you know, excellent libraries that behave pretty consistent with each other. Basically based around this NumPy library which, UM treats arrays very, very similarly to how uhm, J does. Except rather than leading axis, it's trailing axis, but basically you get, you know you don't you get loop free, you get broadcasting you know, you don't get things like uh, rank, conjunction. But there's very easy ways to permute axes so you can do basically the same thing. Things like Einstein notation. You know the they're built into the libraries, and then you know it's it's trivially easy to have them run on GPU or TPU's or whatever you know. So for the last few years of my life nearly all the code I write is array programming code. Even though I'm not using a purely array language.
00:21:54 [CH]
Alright, so where do we start now with the questions? I'll let Bob and Adám go first if they want, and if they if they don't have, uh, OK, Bob you go ahead
00:22:05 [BT]
i've got a quick question about about neural networks and stuff, because when I was going to university all those years ago, people were talking about neural networks and then they just sort of dropped off the face. And as you said around 2010 suddenly they resurfaced again. Yeah, what do you think was the cause of that resurfacing? Was it hardware? Was that somebody discovered a new method or what?
00:22:25 [JH]
Yeah, mainly hardware. So So what happened was people figured out how to do GP GPU, so general purpose GPU computing. So before that I tried a few times to use GPU's with neural nets. I felt like that would be the thing but GPU's were all about like creating shaders and whatever and I, the whole jargon thing I didn't even understand what was going on. So the key thing was NVIDIA coming up with this CUDA uh, approach which it's it's all routes, right? But it's much easier than the old way like the loops you basically, it it's kind of loops, at least, you basically say to to CUDA, this is my kernel, which is the piece of code I want to basically run on each symmetric multiprocessing unit and then you basically say launch a bunch of threads. And it's going to call your kernel, you know basically incrementing the x and y coordinates and passing it to your kernel or making them available to your kernel so it's kind of it's not exactly a loop, but it's this it's more like a map, I guess and and so when CUDA appeared, yeah, very quickly neural network libraries appear to take advantage, appear appeared that we take advantage of it and then suddenly you know, you get orders of magnitude more performance, and it's cheaper and you get to buy an NVIDIA graphics card with a free copy of Batman. You know on the excuse that actually this is all for work. So it's it was it was mainly that there's also this just like at the same time the thing I'd been doing for 25 years suddenly got a name, data science. You know, we like, there's this very small industry of people like applying data driven approaches to solving business problems and we were always looking for a name. Not many people know this, but back in the very early days there was an attempt at calling it industrial mathematics. Sometimes people would like shoehorn it into operations research or management science, but that was almost exclusively optimization people and specifically people focused more on linear programming approaches, so yeah, once data science appeared and also like you know, basically every company had finally built their data warehouse and the data was was there. So yeah, it's like a more awareness of using data to solve business problems, and for the first time availability of the hardware that we actually needed. And as I say in 2012, it just it's it, it reached the point like it's been growing since the first neural network was built in was at 1957, I guess, uhm, at this kind of gradual rate, but once it passed human performance on some tasks, it just kept going and so now in the last couple of months you know it's now like getting decent marks on MIT math tests and stuff. It's it's some it's on an amazing trajectory.
00:25:33 [BT]
Yeah, it's kind of a critical mass kind of thing where you get a certain amount of information and able to process that information it, i guess as you as you you do with your hand, it's an exponential curve, yeah, and humans and exponential curves i think we're finding over and over again. We're not really great at at understanding an exponential curve.
00:25:54 [JH]
No, we're not. And that's like why I promised myself that as soon as I saw the run that's starting to look like they're doing interesting things, I would drop everything and and jump on it because I wanted to jump on that curve as early as possible and we're now in this situation where people are just making huge amounts of money with neural nets, which they then reinvest back into making the neural nets better and and so we are also seeing this kind of bifurcation of capabilities where there's a small number of organisations who are extremely good at this stuff and invested in it and a lot of organisations you know, really struggling to figure it out.
00:26:34 [BT]
And because of the exponential nature, when it happens it happens very quickly. It feels like you didn't see it coming and suddenly it's there and then it was past you. And I think we're all experiencing that now.
00:26:44 [JH]
Yeah, and it's happened in so many industries, you know. Uh, back in my medical startup, you know we were interviewing folks around medicines, we interviewed a guy finishing his PhD in histopathology and I remember he you know he came in to do an interview with us. And he basically gave us a presentation about his thesis on kind of graph cut segmentation approaches for pathology slides and at the earliest like anyway, that was my PhD and then yesterday because I was coming to see you guys and I heard you like neural nets, I just thought I'd check out neural nets, and about four hours later, I'd trained a neural net to do the same thing I did for my PhD, and it way outperformed my PhD thesis I had spent the last five years on, and so that's where I'm at, you know. And we hear this a lot.
00:27:38 [BT]
Existential crisis in the middle of an interview.
00:27:43 [CH]
So I I kind of have I don't know this is like a 1 A, B, and C, and and I'm not sure if I should ask them all at once. But so you said, sort of at the tail end of the 90s is when your array language journey started. But it seems from the way you explained it that you had already at some point along the way, heard about the array languages APL and J, and have sort of alluded to you know, picking up some knowledge about the paradigm and the languages. So I my first part of the question is sort of you know, at what point were you exposed to the paradigm in these languages? The second part is what what's causing you in 2022 to you know, really dive into it 'cause you said you feel like maybe a bit of an impostor or the least qualified guest, which probably is you just being very modest. I'm sure you know still quite a bit, and then the third part is do you have thoughts about and I've always sort of i wondered how the array language paradigm sort of missed out on like and Python ended up being the main data science language. Uhm, while like there's like an article that's floating around online called NumPy The Ghost of Iverson, which it's it's this sort of you can see that in the names and and the design of the library, that there is a core of APL and even the documentation acknowledges that it took inspiration greatly from J and APL, but that like the array language is clearly missed missed what was a golden opportunity for their paradigm, and we ended up with libraries and other languages. So I I just asked three questions at once, but yeah, feel free to tackle them in any order.
00:29:20 [JH]
I have a pretty bad memory, so I'll I think I've forgotten the second one already, so you can feel free to come back to any or all of them. So my journey, which is why you started with UM uhm, I always felt like we should do more stuff without using code. The current side, or at least like kind of traditional, what I guess would call nowadays imperative code, i there was a couple of tools in in in my early days which I've got huge amounts of leverage from 'cause nobody else in in at least the consulting firms or generally in our clients knew about them. That was SQL and pivot tables. And so pivot tables, if you haven't come across it was basically the one of the earliest approaches to to overlap, you know, slicing and dicing there was actually something slightly earlier called Lotus Improv, but that was actually a separate product. Excel was basically the first one to put overlap in the spreadsheet, so no loops. You just drag and drop the things you want to group by and you right click to choose how to summarize. And same with SQL, you know. Uh, you declaratively say what you want to do, you don't have to loop through things. SAS actually had something similar you know, with SAS you could basically declare a prop that would run on your data. So yeah, I kind of felt like this was the way I would rather do stuff if I could. And I think that's what led me when we started doing the C++ implementation of the insurance pricing staff of being much more drawn to these metaprogramming approaches. I I just didn't want to be writing loops in loops and dealing with all that stuff. Just I'm I'm too lazy, you know to to do that. I think I'm very driven by laziness uhm, which as Larry Wall said, is one of the three virtues of a great programmer. Then yeah, so I I think as soon as I saw NumPy had reached a level of some, you know, reasonable confidence in Python, i was very drawn to that 'cause that was what I had been looking for and I think maybe that actually is going to bring us to answering the question of like what happened for array languages. Python has a lot of problems, but at its heart it's a very well-designed language. It has a very small flexible core. Personally, I don't like the way most people write it, but I've, but it is so flexible I've able to create create by almost my own version of Python which is very functionally oriented. I basically have stolen the type dispatch ideas from Julia, created implementation of that in Python, you know, I I, you know my Python code doesn't look like most Python code, but I can use all the stuff that's that's in Python. So this is very nicely designed core of a language, which I then have this almost this DSL on top of, you know, NumPy is able to create this kind of DSL again because it's working on such a flexible uhm, ideally you know i mean, it it well, oK, so Python also has another DSL built into it, which is math, you know, I can use the operators plus times minus that's that's convenient and in every array library done, pipeline, torch, tensorflow, and Python, those operators work over arrays and do broadcasting over axes and so forth. And, you know, accelerate on an accelerator like a GPU or TPU. That's all great. I you know, my ideal world would be that i I wouldn't just get to use plus times minus, but I get to use all the APL symbols you know, that would be, that would be amazing, but given a choice choice between a really beautiful language, you know what it's called, like Python, in which I can then add a slightly cobbled together DSL like NumPy, i would much prefer that over a really beautiful notation, like APL. But without the fantastic language underneath, you know, like I'd, I don't feel like i there's there's nothing about aPL or J or J or k, like, programming language, that attracts me. You know what I mean? I feel like in terms of like what I could do around around whether it be type dispatch or how I/O is designed, or you know how I package modules or almost anything else? I would prefer the Python way. So I feel like, uhm that's basically what we've ended up with you you kind of either compromise between you know, a good language with you know slightly substandard notation or amazingly great notation with the substandard uh or not just language but ecosystem you know Python has an amazing ecosystem. Uhm, I think i I hope one day we'll get the best of both, right like? Here's my, OK, here's my controversial take and I may just represent my lack of knowledge. What I like about APL is its notation. Uh, I think it's a it's a beautiful notation I I don't think it's a beautiful programming language. Uhm, I think some things, possibly everything you know, but some things work very well as a notation. Uhm, but to get to get to raise something to the point that it is a notation, some years of study and development and often some genius you know, like the genius of Feynman diagrams or the genius of juggling notation you know, like there are people who who find a way to turn a field into a notation and suddenly they they blow that field apart and make it better for everybody. For me, like i don't want to think too hard all the time. Every time I come across something that really hasn't been turned into note into a notation yet, you know, sometimes I just like i just want to get it done, you know, and so I would rather only use notation when I'm in these fields that either somebody else had figured out how to make that annotation, or I feel like it's really worth be investing to figure that out. Otherwise you know there are and and the other thing I'd say is we already have notations for things that aren't APL that actually work really well, like regular expressions for example. That's that's a fantastic notation, and I don't want to replace that with APL glyphs, I just want to use regular expressions, so so yeah. My ideal world would be one where we where I can write pytorch code but maybe instead of like einstein operations, Einstein notation I could use APL notation that that I think that's where I would love to get to one day and I would and I would love that to totally transparently run on a GPU or GPU.
00:37:02 [CH]
Wouldn't we all? Wouldn't we all?
00:37:04 [JH]
Well, that would be that would be my happy place.
00:37:07 [CH]
Has no reason to do with the fact that I work at nVIDIA that I would love that but interesting, I've never heard that before. The difference between basically appreciating or being in love with the notation, but not not the language itself.
00:37:25 [JH]
And you know it started out as a notation, right like Iverson talked, you know it was a notation they used for representing state machines, whatever on a early IBM hardware you know, when he did, his Turing award is, say, he chose to talk about his notation and and you know you see with people like like like Aaron with with his Co-Dfns and stuff that if you take a very smart person and give them a few years, they can use that notation to solve incredibly challenging problems like build a compiler, uhm, and do it better than you can without that notation. So I'm not saying, like, yeah, APL can't be used for almost anything you want to use it for, but a lot of the time we don't have five years to study something very closely. We just want to, you know, we've got to get something done by tomorrow.
00:38:23 [AB]
Interesting... We still didn't, didn't get an answer to part 2. When did you first, well, when did you first meet APL or how did you even find APL?
00:38:30 [JH]
I first found J, I think, UM which obviously led me to APL. And I don't quite remember where I saw it. Yeah, and and actually when I got to San Francisco, so that would be, i'm trying to remember, 2010 or something, I'm not sure I i actually reached out to Eric Iverson and I said, like, oh, you know, we're we're studying this machine learning company called Kaggle and I kind of feel like, you know, everybody does stuff in Python and it's kind of in a lot of ways really disappointing. I wish we're doing stuff in J, you know, but we really need everything to be running on the GPU, or at least everything to be automatically using SIMD and multiprocessor everywhere. He was kind enough to actually jump on a Skype call with me, not just jump on a Skype call, 'cause like he asked "how do you want to chat" and I said "what about Skype?" and he created a Skype account to chat. Oh yeah, we chatted for quite a while and I we talked about you know these these these kinds of hopes and and yeah, but I just, you know, never really because because neither J or APL is in that space yet. There was just never a reason for me to do anything other than like i kind of felt like each time I'd have a bit of a break for a couple of months I'd always spend a couple of weeks fiddling around with J just for fun, but that's that's as far as I got, really.
00:40:12 [BT]
Yeah, I think the first time I'd heard of you was in in an interview that Leo LaPorte did with you on Triangulation, and you were talking about Kaggle. That was the specific thing, but I I think I was riding my bike along some logging or something and suddenly he said, Oh yeah, but a lot of people use J like J. It's the first time I'd ever heard anybody on a podcast, say anything about J, it was just like wow, that's amazing. I didn't know what, you know, like, and and the whole interview about Kaggle. There was so much of it about the importance of data processing, not just having a lot of data, but knowing how to filter it down, not over filtering all those tricks. I'm thinking, wow, these guys are really doing some deep stuff with this stuff and this guy is using J. I I was actually very surprised at that point, that somebody, I guess not that somebody was who is working so much with data, would know about J, but just that it would be i guess just suddenly popped onto my headsets and I'm just wow. That's so neat.
00:41:14 [JH]
Yeah, and I will say like, in the array programming community, I find there's like this seems to be a common misconception that like the reason people aren't using array programming languages is because they don't know about them or don't understand them, you know, which it's, uh, kernel of truth with that, but the the the the the truth is like, nowadays there's huge, massively funded research labs at places like Google Brain, and you know, Facebook AI Research and Open AI and so forth, where large teams of people are literally writing new programming languages because they've tried everything else and what's out there is not sufficient, you know, I, I find there's you know in the array programming world there's a offered a huge kind of under appreciation of what Python can do nowadays. For example, like I've as recently as last week, I heard it described in a chatroom is like, people obviously don't care about performance because they're using Python and it's like, well, you know, a large amount of the world highest performance computing now is done with Python like it's it's it's not because Python is fast, it's because like but if you want to use RAPIDS, for example, which literally holds records for the highest performance render recommendation systems and tabular analysis, you write it in Python, you know. And that's you know so this idea of having uh, a fast kernel that's not written in the language and then something else talking to it in a very flexible way i think is is great. And as I say at the moment we are very hamstrung in a lot of ways that we, at least until recently, we very heavily relied on on BLAS which is totally the wrong thing for that kind of flexible, high performance computing 'cause it's this, you know, bunch of, somewhat arbitrary kind of selection of linear algebra algorithms which you know things like the C# work I did, you know they were just wrappers on top of of BLAS. And what we really want is a way to write really expressive kernels that can do anything over any axes, so then there are other newer approaches like Julia, for example, which is yeah, it's kind of like got some risky elements to it, and this type dispatch system, but because it's you know, in the end it's on top of LLVM, what you write in Julia, you know, it it it it it does end up getting optimized very well and you can write pretty much arbitrary kernels in Julia and often get best in class performance. And then there's other approaches like Jax and Jax sits on top of something totally different. Which is it sits on top of XLA. And XLA is a compiler which is mainly designed to compile things to run fast on Google TPU's, but it's also does an OK job of compiling things to run on on GPU's. And then really, excitingly, I think you know for me is the the MLIR uhm, uhm project and particularly the the Efine dialects that was created by my friend Chris Lattner, who you probably know from creating Clang and LLVM and SWIFT. So he he joined Google for a couple of years and we worked really closely together on trying to like, think about the vision of of really powerful programming on accelerators that's really developer friendly. Unfortunately didn't work out. Google was a bit too tight to to tensorflow, but one of the big ideas that did come out of that was MLIR, and that's still going strong. And I do think there's you know if if something like APL you know could to tag at MLIR and then become a DSL inside Python. It may yet win, you know.
00:45:29 [CH]
I've heard, yeah, I've heard you in the past say that on on different podcasts and talks that you don't think that Python, like even in in light of you know, just saying people don't realize how much you can get done with Python that you don't think that the future of data science and AI and neural networks and that type of computation is going to live in the Python ecosystem, and I've i've heard on some podcasts you've said that you know Swift has a shot based on sort of the way that they've designed that language and you just mentioned a, you know a plethora of different sort of i wouldn't say initiatives, but you know JAX, sLA, Julia etc do you have like a sense of where you where you think the future of not necessarily sort of array language computation, but this this kind of computation is going with all the all the different avenues.
00:46:11 [JH]
Yeah, I do you you know, I think we're certainly seeing the limitations of Python and the limitations of the the Pytorch, you know, lazy evaluation model, which is the way most things are done in Python at the moment for kind of array programming is you have an expression which is you know, working on arrays, possibly of different ranks with implicit looping, and you know that's one line of Python code, and generally that then gets your you know on your computer that'll get turned into, you know, a request to run some particular optimized pre written operation on the GPU or TPU and then gets sent off to the GPU or TPU where your data has already been moved there. It runs and then it tells the CPU when it's finished and there's a lot of latency in this, right? So if you want to create your own kernel, like your own way of doing, you know your own operation effectively you know, good luck with that. That's not going to happen in Python. And I hate this. I hate it as a teacher because, you know, I can't show my students what's going on, right? It kind of goes off into, you know, kind of a CUDA land and then comes back later. I I hate it as a hacker 'cause I can't go in and hack at that. I can't trace it, I can't debug it. I can't easily profile it I I hate it as a researcher because very often I'm like, I know we need to change this thing in this way, but I'm damned if I'm going to go and write my own cUDA code, let alone deploy it so jax is, I think, a path to this. It's where you say OK, let's not target pre written CUDA things, let's instead target a compiler. And you know, working with Chris Lattner, I'd say he didn't have too many nice things to say as about XLA, as a compiler it was not written by compiler writers. It was written by machine learning people really, but it it it does the job, you know, and it's certainly better than having no compiler and so Jax is something which instead of turning our line of Python code into a call to some pre written operation, it instead is turning it into something that's going to be read by a compiler and so the compiler can then you know, optimize that as compilers do. So yeah, I would guess that Jax probably has a part to play here. Particularly because you get to benefit from the whole Python ecosystem package management libraries, you know, visualization tools, etc. But you know longer term it's a mess, you know it's a mess. Using a language like Python which wasn't designed for this, it wasn't really even designed as something that you can chuck different compilers onto so people put horrible hacks. So, for example, Pytorch, they have something called torch script, which is a bit similar, you know it takes Python and kind of compiles it, but they literally wrote their own parser using a bunch of regular expressions and it's it's you know it's not very good at what it does. It even misreads comments and stuff. So I I you know, I do think there's definitely room for a, you know, a language of which Julia would certainly be the leading contender at the moment to come in and do it properly. And Julia's got, you know, Julia is written on a Scheme uhm, basis, so there's this little Scheme kernel that does the parsing and whatnot, and then pretty much everything else after that is written in Julia. And then, leveraging LLVM very heavily, but that's I think that's what we want, right? Is that something which i guess also I didn't love about Swift when when the team at Google wanted to add differentiation support into Swift, they wrote it in C++. And I was just like that's not a good sign you know, like apart from anything else, you end up with is a group of developers who are in theory Swift experts, but they actually write everything in C++, and so they actually don't have much feel for what it's like to write stuff in Swift, they're writing stuff for Swift and Julia. Pretty much everybody who's writing stuff for Julia is writing stuff in Julia, and I think that's that something you guys have talked about around APL and J as well is that there's the idea of writing J things in J and APL things in APL is very powerful idea.
00:51:13 [CH]
Yeah, I always wonder about...
00:51:14 [JH]
Oh! Yeah, sorry, go on. I just remembered your third question, I'll come back to it.
00:51:18 [CH]
No, no, you go ahead.
00:51:19 [JH]
Oh, you asked me why now am I coming back to APL and J which is dumb, totally orthogonal to everything else we've talked about, which is I had a daughter. She got old enough to actually start learning math, so she's 6 uhm. Uhm Oh my God, there's so many great educational apps nowadays, there's one called Dragon Box Algebra. It's so much fun. Dragon Box Algebra 5 plus and it's like 5 plus algebra like what the hell so when just I think it actually says till four I gave, you know I let her play with Dragon Box Algebra 5 plus. And she learned algebra you know, by helping dragon eggs hatch. Uhm, and she liked it so much I that I tried doing Dragon Box Algebra 12 plus and she loved that as well and finished it and so suddenly I had a 5 year old kid that liked algebra. Much to my surprise. Uhm, kids really can surprise you and so yeah, she struggled with a lot of the math that they were meant to be doing at primary school like like the vision and multiplication, but she liked algebra and we ended up homeschooling her and then one of her best friend is also homeschooled. So this this year I decided I'd try tutoring them in math together. Uh, so my daughter name is Claire, so her friend Gabe so her friend Gabe discovered on his Mac the world of alternative keyboards. So he would start typing in the chat in, you know Greek characters or Russian characters. And one day I was like OK, check this out. So I was like, clapped in some APL characters and they were just like. Wow, what's that? We need that. So initially we installed Dyalog APL so that they could type APL characters in in the chat and and so I explained to them that this is actually. It's like super fancy math that you're typing in and they really wanted to do the trial so uhm. And that was at the time I was trying to teach them sequences and series and they were not getting it at all. It was my first tutor failure time as a as a math tutor with them. You know they've been zipping along fractions you know greatest common denominator factor trees. OK, everything is fine, it makes sense, and then we hit sequences and series and spike. They had no idea what I was talking about. So we put that aside. Then we spent like three 1 hour lessons doing the basics of APL you know the basic operations and doing stuff with lists and dyadic versus monadic but still get up just primary school level math and we also did the same thing in NumPy using Jupyter and they really enjoyed all that, like they were more engaged than our normal lessons and so then we came back. So like you know, Sigma "i" equals one to five of π squared whatever, and it's like OK, that means this, you know, in APL and this in NumPy and they're like "Oh, is that all? Fine". You know that's like, yeah, so that was the problem. This idea of like T(n) equals T(n - 1) plus blah but it's like what is this stuff where you're actually indexing real things and can print out the intermediate values and all that and you've got a holder or a range, they're just like Ah, OK and we should you know, I don't know why I explained it this dumb way before and I will say given a choice between doing something on a whiteboard or doing something in NumPy or doing something in APL now they will always pick APL, because the APL version it's just so much easier you know there's less to type, less to think about, there's less boilerplate. And so it's been it's only been a few weeks, but like yesterday we did the power operator you know, and so we literally started doing the foundations of mathematics, metamathematics, so it's like OK with let's create a function called Capital S, Capital S, arrow, you know, plus jot one, right so for those Python people listening, jot is, if you give it a, uh an array or a scalar, it's the same as partial in in in in Python, or bind in C++. So, OK, we've now got something that adds one to things. OK, I said OK, this is called the successor function and so I said to them oK, what would happen if we go S S S 0. And like, oh, that would be 3. And so I said, OK, well what's addition? What's it? And then one of them is like, oh, it's it's repeated S now like yeah, it's repeated as so how do we say repeat it? So, in APL we say repeated by using this star diaresis. It's called power OK, so now we've done that. What is multiplication? And then one of them goes after a while oh, it's repeated addition so we define addition and then we define multiplication and then I'm like OK well, what about you know exponent. Oh, that's just that now this one. They've heard 1000 times. They both had immediately like oh that's repeated multiplications. So, like OK, we've nailed down find that and then OK well subtraction that's a bit tricky, well it turns out that subtraction is just you know is the opposite of something, that's at the opposite of... They're both no like "Oh, it's the opposite of addition" OK well opposite of which in math they call inverse, is just a negative power so now we define subtraction, so how would you define division? "Oh ok." how would you define roots? "Oh ok." so we kind of like, you know, designing the foundations of of mathematics here at APL. So you you know with a 6 year old and an 8 year old, and during this whole thing at one point we're like OK, well, now I can't remember why, but we're like oK, now we gotta do. 1 divided by a half, and they're both like we don't know how to do that. So you know, with APL this stuff that's considered like college level math suddenly becomes easy and you know at the point when still primary school level math like one divided by a half it's still considered hard so it definitely made me rethink you know, what is easy and what is hard and how to teach this math stuff and so I've been doing a lot of teaching of math with aPL and the kids are loving it, and I'm loving it. And that's actually why I started this study group, which will be on today, uhm today, as we record this a few days ago as you put it out there. As I kind of started saying on Twitter to people like Oh, it's really been fun teaching my kids, you know, my kid and her friend math using APL and a lot of adults were like, can we learn math using APL? So, so that's what we're going to do.
00:58:39 [BT]
Well and that's the whole notation thing isn't it. It's the notation you get away from the Sigmas and the Pi's and all that, you know, subscripts...
00:58:45 [JH]
I know, right?
00:58:47 [AB]
This is exactly what Iverson wanted in fact.
00:58:48 [JH]
Exactly I mean, who wants this you know why should capital Pi be product and capital Sigma be sum? It's like, you know, we did plus slash and then it's like OK? How do we do product they'd like oh so this times slash then I show them backslash and like, how do we do a cumulative product and so it's obviously times backslash. Yeah, this stuff and but you know a large group of adults can't can't handle this because I put stuff on Twitter, i'll be like here's a cool thing in APL and like half the replies will be well, that's line noise, that's not intuitive. How do you type that? It's this classic thing that Iverson always say it's like the difference between what you, is it that you don't understand it or is it hard? And and you know kids, don't not go, for kids everything is new, so they you know they see something they've never seen before they just like "teach me that or else". Adults, or at least a good chunk of adults, just look i don't immediately understand that therefore it's too hard for me, therefore, I gotta belittle the very idea of the thing.
00:59:52 [BT]
I did. I did a tacit program one one liner on APL Farm the other day, and somebody said that looks like greek to me, I said, well, Greek looks like Greek to me, 'cause I don't know Greek. Sure, if you don't know it, yeah, absolutely it looks silly, but if you know it then it's it's not that hard
01:00:08 [JH]
yeah, I will say like you know, uh a lot of people have put a lot of hard work into resources for APL and J teaching, but I i think there's still a long way to go, and one of the challenges is just like when I was learning Chinese, I I really wanted, i liked the idea of learning Chinese new words by looking them up in the Chinese dictionary, but of course I didn't know what the characters in the dictionary meant, so I couldn't look them up. So when I learned Chinese, I really spent the first 18 months just focused on learning characters. So I got through 6000 characters in 18 months of very hard work and then I could start looking things up in the dictionary. My hope is to do a similar thing for APL. Like for these study groups, I want to try to find a way to introduce every glyph in an order that never refers to glyphs you haven't learned yet, and like that's something I don't feel like we really have, and so that then you can look up stuff in the Dyalog documentation 'cause now still I don't know that many glyphs so like most of the stuff, that documentation I don't understand because it explains glyphs using glyphs I don't yet know, and then I look those up and those are used to explain things with glyphs I don't yet know, so you know, step one for me is, I think we're just going to go in through and try to teach what every glyph is. And then I feel like we should be able to study this better together, because then we could actually read the documentation, you know.
01:01:40 [AB]
Are you going to publish these sessions online?
01:01:43 [JH]
So the the the the study group will be recorded as videos. But I also then want to actually create, you know, written materials using Jupyter, which I will then publish, that's my goal.
01:01:57 [AB]
So what you said very much resonates with me, that I often find myself in the when teaching people this, this bind that to explain everything I need to always have everything explained, and I think so and especially it comes down to in order to explain what many of these glyphs are doing, i need some fancy arrays if I restrict myself to simple vectors and scalars then I can't really show their power. And I cannot create these higher rank arrays without already using those glyphs, right? And so hopefully it is this long running project since like 2015. I think it is is to add a literal array notation to APL.
01:02:38 [JH]
Right.
01:02:39 [AB]
And then there is a way in. Then you can start by looking at what is an array and then you can start manipulating and see the effects of the glyphs and into it from there. What they do?
01:02:51 [JH]
Yeah, no, I think that'll be very, very helpful. And in the meantime you know my approach with the kids has just been to teach rho quite early on, so rho is the equivalent of reshape in Python, most Python libraries, and and yeah, so once you know how to reshape you can start with a vector and shape it to anything you like and it's, you know it's not a difficult concept to understand, so I think that yeah, basically the trick at the moment is just to say OK in our learning of the Dictionary of APL, one of the first things we will learn is is rho, and then it was really fun with the kids doing monadic rho, you know to be like, OK, well, what's rho of this? What's rho of that? And OK, what's rho of rho of this? And then what's rho of rho of rho, which then led me to the UM to the Stallman poem about that rho rho rho is 1, et cetera, et cetera, which they loved as well.
01:03:53 [CH]
Yeah, we'll link that in the show notes also too. While you were saying all that, that really resonated me with me when I first started learning aPL is is like one of the first things that happened when I was like oh OK, you can you can fold you can map so like how do you filter? You know what are the classic you know three functional things and the problem with APL and array languages is they don't have an equivalent filter that takes a predicate function, they have a filter that is called compress that takes a mask, that you know drops anything that corresponds to a 0. And it wasn't until a few months later that I ended up discovering it. But for both APL and the newer APL, BQN, there's these two sites, adám was the one that wrote the APL one, aplcart.info, and bqncrate.info, I also think, and so you can basically semantically search for what you're trying to do, and it'll give you small expressions that do that. So if you type in the word "filter", which is what you would call it coming from, you know, a functional language, or even I think Python calls it filter. You can get a list of small expressions and really, really often sometimes you need to know the exact thing that it's called, like one time I was searching for, you know all the combinations or permutations and really what I was looking for was Powerset. And so until you have that, you know the word powerset, you know it's you, know it's a it's a fuzzy search, right? So, but it's still a very very useful tool. When it's like you said you're trying to learn something like Chinese and it's like, well, where do I even start? I don't, i don't know the language to search the words to search for. But yeah it is, i, I agree that there's a, uh, a large room for improvement and how to onboard people without them immediately going like you said, this looks like hieroglyphics, which I think Iverson considered a compliment like I there's some anecdote I've heard where someone was like this is hieroglyphics. He says "Yes, exactly!" and then they both just...
01:05:47 [JH]
I think the other thing like that I wanna do is help in particular Python programmers and maybe also do something for JavaScript programmers, which are the two most popular languages like at the moment. Like a lot of the tutorials for stuff like J or whatever like J for C programmers you know, great book, but most people aren't C programmers and also a lot of the stuff like you know it would be so much easier if somebody just like said to me early on oh, you know, jot's just the same as partial in Python, you know? Or it's like, you know, putting things in a box, what the hell is a box? Somebody basically said, oh, it's basically the same as a reference, and it's like, oh OK, you know, I think in one of your podcasts somebody said us like void star. So yeah, OK, you know this is kind of like lack of just saying like this is actually the same thing as blah in in Python and JavaScript, so I do want to do some kind of yeah, mapping like that, particularly for kind of NumPy programmers and stuff 'cause a lot of it's so extremely similar. Be nice to kind of say like OK well this is, you know, J maps things over leading axes, which is exactly the same as NumPy, except it doesn't open trailing axes. So if you know the NumPy rules, you basically know the J rules.
01:07:09 [BT]
Yeah, I think I think at the basic level you're absolutely right and that that would certainly be really useful when we've talked this over before some of the challenges are in the flavors and the details. If you send somebody down the wrong road with a metaphor that almost works in some of these areas, it can really be challenging for them because they see it in with, you know, through their lens of their experience, but that would say in this area it would work differently than it actually does. So there is a challenge in that and we find it even between APL, BQN, and J. I'm trying to think of what we were talking about recently. It was transpose...
01:07:50 [AB]
Dyadic transpose, yeah!
01:07:50 [BT]
The languages, dyadic transpose, they hand, they handle them differently. They're functionally you can do the same things, but you have to be aware that they are going to do it differently according to the language.
01:08:01 [JH]
Absolutely, but that's not a reason to throw out the analogy right like I think everybody agrees that that it's easier for an APL programmer to learn J than for a...
01:08:03 [BT]
No no.
01:08:12 [JH]
JavaScript programmer to learn J, you know, 'cause there are there are some ideas you understand and you can actually say to people like OK, well, this is the rank conjunction in J and you may recognize this as being like the rank you know operator at APL. So if we can do something like that and say like, oh, OK, this is, you know this would do the same thing as you know, uh dot permute dot blah in in Pytorch, it's like OK.
01:08:39 [AB]
I I see it well as the maintainer of aplcart, I'd like to throw in a little call to the listeners and like what kind of mentioned. And I do fairly often get people saying, well, I couldn't find this and this and ask them what did you search for. So do let me know. Contact me by whatever means is that if you couldn't find something either because it's altogether missing and I might be able to edit or tell me what you search for and couldn't find. Or maybe you found it later by searching for something else. And I'll add those keyword keywords for future users and I have put in a lot of like function names from other programming languages so that you can search for those and find the APL equivalent.
01:09:18 [JH]
I will say I feel like either i'm not smart enough to use aplcart.info or I haven't got the right tutorial yet. I I went there, I've been there a few times and there's this like whole lot of like impressive looking stuff and I just I i don't know what to do with it, and then I sometimes click things and it sends me over to this TIO.run. It tells me like real time 0.02 seconds code like i find it, you know, a little, i'm not a little I I have not yet, I don't yet know how to use it and and so you know my, I guess, given hearing you guys say this is a really useful tool that a lot of people put a lot of time into, I should obviously invest time learning how to use it and maybe after doing that i should explain to people how to use it.
01:10:07 [AB]
I do have a video on it and there's also a little question mark icon one can click on and get to the instructions.
01:10:13 [JH]
I have tried the question mark icon uhm, as well as I say it might just you know, I, I think this often happens with APL stuff. I often hit things and I feel like maybe I'm not smart enough to understand this.
01:10:29 [CH]
Clearly don't think that's if if you're...
01:10:31 [JH]
Yeah, I think.
01:10:32 [BT]
Well, I don't think that's the problem.
01:10:34 [AB]
We humbly disagree, but.
01:10:37 [CH]
I do recall you saying a few minutes ago that you managed to teach your, you know, a four year old daughter like 12th grade or age 12 algebra.
01:10:45 [JH]
I know, I didn't, I just gave her the app right? It's like it's it i've heard other parents are giving it to their kids. They all seem to handle it, it's it's just this fun game where you hatch dragon eggs by like dragging things around on the iPad screen and it just it so happens that the things you're doing with Dragons eggs are the rules of algebra and after a while it starts to switch out some of the like monsters with symbols like x and y, you know, and it does it gradually gradually and at the end it's like oh now you're doing algebra, so I can't get any credit for that. That's some very, very clever people are very cool things.
01:11:22 [BT]
It it really is an amazing program, I I homeschooled my son as well and we used that for algebra.
01:11:27 [JH]
Great yeah.
01:11:28 [BT]
He was a bit more age appropriate, but it's I I've i looked at that and said that that really is well put together, it's it's an amazing program, yeah?
01:11:39 [JH]
Maybe there'll be a dragon box APL one day.
01:11:42 [CH]
Hey, it's not a bad idea, yeah.
01:11:45 [BT]
Not a bad idea at all. I was going to say when you're teaching somebody one of the big challenges when you're sort of trying to get a language across to a general audience is who is the audience. Because as you say, if you're if you're dealing with kids or people who haven't been exposed to programming before, that's a very different audience than somebody might have been exposed to some other type of programming. Functional programming is a bit closer, but if you're procedural program or imperative programmer it's going to be a stretch to try and bend your mind in the different ways that you know APL or J or BQN expect you to think about things.
01:12:21 [JH]
Yeah, I think the huge rise of functional programming is very helpful for coming to array programming. You know both in JavaScript and in Python it's, you know, I think most people are doing stuff, particularly in the machine learning and deep learning world or doing a lot of functional stuff off. That's the only way you can do things, particularly in deep learning. So I think, yeah, I think that does help a lot. Like like Conor said, like you've probably come across, you know, map and reduce and filter and and certainly in in in Python, you'll have done list comprehensions and dictionary comprehensions. And a lot of people have done SQL, so it's yeah, I think a lot of people come into it with some relevant analogies. If we can help connect it for them...
01:13:12 [CH]
Yeah, one of the things that you know this really is reinforcing my idea that or it's not my idea. I think it's just an idea that multiple people have had, but that the tool doesn't just yet, because 'cause we'll link to some documentation that I use frequently when I'm going sometimes between APL and J on the BQN website, they have BQN to Dyalog APL dictionaries and BQN into J dictionary. So sometimes I like if I'm trying to convert between the two, the BQN docs are so good I'll just use BQN as like an IR to go back and forth, but I've mentioned on previous podcasts that really what would be amazing and it it would only work to a certain extent. Is something like a a multidirectional array language transpiler and adding NumPy to that list would probably be, you know, a huge, I don't know what the word for it is, but it's beneficial for the array community. If you can type in some NumPy expression, you know? Like I said, it's only going to work to an extent, but for simple, you know, rank one vectors or arrays that you're just reversing and summing and doing simple, you know reduction and scan operations. You could translate that pretty easily into aPL, J, and BQN, and it's I think that would make it so much easier for people to understand aKA the hieroglyphics or the Greek or the Chinese or whatever metaphor you want to use, because, yeah, it is definitely challenging at times to get to a certain point where you have enough info to keep the snowball rolling, if you will, and it's very easy to hit a wall early on.
01:14:43 [AB]
So that's a project I've been thinking about, is basically rewrite NumPy in APL. Yeah, it doesn't seem like a whole lot of work. Uhm, just take all those names that are available in NumPy and just define them as APL functions and people can explore that by opening them up and and seeing how they're defined.
01:15:06 [CH]
Oh so not, not actually, you're saying like a it wouldn't be a new thing. You're just saying like rename the symbols what they're known as in NumPy, so that you'd still be in a like an APL.
01:15:17 [AB]
Yeah, I mean you could use it as a library, but I was thinking of it more as an interactive exploring type thing where you open up this library and then you write the name of some NumPy thing, functionality, and open it up in the editor and see well, how is this defined in APL? And then you could use it, obviously, since it's defined. Then you could slowly, you could use these library functions and then as you get better at APL, you can start actually writing out the raw APL instead of using these covers for it.
01:15:54 [CH]
Jeremy, that's an interesting, do you think that 'cause you've mentioned about sort of the notation versus the programming language and where do you think the like in your dream scenario, are you actually coding in sort of an Iversonian-like notation or is it at the end of the day does it still look like NumPy, but it's just all of the expressive expressivity and power that you have in the language, like APL is brought to and combined with what numPy sort of currently looks like.
01:16:27 [JH]
I mean, well it would be a bit of a combination, Conor, in that like you know my classes and my type dispatch and my packaging, and you know, although you know my function definition, whatever, that's that's Python, UM, but you know, everywhere I can use plus and times and divide and whatever, i could also use any, uh, any APL glyph and so it be, you know, basically a an embedded DSL for kind of high dimensional notation. It would work automatically on NumPy arrays and tensorflow tensors and Pytorch tensors I mean, one thing that's interesting is to a large degree, aPL and pytorch and friends had actually arrived at a similar place. With the same, you know, grandparents, which is Iverson actually said his inspiration for some of the APL ideas with tensor analysis and a lot of the folks as you can gather from the fact of them, Pytorch, we don't call them arrays, we call them tensors. A lot of the folks working on deep learning their inspiration was also from tensor analysis. So it comes from physics, right? And so I would say you know a lot more folks have worked on Pytorch, we're familiar with tensor analysis in physics, then, we're familiar with APL and so and then, of course, there's been other locations but like explicitly based on Einstein notation, there's a thing called einops which like takes, it's a very interesting kind of approach of taking Einstein notation much further. And like Einstein notation, if you think about it, is the kind of the loop free programming of math, right? The equivalent of loops in math is indices, and Einstein notation does away with indices and so that's why stuff like einops is incredibly powerful, because you can write, you know, an expression in in einops with no indices and no loops, and it's all implicit reductions and implicit loops, I guess. Yeah, my ideal thing would be, we wouldn't have to use einops, we can use APL, you know, and it wouldn't be embedded in a string, it would they would actually be operators. Yeah, that that's what it is, there be operators in the language that Python operators would not just be plus times minus slash that would be, all the aPL glyphs would be Python operators, and they would work on all Python data types, including all the different tensor and array data types.
01:19:21 [CH]
Interesting, yeah, so it sounds like you're describing a kind of hybrid hybrid language that yeah,
01:19:25 [JH]
yeah, JavaScript too. I would love the whole DSL to be in JavaScript as well. You know that would be great and I feel like I saw that somewhere. I feel like I saw somebody actually do an ECMA script, uh, you know RFC, with an implementation of APL.
01:19:41 [AB]
It was an April fools joke.
01:19:41 [JH]
Yeah, but it actually worked. It's not like there's there's actually an implementation.
01:19:48 [AB]
Like I don't think they had the implementation. There was just very, very well specc'ed, it could actually work, kind of thing, no?
01:19:55 [JH]
I definitely I meant the I read the code, I was i don't know how complete it was, but there was definitely some code there. I can't find it again or if you know where it is.
01:20:03 [AB]
There's there's some JavaScript implementation of APL, but Nick Nikolov, but my problem with it, it's not tightly enough connected with underlying JavaScript. It just allows you to...
01:20:16 [JH]
And it shouldn't be an April fools joke, should it?
01:20:19 [AB]
No it shouldn't.
01:20:20 [JH]
It's like Gmail was an April fools trick, right? Gmail came out on April the 1st and totally destroyed my plans for Fastmail because it was an April Fools joke that was real. And and Flask, you know the flask library I think was originally an April Fools joke of like you know we basically saying we shouldn't be using frameworks 'cause I've created a framework that's so stupidly small, but it shouldn't be a framework. And now that's the most popular web framework in Python. So yeah, maybe this should be an April fools joke that becomes real.
01:20:49 [CH]
How close this is, maybe an odd question, but because from what I know about Julia, you can define your own, uhm Unicode operators and I I did try at one point to create a small, uh, composition of two different symbols, you know, square root and reverse or something, and it ended up not working and asking me for parentheses. But do you think Julia could evolve to be that kind of hybrid language that...
01:21:17 [JH]
Yeah, maybe you know, maybe I'm actually doing a keynote at JuliaCon in a couple of weeks, so maybe I should raise that.
01:21:27 [CH]
Just at the Q&A section say "Any questions? But first I've got one for the community at large. Here's what I'd like and...".
01:21:34 [JH]
I think my, you know, how tall is going to be kind of like what julia needs to be, you know, to move to the next level, uh, actually I'm not sure I can demand that a complete APL implementation is that thing, but I could certainly put it out there as something to consider.
01:21:47 [AB]
Well, it always bothers me though that if you touch it to extend those languages like this or you could do some kind of pre compiler for it uhm, then their order of execution ends up messing up APL. I think the API very much depends on having a strict 1 directional order of functions, otherwise it's hopeless to keep track of its precedences.
01:22:12 [JH]
Yeah, but you'd have to like that's that is a big challenge currently. The DSL inside Python, which is the basic mathematical operations, do have the BODMAS or PEMDAS order operations, so there would need to be some way so in in Python that wouldn't be too hard actually, because in Python you can opt into different, it's kind of passing things by adding a from dunder futures import BLAS. You could have a dot from dunder futures import aPL precedence and then from then on everything in your file is going to use right to left precedence.
01:22:52 [AB]
That's really interesting and cool. I didn't know that.
01:22:56 [CH]
I've been spending a lot of time thinking about function precedence and just the differences in different languages, and I'm i'm not sure if any other languages have this, but something that I find very curious about BQN and APL is that they have functions basically that have higher precedence than other functions. So operators in APL and conjunctions and adverbs, they have higher precedence than your regular functions that apply to arrays you know I'm simplifying a tiny bit, but this idea that like in Haskell function application always has the highest precedence. You can never get anything that has a higher function precedence in that, and it always having stumbled into the array world now, it seems like a very powerful thing that these combinator like functions don't have just by default the higher precedence, because if you have a fold or a scan or a map, you're always combining that with a some kind of binary operation or unary operation to create another function that you're then going to eventually apply to something and...
01:23:58 [JH]
But the basic like right to left, you know, putting aside the the the higher order functions or operators, as they're known in APL, the basic right to left path I mean again, for teaching and for my own brain. Gosh, that's so much nicer than like in C++ oh my God, they're not bad operator precedence there's no way I can ever remember that, and there's a good chance when I'm reading somebody else's code but you know they haven't used parentheses 'cause they didn't really need them and that I have no idea where they have to go. And then I have to go and look it up and you know, it's another of these things that with the kids, so I'm like, OK, you remember that stuff we spent ages on about like you know first you do exponents and then you do times and it's like OK, you don't have to do any of that in APL. You just go right to left and they're just like, oh, that's so much better.
01:24:51 [CH]
What is your, i this literally came up at work like a month ago where i was giving this mini APL like we had 10 minutes at the end of the meeting and then I just made this offhand remark that of course, like the evaluation order in APL is a much simpler model than what we learned in school, and like I upset like there was, I don't know 20 people in the meeting, and it was the most controversial thing i had said and like and I I think i almost had like an out of body experience because I thought I was saying something that was like objectively just truth.
01:25:20 [JH]
Yeah, well you were.
01:25:21 [CH]
And then I was like wait a second what I'm clearly missing like is there...
01:25:25 [JH]
No, I mean...
01:25:25 [CH]
Am I doing something wrong? Like how do you communicate?
01:25:28 [JH]
And most adults are incapable of like new ideas. It's just.
01:25:34 [CH]
That's what I should have said in the meeting.
01:25:37 [JH]
What I mean, this is a reason that I another reason I like doing things like APL study groups 'cause it's a way of like self selecting that small group of humanity who's actually interested in trying new things, despite the fact that they're grown ups and then try to surround myself with those people in my life.
01:25:52 [AB]
But isn't it sad then? I mean, what has happened to those grown ups? Like when you mentioned teaching these people and trying to like map their existing knowledge onto APL things. What does mean to box and so on? I find that to children and non programmers, uh, expanding the array model and and how the functions are applied and so on is almost trivial, meets no resistance at all, and it's all those adults that have either learned their their PEMDAS or BODMAS or whatever the the rules are and and all the computer science people that know their precedence tables and their lists of lists and so on. Those are the ones that are really, really struggling. It's not just resisting, they're clearly struggling, they're really trying and and it's a lot of effort.
01:26:35 [JH]
Well there there is actually, I mean that is a known thing in educational research. So yeah, I mean so I I've been months earlier this year and late last year, reading every paper I could about, you know, education, because I thought if I'm going to be home schooling then I should try to know what I'm doing. And yeah, what you're describing is is absolutely a thing which is that that you know the research shows that trying you know when you've got a you know an existing idea, which is a an incorrect understanding of something and you're trying to replace it with a correct understanding that is much harder than learning the correct version directly, so which is obviously a challenge when you think about analogies, an analogy has to be good enough to lead directly to the to the correct version, but I think you know the important thing is to find the people who are who have the curiosity and tenacity to be prepared to go over that hurdle even though it's difficult, you know, because yeah, it is like that's just that's just how human brains are. So so be it.
01:27:46 [BT]
You know, yeah, unlearning is really hard work actually. And if you think about it, it probably should be because you spend a lot of time and energy to put some kind of a pattern into your brain. You don't want to have that evaporate very quickly.
01:27:59 [JH]
You're right and our you know myelination occurs around what like ages 8 to 12 or something. So like our brains are literally trying to stop us from having to learn new things because our brains think that they've got stuff sorted out at that point, and so they should focus on keeping long term memories around. So yeah, it does become harder, but you know a little bit it's still totally doable.
01:28:24 [AB]
The solution is obvious. Teach APL on primary school.
01:28:27 [JH]
That's what I'm doing. Kind of.
01:28:30 [CH]
What was the word you mentioned a mile mulation?
01:28:32 [JH]
Myelination.
01:28:39 [CH]
Interesting, I had not heard that word before.
01:28:40 [JH]
It's a physical coating that, I can't remember, goes on the dendrites?
01:28:45 [BT]
I think it's on the axons, isn't it?
01:28:47 [JH]
Axons, that sounds right.
01:28:47 [BT]
It, it's yeah, the the the transmission wire, yeah? Yeah they get.
01:28:51 [AB]
These fat layers or or cholesterol layers?
01:28:55 [CH]
I never took any biology courses in my education so clearly I've I've missed out on that aspect.
01:29:00 [BT]
You you myelinate you myelinated anyway.
01:29:06 [AB]
Isn't that an APL function?
01:29:10 [BT]
You also mentioned the word tenacity, Jeremy and and and I was watching an interview you did with sanyam Bhutani and you were talking about 'cause it sounds like he was you you spotted at an early point in his working with Kaggle that he was something probably different. And then you said, was the tenacity to to keep working at something. I think that's a really important part about educating people that they shouldn't necessarily expect
01:29:38 [JH]
(Hell yeah.)
01:29:40 [BT]
learning something new to be easy, yeah, but you can do it.
01:29:44 [JH]
Oh yeah, I I mean. I really noticed that when I was started learning Chinese like I I went to, uh you know just some local class in in, in Melbourne and everybody was very, very enthusiastic, you know, and everybody was going to learn Chinese. And now we all talked about the things we were going to do. And yeah, each week there'd be fewer and fewer people there and you know, I kind of tried to keep in touch with them, but after a year every single other person had given up and I was the only one still doing it, you know. So then after a couple of years, people would be like wow, you're so smart. You learn Chinese. This is like, no man like during those first few weeks I I was pretty sure I was learning more slowly than the other students, but everybody else stopped doing it, so of course they didn't learn Chinese. And I don't know what the trick is, because yeah, it's the same thing with you know, like in fast AI courses, they're really designed to keep people interested and get people doing fun stuff from from day one. And you know, still, I'd say most people drop out and the ones that don't, i would say most of them end up becoming like actual world class practitioners and they, you know, build new products and startups and whatever else and people will be like oh, I wish I knew neural nets and deep learning, it's like OK here's the course, just just do it and don't give up. But yeah, I don't know, tenacity, it's not a very common virtue, I think for some reason.
01:31:25 [BT]
It's something I've heard, i think, it's Jo Boaler at Stanford talk about the growth mindset, and I think that is something that for whatever reason, some people tend to, and maybe it's myelination at at those ages you start to get that mindset where you're not so concerned about having something happen that's easy to do well, but just the fact that if you keep working at it, you will get it and not everybody I guess is maybe put in the situations that they they get that feedback that tells you if I keep trying this I'll get it. Yeah they if it's not easy they stop.
01:31:59 [JH]
Yeah, I mean that that area of growth mindset's a very controversial idea in education. Uhm, specifically the question of can you can you modify it? And I think it's certainly pretty well established at this point that the kind of stuff that schools have tended to do, you just put posters up around the place saying like, you know, make things a learning opportunity or don't give up, like they do nothing at all. Uhm, you know with my daughter we do all kinds of stuff around this so, so we've actually invented a whole family of clams. And as you can imagine, clams don't have a growth mindset. They tend to sit on the bottom of the ocean, not moving and so the family of clams that we invented that we live with, you know, or you know always at every point that we're going to have to like, learn something new or try something you always start screaming and don't want to have anything to do with it and you know, so we actually have Claire telling the clams how it's going to be OK. And you know, it's actually a good thing to learn new things, and so we're trying stuff like that to try to like, have have imaginary creatures that don't have a growth mindset and for her to realize how silly that is, which is fun.
01:33:20 [BT]
But even in the in the things that you were talking about in terms of the the metamathematics, you didn't say oh, the successor, this is what plus is, you said...
01:33:29 [JH]
Yeah for sure.
01:33:30 [BT]
How do you? How do you, how would you use this? How would you, they start to have to put it together themselves, which to me, that's the growth mindse, that if you, if you're creating now...
01:33:39 [JH]
But then like Ada... Gosh, you're getting to all the most controversial things in education here, bob, 'cause that's so that's the other big one is discovery learning, so this idea of having kids explore and find, it's also controversial because it turns out that actually the best way to have people understand something is to give them a good explanation, uhm, so it is important like that you combine this like OK, how would you do this with them? Like OK, let me just tell you what, you know why this is, it's easier for home schooling with two kids, because I can make sure their exploration is short and correct. You know, if you spend a whole class, you know, 50 minutes doing hardly the wrong thing, then you end up with these really incorrect understandings, which you then have to kind of deprogram. So yeah, yeah, education is hard, you know, and and I think a lot of people look for these simple shortcuts and they don't really exist, so you actually have to, yeah, have good explanations and good problem solving methods and uhm, yeah, all this stuff. That's a really interesting area though and the tools...
01:35:02 [BT]
And the tools and the notations become part of that.
01:35:05 [JH]
Yeah, and you know, notation, i mean, yeah, so I I do a live coding, you know, video'd thing every day with a bunch of folks, and in the most recent on, we we started talking about APL, why we're going to be doing APL this week instead, and I gave, you know, somebody actually said like, "Oh my God, it's going to be like regexes" and you know, I kind of said like, OK, so regexes are are a notation for doing stuff, and we spent an hour solving the problem with regexes and and, oh my God, it's such a powerful tool for this problem, and you know, and by the end of it, they're all like, OK, we want to like deeply study regexes and obviously that's a much less flexible and powerful tool, notation, than APL, but you know, we kind of talked about how once you start understanding these notations, you can build things on top of them, and then you kind of create these abstractions. And that's yeah, notation is how you know, deep human thought kind of progresses, right, in a lot of ways. So you know, it's like I, I actually spoke to a math professor friend a couple of months ago about, you know, my renewed interest in APL. And he he was like and I kind of sent him something. I can't remember what it was. Maybe doing the golden ratio or something little snippet and he was just like, yeah, so like that looks like Greek to me, i don't understand that it's like, like dude, you're a math professor, you know, like if if I said somebody who isn't in math like a page of your, you know, research, what are they going to say? And it was interesting, I said, like, that's better. There are ideas in here. Iverson brackets, for example, have you ever heard of Iverson brackets? He's like, well, of course I've heard of it, like you know, it's a fundamental tool in math, was like, well, you know that's one thing that you guys have stolen from APL. You know that's a powerful thing, right? It's like, fantastic, I'd never want to do without Iverson brackets. So I kind of tried to say like, OK, well, imagine like every other glyph that you don't understand has some rich thing like Iverson brackets you could now learn about. OK, maybe I should give it a go. I'm not sure he has, and I think that's a that's a good example for mathematicians is to show like his one thing, at least, that found its way from APL, that maybe gives you a sense that for a mathematician, that there might be something in here.
01:37:48 [CH]
On that note, 'cause I know we are potentially well, we've gone way over, but this has been awesome. But but a question I think that might be a good question to end on is do you have any advice for folks that want to learn something, whether it's Chinese or an array language, or to get through your fast AI course and is there, 'cause i think you know, like you said, you like to self select for folks that are the curious types and that want to learn new things and new ways to solve things, but like is there any way other than just being tenacious, too. Like be tenacious. Is there tips to you know, approaching something with some angle 'cause I think a lot of the folks maybe listening to this don't have that issue, but I definitely know a ton of people that are the other kind of folks that you know they'll join a study group, but then three weeks and they, you know they kind of lose interest. Or or they decide it's too much work or too difficult as an educator and you know, it, seems like you operate in this space. Do you have advice to tell folks you know?
01:38:59 [JH]
I mean, so much conor, I actually kind of embedded in my courses a lot. I I can give you some quick summaries, but what I will say is my friend Radek Osmulski, who's been taking my courses for like 4 years, has taken everything I've said and his experience of those things and turned it into a book. So if you, Radek Osmulski's book is called Meta Learning, powerful mental models for deep learning. This is learning, as in learning deeply. Uhm, so yeah, check out his book to get the full answer. I mean, there's just, gosh, there's a lot of things you can do to make learning easier, you know, you know, and and a key thing I do in my courses as I always teach top down. So like often people with like let's take deep learning and neural networks, they'll be like, OK. Well first I'm gonna have to learn linear algebra and calculus and blah blah blah and, you know four or five years later, they still haven't actually trained a neural network. Our approach in our courses, in Lesson 1, the very first thing you do in the 1st 15 minutes is you train a neural network and it's just more like, uh, how we learn baseball? Or how we learn music? You know, like you say, like, OK, well, let's play baseball, comes, you stand there, you stand there, I throw this to you, you're going to hit it, you're going to run, you know you don't start by learning, you know, the parabolic trajectory of a ball or the you know, history of the game or whatever, you just start playing. So that's you know you want to be playing. And if you're doing stuff from the start, that's fun and interesting and useful. Then top down doesn't mean it's shallow, you can then work from there to like, then understand like what's each line of code doing and then how is it doing it and then why is it doing it and then what happens if we do it a different way and until eventually with with our fast AI program you actually end up rewriting your own neural network library from scratch, which means you have to very deeply understand every single part of it, and then we start reading research papers and then we start learning about how to implement those research papers in the library we just wrote. So yeah, I'd say go top down, make it fun, make it applied for things like APL or Chinese where there's just stuff you have to remember, use Anki, use repetitive spaced learning, you know, that's been around... Ebbinghaus came up with that. I don't know what, 200, 250 years ago. It it it works, you know you you, everybody if you tell them something, we'll forget it in a week's time, everybody you know, and so you shouldn't expect to read something and remember it because you're human and humans don't do that. So repetitive spaced learning will help you quiz you on that thing tomorrow and then in four days time and then in 14 days time and then in three weeks time. And if you ever forget it it will reset that schedule. And it'll make sure it's impossible to forget it, you know? So it's it's depressing to study things that then disappear. And so it's important to recognize that unless you use Anki or Supermemo or something like that, unless you use it everyday, it will, it will disappear, but if you do use repetitive spaced learning it, it's guaranteed not to. And I told this to my daughter a couple of years ago, I said I, you know, what if I told you there was a way you can guarantee to never ever forget something you want to know. Just like that's impossible, this is like some kind of magic. It's like, no, it's not magic, and like I, I I sat down and I drew out the Ebbinghaus forgetting curves and explained how it works, and and I explained how you know if you get quizzed on it in these schedules that it flattens out and she was just like, so so what do you think? "I want to use that", so she's been using Anki ever since so so maybe those are just two, let's just start with those two, yeah, so go top down and and use Anki. I think could make your learning process much more fulfilling because you'll be doing stuff with what you're learning and you'll be remembering it.
01:43:33 [CH]
Well that is awesome. And yeah, definitely we'll leave links to not just Anki, the book meta learning, but everything that we've discussed throughout this conversation because I think there's a ton of really, really awesome advice, and obviously to your fascinating AI course in the library. And we'll also link to, I know you've been on, like we mentioned before, a ton of other podcasts and talks. So if you'd like to hear more from Jeremy, there's a ton of resources online. Hopefully it sounds like you're going to be, you know, building some learning materials over the next however many months or years. And so in the future, if you'd love to come back and update us on on your journey with the array languages, that would be, yeah, super fun for us, 'cause I've, I've thoroughly enjoyed this conversation and thank you so much for waking up early all on the other way side of the world from us, at least in Australia.
01:44:15 [JH]
Thanks Conor. Thanks for having me.
01:44:23 [CH]
Yeah I guess with that we'll say happy array programming.
01:44:27 [All]
Happy array programming.