Episode 86 Transcript — The Array Cast

Transcript

Transcript prepared by

Sanjay Cherian, Igor Kim, and Bob Therriault

00:00:00 [Paul Teetor]

To me, what comes to mind when you say something like that is, I had one client that was another money center bank, and when I arrived, they had a code base of 50,000 lines of R, which if you know the language, that's a lot. And they had never learned Python. So they had written all this stuff, like report generators and web servers and stuff that really should be done in Python. And somebody, I don't know, in the group said, "Let's try using Python." And within months, a lot of that R code just evaporated and got put in Python, where it should have been in the first place. So yeah, I mean, you can do a lot of these things with it, but the right tool for the right job is the whole story of this podcast, right? Is trying to find, we got this job, what's the right tool for doing that?

00:00:38 [MUSIC]

00:00:56 [Conor Hoekstra]

Welcome to episode 86 of Arraycast. I'm your host, Conor. And today with us, we have our four panelists and a special guest who we will get to introducing in a couple minutes. But first, we're going to go around and do brief introductions. We'll start with Bob, go to Stephen, then to Adám, and finish with Marshall.

00:01:11 [Bob Therriault]

I'm Bob Theriault, and I am a J enthusiast, and this will be a very interesting episode.

00:01:16 [Stephen Taylor]

I'm Stephen Taylor. I do APL and q and occasionally get enthusiastic about them.

00:01:22 [Adám Brudzewsky]

I'm Adám Brudzewsky, and I mostly do APL, but I can get enthusiastic about other programming languages, too. Array languages.

00:01:30 [Marshal Lochbaum]

I'm Marshall Lochbaum. I've been through J and worked at Dyalog in the past. Now I work on BQN and Singeli.

00:01:37 [CH]

And as mentioned before, my name is Conor, host of Arraycast, and perennially excited, enthusiastic about all of the array languages. And with that, I think we have two short announcements. We'll throw it over to Stephen, and then I'll finish with the last one.

00:01:49 [ST]

As everybody knows, functional programming has its roots deeply in APL, and the Lambda World Conference is reflecting that and exposing those roots this year with several array language speakers. [01] I'm one of them. You can come out and hang with us in Cadiz, which the website charmingly describes as the little silver cup, the mermaid of the ocean, and one of the most beautiful cities in Spain. That's 2 to 4 October this year. Cadiz, Spain, Lambda World.

00:02:28 [CH]

And I should tack on to that that also I am giving a workshop there that's, I don't think I've announced on this podcast yet. It's entitled Tacit Programming. I got to remember the title of the talk. I think it's a workshop. It's Tacit Programming in BQN, Uiua, Kap, and APL. So I believe it's free if you're registered to go to the conference. I think it might also be free if you're not going to the conference. I have to work that out with the organizer. Anyways, link will be in the description. And my announcement on the topic of Tacit is that episode four of Tacit Talk is out. It was recorded or live streamed on Friday a couple days ago with Adám. It went 30 minutes over the 90-minute target mark, so it's two hours of Tacit Programming. We dive into many technical minutia. I learned a ton, especially, I also learned, I wrote today my first BQN, what do you call it? FGA fork using the constant modifier, which I did not know that was a pattern. If you don't understand what I just said, I don't know, maybe we'll talk about it in Tacit number seven. But the constant modifier make.

00:03:33 [ML]

It's still count as FGH though. It'sjust that H is a-

00:03:36 [CH]

Oh yes, technically. Well, I'm going to call it the FGA because it is-

00:03:39 [ML]

Well, you could say it's an implementation of an FGA fork. I don't know that BQN really even uses those letters, so I don't know if there's-

00:03:47 [CH]

That's true.

00:03:48 [ML]

An official terminology on that.

00:03:49 [CH]

We're borrowing the terminology from Dyalog APL and applying it haphazardly to BQN. Anyways, you can do it, figure it out, go to your local online editor. With that all out of the way, today I'm very excited to introduce our guest, Paul Teetor, who is the co-author, I believe, of "The R Cookbook," which is a book published by O'Reilly, one of the premier publishers, along with a couple of others. And I think you've also published, it was 25 R recipes or something like that. I can't remember if I saw a different publication, but Paul has been working in, I believe, the finance industry for, I think, decades now and has held multiple different positions. I think currently he is a senior quantitative analyst, but has worked as a quantitative developer, so gonna be super interested to hear about his work experience and his path towards the R language, maybe how they compare to array languages, if he has any experience with the array languages. And we've had several different folks from the finance industry before talking about different languages and how they're used there, so super excited to dive in, 'cause we have not had anyone in the 85 episodes so far that is an R expert, so interested. We have had the Julia folks on, but now we're branching out into the R world as well. So I'll throw it over to you, Paul. Maybe you can fill out your biography, take us back to however far back you wanna go, whether it's when you discovered computers, when you got into programming, and we'll go from there.

00:05:21 [Paul Teetor]

Okay, well, thank you. I appreciate the introduction. I'll tell you a little bit about myself, but actually I was thinking I would start by talking about the genesis of R and where I kinda slipped in after it was trying to take off. My background was that I came out of college all rearing to go as a systems programmer, because I thought those were the demigods that built computers and built operating systems and compilers, and everyone else was just a muggle compared to those guys. And after a few years of building those things, I realized the people to whom we sold the software were making a lot more money than we were. So I got interested in application programming. And then I realized the people who were using my application programs were making a lot more money than I was. So I got interested in trading. Yeah, so I now, and then after I got interested in trading, I realized with my quant style of thinking, the only thing that would work for me is statistics. So I got a degree in statistics. So now I work at the intersection of software engineering, my original training, and finance, my next career, and statistics, my most recent training. But sort of more interesting is the history of R, because we kinda grew up together. I'm just younger, that's all. No, no, wait, wait, wait. It's younger, that's right. That's right, it's younger. So have you heard of the language S?

00:06:52 [CH]

I actually have. Everyone nodded their heads, so everyone's heard of the language S.

00:06:56 [ML]

We've heard of it because it's more related to APL.

00:06:59 [ML]

Yes.

00:07:01 [ML]

It's not because it's known in the programming world.

00:07:02 [PT]

Right, right. Yeah, if you look at the Wikipedia page, they say it was influenced by C, APL, PPL, and Fortran. I never even heard of PPL. Anybody know what that is?

00:07:13 [CH]

I have not heard of it, no.

00:07:15 [PT]

No. Yeah, okay, some mystery thing. Anyway, it was invented in 1976 by some very high-powered statisticians, actually. They were all working at Bell Labs, AT&T Bell Labs, which is an institution for which I had tremendous respect. These guys were like John Chambers, Trevor Hastie, William Cleveland. These are all high-powered guys. And there was this ethos of Unix and C at the time, 1976, very exciting, that stuff was coming out. And so they said, "The tools we have stink. Let's build our own tool." So they gave it a C-like syntax. The problem they were facing was that they had these very powerful libraries like LAPAC or BLAS, and they had Fortran. So to do even a basic analysis and a plot, the first thing they had to do was write a Fortran program that would read the data, call the shared library, save the data, or write the plot file. And this was pretty tedious. They figured out we could do a scripting language, kind of like the shell, like the Unix shell, that would cover a lot of this stuff. And I think actually one of the quirks of R is kind of humorously explained by the fact that the interpreter they wrote was so simple, it did not actually have variables. It just had files. So if you wrote x gets y plus one, it would look for a file called y, read the contents, add one to it, and save it in another file called x, which explains why variable names in R can have a dot right in the middle. So you could have like x.1, because that's a perfectly valid Unix file name. And if you come from another programming language, you're like, "Wait, wait, wait. x.y, that should be a structure whose member is y?" And you're like, "No, no, it's just a variable name." And it takes a little getting used to when you're new to the language. Anyway, Word got out about this thing. Of course, they published papers. They were kind of academic type and people were very interested, but AT&T had a miserable track record of monetizing their developments. I mean, horrible. So they licensed it, thank God, to a company called Tibco, which rebranded it as S+ and put a big fat licensing fee on it. I remember working at hedge funds back then where it was really a status thing, like we licensed S+. And you think, "Oh, these guys have some money." And I was at a small hedge fund. And one day I said, "This job, this particular task would be so much better if we had S+." And they said, "Oh yeah, we've got it." I was like, "You can afford S+?" And they said, "No, no, no. The last consultant in here left a pirated copy behind." So that was typically the way we got access to this kind of tool. Well, at the same time, computing power for statistical work was growing tremendously. If you know things like Monte Carlo and Bayesian inference [02] and the bootstrap, they require a lot of computing power and people were tapping into it and they needed tools to do that. And academics, for instance, didn't want to lay out a big licensing fee. Around 1993, as S was ascending, but so was all this computing power and the whole ethos of free software, open source software, a couple of guys in University of Auckland, shout out to New Zealand, a couple of guys, Ross Ihaka and Robert Gentleman. I will note that their names both begin with R. They decided they were going to come up with a better S and it was going to be free and open source. And so they started to do this, started to write this and they advertised it. And the immediate feedback they got was, "It's not compatible with S." Oh, but this is better S. People are like, "So what? We got all this software written in S." And so they had to backpedal a little bit and make it S compatible. But one of the brilliant things they did was, under the hood, it's actually got the semantics of scheme, which is a dialect of Lisp. And that was brilliant. You'll see, if we talk about the history of R, you'll see that's why, a big reason why it's survived to this day. So they released it and that's where I step in, because having decided I needed to get a degree in statistics, I went back to grad school and the people at grad school told me, "Oh, okay. You were going to teach us statistics and you have to use SAS, S-A-S, SAS." And I'm like, "This is the world's worst programming language." I'm not kidding. I hope none of you have ever had to use it. It's horrible. And I said, "Can I use R instead?" And they said, "No." And I said, "Well, wait a minute. If I can do the same computation and get the same results, what do you care what tool I use?" They said, "Because if you have trouble with R, we can't help you." And I said, "No problem. I will teach myself R. I'll solve all the problems. I will never bother you." And all my time there, I never did. But unfortunately, that dug me into a hole because it turns out the documentation for R was terrible, horrible. I spent hours digging through, trying to figure out how to solve my grad school assignments. So what happened was, school was going so quickly, I would figure out something one week, and by the next week, I would forget what I learned. So we had a little house wiki, and I would just write down in the wiki what I learned that week, like, "Here's how you do regression," and then next week, "Here's how you do multiple variable regression," next week, "Here's how you do PCA," et cetera. And I slowly built up 100 recipes on our home wiki, which is not the kind of recipe you usually find on a home wiki. And then, unfortunately, the great financial crisis hit. I got laid off, and I suddenly had a lot of time and 100 recipes. So I approached O'Reilly and said, "Can we turn this into the R cookbook?" And they said, "Yes." And that was the next nine months of my life, was writing that. I'm proud to say the latest translation is into Russian. It's been translated to Korean, Chinese, Japanese, Polish, and now Russian. I have mixed feelings about the Russian translation. I just hope they're using it for good and not for evil.

00:13:20 [ML]

They forgot to translate the letter R, didn't they? I mean, it's a Latin R, yeah.

00:13:27 [PT]

I don't know. Is it valid? I don't know. But that's what they did. It turned out my timing was really good because with this explosion of computing power, the interest in computationally intensive statistics, the thirst for free software, and now this vacuum of documentation, the book kind of took off, and I no longer needed a resume. I would just walk into an interview and say, "By the way, the publisher wants you to have a copy of my book." And I would put it on the desk and they'd say, "When can you start?" So my consulting career took off, did really well. I became one of the co-organizers of the local R users group, ended up going to conferences and doing stuff like that. And all of this because I really wanted to do training. But here I am, having to kind of teach people how to do programming. So that's the genesis of all that. Yeah. So I jotted down what I thought were some of the key elements of R, what I thought made it great. And the big thing is that it really is focused on statistics and graphics. If you really want to do those two things, that's the language to use. I don't care what you say. It gets the job done. The second thing is it has this huge package ecostructure, over 10,000 packages available. Do anything, whatever you want. There's some package that will do it. Maybe not the way you want to, but it's open source, so tweak it and on your way. A huge factor was a company called RStudio released a product called RStudio just about the same year I published my book. And that was the first time it had a real IDE. The people there had a real vision of what data analysis could be with a tool like that. And it's excellent. If you've used VS Code and you've used RStudio, which I have, I say, yeah, RStudio is better. The problem is it just handles one language. So they're expanding it to handle multiple languages like Python at the moment. And of course, the fact it's free was huge. The academic community suddenly could get the tool that it wanted. The downside is the thing has this really annoying learning curve. It's just this amalgam of stuff since 1976. Because like you've talked on the podcast about languages that have a central core structure and have a consistent structure and there's somebody who's guiding it, like Iverson is guiding APL and Guido is guiding Python. Nobody was guiding R. It was just this free-for-all. And so, you'll find multiple styles, multiple packages do the same thing, a mishmash of semantics, weird stuff you can't really explain to a beginner. You have to just say to them, you just have to accept this is the way it is. Yeah, that's its biggest downfall. My favorite example is, yes, you can do object-oriented programming in R. In fact, there's six different ways to do it. Take your pick. And that, you know, that's ridiculous. Yeah. No language should be like that. Nonetheless, it has survived, as I mentioned, partly because under the hood, it's got this list semantics. So, people keep finding out ways to change the semantics of the language and adapt to more and more things as we go along. It's really like a chameleon. A really ugly chameleon, but a chameleon.

00:16:44 [CH]

So, I'm very curious because I, what year was it? 2010? When I was in my undergrad, I was, I actually studied actuarial science, which R was massive in the actuarial mathematics world. Specifically, I remember in several of my statistics and actuarial courses, there was, as you mentioned, a package specifically for the actuarial extended world of probability distribution functions. And so, they had every single PDF and CDF you could possibly want, like things you'd never heard of, like Weibull, et cetera. Everyone's heard of normal, et cetera. And so, anyways, there was a lot of R programming, but that was really the last time I touched R. And so, I'm not familiar, like, you know, similar to other languages, like, you know, C++ is, you know, on C++ 23 and they're working on C++ 26 and Java, I think the latest is Java 22. But back when I used it, it was Java 8. Are you able to give us like a, I imagine you're familiar as, you know, you're doing training in R. I think I saw that, you know, at one point, R added the pipeline operator. So, I know that they've been doing like language developments in R and R, probably the R that I know from 2010, 2011 is not the R that exists today. And I think also too, I've had people on Twitter say that if we ever do talk to someone from the R world, we should ask them about like the, I think it was the Per Functional Programming Library, PU, yeah, triple R. And that like, there's a kind of ecosystem of like functional programming inside of R. Anyway, so like, what's the, if you're able to tell us like the state of R, you know, is it R, you know, 10, R4? Like, I haven't, you know, stayed up to date with R. And so, I imagine that it's much different and it's evolved over the last, you know, decade and a half since I've touched it.

00:18:33 [PT]

I'll start with a blanket answer to your question, which is that because it's evolved through so many different hands, it supports almost every style of programming. Yeah, and you're right about the pipe operator. That's actually kind of a relatively recent addition and thank goodness for it. It just streamlines so much. I mean, talk about tacit programming. [03] That's, to me, that's the best example in R of doing it. We're up to R release 4.4, I believe. The major releases are kind of slow. They do a lot of the dot releases, do a new dot release every six months. And you're right about those packages and the distributions. It just triggered in me a memory. One of my biggest clients was a banking center here in Chicago that was trying to use SAS and they needed this one distribution for their risk analysis. And they found out that R had this distribution in it. So they wrote this SAS dropout program that would invoke the R interpreter, do the calculation and return the number back to SAS so that SAS could continue. And that's all they did with it. And then somebody started to notice, you know, we could actually do a lot with R, more than just calculate a number. And they hired me to translate all of their SAS code into R, which made their accounting department happy as they dropped the $50,000 a year licensing fee. But yeah, that's real typical. Is there some esoteric thing in R that you look at and say, I've got to have that. And that draws people in. As for programming styles, as I said, it seems to support everyone. I jotted them down as I was thinking about being on the show. Obviously, there's the imperative style it got from C and Fortran. There's an object-oriented style it got because, you know, C++ was growing up at the same time. And as I mentioned, there's six ways to do it. I use a style called functional programming style in that, which is a little more rare in R, but it's the style I use. It's got plenty of vector mathematics, matrix mathematics. It has the pipeline, so you can be fairly terse. It also has reactive programming, if you know what reactive programming is. So what I do is actually a combination of object-oriented, reactional, and functional all together in my programs. And they complement each other very well. I was inspired by Scala. If you've ever used Scala, you know it actually melds these multiple methodologies into one language. And the way they did that was great. So I just try to emulate that in my own work. Did I answer your question?

00:21:08 [CH]

I think so, yeah. And I'm not sure, are you familiar with the Perl library? Like, I don't know much about it. I think the extent that I've seen is like there's a little cheat sheet kind of thing that people will tweet and be like, because I think folks on Twitter know that I'm a big fan of outside of array languages, just like a declarative sort of functional style of programming. And apparently there is a big community within R. Like, I think there's even in other languages, like I saw recently, there's a functional Swift conference. So like Swift, the language out of Apple for iOS programming, is not necessarily a quote-unquote functional language, but there is a big community within the Swift community that focuses programming functionally with Swift, like not like functional programming style. Not, I guess you hope everything is functional when you're programming. And it's the same thing with Rust. I think one time I made a YouTube video of, you know, top end functional programming languages, and I included Rust. And then like half the comments were like, "Rust is not a functional language." And it's like, well, it's not, but also you can do a ton of stuff if you stay in the iterators or the iter tools land. It can feel very functional. And so, yeah, like, do you know what, so it sounds like you can do any paradigm. It's a multi-paradigm language. Is there a primary, you know, group of folks that, you know, make up the majority or is it kind of just everyone's got their different style? Like what is the idiomatic or is there no idiomatic? It's just get your, you know, results and numbers crunched. The way you do it is up to you.

00:22:40 [PT]

Yeah. Great question. So to go back and answer, yes, purr is the package of choice for functional programming. That's another great example of the evolution of the language. There's something called core R, which is developed by this committee and maintained. They do a fabulous job. And then there's the tens of thousands of packages around it. So the core actually has some basic functional programming stuff in it, like a reduce operator, a map operator, that kind of thing. But they're not as programmer-friendly as you would hope. They were kind of glued on. So somebody named Hadley Wickham, also from New Zealand, has led a charge to try to standardize the way things are done in R. He started out by standardizing the data. He called it the tidy data format. Data frames are a huge thing in R. That was a really big addition back in 1976, because that's what statisticians think about is tables of data like that. And he nominally invented this thing called tidy data. You would just call it a normal form if you're into SQL. And he's built a series of packages around that. There's one called R, excuse me, called PURRR, which does functional programming tools for you. I use it in almost all my programs. It's got your great, you know, the map and the reduce and the accumulate and all the operators you need to access data structures. It's great. And anyway, the universe that's grown up around that concept is called the tidyverse. And yeah, there's a lot of packages that work that way and interacting with new developments like the pipe operator. As for style, no. Everybody has their own style of writing R. If you're a fan of Hadley, because he's done all this great work, you say, "Well, I'm going to be just like Hadley and write that way." Yeah. But I'll tell you, when we start a project in R, the first thing is we try to set up code standards. And the next thing that happens is everyone ignores them. It's because it's such a mishmash. You know, like, "Well, but I like doing it this way." You know, "Well, you know, I like doing it that way." And then eventually you have to review somebody else's code and you start to realize, "Oh, we really need to like make our code readable to each other." And slowly, you'll can code less on a style. But in fact, there's a five-page Google style guide for R. Like I said, the first thing you do is everybody says, "Yeah, we want to use Google." And then they start taking out sections of it and say, "Yeah, but we won't do that."

00:25:03 [CH]

Interesting. Yeah. And I did pull up the Purrr.pdf cheatsheet that's on the website. And I do recall now; I don't remember the individual, because this was a number of years ago. I have this GitHub repository where I compare array languages and R is one of the ones I look at. And I think I was looking for four higher order functions being "reduce" (what most languages call "scan", but some call "accumulate", including R or at least in per). And then I think it's outer product and one other one. When I first looked into R, I was looking in the core world. And I think the way that I ended up finding both the reduce and the scan was that they were both called reduce [Paul laughs]. And whether it scanned or not, or kept the intermediate values was a boolean flag that you passed to the parameter list of reduce, which I kind of died a little bit inside. Because to me, those are two completely separate operations. And then I think I posted that on twitter and then someone said: "oh, yes, that is true. But hey, go and look at this because this is a much nicer way, especially if you're coming from the array languages, to think about this stuff."

00:26:20 [PT]

Right. You might find some other inconsistencies in the function calls like that. Like you said, people who did the core said: "oh, we need some functional programming tools. Let's add this one.". "Oh, okay; let's do that."

00:26:32 [CH]

I think, Bob, you were going to ask a question.

00:26:33 [BT]

Well and you led me right into it because you were looking at array languages and whether R is an array language. And of course, that's a question that's come up. In fact, we've been trying to get somebody to talk about R for a while. And thank you so much, Paul, for coming on because you're obviously a great choice. But I did talk to John Chambers and John Chambers told me: "yeah, I'm not going to come on the podcast". R is not an array language in his way of looking at it. And of course, as you say, there's all different windows of looking at it. But do you think most people would look at R as an array language or a language that works with arrays? What do you think?

00:27:09 [PT]

No way. In fact, when you first approached me, I was like: "you got the wrong guy; this is not an array language; you need to go over to that field over there." And honestly, your invitation to be on the show did make me stop and think and realize, actually, it is kind of an array language [chuckles]. It's just, it started out that way, but it's just grown into so much more. When you talked about the Matlab [04] episode on the show, it talked about: it just started out as some way to do cool array arithmetic. And then it slowly evolved into this full scripting language, which is, to me, ridiculous. It's like when people complain: "well, Excel is so hard to use". I'm like: "well, probably you're using it for the wrong thing; like you're trying to write a web server in Excel or something like that, you know?" Get it straight. This may seem like a diversion, but I think it's kind of interesting.

00:28:03 [BT]

We live on diversions. [everyone laughs]

00:28:05 [PT]

Okay [laughs]. If you're beginning with R, the first thing you learn is that everything is a vector. So, I can legitimately say, yeah, it's a vector language because there's no difference between a scalar and a vector and it does element by element operations very well. And I use that all the time. In fact, it's a key element of the language, especially if you're doing data frames. Then after a while, you start to realize: "no, wait, everything's actually a matrix because the dimensions are flexible, so, I can actually make everything be a matrix". And as you keep going, you start to think: "well, so data frames and matrices are the same thing, right?" And then you realize, "no, no, no, no, they're different". So, then you start to realize everything is either a data frame or a matrix. Then your next level of understanding is: "no, no, wait, wait, wait. Data frames are actually just lists of vectors. Therefore, everything's really either a list or a vector or a list of lists." And so, the next level, like I said, as you understand, everything's a vector or a list. Then you realize, "No, everything uses lazy evaluation," which is one of the semantics they picked up from Scheme. And you start to think, "No, no, everything's just an expression waiting to be evaluated". It's a promise, as they call it; a promise to be evaluated. So, then you start to realize: "okay, R is just lists for vectors with a friendlier syntax, helpful data structures, lots of C, C++, and Fortran under the hood, and everything's a vector or a list". [chuckles] And if you get to that level of understanding, you really understand the language and how to use it. I can't emphasize enough the importance of data frames. Back when they were invented, that was really a big thing because that, like I said, is how statisticians think about their work.

00:29:45 [BT]

And you were saying that data frames are not the same as matrices. What's the difference between the two?

00:29:50 [PT]

That's correct. So, a matrix has got to be homogeneous. It can be all integers, all reals, all strings, all functions, all lists, anything, but it has to be homogeneous. A data frame is just like an SQL. Each column has a unique type, and it must be unique within that column. But you can have any type column you want. In fact, you can have a data frame of data frames if you want, but the columns have to be consistent.

00:30:16 [BT]

So, the data frame's more like a structure, and it happens to be a matrix structure.

00:30:20 [PT]

It has to be rectangular. That's enforced by the language.

00:30:23 [BT]

Okay. And when you were saying the list of vectors, to me (and I'm not a k programmer), but that sounds to me like k or q, where you're talking about vectors. And you can do things because you're building a matrix, but you're building as a list of vectors. And I'll let Stephen, Adám, or Marshall correct me on my misunderstandings [chuckles].

00:30:44 [ML]

Yeah, it sounds very similar. How I would describe a k or q table is that it is implemented as a list of vectors in this terminology, but it presents itself to the user as the transpose of that. So, as more of a two-dimensional thing where the main axis is the vector length, which is really the inner axis, and the secondary axis is those columns, the list axis.

00:31:16 [BT]

So, you get the advantage of the column tables, and it's quicker that way, I think, isn't it?

00:31:21 [ML]

Some things are not. Like, if you want to pull out one row, you have to visit each column and pull out the elements. But the sorts of things that you want to do in k are much faster.

00:31:28 [CH]

And if I'm not mistaken, you can end up with a table in k or q where the list of vectors [are such that] each vector technically can have a different length. Is that correct?

00:31:43 [ML]

I don't know if it lets you do that with a table. It probably depends on the k implementation.

00:31:47 [CH]

Stephen was shaking his head. No, so.

00:31:49 [ST]

Not if the vectors you speak of correspond to the columns of a table. They must all be the same length.

00:31:58 [CH]

Okay. Then it basically is kind of morally the same thing as a data frame. Because I was going to say as a data frame, you always have the same length columns. And if you've got some missing data, you might have some nulls. I'm not sure how you represent that in R. But something like that whereas in k I was thinking was different. But it sounds like, yeah, morally they're the same thing.

00:32:17 [ML]

So I'm wondering when you were talking about matrices. But then you said everything's a vector or a list. So I'm wondering what the relationship is between a matrix and a vector. Like, is a vector necessarily one dimensional or is a matrix a vector of vectors or what's going on?

00:32:32 [PT]

That is a really great question. Really, because I actually have a question for the rest of you all. Because this is something I've never seen before. So what I said was technically correct: when you have a homogenous data structure like a vector, everything is a vector. And the way they make an array out of that or a matrix, they attach an attribute to the vector saying: "here are the strides; here are the dimensions of this thing".

00:33:00 [ML]

Ah, yeah.

00:33:01 [PT]

Internally it just divides up that vector. But the weird thing is the user can say: "I think I'll change the dimensions to be this instead". And with a single attribute change, suddenly reorganize the vector into a different size matrix. Now the underlying logic, like the matrix multiplication, will recognize there might be some problem. It'll get it right. But I've never seen anything like that in a programming language where the programmer can actually control the attributes of the data structure and change the semantics. Is this unique or have you all seen this in other places?

00:33:34 [ML]

So NumPy [05] also has a strided representation and it has this as as_strided() thing that lets you mess with the strides. So that might work similarly. They tell you not to use that [chuckles].

00:33:46 [PT]

Good advice [laughs].

00:33:47 [ML]

So how it works in APL is that we have the same representation. We don't use strides usually, but we have the different axis lengths. So internally it's stored as this shape plus all the data all in one dimension in memory. But if you wanted to reshape that, you would call a primitive ("reshape") that's not defined in relation to how it's represented internally. Well, it takes your data and changes it to a different shape. Or you can transpose and move the axes around. But I mean, these are defined abstractly and then it's the implementation's job to translate them to what it does on the shape and the data.

00:34:26 [PT]

Thank goodness.

00:34:28 [AB]

But I think you get a bit of the feeling of that for those APLs that have what we call modified assignment. Something that you have in many languages is you can write variable plus equals some value and it will in-place add to that variable. And so too, APLs tend to have the ability to write a function on the left of an assignment arrow, which is just like R. And we also have a higher order function that can modify all the functions that take two arguments that are infix. So we take one argument on each side and for the reshape function, it takes a shape on the left and the values to be reshaped on the right. Now, if we put all that together, then we can construct a new reshape function that is just like the old reshape, but with swapped arguments. And now if we write "variable, reshape, swap arguments, assign, and then a new shape", then at least conceptually, I'm in-place reshaping to a new shape. And the interpreter can be optimized so that it actually does the reshape in-place or not. But from the user's perspective, it certainly looks like I'm just controlling the shape information and leaving all the data as it was before.

00:35:52 [PT]

Yeah, I like that because at least the interpreter core is in charge of the data structure, unlike this attribute thing where the user controls the attributes. To me, it's insane. I don't know why they did it that way.

00:36:02 [AB]

So what happens if you try to give it a shape where the product of the shape doesn't multiply (it doesn't end up being the number of elements)?

00:36:12 [PT]

That is so scary that I've never tried it.

00:36:15 [AB]

[laughs] You've got homework!

00:36:17 [PT]

Yeah, I don't know, the machine would explode or something. I don't want to even go near that.

00:36:23 [BT]

If there's functional programming, this is dysfunctional programming [laughs].

00:36:27 [AB]

Well, in APL, it's well-defined because the reshape function ... [sentence left incomplete]

00:36:32 [ML]

I would not describe it as well-behaved, actually [chuckles].

00:36:35 [AB]

I stopped myself! I said, well-defined. I didn't say well-behaved [others laugh].

00:36:39 [ML]

I just wanted to be explicit about that.

00:36:41 [AB]

Yeah, yeah, yeah. Well, it says well-defined because reshape is defined to either stop short if it doesn't need any more data or recycle it from the beginning if it needs more data. And then there are some variations on it. Which, by the way, J is getting an extension to that.

00:36:59 [BT]

It is. Thank you, Adám.

00:37:06 [AB]

You're welcome.

00:37:03 [BT]

Adám suggested that to Henry and it is in the newest version of J.

00:37:07 [AB]

I'm collaborating with the competition.

00:37:10 [BT]

Yes, you are [chuckles]. And there's things you can do with infinity in the shape, which are really interesting as well.

00:37:15 [ML]

It doesn't actually make an array with shape infinity, it should be pointed out.

00:37:19 [BT]

No, it doesn't. No, no, no. It's using infinity to create a behavior.

00:37:24 [AB]

Yeah, it's just a special value that happens to look like a blank.

00:37:28 [PT]

So, rewinding like 10 subjects [Bob laughs], when I reacted to Bob saying, I don't really think this is an array language, I had to stop and think. Like the week before, I was doing some principal component analysis in matrix work and looking at the results and graphing the results. And then this week, I've been looking at portfolio optimization using, again, matrix arithmetic and lots of all my matrix operators in front of me. The more I got to thinking about, like, well, actually, [chuckles] the array language core is still alive and well in R. It's just like so many things, 1% of your program touches that core and the other 99% is doing things like getting the data and organizing the data and showing the data to the user and asking the user, what to do with the data and saving the data. And then finally: "oh, okay, here's this array expression that actually computes what the guy wants here; here's the result". So, [the] more I thought about it, yeah, okay, I'll go with the array language on this one. And we'll call it Array Language Plus. How about that?

00:38:36 [CH]

It's interesting because we've had (I don't know how many times) a conversation on this podcast. We've dedicated definitely at least a whole podcast to it and it's come up multiple times before ... of what makes a array language an Iversonian array language, which is sort of to say, inspired by APL. And we definitely consider J, APL, BQN all Iversonian languages. q and k, I think we do but we've had discussions about it. But then I have this sort of diagram that shows the inner circle of Iversonian array languages and then an outer circle of array languages. And we've never actually discussed what makes a language an array language, I don't think. And, oh, yeah, here we have Adám sharing the graphic. We'll also put it in the show notes for folks. But I've put R, NumPy (obviously NumPy is just a library, but you can consider it language-esque) Julia and Matlab. And I guess I've always just thought that if you have a data type that is kind of an arbitrarily ranked array and you can do element-wise operations and potentially element plus vector operations or vector-vector operations, then that's what qualifies you as an array language. Because if you think of most programming languages, they don't have this data type as a built-in data type. You either need to load some kind of library that does operator overloading. Anyway, so as we're having this discussion of: do the folks in the community consider it an array language and do we consider it an array language? I wonder if it's worth asking is it anything more than that sort of core data type that qualifies you to be a quote-unquote array language? Not Iversonian, but just an array language, period.

00:40:32 [ML]

Well, I will say, if there's the array language idea that everything is an array, in R, I guess you've got something similar, but you've also got this breakdown that you mentioned that every array also has these attributes, so in some way, it's an object. I mean, I'm not saying it doesn't; it prevents you from effectively being able to do array programming, but it sort of undermines that idea that this is an array language when you say: "well, we have arrays, but they're actually built from some more fundamental components".

00:41:10 [AB]

But I disagree, Marshall, because who is to say that they're built from that? We have something at Dyalog internally called John's Big Toe, and it's John Daintry specifically, and he has this big theory of everything. And the thing is that over the years, people have wanted all kinds of meta information about all kinds of things. And so he got this idea that we could have something like the dot that people generally have in objects to get properties of an object, and it would allow us to get some properties of some name or symbol or something. The idea is you should be able to put this dot next to anything, hence the theory of everything. And therefore, it's not following normal APL syntax at all. And it would then allow exposure, possibly even modification, of various properties of all kinds of things. But if you were to do this (in fact, I've seen a prototype of this running) then surely we wouldn't say that everything was constructed on top of such objects, because it was just something that was patched on top of a language that was all arrays before that.

00:42:26 [ML]

I'm not sure, because the thing is it can expose facts about the implementation that otherwise wouldn't have been there, really. It's one thing to have a feature that reaches into the implementation that you can use if you really need it. But if you're doing a lot of programming in terms of this, then your program becomes not a program that works in terms of arrays, but that works in terms of properties of arrays. So I guess it depends on what the use of it is exactly. But I don't agree that adding a feature can never change the fundamental basis of the language. I think if you have a layer under ... [sentence left incomplete]

00:43:13 [AB]

You're saying that the usage of the language has an effect on what kind of language it is?

00:43:18 [ML]

Yeah, well, I mean, it depends on ... [sentence left incomplete]

00:43:21 [AB]

That means two different people can use the same programming language; two different places in the world with no connection between each other. And for one person, the language is this type of language, and for the other person, it is another type of language, but it's the exact same language.

00:43:35 [ML]

Yeah, yeah, absolutely. I mean, you could have a language that the definition of the language is that if the file starts with a declaration that the file is in APL, then it's evaluated in APL. But if it starts with a declaration that the file is in C, then it evaluates in C. I mean, so yeah, then is it an array language or not? Well, it depends on whether you use the APL half or the C half.

00:44:00 [AB]

Okay. I mean, I don't know if I would say it's the same language. That would make almost all the world's programming languages the same programming language because in Bash, you just start with a hashbang that tells what the rest of the file is about [chuckles]. So you just made them all into a single programming language?

00:44:18 [ML]

Well, I mean, yeah, yeah. So you could say that Bash is actually ... [sentence left incomplete]. I mean, it depends on what programs you have installed, of course. But yeah, so Bash is a language that can be used in very different ways. Maybe in many cases, Bash programming actually looks like "sed" programming, which is a totally different paradigm.

00:44:34 [PT]

Can I propose a hierarchy here? Talking about languages, that's a really big subject. When you talk about is this the same as that, what comes to mind for me is Turing completeness. I look at the list of languages on there, and I suspect all of these are Turing complete. [06] They could all calculate something that the other language could compute. The question is, (a) do they have the facilities that lubricate that process and they simplify it? And the second thing is, do they have the expressiveness that makes you want to use them because it expresses itself the way you think about it? I think you talked about this in a recent episode where the language of the expressiveness has to match your conceptual model. Otherwise, you can't put down on paper what you're thinking. So if I think about it that way, like the Turing completeness, yeah, they are. Facilities: do they have it? Can it do a matrix arithmetic even basically without writing a for loop? And then expressiveness: can I write in this language to do that? I think, especially thinking about R, if there's a core of the language that has those features, then suddenly I'm willing to say, sure, it's kind of an array language. And by the way, you get all this other stuff for free. Adám, you had an observation?

00:45:57 [AB]

Yeah, well, I like that. And I think that Conor should add another circle (or ellipse or whatever it will be) on this diagram. And something I wanted to ask about for R (because I've never tried even using R) and that is in APL and J and BQN, I can have a multidimensional array, say a matrix, and I can apply a function to each element directly. So we have something we call "each" or some equivalent of that. And no matter how many dimensions the array has, we directly with a single level of "each" application, access all those elements there. So it doesn't matter if it's a vector, one dimensional, or a matrix, two dimensional. This is not so in k and q and in languages I would call vector languages. It's not so in programming languages that have what they call arrays that are just lists. And you can have lists of lists. Yes, you can map over them. But if you want to access the leaf elements of a list of lists, you have to map twice. And I would like to know whether R allows that when you have a matrix or a list of list table; whatever the various structures are. And I would also like Conor to add the circle.

00:47:27 [PT]

So if I understand your question, you're asking, can I slice and dice a matrix or an array and just operate on sub-slices of that and transform sub-slices?

00:47:39 [ML]

No, not, he's asking the opposite. Can you just ignore any sort of slicing and act on all the elements?

00:47:46 [PT]

Oh, well, yeah, you can do that too. Yeah, you can just ignore the structure if you want to, and just treat it like it's just a big vector of numbers.

00:47:54 [AB]

If I have a matrix of numbers and I have a function that takes a number as argument and prints it, and I do some kind of (I don't know how you write it in R) but mapping, "each-ing", whatever you do on that matrix with that function, then it will print all the numbers, one number on each line separately from the whole matrix?

00:48:15 [PT]

If that's what you want, yeah, that's almost trivial, but it is trivial.

00:48:20 [AB]

It's decidedly non-trivial in, say, k or q, even though they're marked as basically core here in the diagram and Iversonian languages.

00:48:30 [ML]

Somewhat non-trivial, it's not that hard.

00:48:33 [AB]

Is it not?

00:48:34 [ML]

Well, you could flatten the array or something like that.

00:48:36 [AB]

Yeah, but then you didn't actually do the job anymore. You flattened it first manually. That's the whole point. Then any language can do this by flattening it first.

00:48:44 [ML]

Well, I didn't say it was ... [sentence left incomplete], it's not trivial, but it ... [sentence left incomplete]

00:48:48 [AB]

The question is, do you have to manipulate ... [sentence left incomplete]

00:48:49 [ML]

You have to implement it, it's not built in.

00:48:52 [AB]

Yeah, do you have to manipulate the function or the array in some special way for this to happen or not?

00:49:00 [ML]

I'm thinking in R, it's probably just map, right?

00:49:03 [PT]

It's pretty darn close. The indexing scheme lets you bypass the dimensionality scheme and you can just say: "I just want a single element at a time". I mean, really, the language is a mess. You can do all sorts of stuff. And so if somebody says: "well, can you do this?" My answer is almost always going to be: "yeah, do you really want to do that? No"

00:49:28 [ML]

Yeah, well I like the idea that paradigms apply more to your mental model of the language than the actual language itself, like the definition of the language. I mean, so I would probably feel comfortable saying, if you have a mental model that's array programming and you use R in this way, well then, yeah, it's an array language, but it's also perfectly possible to program it in a completely other style that has nothing to do with array programming.

00:49:56 [PT]

To me, what comes to mind when you say something like that is I had one client that was another money center bank, and when I arrived, they had a code base of 50,000 lines of R, which if you know the language, that's a lot of intellectual property; that's really dense. And they had never learned Python. [laughs] So they had written all this stuff like report generators and web services and stuff that really should be done in Python. And somebody, I don't know in the group said, "Let's try using Python." And within months, a lot of that R code just evaporated and got put in Python where it should have been in the first place. So yeah, I mean, you can do a lot of these things with it, but the right tool for the right job is the whole story of this podcast, right? We got this job; what's the right tool for doing that?

00:50:47 [ST]

Something you were saying a few minutes ago has been niggling at me in that regard. You spoke about considerable quantities of R code, but it was all setting up data, importing it, getting user requirements and so forth. And then you said, "And there'd be this little bit, which actually does the stuff that you want, but it's just this little bit of code." But when you thought about that, you thought, "Oh yeah, that's the array program."

00:51:15 [PT]

Yes.

00:51:17 [ST]

And it suggests to me that the criteria for this question, which we're always circling around, what is not an array language, is not whether it will do arrays. The question, the criterion may perhaps be what you don't have to say.

00:51:31 [PT]

Yeah, that's good.

00:51:32 [ST]

So when you set, when you assign a variable, when you create an array, do you have to say, "It's an array, I can set up an array in Fortran, sure." But this is how you do it. In APL, if I assign a variable, it's an array. If I want to add two arrays in Fortran, you could do that. Here's the loop. But in APL, you don't have to say anything about it. The primitives handle it. And what looked in your example, like, "Well, I guess there's an array bit in this software, but it is not very much," is actually the giveaway. Because you don't have to say very much, it's a tiny bit of code. And it's code which you can be very confident about, because you didn't have to write a lot of loops and do a lot of ceremony.

00:52:24 [PT]

Right. Something I've heard you all touch on, but the software engineer in me feels like you don't talk about it enough, is readability. Which is, there's expressiveness, which is, "Can I express what I want to express in this language?" And I touched on that earlier. But then the person who's reading it, will they understand what I was trying to express? And I think your point about what the language just can do without saying much, tells you, "Oh, yeah, I can express this kind of array work here, and everyone will know exactly what it means," as opposed to having to figure out, "Well, see, this is a loop inside a loop inside a loop. What the heck is it actually doing?" I think that's key. A term I like to use is software programming in the small versus programming in the large. Programming in the small is when you're just writing something for your own benefit. You understand the idiom, you know what you're doing, you don't care what anybody else thinks about it. But programming in the large is when somebody else is going to have to look at this or maintain it or think about it, even after you're gone. You're a team. And if you don't have shared idioms that everybody can understand, then you're in trouble because the software becomes unmaintainable. And your point about, "Is it expressed in the language implicitly and understood? Are the semantics well-defined in the language?" That means that, yes, you can do software programming in the large and people will know exactly what you're trying to do. Because it's defined in the language, not made up by us.

00:53:50 [CH]

Do you find that that is, across the R community, something that comes naturally? That idioms and, you know, because you've said a couple times that the style, everyone's got their own. For certain aspects of the language, it's been building on top. It started with the core, but then there's a bunch of different paradigms. Do you consider it a readable language compared to languages or sibling languages like Python or Julia? [07]

00:54:20 [PT]

I'm sorry, do I consider it a what kind of language?

00:54:23 [CH]

A readable language. How does the readability, because that's what we were just talking about, in that you need shared idioms, otherwise you're going to have a problem if you're programming in the large.

00:54:31 [PT]

Yeah, that's a great question. I'm trying to think of a programming language I've learned where you could not write a mess. And I think every programming language, I can mess up almost anything and make it unreadable. R had some quirks, has some quirks, because the statisticians using it had some very specific needs. A simple example is in Python, if you use a negative index, that means, okay, I want to index from the end of the vector. And programmers are like, oh, that is so convenient, I can get it the right element I want. To a statistician, a negative index means remove that element. Kill that, because that's what statisticians are worried about, is trying to pick out the data that's important. So they wanted a mechanism for throwing it out. And I've heard so many complaints from Python programmers about that is the stupidest semantics. Negative should mean the end of the vector. And I'm like, dude, you don't understand what people are trying to do with the language. But the result of quirks like that is, yeah, you can write code that is just mind-blowing. Like, let's see, it adds these elements, it includes these elements, and it removes those elements, and then it does the transpose, and then it does the thing with this, and then it slices out that part of the array, and pretty soon you have no idea what this thing is doing, especially because the language is untyped. You get some function that takes some arbitrary data structure, you don't know whether it's a matrix or a data frame or an integer or whatever, and starts slicing it up. And frankly, lots of times I would be teaching a class and somebody would ask me, what happens if you do this? And I say, I have no idea. We'll have to run the code and find out. Because you can't read it and deduce what the result is. I'm a huge fan of statically typed languages, and unfortunately I'm stuck in an untyped world.

00:56:24 [CH]

I always say the same thing too, is that I prefer statically typed languages. However, I think almost all of my favorite languages are dynamically typed, so I'm not sure what that says about, you can just be so productive with dynamic typing.

00:56:37 [PT]

Yeah, you can dig your hole really, really quickly in an untyped language.

00:56:42 [CH]

Yeah. I heard someone once say that personally they like using C++, but they'd never use it at a company, because they were a software manager, and so on their team they use Rust. Because as an individual, you can do and go as fast as you want, and not make mistakes, etc. But then when you're working with people, C++ does not hold your hand, whereas Rust, it comes with a lot more guarantees. And I feel like that's kind of the same way. If I'm working on a team of people, I want the statically typed checks up front, you want CI/CD doing as much as possible for you, code formatting so you're not arguing with your coworkers, you want all that nice stuff. But then when I'm by myself, I know exactly what I'm doing, I don't need to worry about other people reading my code, I just want to move fast. And that's why Python, Python's not my favorite language in the world, but I can go so quickly with, import a couple things, glue it together, you're done. And I'm just trying to do something, I don't actually care about the quality of the code or the readability. I like your programming in the small versus large analogy, that there are different types of programming, to the extent that a lot of the times you're in a notebook, you're just writing it once, you're throwing it away. It's a write once, read never piece of code. You don't have to worry at all about what you're doing. If it gets the job done, and even if it takes a couple minutes to run, it's going to run once. So you don't need to micro-optimize to make sure it goes super quickly.

00:58:08 [AB]

Yeah, that is so true. Yeah.

00:58:08 [PT]

Actually, I really like that part about R, is I can script stuff up very, very quickly. And then I have a process for migrating it into a production-ready form. I mentioned the RStudio IDE, that was really a huge breakthrough because they let you support all those different forms of programming. As I just alluded, this week I was working on that portfolio optimizer, and I just started writing lines of code, as if I was working with a REPL. Okay, I want to do this, then I want to do that. Then refactor it into functions. Okay, then refactor that into reactive programming. Okay, put that into a Shiny script. Okay, if you know what Shiny is, it's the interactive user interface for R. Okay, great. And I can evolve the whole thing that way, or I could just evolve it into a notebook, which is like a Python Jupyter notebook. And I really appreciate that about the language, and it's the untypedness that lets me do that. I will mention that whenever I refactor code out of that environment into one of my packages, the first thing, the first requirement is all parameters must be type checked. And it's a dynamic language, so ironically, you're dynamically checking the type. But this is an ironclad requirement when I'm working on a team. You cannot have a function that just, you know, I don't know, takes anything and sort of does something with it. It has to say this takes a data frame, and the data frame has these columns, or this takes a vector. And because the language is so badly typed, it's not uncommon to have like three or four type checks followed by one line of code. And that's your function, you know. And initial programmers, especially right out of school, are like, "This is ridiculously inefficient." And then you say, "Okay, well, you can debug your code without type checking, and then let me know how that goes for you." Because 90% of my errors are found by these type checks.

01:00:04 [BT]

You can drive a mountain road without guardrails or with guardrails. It's your choice.

01:00:10 [PT]

[Laughter] Right. I think Lisp has the same approach, you know. [08] Like, you sort of declare the type, but it's really checked at runtime. And, you know, that's good enough. It works just fine.

01:00:21 [BT]

How terse are, you mentioned 50,000 lines. That's a huge program, and you said it was big. But is R towards the terse end of programming, or is it fairly, a little bit more ceremony than that?

01:00:36 [PT]

That's a very good question. Remember I talked about right tool for the job, and how people who are cursing at Excel are probably using it for the wrong thing? Yeah, they were using it for the wrong thing. If you're in the wheelhouse of R, things go pretty smoothly. Now, I wouldn't call it terse. It's not like, you know, you guys love these tacit languages. It's definitely not a tacit language. But if you're hitting its hot spots, it just goes like that. Especially transformations on data frames, it's amazing what you can do in just a few lines of code. And it's still pretty readable, too, which is great. Again, data frames are right in the wheelhouse of R. I'm trying to think if, I can't think quickly something that wouldn't fit that data frame well, or that paradigm well. But yeah. So, the answer to your question is it's somewhere in the middle.

01:01:35 [BT]

When you were mentioning it's really good for graphics as well, and expressing statistical things with graphics.

01:01:40 [PT]

Oh yeah.

01:01:41 [BT]

I would assume that a data frame can be translated to graphics. You think about it, your two dimensions or how many dimensions you want. That's a natural, isn't it?

01:01:50 [PT]

That's really important. The premier package for graphing is called ggplot. A lot of people have borrowed the concepts of it because ggplot itself is built on a concept called the grammar of graphics. Some guy I don't know invented this grammar of graphics and then Hadley Wickham translated that into actual code. And it actually produces a coherent framework in which you can understand the graphics you're trying to build in a very modular, powerful way. I believe Python has a kind of a clone copy of this, but I don't remember the name of it. Yeah, it's very good. And speaking of array languages, one of the central concepts of doing something like that, plotting that kind of data, is the idea of taking a data frame and you can make it long or you can make it wide. It's just a different way of normalizing the data. And R gives you the tools to do that. They're all just hypercubes. But the question is, how are they keyed and what is the normal form? And once you get the right keying and the right normal form, whatever you have to do just happens. It either becomes just vector arithmetic or just a plotting job or a list transformation or something like that. But that's where you spend your time is trying to get the right piece of data in the right place at the right time.

01:03:06 [BT]

And then, bam, do the transformation and you're done. And that's something I've noticed with the array languages too. I wouldn't say it's half of it, but a big part of it is setting up your data structure so you can work with it. And if you set up, as you say, your data frame set up the right way, everything becomes easy. And if you set it up the wrong way, you're swimming upstream with every operation you do. You can make it hard on yourself.

01:03:27 [PT]

So if I can also jump back 20 subjects. The Venn diagram was really interesting that you put up. I hope people look at the show notes and see how you've categorized these various languages into different categories. And by the way, you're probably making yourself very popular with Kamala Harris by using a Venn diagram. So good political move there. But the thing I noticed the most is when talking about is a programming language functional is that we want to think about, especially because of our mathematical proclivities, we want to think like functional programming languages. Like the way I write a program or the way I explained it to someone is this program is just a series of transformations. It took this data and it transformed it this way, this way, this way. There are no loops. And people who read my programs are always like, "Where are the loops?" I said, "I don't use loops. I use transformations." And that's the heart of the functional programming world. However, we live in a stateful world. Your favorite example is a database. Database is just full of stateful information that changes every day and has to be perpetuated to the next day. You can't lose it. You want to retain it. Right. So to me, there's this dividing line between the functional world and the stateful world. And the question is, how do you cross between those two worlds and do the languages let you do it in kind of a natural way? So when you ask about like style and what can you do with R, I'm like, "Well, I'm really trying hard to write functional programs." And then every once in a while, I have to say, "Okay, we've reached a stable state in the transformation. This has to be saved somewhere for future use, like maybe in five minutes or maybe in five years." But that's transition between stateful thinking and functional thinking is just so important to the programming paradigms I use certainly every day.

01:05:17 [ML]

Well, and I have a very related thing that I wanted to bring up about the Iversonian versus non-Iversonian array languages, which is that I wanted to ask whether arrays and data frames in R are mutable. So like if you wanted to, could you write a function that takes an array and doesn't return anything, but reverses the elements in the array that just swaps every element with the one on the other side of it?

01:05:44 [PT]

So the answer to are those key data structures or those rectangular data structures, are they mutable? The answer is yes and no. Do you have any other questions? No. Because R uses a pass by value semantics. So if I pass in an array and you work with the array, as long as you don't change it, you're just using what I gave you. But as soon as you change it, it becomes a call by reference semantics, and it makes its own local copy of the array and changes that instead so that the caller's array is not changed.

01:06:22 [ML]

So it does have the ability to mutate arrays, but it has a mechanism that constrains the scope of what that affects.

01:06:32 [PT]

That's exactly right. Yeah. And when I first learned the language and I was familiar with, I thought I knew everything about programming semantics. Oh, there's call by reference, there's call by value. And I was blown away by this. It took me a few days of experiments to figure out, wait a minute, it's making its own copy of the data. And fortunately, it's very clever about the way it does it. So it might sound like this ridiculous amount of over copying, but the people who implemented it did some very clever things using, as I always say, lists and vectors to make sure that there's a minimal amount of copying. It's remarkably clever and efficient that way. But yes, it's mutable. Many times I wish it was not, but yes, it is mutable.

01:07:15 [ML]

Yeah. Well, and I can imagine you'd have some pretty tricky problems with that. Like, if you don't pass the array in directly as an argument to a function, but you pass it as a field of an object, or even if the function calls another function and this function sets the array to be a field of an object, like tracking, you know, where those arrays are going and when, so you know whether to copy them sounds pretty tough.

01:07:38 [PT]

It's really hairy. I'm glad I didn't have to implement that.

01:07:44 [ML]

But so yeah, my point was that all the Iversonian languages have just purely immutable arrays. Like they, most of them have some feature where you can take an array and change an element, but it gives you a new array. So there are, there's a syntax that looks like you're mutating arrays, but it really doesn't. It creates a copy.

01:08:09 [PT]

Right.

01:08:09 [ML]

And I was thinking most likely all the non-Iverson array languages, I mean, I know NumPy, Julia, and MATLAB all just have mutable arrays. And so it looks like R is actually, I'd probably place that more on the mutable side, but it's kind of hard to say. So R is somewhere, it's not entirely on either side.

01:08:31 [PT]

Right. Well, remember I talked about thinking about a program as a series of transformations. I mean, the transformer does take some data structure. It does mutate it, but it's a whole new data structure. And that's what gets passed on to the next phase in the pipeline, in the transformation.

01:08:47 [ML]

Well, and that's a really important part of what makes array programming, array programming, because when you can't just take your array and set one individual element, you have to think, well, in order to be efficient, at least, what, how can I manipulate the entire array to get the new array that I want without just working one value at a time?

01:09:06 [PT]

Yeah. Yeah. If you can mutate arrays, then you're going to, at some point, you're going to write a program that looks functional, but is not. It's not functional programming style. You've just kidded yourself because some jerk over in the room next to you decided to fiddle with your array. And now the whole thing is not working.

01:09:22 [CH]

Well, we've, as always, blown by the hour mark. But I think one final question, I guess maybe if other folks have an additional final question, I guess, therefore not making this the final question, we can have that as well. You mentioned at a certain point is, you know, picking the right tool for the job. And that in certain cases, you know, you mentioned the one specifically where from SAS, they were making a call out to this one thing that R had and then jumping back into SAS. And then slowly people realize, you know, we could probably do more stuff in R. In your opinion, what are the best use cases for R? Because, you know, like there's a couple of languages, as mentioned before, Python, Julia, MATLAB, SAS, if you want, there's a few other statistical programs. You know, if folks are out there, they're listening to this and they're saying, oh, I didn't realize R had such so many different paradigms, so many different styles, you know, ecosystems within it. What would you recommend is like, you know, these are the top one or top three. If you're looking to do something like this, consider R because it is such a good tool for solving that set of problems.

01:10:21 [PT]

Wow. Well, I got to right away say, I will have trouble answering the question because to me, the answer is really simple. It's statistics and graphics. But what I mean by statistics and graphics is like huge compared to what most people think statistics and graphics is, you know, like you want to compute the average of these numbers? Okay, you should use R. No, no, no, that's, anybody can do that. I'm talking about where you have some serious analysis. It could be numerical data. It could be categorical data. It could be strings, any kind of data, but you really need to tear it apart and understand it. That's to me is statistics. Yeah. And then the other thing is graphics. You might think, you mean like you want to draw a little line graph? No, I mean, you need to churn out some complicated graph like the New York Times. The New York Times uses R for those complicated graphics they do, which look beautiful, but there are not many tools that can do that kind of thing. I mean, they use things other than R, but a lot of their graphs, the really beautiful numerical ones were done right in R. Really nice. So, unfortunately, I can't give you like this little closed box answer to your question. I wish I could. I would not do really heavy text programming with it. I would not write an editor in R. When I do database work, I use SQL. You know, I write SQL queries and the R calls that query. [09] It has a great user interface, as I mentioned, the Shiny package. So, when I want to do an interactive package, I use R because it's got those facilities. I think that's probably the best answer I can give you in the remaining time.

01:12:04 [CH]

Awesome. And I think I'll pause in case for my last little bit, if there's any, I see no hands going up. Oh, no, there is a hand. All right, Adám, over to you.

01:12:12 [AB]

Well, it's not a question, but just a shameless, not even self-promotion that, you know, if you want the full Iversonian experience and still have access to R, you can always use Dyalog APL with the RSConnect library that allows you to call R from Dyalog and pass the data around, of course.

01:12:36 [CH]

Awesome. I didn't even know that existed. And I mean, on the topic of, you know, using tools is that I don't think we've actually plugged the, or I was mentioned that R is open source, but I believe we'll link it in the show notes. I don't actually know what, I know CRAN is like one of the top sites for at least the packages, but you can go and download, I think, every single operating system you can use R on.

01:13:01 [PT]

Yeah, the pre-compiled binaries, as they put it, it's all there. And yes, CRAN, the Comprehensive R Archiving Network has everything in it. It's not at CRAN on the internet, it's called r-project.org. And from there, there's a link into CRAN where you can download everything. One of the cool things about CRAN is that it's fully distributed and people have put it in the cloud too. So, it's very, very fast. You can download even huge packages very quickly and install them. Like if you've used Docker and you're amazed at how quickly an image can be downloaded, it's that kind of experience. It's like, I need this package, wham, oh, okay, I got it. You know, that's one of the beauties. It's not just the number of packages, but the accessibility to the packages is excellent.

01:13:49 [CH]

Yeah, no, and it's totally awesome that whenever we have someone on to talk about a technology and that technology is completely open source and completely free. I mean, that is, there's no bigger motivator, or I should say, it's a lack of a barrier to entry, right? If someone has to go spend money to try something out, some folks are not going to do that.

01:14:03 [PT]

Yes.

01:14:06 [CH]

But if it's completely free and you can go download it, and it doesn't matter what OS you're on, there is no reason not to go try it out. Especially in the day of, you know, chat GPT, you probably don't even need to go write your R code. You can just say, hey, I've got to crunch this, you know, statistical calculation, and I've heard that you've got some beautiful graphs, make it happen, poof, and then it probably will work.

01:14:27 [PT]

Okay, remember I said it's not a very regular language? Okay, so chat GPT, you might want to read the code that it generates and just, you know, be more comfortable with it before you unleash it on your boss, for instance.

01:14:36 [CH]

Yes, maybe you're using your own risk if you're hooking it up to an LLM, don't necessarily ship it into production.

01:14:42 [PT]

And as for range of machines, here in this office, I run it on my Raspberry Pi, and I run it on my LLM server also. So yeah, covers a pretty good range. Run it on my Windows machine, my Mac machine, my Linux machine, yeah, everywhere. It's awesome.

01:14:59 [CH]

All right. Well, if you have questions, I mean, for us or for Paul as well, I'm sure we will link Paul's contact info if he wants. But if he doesn't, you can send it to us, and then we will relay it to him, and you can reach us at?

01:15:11 [BT]

Contact@arraycast.com. It's my plug. And thank you to our transcribers, and thank you to Paul, because I've been working for a while to get R onto this podcast, just to get the perspectives, and I think he did a great job of explaining them to us and giving us insights. It's been fantastic.

01:15:29 [PT]

Thank you. Thank you. And you're welcome. Pleasure to be here. And if you want people who won't stop talking about R, I know a bunch of them.

01:15:38 [CH]

Yeah, maybe we'll link, you mentioned that you were involved in the user group, so maybe we'll link that as well if folks want to meet people in person. I know we don't do it often these days, but every once in a while, it's nice. But yeah, once again, thank you for coming on, Paul. It's awesome to hear about another, you know, "Array Language" in quotes. You know, it depends on who you ask, it sounds like. But yeah, it's always great to learn more about languages that aren't in our small set that we typically focus on. But yeah, thank you. And with that, we will say, happy array programming.

01:16:07 [ALL]

Happy array programming.

Transcript

Episodes About