Ilya (00:01.445)
All right, so today I'd like to welcome Dr. Kevin S. Van Horn on this podcast. Kevin has been in this industry for a little while. His doctorate thesis was on how learning is really a big optimization problem. And since then, he's worked in many companies across different industries, including search, online advertising, done a lot of work, kind of more of a consulting.
places like Savvy Sherpa. And also Kevin has had an opportunity to teach computer science at several points in his career. Kevin was a colleague of mine at Adobe. I learned a ton from him. I think he is one of the greatest patients that I've seen, whatever that means nowadays. And today Kevin has his own company where he helps businesses.
kind of navigate through, how would you say this? The uncertainties of ML in production. And specifically he's, yeah, go ahead.
Kevin S Van Horn (01:10.273)
And.
Yeah, Bayesian Statistical Modeling.
Ilya (01:16.035)
Yeah, I'm going to cut that out and I'm going to pretend like I knew to begin with, specifically Bayesian statistical modeling. and he does a lot of work with, mixed marketing modeling and, lots of stuff for, marketing. Well, I'm not sure that was any better. I'm going to have to a lot of this, but anyway, welcome Kevin. and, would love to ask you how you see your career.
Primarily from the perspective of what are the main things that you've learned? What have the more pivotal points in your career been? And how you ended up where you are now?
Kevin S Van Horn (01:57.422)
Okay, well I will try to mostly focus on stuff that is relevant to ML and data science. So I started off getting degrees in physics and computer science. I always say that physics was my first love. But I wanted to build things too, so I focused on computer science. But part of all of that with both physics and CS is that I always wanted to understand things at a fundamental
There's a certain personality type they say is the, what do they call it, a systematizer, right? The one who wants to see the grand scheme, who wants to see how it all fits together, what are the basic principles and so forth. That's me 100%. And so that has always been my approach to pretty much anything I've gotten into. I want to make sure that I understand what's really going on in some sense.
When I, my graduate work at BYU, I got interested in machine learning, got interested in neural networks at that time. And so of course, the big question to me was, you everybody's talking about generalizing in that time, you neural networks can generalize. What in the world does generalize mean, right? I wanted to make sure I understood that. And so I began to look into some of the, some of the literature.
on machine learning at that time. This is early days, we're talking about 91. Pretty early days and some of that stuff. And got into something called computational learning theory, which actually mixes together statistics and computational complexity. And found myself having a little lot of statistics and teaching myself that. And gradually over time, beginning to get a feel for, oh, okay, this is...
In a real sense, machine learning and all that stuff, it's in a sense all statistical problems, right? Not the traditional statistical problems, which were always like narrowly limited to specific models with a finite number of parameters and so forth, but attempts to do things with much more complicated models. But there are certainly some very, very strong common
Kevin S Van Horn (04:23.978)
ideas in there. So as you mentioned my initial dissertation was on machine learning as optimization and in the spirit of computational learning theory I tried to prove positive results and couldn't. It's really hard to come up with any sort of positive results in that field. It's a lot easier to come up with negative results which I didn't turn my attention to and prove some rather discouraging negative results, okay?
Ilya (04:29.638)
You're so much stupid!
Kevin S Van Horn (04:52.248)
So let me tell you the two really discouraging negative results that I got. What's like the simplest kinds of machine learning or inference problems you could think of? Well, how about a simple linear discriminant function? You just linear threshold function, right? That should be easy, right? Well, okay. Also known as a one level neural network, Or a single unit, single level neural network. Well, it turns out that
If you want to minimize the number of inputs that you pay attention to, there's all kinds of, you could have a large set of possible predictors you could look at. You could look at interactions between them. You could look at various functions and so forth. So this whole business of like, you know, the optimization side of it, you need to reduce the number of inputs to reduce the difference between your empirical error and the actual error.
That problem is not just NP-hard. I assume your audience knows what NP-hard means, right? As far as we know, there's very, good circumstantial evidence that you're never going to find a polynomial time algorithm. They're all going to be exponential blowups in the amount of time. So it's not just NP-hard to do the full optimization. Oh, no, no, no, nothing so nice as that. It's NP-hard to even come up with any constant factor.
Ilya (05:57.165)
Yeah.
Kevin S Van Horn (06:21.23)
I'm okay with coming with 50%.
Ilya (06:23.205)
I love it when stuff starts with, it's not just NP hard.
Kevin S Van Horn (06:29.817)
So it's like, if you tell me that you're okay with coming up with within double the minimum number of inputs, sorry, to get it perfect, by the way, to get zero error. And we get into something similar if you're just trying to minimize the error. Okay, so if instead you're saying, okay, I'm not trying to optimize or minimize the number of inputs we're using, but I'm trying to minimize the empirical error, right? The misclassification error. Well, again,
It's NP-hard even trying to come within any constant factor of minimum. So let me say, I'm okay as double the minimum empirical error. That's NP-hard. 10 times. That's NP-hard. So that was pretty damn discouraging at that point. So at that point it's like, okay, I'm not gonna find any. I mean, it's the very simplest things you could come up with like that are already NP-hard. It's like, okay, trying to come up with
with algorithms that are provably correct and provably run in polynomial time and everything, it ain't gonna happen. And so the rest of my dissertation, I did some work on learning classification rules, but that was all very empirical. It was all much more in the sense of trying out various.
Kevin S Van Horn (07:59.234)
you know, various more intuitive ideas, nothing guaranteed to work, right, but that works often enough. And by the way, before I go on, I'll probably get back to this later on, but that's an interesting thing we run into is that we keep on running into situations where if you want guarantees, you forget it. You can't get guarantees. But often we find that we can come up with solutions that work very often.
Ilya (08:27.951)
Yeah, I was actually going to say that probably prepared you for ML as it is in real life better than anything. Yeah, I think that's an interesting topic to explore. But yeah, so after you graduated, I mean, it's not like ML was everywhere at the time, or was it?
Kevin S Van Horn (08:32.492)
Yeah.
Kevin S Van Horn (08:42.456)
So.
Kevin S Van Horn (08:49.164)
No, no, no. And there's a lot of talk about data mining and so forth. So it took a while before that really took off. And so I really did get a chance to work in what I had done my dissertation topic on, those kinds of problems for quite a while.
But one of things that came out of it was that right towards the end of my dissertation, I learned about the Bayesian approach, Bayesian probability Bayesian statistics. And it was kind of like, this is what I should have been using all along. Because it solved so many problems that I had with the more traditional classical frequentist approach. It was far more general. It could answer a far more general set of questions than what I could with the other tools.
Ilya (09:38.349)
Interestingly enough, I had the same experience years later when you taught me about the Bayesian approach and I was like, yeah, why weren't we doing this? So yeah, you passed it on.
Kevin S Van Horn (09:38.606)
But
Kevin S Van Horn (09:51.118)
Yeah, so that kind of, I remained interested in that for quite a while. From about 1997 to 2000, I worked in a speech recognition company. It was called Phonics, it was a startup. And applied some of the Bayesian stuff there.
One of the interesting things that I found in there was, this was my first introduction to the kind of bitter pill for those of us who are systematizers and who like all the theory to be nice in that sometimes just throwing lots and lots of computation data at the problem will solve it. And they were seeing this with hidden Markov models applied to speech recognition at the time. The hidden Markov models for speech recognition have been around forever.
But they're starting to get some real results because they just had tons of training data to use with it. Going on further from that, I got into some of Edwin James's work on Bayesian probability theory, specifically his book, came out in 2003. But I the precursors to it, the unpublished version before it was published. And a number of his papers learned about maximal, principle of maximal entropy.
Learn about Cox's theorem, what are the reasons why probability theory is the right way to reason about degrees of credibility or uncertainty, all that stuff. And I created a page of my own, which is still up there. It's pretty old now. It's really old school HTML. But it's my unofficial Reddit commentary on James's book, which was published after he died. His former student sort of cleaned up a few things and published it.
Well, that turned out to be my entree into getting into some more interesting stuff because in 2009...
Ilya (11:53.381)
Sorry, I'm gonna press a little bit more on that. What on earth possessed you to write it?
Kevin S Van Horn (11:58.328)
to write it because I was very...
Ilya (11:59.459)
Yeah, because all of us have ideas, right? Like when I read a book, I'm like, yeah, they're like, I have a bunch of ideas. I do not go and publish them on the internet.
Kevin S Van Horn (12:09.878)
What possessed me to do that? What possessed me to do that was nobody else was going to do it. The author was dead. Right? And there were eretta, and I thought that his book was a very, very important book, and somebody needed to do it. And then, of course, the commentary was, well, as long as I'm doing this, you know, I do have some commentary that might provide some additional explanation. I might as well put that in there, too.
Anyway, so that was not intended to be a career boost or anything. It was just sort of a passion project for me. And it turned out to get me my entree into marketing research. So several years later, when I was looking for work again, and during most of the time I had not actually been able to get work doing pattern recognition, machine learning, data mining, data science kind of things.
But one of the like 100 different places that I applied to was a marketing research firm and the VP of software development. When he found my web page, he internet stalked me. And when he found my web page, he basically said, this is the guy. This is who we're looking for.
I did not intend that to be a resume, but it was. And so what the modelers did, they later got consumed by a huge company called OmniCom, if you can believe it. The name sounds ominous, you know. And it really was, that company really was like this soup of half-digested companies that they had bought off. They were not fully integrated yet.
But anyway, what the modelers did is that in marketing research, there's a lot of math, actual real math and real science and real statistics involved in modern marketing research, analyzing the data. And one of the kinds of things that marketing researchers are interested in is they want to know your values. What you as a consumer, when you want to buy a certain class of product, how much you value different
Kevin S Van Horn (14:33.006)
possible features or different, and how you do the trade-offs between them, right? How they trade off in particular what they call willingness to pay, which you can think of as, if you're familiar with that microeconomic theory and the idea of utility, decision makers have this utility function and they maximize expected utility. What they're trying to do is they're trying to infer the consumer's utility function. And they want to do it not, they want to do it on an individual level.
But part of the problem is that they don't, they can't ask enough questions to really pin it down for any individual. So anyway, let me back up a little bit. People can't tell you what their utility function is. Okay, they just can't. They can't give you a number. They can't really, they can't tell you, but what they can do is they can make decisions. That they know how to do. So what a lot of this marketing research stuff does, at least in the area of what's called discrete choice surveys,
discrete choice modeling is they will create a bunch of hypothetical purchasing scenarios for the survey taker. And these hypothetical purchasing scenarios, they'll have three to five hypothetical products described by some set of attributes, things like price, things like if it's a computer, like how much memory it is or screen size or.
it's a car, maybe it's gas mileage, or what its internal volume is, or things like that. And so that's what they use to describe these various hypothetical products. And then they put these before the survey taker and say, well, which of these would you buy, if any? And then from their answers, assuming that they are acting as reasonably rational beings with a little bit of noise thrown in there.
they then infer that utility function for them. And so then you can do all kinds of fun things once you've got that, right? Once you've got that, you can now predict, or at least try to predict, what their choices would be in other scenarios. So you can say, OK, well, I know the competition has this product and this product. These products are already out there. What if I bring in this other product here? How will it fare against the ones that already out there now?
Kevin S Van Horn (16:59.306)
And so you can predict what their choices will be. You can also segment the market based not on demographic attributes, but on what you really want, which is based on their desires, what they're willing to pay for, what they're looking for in a product. So all kinds of fun things you can do with that. And some of the Bayesian methods help you combine population-wide information with
with individual level information. That's what they call a hierarchical model, where you kind of use the assumption that there are some similarities in the population as a whole. So you can sort of like use what you know about other people to sort of fill in the gaps a little bit. And this is how you can get away with asking fewer questions than you would need otherwise to pin down individuals. So anyway, when I went to work for them,
I was originally doing some other stuff, but about a month in, they decided they were going to start using their own software. There was a company there called Sawtooth Software that made software to do this, but they wanted to have their own software. They also saw themselves as working on the really hard problems and working with leaders in the academic side of marketing research, like people like Greg Allenby.
They wanted to implement other models that are out there in the literature that nobody else was offering. And so they could say that they are really the top of the line and they had the most recent stuff. So that's sort of where I came in, is that they took some code that was originally academic code and were trying to turn it into production code. Now, I don't know if you've ever done this, but turning academic code into production code is kind of a nightmare because
When you're writing code to write a paper, it only has to work for the data setting paper. It doesn't have to work for everything, right? It doesn't have to take care of all the messiness. Maybe it has to take care of whatever messiness is in that particular data set, but it doesn't have to take care of the full range of things that can happen. also, academics don't tend to be all that great when it comes to code quality, organizing their code and testing their code and all that stuff.
Kevin S Van Horn (19:24.462)
So anyway, the guy who was working on it was having some difficulty with it. I started taking look at it. And here's where things got interesting. So this sort of gets in the career path. Just because I was interested in the subject, I read a lot about Bayesian computation. It wasn't required for my work or anything like that. I just wanted to know. I was just fascinated by the topic. Because there's some really difficult computational problems there with Bayesian computation. Because they involve these huge
Essentially integrating over thousands of variables. How in the world do you do that? In any reasonable amount of time. So this whole area of Markov-Tain-Money-Carlo as a method of doing that. So I had learned quite a bit about that. I had never actually implemented any of it. Now I didn't tell them that side of it. That I had never actually implemented any of it. I knew the theory really well, but I had never actually implemented any of it. So we get into this.
And I'm looking at it and yeah, so you know, I did the usual kind of things that I'm used to doing just in terms of test driven development and making sure things will test to try to figure out what was going wrong. But I actually found a really subtle bug in the implementation that you have to understand the math to understand why it's a bug. Okay, it turns out that there is a likelihood they were computing that they were computing using Monte Carlo.
So we have Markov chain Monte Carlo, and within the iterations of the Markov chain Monte Carlo, they were doing some Mark to straight plain old Monte Carlo to evaluate a particular likelihood. Well, that's an approximation. It's not quite the right thing, right? And that bit that you're off kind of skews the results to some amount. And so,
The fact that I knew the theory well enough, I was able to figure out how to actually fix that. Turns out there was a very simple fix for it that will in fact give you the...
Kevin S Van Horn (21:22.094)
It will get you the theoretical write behavior. terms of a Markov chain, it'll have the right stationary distribution that you wanted to have in everything. And so that sort of got me off on a good foot there at the modelers at doing that. But yeah, so I ended up doing a lot of other kind of stuff like that, implementing various other custom models that were in the literature, doing
Experimental design, which is actually something you don't see a lot of in machine learning because we usually just, machine learning usually just take the data you have, right? You don't have any control over that. But when you're doing a survey, you get to control what sets of questions you ask and to whom you ask them. And so it's important to be able to get the most information per question. And that's what experimental design is all
Anyway, so I worked for them for several years. I'm not going to go into all the details of what I did for them, but that was a period of time when we didn't have all the tools we have now. We didn't have things like Stan to generate code for Bayesian models. So I was spending a lot of time doing the math myself for the Metropolis updates.
Right? Lots of linear algebra and everything. So my typical work during that time was like spend days and days doing tons of math. And then, you know, about a fifth that much time actually writing the code after I worked out all the math and made sure I hadn't made a mistake anyplace. And then Stan came along and of course I was really excited about that.
things like Stan and some of the other tools that are out there now. Let's see, what else do want to say about that? Also, after the modelers, the modelers eventually, like I said, got consumed by Omnicom and their focus began to be more on quantity over quality.
Ilya (23:29.465)
Say it with a voice.
Kevin S Van Horn (23:31.82)
with a voice. That's right. And we had what I think what I call the purge of the PhDs. One by one, the PhDs are all let go. I was let go a few months later, my boss was let go. And a year or later, the founder was let go.
Ilya (23:34.167)
Omnicom.
Kevin S Van Horn (24:00.29)
He had sold out Omnicom. so by that, know, because they just decided, that was just the mindset of the company that bought them out. So yeah, all the PhDs, all the idea of being the people who solve the hard problems without the window was, no, we want to be a project mill. We just want to get as many projects pushed out the door as quickly as possible. anyway, so where I went after that is I worked for Symantec for little while, did some revenue forecasting.
One of the things that I'd learned while I was at the modelers is I'd learned about state space models. And of course those have applications to forecasting. So with the revenue forecasting, we're trying to model the whole revenue process, including leads and everything. But I was only there for about seven months and got an opportunity to take a more rewarding job, which was Adobe.
I kind of jumped ship at that point and went to Adobe, which is where we met. And at Adobe, my focus was mostly on various forms of forecasting. Some of these.
Ilya (25:12.451)
Yeah. And Kevin was a senior staff engineer at Adobe. not just some, you know, no nothing that hosts the podcast, but a senior staff engineer.
Kevin S Van Horn (25:24.622)
Thanks. But yeah, so a lot of time series forecasting, forecasting load, server loads. There were a of sort of one-off little projects for the director, Jan. In some cases, we had to apply a Bayesian methods to get some sort of answer with very little data. Jan was actually pretty excited about that because he says,
Usually when I ask these data science people, they say, sorry, can't do anything about that. It's like, well, OK, we can do a little bit. We can come up with a reasonable prior based on what we know about the situation to sort of limit things a bit. yeah. Another one of those reasons why I like the Bayesian methods.
Ilya (26:12.909)
I remember, I remember he once told me that like, we don't want to have you in the on-call rotation because like if you don't get a good night's sleep and then like that ruins a lot of really important work. So that's why I got to be on the on-call rotation. But no, yeah. So tell me about what you're doing now and why.
Kevin S Van Horn (26:29.57)
Oops.
Kevin S Van Horn (26:33.736)
Anyway
Kevin S Van Horn (26:39.534)
Yeah, so what ended up after that is I have always wanted to do, well, two things went on. I have always wanted to get back into doing some academic research. And I have been very much interested in, and one of the reasons I was interested in Bayesian probability theory was that it's basically epistemology, right? It tells you how you should evaluate evidence and how strongly you should believe.
you know, various hypotheses. Well, potentially it tells you that. There's some fine details there that we have to work out. But I got interested in some of those foundational questions. In 2003, I had written a survey paper on Cox's theorem, which is a paper that says starting from certain assumptions, we can show that something equivalent to probability theory is the only viable
logic, plausible reasoning, reasoning about degrees of credibility. But there are some limitations to Cox's theorem for when it assumes from the start that plausibilities or credibility, whatever you want to call it, are measured as real numbers and single real numbers. So the Dempster Schaeffer belief function people would come back and argue against that. They might argue for at least a two-dimensional or
higher dimensional kind of thing. And it does make some assumptions about functional forms of how things decompose. And in 2017, which is right after I joined Adobe, I published a paper where I showed how to basically show that result with much weaker assumptions. So the idea is that a propositional log that
Start with propositional logic, okay? Propositional logic will sometimes tell you, given these premises, I can say yes, A is true. Sometimes I can say for sure, no, A is false. And then there are all these other cases where I can't say either way, right? And we'd like to fill in that gap. We'd like to be able to assign some level of credibility, some level of plausibility between certainly true and certainly false. And what are the options?
Kevin S Van Horn (29:07.63)
And so all the assumptions that I use in this theorem are basically assumptions of the form, this property, that propositional logic already has, and it would be really weird not to have. We're gonna keep this property. And it turns out that is enough to show that, again, probability theory or something that is isomorphic to it is your only option. But it actually takes you a step further than
It tells you how to take a set of propositions, your state of information. There's this notion of state of information that James talks about a lot, that Bayesian probabilities are not features of the real world. They're features of what you know. They're features of your state of information. And they shouldn't be purely subjective either. They shouldn't just be your opinion or your feeling or something. They should be based in something. They should be based in what information you
So two people with the same information should have the same probabilities. And so in this case, the information is actually a finite set of propositions. It's propositional data. And we can actually say exactly what your probability should be. Not just that you're going to use the laws of probabilities, but we can actually give you a number to calculate. It says this is what the number should be. And that was
pretty exciting to me, but it was restricted to finite cases. So effectively, because it's propositional, right? Propositional logic doesn't deal well with infinities. And what I have done since then is I've worked out some stuff about how to extend it to the infinite case. And again, a lot of what I'm doing here is I'm trying to fill in, I'm trying to flesh out ideas that Jaynes had, but never worked out because he was a physicist, he wasn't a leitician.
Right? And so he talked about what he referred to as the finite sets policy. You always start with finite when you don't want to deal with an infinite set or infinite number of possibilities or infinite gradations of distinctions. You always start with the finite case. And then you take the limit as the number of states or number of gradations or so forth gets larger and larger.
Ilya (31:32.485)
Mm-hmm.
Kevin S Van Horn (31:32.684)
so that your infinity is always derived from well-defined limits. And so I actually, it's been a few years, I actually did work out and proved that we can actually a way of defining a metric space of premises, defining what they mean for them to be close to each other, similar to each other, and to really rigorously define what this whole notion is. And to show that,
Back in a bit, the usual approach to probability theory is measure theory, which most of your audience has probably heard of and tried very hard not to study. Because it gets into trans-finite induction and all kinds of awful stuff like that. If you're reading through the proofs and stuff. Most practitioners don't really get into the fine details of measure theory.
But measure theory is really cool in the sense that it unifies like discrete distributions and continuous distributions or mixed continuous and discrete. It's a very nice, theoretically clean framework. But it starts at the infinity, right? It doesn't start from the finite and work up to it. And anyway, what I showed was that every computable measure
Ilya (32:49.049)
Yeah.
Kevin S Van Horn (32:59.906)
which from an epistemic standpoint, if we're thinking of probabilities as logic, things should be computable. anything that you're ever going to actually work with has got to be computable or you can't do anything with it. So any computable measure can actually be obtained in this way as a of a sequence of increasingly refined, finite bodies of propositional information.
And so that's sort of a research direction I'm pushing in. The idea is that if we understand the fundamentals better, we really understand that they're really ground level stuff, that it will let us do a better job of tackling some of the hard problems in probability theory. So just to give you an idea.
Ilya (33:48.895)
So let's talk a little bit about the fundamentals then. What is fundamental? Especially to the concept of learning, You're specifically focused on the probability theory, but more broadly, what do you see as the fundamentals of learning?
Kevin S Van Horn (33:59.608)
So.
Kevin S Van Horn (34:11.182)
Let's look at the very, very, very simplest problem you can have out there, which is simple induction. In fact, so simple that we just have a sequence of binary variables, right? And all we want to do that we have no reason to assume that there is some
probability, essentially we assume they're each Bernoulli draws, right? That there's some probability between 0 and 1 that'll be true or false, and that it's independent for each one. How do we infer that? We don't even have a full answer to that simple question. So part of the problem is that, you know,
The usual way we do it is we come up with, a beta distribution on that Bernoulli draw probability, right? OK, where does that prior distribution come from? That beta prior, should it even be a beta prior for that matter, right? The beta prior is computationally the easiest one to work with. Should it even be a beta prior?
But what should it be? And people just sort of go intuitively and have various intuitive arguments. And in a lot of circumstances, we can sort of get away with just kind of fudging it a little bit and saying, well, this looks reasonable.
Ilya (35:49.551)
Yeah, I'm so sorry to do this to you, but bitter pill, right? Do we, like, I understand why it would be really cool to understand that, right? But do we need to if throwing more data at it will probably just work? Yeah, it doesn't work. It fails in ways we can't predict. And then we're like, okay, this doesn't work every time. But.
Kevin S Van Horn (35:53.548)
Hahaha!
Kevin S Van Horn (36:04.706)
doesn't always work because here's the...
Kevin S Van Horn (36:13.87)
Here's the problem, what if we have predictors or what if we have a large number of categories? So it's not just binary, it's categorical. So think about doing trigrams or quadrograms or pentagrams or so forth. You quickly get to the point where you have to have more structure there. You'll never get enough data.
Anyway, the point I was trying to make is that one of the things we don't understand is that to give you an idea of why it can be real problem is this. We don't even know that you can do the induction. If you really don't know anything about these values, right, if you're really truly in a situation where we're dealing with...
Kevin S Van Horn (37:07.97)
Let's see what I'm looking for. Not default.
Ilya (37:12.035)
No prayers, right?
Kevin S Van Horn (37:13.742)
Well, I there's always a prior, but basically we want ignorance, a true ignorance, right? Well, true ignorance, we don't even know these values are related in any way at all, right? The first one could be whether or not the number of people in the world is even. The second one could be the 27th bit of pot. The third one could be, they could be all of the things, they just have no connection whatsoever to each other, right? And in that case, no, it doesn't matter how many of them you've seen.
You can't learn anything. That's actually provable. If you have no prior information whatsoever, you can't do induction. You're just out of luck. Those values are just all completely in a pin of each other, and even after you've seen 10,000 of them, you can't use the fraction of them that have been true so far to make any sort of conclusion about what the next one's gonna be.
I'm trying. So one of the things like that's just a simple example of like, it's a really basic thing that we don't understand. You know, in general, we don't really understand very well how to create priors. And.
Ilya (38:24.409)
And one of the things I learned from you is every time you think you don't have a prior, you do. It's just implied. And so you can't troubleshoot it. Yeah, that's fascinating.
Kevin S Van Horn (38:30.818)
Yes.
Kevin S Van Horn (38:35.662)
Anyway, so actually I have been sort of semi-retired in that I've been, you I have done some consulting work, but lately I am putting a lot of effort into, you know, into that research area and I'm starting to write up results. And, you know, I have a bunch of results that I just sort of wrote up enough so that I was satisfied that I knew the answer, but didn't write up into a paper, right?
It was not in plovishable form and so I'm doing a fair amount of work trying to get some of that cleaned up and put out there and made more accessible right now.
Ilya (39:14.501)
So how do you create new knowledge? Because that's, know, like I've been an engineer my entire career and that's been, I don't have a PhD. I don't have a fancy degree. I have a master's, but like the concept of creating new knowledge, right? Like you're a systematizer. This is terrific. So you can give us a system for creating new knowledge. If I were in my last year of PhD today or first year of PhD today and I was thinking about ML as a career,
Kevin S Van Horn (39:34.894)
Ha
Ilya (39:44.143)
How do I do it? How do I advance the state of the art?
Kevin S Van Horn (39:46.126)
Okay, so are you interested in advice for someone who's just about to start grad school too? Because I do have a few things to say about that. I'll say, someone tell me my big mistake I made, okay, when I went into grad school. I did not understand.
Ilya (39:55.597)
Sure, yeah. Hey, we'll take it.
Ilya (40:02.733)
You didn't talk to Bezos enough. Kevin was a year before Bezos in his undergraduate degree.
Kevin S Van Horn (40:11.15)
Yeah, I almost went to Princeton, is where Bezos went, but ended up not going there. So the most important thing is your choice of advisor. That's even more important than the school you go to is finding the right advisor because you're going to be working very closely with your advisor and realizing that this person is your advisor and is your resource that you need to be using. Your next most important choice is a dissertation topic.
because that needs to be something that is a problem that is challenging enough to be worth publishing and worth getting a PhD for, but not so difficult that you can't solve it within a few years. And this is where a good advisor comes in, because a good advisor will typically have far more ideas that they don't have time to pursue themselves than they can pursue in their own time. And so they've got plenty of interesting ideas that they can toss at graduate students and have them run off and try to solve them.
That was the biggest mistake I made in grad school, is that I did not understand that. And I did not do a good job of making use of my advisor or, well, in one sense, in choosing an advisor. I should have done things very differently, but I did.
Ilya (41:33.807)
Well, I think all of us have that about our schooling, you know, and it's really interesting to me because the advice that you give there is really pertinent throughout the career too, right? Like this is like your manager, your technical lead has way more impact on what you're doing than the company. Like people go to companies because of the company name and like my favorite is like such and such a company does not have very good work life balance. And I'm like,
I've seen managers there whose teams have really good work-life balance, and I've seen managers whose teams don't, you know? And also the choice of project too, right? Like I feel like as MLAs we're very prone to picking projects that you cannot complete in a reasonable timeframe because it would be really cool if we could do this. But also like, I see less of this, but I see some failures on the side of like, this was too trivial. This doesn't actually move the bottom line.
Kevin S Van Horn (42:17.6)
Yeah.
Kevin S Van Horn (42:29.891)
Yeah.
Ilya (42:30.253)
Okay, you've done it. Cool. I don't care. And so yeah, that's, I think that generalizes after graduate school as well. I think that stays with you forever.
Kevin S Van Horn (42:40.622)
I think there's one more thing that would also, piece of advice I'd also give for starting grad school, grad student, which is that, and I'm gonna guess that your audience here are probably pretty bright people who probably sail through school, are used to.
Ilya (42:56.943)
Hey, if you subscribe to this podcast, you definitely are one of the best and brightest.
Kevin S Van Horn (43:00.622)
are probably not used to needing help in school, right? They're probably used to just going through, you know, and just figuring, working things out. And when you get into grad school, it's going to be a different environment because it's not just doing the homework. It's not just doing tests. It's finding a good, you know, very important part, like I said, finding a good project to work on. And you need to be
You need to learn how to ask for help. If you have never, ever, ever had to ask for help before in your academic career, once you get into grad school, you need to accept that there may come a time when you're gonna need to do that. And that needs to be in your mind as a possibility. It's okay to go to your advisor and say, I'm stuck. I don't know where to go. Can you help me out here?
Ilya (43:59.109)
man, this once again generalizes to professional life because you know, working at Metta specifically, right? Like in Metta ads, uh, people come in there, like the bar is pretty high. And so by the time you come in there, you're not competing with whoever you were competing with and like the little startups that you were in or whatever. Uh, and so I, I helped the guy get into meta last year and, uh, coach them kind of through it after that. Uh, and he came back to me like,
couple of months in and he was like, I don't know how to get an edge on these people because everybody already is very smart. Everybody already is working like nights and weekends. So like my usual methods of like, I'm just going to outwork somebody or I'm just going to be smarter than somebody. They no longer work. And, and yeah, like, I mean, this is when you come to people and you ask for help, right? This is where you need to understand how to do that. So yeah, I think that's all applicable way beyond.
Kevin S Van Horn (44:38.44)
Hahaha!
Ilya (44:53.231)
graduate school. I think that's partly why graduate school is, I wouldn't say prerequisite, right? Like I don't want to gatekeep necessarily, but you do see a lot of MLEs coming from like some sort of a graduate school background, be it the master's degree or a PhD.
Kevin S Van Horn (45:10.83)
Well, I think part of it also is that one of the most important things that grad school taught me was just how to teach myself. Learning how to go out and just dig up things in the literature. Because I always found that most of the stuff that I needed to know, nobody was teaching any of the classes that were available in the department.
Ilya (45:32.549)
Well, that's terrific. So, I have a question for you on that one then. How do you learn new things, right? Like, especially like, you you learn Bayesian stuff from scratch. You learn marketing things from scratch because you were interested in them. Like, I remember the reason I think you're a perfect person to ask this is years ago, you told me that like, listen, if you know nothing else, go find like the recent graduate student thesis in this.
Kevin S Van Horn (46:00.685)
You
Ilya (46:00.919)
and just read through the introduction section. Like that hack will get you really far. But beyond that, is there anything, like how do you read a paper, right? Because I know lots of people who like open up a paper and then close it and they're like, didn't get anything. But like, how do you get stuff out of a paper? How do you get stuff out of the thesis? What resources do you even look at before you even get there? And if you can speak to that.
Kevin S Van Horn (46:26.188)
Yeah, I can't really say that I have a really, you know, full-fledged system. I think the main thing that I've always had going for me, I know, I know. I think the main thing I always had going for me was just plain stubbornness. Like, I would start trying to figure something out and I'd find a few papers, I'd get a little bit, but you know, would get a little bit.
Ilya (46:36.973)
And you call yourself a systematizer.
Kevin S Van Horn (46:53.56)
but wouldn't really get as far. I didn't fully understand what I wanted to. And then I'd go off and do something else for a while. just, I was too stubborn to give up, right? And I would always come back to it. And it's kind of like the, just bit by bit, chipping off bit by bit, eventually things start to make sense. But yeah, I mean it.
Ilya (47:16.889)
You don't have a system for reading the papers?
Kevin S Van Horn (47:20.174)
You know, actually not. mean, there is a little bit of scanning over it to begin with, right? A little bit of, of course, the usual business of going down and reading the conclusion, reading the introduction, and all that kind of stuff. There is the business of using papers, oh, of using a paper that maybe turns out to not have the information you want, or maybe you didn't understand that well, but it's got some great references to other papers. Like a lot of times, right, they will reference
They'll reference the seminal papers in the field that introduce the ideas in the first place. So they'll be talking about something that involves some concept you know nothing about, but they assume you do. But yeah, you follow the references, and that gets you back to the paper that tells you about the precursor concepts you were supposed to know before you even tried to that paper.
Ilya (48:12.825)
Yeah, yeah, to me, if it's like a new enough field, one of the things I would do is I would take a look at a couple of papers that I can find really quickly and look at the reference section and see if they're all referencing the same paper. And then I'm going to go read that one first. But these days, of course, you can find. There's an echo there. Sorry, my audio just changed. I ran out of juice on my headphone one second.
Kevin S Van Horn (48:25.39)
These days, of course, if you can find, there's an echo there.
Ilya (48:45.349)
I've created some of the most complex systems in existence and I cannot use a microphone.
Kevin S Van Horn (48:46.488)
By the way, if you.
Kevin S Van Horn (48:57.96)
If you want to...
Ilya (48:58.149)
There's gotta be hundreds of people as smart as us, right Kevin?
Kevin S Van Horn (49:04.64)
You're even worse than my boss was. He would only say, what did he say? There must be thousands. Which was already, you know.
Ilya (49:09.477)
thousands?
Ilya (49:14.553)
Yeah, anyway. And give me one second.
Kevin S Van Horn (49:16.939)
Are we back?
Ilya (49:30.661)
Alright, well that will need to be cut.
Ilya (49:45.773)
Okay, can you still hear me?
Okay. All right. Sorry. Where were we? Understanding papers and just understanding new areas of technology.
Kevin S Van Horn (50:00.652)
Yeah, nowadays there's actually a new strategy which, you know, I go to chat GPT and describe what I'm looking for, what I want to understand. And not because I want its explanation, although that may help a little bit, but basically I'm trying to help me find the right things to read and find the right terminology.
So for example, when I was first learning Bayesian statistics, an obvious thing that came to mind was that, gee, if you want to be really general and not tie yourself down to particular parameterized form, maybe we should have an idea of a prior over functions. And then I thought, well, surely someone's already thought of that idea. But I couldn't find anything on it. And it was because I didn't know what they were called. They're not called prior, they're functions.
They're called stochastic processes or stochastic fields. And it was years before I found the right terminology. Several years before I found that and suddenly, yeah, suddenly I was able to find plenty of information on the topic. But without knowing the right, the magic words, the right words to look for, it was very difficult.
to find anything on the topic. And I found that ChatGDPT is really good at that. You say, you I don't know the word for this and then you describe what it is. It can do, generally do a pretty good job of telling you what the right term is that you should be looking.
Ilya (51:44.569)
Yeah, no, yeah, just a word of warning to not take like the terminology it's great at, but do not take its explanation, especially of like the newer fields, right? Where it doesn't have a ton of training data at face value. But yeah, I think who was it?
Kevin S Van Horn (51:55.628)
Yeah, yeah.
Kevin S Van Horn (52:01.614)
That's why getting good references and getting terminology, stuff that you can check.
Ilya (52:07.277)
Yeah. Yeah. Somebody in my career told me that like number one skill is knowing what to Google. If you, if you know what to Google, you got it all. but yeah. so this is like how to learn, but there's a meta problem in machine learning of what to learn, right? Because there's a real difference between things that come out all the time. And you sometimes actually do need some of them in your job.
Kevin S Van Horn (52:13.635)
Yeah.
Ilya (52:36.453)
Versus like this is gonna be a seminal paper. I gotta understand this and and I'll be honest with you I'm really bad at this when attention is all you need paper came out partly because it was you know, so click baity of a title I was like, there's no way like this is not serious. This will go away This is a fad, right? And I it took me about six months after it came out to like come around to it Not being a fad when every researcher at Adobe was like we're doing this with attention. I'm like, okay
Now I'm listening. But how do you know what to study in order to stay relevant in the industry? for MLEs, like our stuff moves.
Kevin S Van Horn (53:15.459)
That.
Yeah. I mean, so that is a difficult problem. And this is sort of something where you get into the personality types. Cause like I said, I'm a systematizer and I, and my tendency is like, I want to understand the fundamental issues, but, here's sort of that bitter pill. The advances in deep learning have had nothing to do with understanding deep issues, right? They've all been successful hacks, but they've all been hacks.
Ilya (53:49.071)
Wait, stochastic gradient descent is a hack?
Kevin S Van Horn (53:49.592)
and
Well, kind of, kind of.
in a certain sense. And there are a lot of things like that, that just, you know, people tried them out and you couldn't necessarily have predicted in advance what was going to work or what was not going to. And so there's a bunch of things that people know work, but I really do think that we don't really have a deep understanding of why they work, right? And people have been surprised to how
how well the large language models have worked. And the researchers themselves did not expect them to work that well. so there are a lot of things that are really, if you really look at them, really kind of, they're kind of hacks, kind of, well, there's some intuition behind them, but there's not a lot of really solid theory behind them. And that's just the reality that we've
didn't know that we have right now. And I would say one of the biggest mistakes I've made in my career was being very reluctant to look at things like that, right?
Ilya (55:04.869)
So knowing what you know now, how would you have like, what would you have even thought of, right? You can't chase everything. If you chase every technology that comes up day to day, you're gonna burn out and, you know, not be an ML engineer. But at the same time, you can't miss a couple of essential things. So like, this is a very hard part for me about learning about ML engineering and...
You know, people are like, you've been in the area for 14 years. I'm like, so what? Right? Like I've, I've been irrelevant four times during this time when like paradigm changed under me. and, and, know, the new grad knows just as much as I do in the new paradigm. And I have to still commit to learning and, and, and do this well. Go ahead.
Kevin S Van Horn (55:50.84)
So.
So there some things that I think I said. There will always be some things that will always serve you well, right? And so, you know, learning more math is almost always not going to hurt you, right? As long as you don't spend, you know, as long as you spend enough time on other things you also need to be doing. But it's like, know, physicists are sort of like the gods of science because they can go and enter any field they like to and contribute to it because they know more math than anybody else.
Right? that's right. You heard from physics,
Ilya (56:23.087)
Thank you, thank you.
You
Kevin S Van Horn (56:31.278)
I had forgotten that. Sorry, I didn't mean to compliment you.
Ilya (56:37.803)
Ouch.
Kevin S Van Horn (56:42.198)
Anyway. But no. so it's the same thing in computer science, right? There are some fundamentals that will always serve your will, no matter where things go. And so think it's always good to make sure you have those down. So yeah, make sure you understand your linear algebra very well. Make sure you understand your statistics and probability theory very well. But the other stuff you're... I'm...
Ilya (57:08.633)
Bayesian or frequent test.
Kevin S Van Horn (57:11.212)
Well, you can't avoid frequentist stuff. Obviously, I think that the Bayesian is much better founded and I think has much better possibilities. But the reality is you're going to have to know some frequentist stuff, too. You can't avoid that. But I was going to.
Ilya (57:29.465)
What about, you mentioned computer science, what about kind of the fundamentals there that you see as fundamentals?
Kevin S Van Horn (57:37.57)
Fundamentals would be things like understanding. I grew up in the age, right, when we had to create our own data structures from scratch. We didn't have all these nice libraries and so forth. So I'm used to understanding like the nuts and bolts, know, pointers, linked lists and all that stuff, what's going on in there. Understanding how it is that you're able to make a randomly accessible vector that you can
that you can append values onto in O of one amortized time. know, so understanding your data structures, understanding your basic algorithms, understanding, you know, the O of n, O of log n, you know, understanding that time complexity stuff, big O notation. One thing that shockingly few people understand, find, shockingly few computer science graduates understand is just how to do parsing.
Ilya (58:25.241)
the big O notation.
Kevin S Van Horn (58:37.198)
how to write just basic compiler stuff. I I'm talking about really sophisticated stuff, but just basic stuff. So I even found that even just basic pricing shows up all the time. It's very useful in building little mini tools and little mini languages for yourself. And it just kind of surprises me the number of people who don't understand how to do that. And none of that stuff is really sexy. It's all old technology.
There haven't been very many new ideas in parsing for years years years. But it's all basic stuff that's pretty useful to understand. Even if you're not an expert in complexity theory, at least understanding the idea of, yeah, there are problems that are empty hard and do not try to solve them.
Understand we need to throw in the towel and just just go with that, you know with a heuristic
Ilya (59:36.037)
Yeah. Yeah, no, for sure. And in terms of things changing, right? Like you've been through, you've been through more than me of this hype cycles and like AI winters and, you know, whatever you want to call them, but there are definitely times when it's better to be an MLE than others. So like the, the late nineties were probably a bad time to be an MLE, whereas, you know, 2025 seems like a terrific time to be an MLE.
For people who don't want to change fields every couple of years and are really happy being an MOE, what do you feel are some strategies that work both in the lean times and the good times?
Kevin S Van Horn (01:00:21.26)
Well, are we talking about on a personal level or research level or?
Ilya (01:00:27.011)
Yeah, yeah, like, like in terms of thinking about it, right, like in terms of like, what do need to be concerned with? Are they all that different? Do you just like keep tracking along or?
Kevin S Van Horn (01:00:38.924)
Well, mean, so some of these winters you talk about, I we touched on this before, but it's it's kind of surprising how spring came about. Spring came about just because of more data and more computing power, which is not really the answer that anybody was looking for. least it's not the intellectually satisfying answer. And maybe it's not the best answer, right? Maybe we'll eventually find a better answer and find that we could make do with a lot less data.
Ilya (01:00:59.642)
Hmm.
Kevin S Van Horn (01:01:08.76)
which does seem actually probable considering that human beings aren't trained on nearly as much data as these big LLMs are. But, you know, if you've got the data, use it.
Ilya (01:01:22.553)
mean, no human being has the range of knowledge that an LLM does, right? There are different things that we're good at. And in terms of regurgitating facts, which are mostly facts, mostly usually facts, LLM is better than any human. Yeah, truthiness. I was at New Rips and I was talking to a group of people and I said something like, what LLMs are really good at.
Kevin S Van Horn (01:01:39.47)
They're very good at truthiness. LLMs are great at truthiness.
Ilya (01:01:52.511)
is confidently BSing. And so the first job that I think should worry about, you know, being automated as a CEO and the guy standing next to me is like, I'm a CEO and I'm like, sorry. But, but like any, any place where like confident BSing, like, you know, you just sound like you're saying the truth and you don't care that you're not like, I think that can already be replaced. Any place where you actually have to be right, like that's a little bit harder.
Kevin S Van Horn (01:02:17.622)
Yeah.
Kevin S Van Horn (01:02:22.414)
Yeah, yeah. So what would I say about that?
I think part of it is you do, even when things are in tougher times, I think it's important to keep on learning. Keep on educating yourself. Keep on deepening your understanding.
You know, a lot of times, and sort of think about what startups do, right? They start off with this great grand idea that they've got some great solution, and they end up pivoting to something that's actually typically far more specific and narrow than what they start off with, which helps them a lot. And I feel like ML where, you know, if you're hitting it, if things have hit a wall,
focusing hard on the specific application areas at hand, the specifics of it, right? And doing well on that and understanding your application area.
is actually a pretty good strategy a lot of times. Yeah, because at
Ilya (01:03:36.869)
Spend all your time understanding it so when more data comes up, whatever you understood was not relevant, but more data is.
Kevin S Van Horn (01:03:43.64)
Ha
Kevin S Van Horn (01:03:47.104)
Yeah, it's just, so that's like how to get through winter. It's kind of, how do you get through summer?
Ilya (01:03:53.059)
How do you get through summer? Cause summer is also hard, right? Like people don't understand this, but like when you go to social gatherings and everybody is more of an ML expert than you are, and you're like, guys, please listen to me. Like I do this thing for a living. But you know, people have strong opinions and bosses have strong opinions of things that should be possible because you know, everything is moving exponentially. Why aren't we there yet? Right? And.
And so I think summer has some hard parts about it, too
Kevin S Van Horn (01:04:26.646)
Yeah, yeah, so you know, like when you're faced with a big hype machine, right? I think the real skill is to start asking lots of questions and start programming. Because if there's real meat there, there's something real there, as you drill in and try to get specific answers to various questions, those answers will exist, right? The more you drill in, the more things will make sense. On the other hand,
If you're dealing with vapor wear or BS, the more you drill in, the more frustrated you'll get and less things will make sense.
Ilya (01:05:05.893)
That's a really good one. Yeah, that's definitely a very good. See, I knew you were a systematizer. That one is a great system. Yeah, so we're getting close to time, so just a couple more quick questions. I'm gonna cut that out because anytime you say that, people drop out. So I'm not gonna say that. Yeah, no, I think that was a terrific system for how to deal with the summer as well.
having worked on so many ML projects, how do you assure success? Because I think every single person who's ever worked in ML knows how to assure failure. but, but not very many people have like shipped successful ML projects and done it multiple times. Right. because lots of demos I can, I know lots of people who can demo things very well, but
It does go back to that, like, you know, taking research into production. The researcher just has to demo that it worked once. You have built your career on being the guy who makes it work every time. How do you do that?
Kevin S Van Horn (01:06:25.976)
Well, so the first part of it is not specific to Emily. It's just making sure you've got your software.
basics down, right? So that you are, you you're doing, you've got your unit tests over the place, you're doing your test driven development. So you know that you solid code, you know, before anything else, make sure that every piece of code, you know, is doing what it's supposed to do. And one of the things that I found especially important there, and one of the things that's, it's especially difficult.
about machine learning and statistical sorts of problems is that most other sorts of software problems, you know the answers and it's easy to come up with test cases. Whereas with statistical software and machine learning software, erroneous output isn't always obvious that it's erroneous. And so that's always where I have put the most effort in testing. Like if it's something
Ilya (01:07:29.957)
So how do you test that?
Kevin S Van Horn (01:07:32.622)
Boy, in a case like that, I will often do, because often what that involves is like there's some sort of numeric computation. And I had to do a bunch of things to make it more efficient or because the way the data was organized or something. So one thing that I will often do is I will write another piece of code, perhaps in another language, that is horribly, horribly, horribly inefficient, but it's really simple.
is as close to the raw equations as I can make it, right? And it's easy to check. And I will get an answer that way first. That's one technique that I've used.
Kevin S Van Horn (01:08:16.426)
Another technique, because numerical, a lot of people don't worry about this, but you can run into numerical instability. There are numerical analysis issues to worry about. There are a lot of formulas you can use that will blow up in certain regions because you're essentially dividing by one over the difference between two numbers that are very close to each other, right? That kind of thing.
And so what I have done sometimes in those is have gone to a piece of software that has infinite precision or arbitrarily large precision computing, right? Where instead of the 16 or 17 digits of a double, I can have a hundred or 500 or a thousand or whatever, know, compute some answers to just ridiculously high levels of a number of digits. So I know that even if I have done something
even if there are numerical instability issues there, that I still have the right answer. then make sure that my code, think about ways that it can happen. Think about regions where things can go wrong and how to deal with that. And so my code will often have if then else's to handle different regions of the input space because they have to use a different formula in different regions of it. You sort of have this piecemeal thing where it's like,
this formula works well in this region, this formula works well in this region, and they overlap by somewhat, right? But there are points where either one of them will just fall down really badly. There's also just finding a few really simple cases where I know what the answer's supposed to
Kevin S Van Horn (01:10:06.646)
With machine learning, there's the obvious thing of starting with the answer and generating the observations and seeing if you can get the answer back, if can get back the parameters that you used to generate the false data from, the synthetic data from, things like that. Things.
Ilya (01:10:15.482)
Mm-hmm.
Ilya (01:10:29.913)
Yeah. The thing you taught me about synthetic data is synthetic data is only as good as the stuff that you can think of, but it's still worth doing, right? Because like you can already think of a way it's going to fail. So you should, you should check it against that.
Kevin S Van Horn (01:10:36.012)
Yeah.
Kevin S Van Horn (01:10:44.866)
Yeah, so there's all that. The other thing is never trust your inputs. Again, that's a general software engineering precept, but it's especially important in machine learning. Dirty data is just ubiquitous. Don't assume they make sense. Check them. Make sure that they do make sense. Make sure that any assumptions you're making about that data actually hold for the data.
And of course, logging, right? This is all just standard software engineering stuff. If you're building a server or anything like that, you're going to want to have sufficient logging there to let you track down what may have happened. Beyond that, and I actually have to admit that I have not always been very good at this, I do have certain perfectionist tendencies that get in my way sometimes. And really what you want to do
is you want to get something that works, even if it doesn't work very well, but something that works as a baseline and then iterate on it, so that you always have something.
something available.
And that is probably the most important, more ML specific piece of advice.
Kevin S Van Horn (01:12:09.07)
Because there's no end to amount of tweaking you could do.
no end of the amount of little improvements you could do. So get a baseline there. If you're satisfied with it, you did too much. It should hurt.
Ilya (01:12:32.825)
I had one time in my, no, two times in my career when I wrote a relatively, so I don't do test driven development as much. I often test like after the fact. And I've had a couple of times in my career where all the tests ran the first time and like it just worked. And I gotta tell you, I was never as scared as those two times because I'm used to it failing, right? And.
And so when it's like, that worked what I did wrong, you know.
Kevin S Van Horn (01:13:08.622)
Did you ever have, because I have had the experience where like everything passed and it's like when I dug into it, it was because my chest was wrong.
Ilya (01:13:16.229)
Well, yeah, that's the first thing you check. You're like, yeah, did I put a cert true everywhere? What happened? Yeah, no, for sure. And it, it is definitely an interesting experience. And I think, I think this is a thing. Since we can't guarantee it's not that we can't guarantee that everything works. It's that I'm telling you that what I really did is I just like took a little bit of data, put it in a bucket over here. Just didn't.
Kevin S Van Horn (01:13:17.39)
My tests were flawed.
Kevin S Van Horn (01:13:23.49)
Yeah.
Ilya (01:13:45.689)
didn't do anything with it during the training. And then let's see like if it even performs well. Okay, it did, like 90 % of it. So that's pretty good, right? Like it's wrong on the 10 % of it, but it's right on the 90. And I feel like that's a pretty unique thing in our field. There are very few other places. There are very few other like engineering disciplines where it's like, this bridge falls down like one time out of a hundred. That's okay, right? And...
And like for reasons that I sometimes don't even understand, that, yeah, it's a thing that we deal with uniquely.
Kevin S Van Horn (01:14:14.35)
you
Ilya (01:14:28.375)
Anyway, I do want to be respectful of your time, but before we go, this has been a terrific conversation. I always love having conversations with you. I hope there are a few folks who want to learn more about what you're doing and follow you. And so what's the best way to keep up with Kevin.
Kevin S Van Horn (01:14:46.05)
Well, these days I am doing most of my writing on my Substack blog and that is epistemicprobability.substack.com. So epistemic, yeah, yeah, epistemic, E-P-I-S-T-E-M-I-C, probability, all one word, .com.
Ilya (01:14:59.129)
You wish you had a shorter name, don't you?
Ilya (01:15:06.519)
Yeah, there will be a link in the description too, but yeah, for the masochists in the audience, feel free to type it in.
Kevin S Van Horn (01:15:13.586)
And there's a lot of math in there. It's not your typical self-stack blog. There's quite a bit of math in there. I am trying to make it as accessible as I can. And if you do go there and visit, please don't hesitate to ask questions, to put questions in the comments if there's anything that doesn't make sense.
Ilya (01:15:36.517)
Terrific. Thank you, Kevin.
Kevin S Van Horn (01:15:40.632)
Pleasure.