Rebecca Bilbro - Open source, entrepreneurship, and 14 years in ML

January 26, 2025

Dr. Rebecca Bilbro shares her unique journey through data science and machine learning. We explore the value of blending unusual skill sets, fostering collaboration, and maintaining control in her career.

Rebecca also reflects on her transition from an individual contributor to an entrepreneur, the importance of building strong teams, and the transformative role of open source in the industry.

She offers insights into what makes a great machine learning engineer, the challenges of entrepreneurship, and the critical role of contributor energy in open source projects. We also discuss optimizing machine learning workflows, effective communication in managing ML projects, navigating the hype around ML technologies, and fostering community engagement and transparency in the field.

Links:

LinkedIN
Rotational
Applied Text Analysis with Python (affiliate link)

Episode Transcript

Ilya (00:01)
All right, well, I'd love to welcome Dr. Rebecca Bilbro today. Rebecca has kind of been the antithesis to people who think you can either think with the right side of your brain or the left side of your brain. She's a double majored in English and math, a pairing that you don't see very often. But after that, she saw the light and went and did her PhD in actual math. So we're glad to have her.

So Rebecca, like me, has graduated into just about the worst job market that our generation has seen. despite that, her skills were able to carry her to a job with the Department of Labor, where she has pioneered a lot of data science techniques that were just coming out in the early 20 teens. And then she went on to work with the Department of Commerce.

where she's refined her craft. She's then worked for several startups in a row. And as if that wasn't enough, she also taught at Georgetown for about nine years. teaching is something that Rebecca is really passionate about. She has done it as a teacher at Georgetown. She's also a...

going to a lot of conferences and giving a lot of talks about what she's found out. She's helped a lot of people early in their career. And she's even written two Safari books. One wasn't enough. The books are on very different subjects. One is about NLP, which fun fact, the last time, or one time that I talked to Rebecca in 2016, I think she was saying, you know, I think all of this stuff about...

image processing is really cool, but language is next. And 2017 is when Attention Is All You Need paper comes out. So she's proven herself to be quite a prophet of things that are yet to come. And she's also written a very popular library, Yellow Brick, for data science. It currently has 4.3 thousand stars, which puts it squarely in the 0.5 top repositories on GitHub.

And then leveraged it into a company where she's been for last four years now. yeah, the Rotational Labs is the name of it. one of their, or I guess the only, the main product is Ensign And being from Utah, I have to say Ensign I'm sorry. This is kind of a thing that you have to do here.

Rebecca (02:35)
That's fine.

Ilya (02:49)
which aims to help people use streaming data as well as if not better than we used to use data at rest. And so welcome. Thank you for agreeing to be the first on the podcast. And Rebecca, we just talked about a whole bunch of your many accomplishments, but what I would love to know is how does it feel from the inside? What have you learned over your career? And kind of...

If you can walk me through the things about this journey that somebody just reading your resume can't tell, but you definitely can.

Rebecca (03:27)
Sure. Thank you, Ilya, for the introduction. That was really nice. Well researched.

Ilya (03:36)
Thanks. I know you.

I don't have to research.

Rebecca (03:42)
Fair.

I think that there's maybe a, I think one of the themes probably in my answers to a lot of these kinds of questions, like one of the common questions that you get is like, how do you keep up with the field? And so I think my answer to a lot of these kinds of questions is to kind of try to focus on patterns instead of details.

to the extent that that's possible. And so I'm going to try to do that to sort of answer your question.

So I would say like some of the patterns I think in my career progression and sort of the decisions that I've made that took me, you know, into machine learning engineering and then maybe around machine learning engineering now, whatever it is I do now. I think some of the kind of, one of the key patterns is the idea of unusual combinations.

So you mentioned my majors as an undergrad. I studied math and English, which is an unusual, statistically rare combination of people to double major because there's not a lot of overlap in requirements. So it's not particularly efficient. But it was enormously beneficial to me. think that if you want to help describe novelty or identify novelty, looking at the intersection between two things that feel sort of disparate

can be helpful just as a thought experiment, just generally as a strategy. So I would say, and actually to slightly modify what you said, when I actually went to grad school and I got my PhD, it's not in math, it's actually in technical communication. the way I think of it is my PhD was in domain-specific languages. So learning how to start to describe different engineering disciplines in terms of

a domain, in the way that we communicate in chemical engineering is different from how we communicate in material science and how that manifests in like different types of charts and visualizations and schema diagrams that help us do the work that kind of.

you know, allows things to progress and allows science to progress and society. I, but I think combinations are really interesting. And I think that that's a pattern. like yellow brick is another example of combinations, right? Taking like, you know, the best of both worlds of Matplotlib and Scikit-learn to sort of say, how can we kind of write out some common functionality that unifies these two APIs that everybody loves already?

So that it's easier to do basically like hypothesis testing around machine learning. Like is this the right model for the data? Question mark. And then you can use a visualization to say no, that is not the correct model for this data because you can see there's heteroskedasticity in the chart or something like that where, you know, there's too many outliers or where, you know, you're the line of best fit is way off from what it should be. So.

You know, I think that those combinations is a common theme. I think maybe like two other ones that I would throw out there, collaboration is really important to me. Basically, that was one of the things that I...

I was in grad school for a very long time, probably longer than I would recommend to other people. Even if you decide to go to grad school, don't do it for like eight years. I did it for eight years because it was subsidized. I was subsidizing my own education, but it's not, again, not very efficient.

But I think what I realized towards the end of my, because I was sort of planning to go into academia. That was my plan. Both my parents are academics. My brother also went into PhD right out of undergrad. So I kind of just figured that's what I was doing too. But I think towards the end what I realized is what I liked the most about

academia was collaboration and not publication. that was important. And collaboration continues to be really important. And I think a third pattern that's really important to me is the idea of control.

And that is ultimately what drove me to kind of sacrifice some of what I could do when I was purely a machine learning engineer and somebody else was in charge of getting contracts and, you know, mentoring people and making, you know, the gears turn correctly when I could just focus in on, you know, just the MLE stuff. I felt kind of a constant irritation with how things were

were

being done or which things were being prioritized and basically who I was allowed to collaborate with and who I was not allowed to collaborate with. ultimately, I was willing to make the sacrifice in order to have more control, but it's a lot more responsibility. And I'm sure we'll talk about entrepreneurship a little bit more, but I'm happy to sort of sing the praises but also be sage and cautious about how to proceed in a way that is

you know, maybe a little bit more efficient or has a higher probability of success than just kind of random search.

Ilya (09:17)
For sure. What drove you to start rotational? mean, obviously the mission is something that you've been passionate about. I know the people that you've started with are people that you've worked with before. What would you say is driving you to it? like, what do you feel like you're doing? Because you could go out into market today and find a better job, right? Or find a great job, rather. don't know, better is a subjective. But yeah, why do you do it today?

Rebecca (09:45)
Yeah, I mean,

to be honest with you, I don't think I agree with that. Like, there is no other team like Rotational's team.

I've worked in a lot of teams and this is the only one that works like this.

You know, it's extraordinarily high trust. People have a lot of responsibility and there's a lot of respect. People encourage each other. People are excited to do code reviews and excited to do pair programming. They do it without being told to. And they're all excited to keep doing stuff. You know, they are all excited to work together.

And I think that that's really special. And that was the reason and is the reason.

Ilya (10:43)
Yeah, no, and that's terrific. And it does look like you guys are putting out a lot of great stuff. And I really appreciate that. At least the things that I hear about it, right? In my interactions, it doesn't seem like you're after the hype. It seems like you're after building the actual technology, which sometimes in this industry, people go out after the trendy thing. And it just so happens that your thing is also trendy, but that's...

an outcome, not necessarily a goal. So let's maybe talk about entrepreneurship a little bit since you brought it up. So you've transitioned into it after being an individual contributor for a while. And I know that, you know, all of us probably have things that we would have done a little bit more efficiently or differently afterwards. And since I really want this podcast to focus kind of on the main principles and some MLEs are going to want to go into entrepreneurship,

How do you see that transition? How would you, you if you had, if your brother was going that route or your kid was going that route and you had to tell them to not step on the same rakes you did, but maybe step on slightly different rakes and you know, also maybe some good things that come of it. You've mentioned control. You've mentioned being able to build a team that you love working with. You know, there's...

There have been many times in my career working in really big companies where it's like, these are people on your team and you're like, they don't really all fit here. Like I would rearrange this a little bit, but you can't, you know? So if you could talk a little bit about that, I think that would be great.

Rebecca (12:22)
Yeah, I mean, think that it is like team. The team is really, I think the core of the answer here to the entrepreneurship question. If you are really, you know, whether your strength is in business or your strength is in tech, like you can't do it alone. And I like, I don't know what to tell you. I mean, I'm sure that you could go out there and buy a book that says how to do it all alone, but you can't.

It's too hard. There's too much stuff to do.

Ilya (12:55)
So how did you build such a great team? Like, how do you find people?

Rebecca (13:00)
They find us. I'm serious. That's what happens. People come to us. And they come to us, and we meet them at a conference, or we meet them at a hackathon, or we meet them because they email us, and they say, this is kind of neat, or I want to ask you a question about this. And we stay in touch. And then two years later, three years later, six months later, something.

Ilya (13:01)
Hahaha

Rebecca (13:30)
we need somebody and it turns, I guess we realized that they were the one that we needed all along. I mean, I realize that sounds a little woo woo, but that's happened multiple times already. You know, people find us and they reach out and they sign up.

Ilya (13:55)
That's great. That's great. How many people do you have now?

Rebecca (13:59)
I think it is somewhere between 10 and 12 full-time people. It's actually starting to get, I know that sounds really small to everybody else, but to me it sounds like the biggest number because I was around when it was zero. So we literally this week, Monday, had a new machine learning engineer start.

Ilya (14:04)
Okay.

No, it doesn't.

Yeah.

Rebecca (14:27)
So we are growing, but we are trying to grow slowly because, you know, really our number one operating expense is salaries, which feels right to me. This is hard. People should get paid for it, you know? So our main expense is salaries. So we are very cautious and strategic about growing quickly, but we are growing.

Ilya (14:52)
Excellent. Yeah, no, and I think it's really smart. I think you see a lot of people who get ahead of what they can manage. And I've tried this several times now where I've grown a team and it feels like, get more people, right? But then you bring them in, you have to onboard them and then you run out of tasks to delegate because there's some things that are way easier to delegate than others. And I don't know, I have...

I have control issues myself sometimes. And it's like, I'm not gonna give you this because I know I can knock this out of the park in a few minutes and to like onboard something on this is a long time. So kind of along those lines, right? That you wanna get the best people. You've been an MLE for a long time. I'm gonna still keep calling you an MLE, sorry. But what makes a great MLE? What's like the delta between somebody who's like...

or even you, right? Like I'm sure at some points in your career, you've been better than others. What are those things that take you from one level of those two? Like I hate to use the term 10X engineer, but you know, toward that.

Rebecca (16:02)
Honestly, I think it's like one thing. Like it really, like I thought you sent these questions to me in advance. I don't know if everybody in the audience knows that, but I got this.

Ilya (16:12)
No, I'm gonna bleep that out. These are all new. You're answering

off the cuff.

Rebecca (16:16)
This is extemporaneous.

But you sent these questions in advance. And I thought about this question the most of all of the questions is, what is that sort of thing that makes them great? And I really think it's just one thing. Although I'm going to say a second thing after it so I don't sound like I'm close-minded. But it really is a willingness to figure things out.

in the middle of them being on fire or, or not on fire, but just like a terrible wreck, a disaster. and the reason that that's the differentiator is that that is what it seems like nobody else is willing to do. Like nobody else on the team, when it comes to like machine learning projects, of course, there are many engineers who are used to kind of like, you know, scrubbing in and doing miserable work. but like machine learning.

engineers are the ones who are willing to scrub in when the model training pipeline or the prediction pipeline is not working properly because I think it's just about a willingness to jump in to a burning building and not everybody wants to do that. In fact, the majority of people do not want to do that.

Ilya (17:34)
Yeah. And in my experience, at least if they do do that, they do it to check off, you know, like, okay, the production is no longer on fire rather than going down to that, like the five whys or whatever, like to get down to the very bottom of it and be like, actually, you know, I could fix the top line problem, but there's still going to be issues over and over again. If we don't fix this, like really core thing. And

And to me at least, you know, I come from a different world than you. come from a lot of, well, actually we come from the same world, but we've diverged since. But in big companies, there is a tendency to say like, the thing is no longer on fire. It's not my problem. And being able to get a little bit deeper than that and say, but why was it? It's not normal for it to be on fire in the first place. So yeah, I think, I think that's a very important thing. What is the second thing?

You said two.

Rebecca (18:31)
So I just want to acknowledge that

I really like what you added on to what I said, which is like kind of being willing to solve tomorrow's problem today. I think is also, you know.

just like that's such a useful skill to have to cultivate in yourself. And it's hard to when everything feels like it's on fire, but like to the extent that you can cultivate that in yourself as a machine learning engineer and like think a little bit further ahead and say, how can I solve this forever? Not just for today. How do I make sure that this doesn't happen again? What does that look like?

That's really useful. But what I was going to say is collaboration. And I know that sounds kind of like also a little bit woo-woo. But what I mean is like, if you are not doing code reviews, if you are not doing pair programming, if you are not participating in sprints, like, are you really collaborating? Or are you just rewriting what other people have written?

Ilya (19:35)
Yeah.

there a difference between you and the generative AI at that point, right? yeah, no, for sure. And you mentioned code reviews. I'm gonna pick on that for a little bit. So like a lot of code reviews that I've seen in my life is LGTM, right? And that's, I work with senior engineers always to be like, please don't do that. Like if you have...

It is very rare that a junior is going to commit code and you're going to not have anything to add to that. Okay. Like if you're reviewing principal engineer's code, maybe, but usually like there's something in there. Do you have kind of a best practice for this and being like, or maybe things you teach your team to be like, please don't just like look at it superficially and be like, yep, at the very top level, doesn't break anything. I'm going to commit it.

Rebecca (20:27)
You know, I have sort of like an uncon... maybe unconventional answer to this, which is one, part one. I think the onus is on the PR opener. Like, ask questions. What do you want them to focus on? What are you... what are you worried about? Like, you know, where are you worried there might be a leak, memory leak? Like...

Where did something funny happen? Where you were like, man, I banged my head against the wall for an hour before I realized that this should be a semicolon. You know, like put those notes in. Like make it fun to read. Like that's on you.

So I guess that's slightly an unconventional answer. But the other slightly part two of unconventional answer is I think it's OK to LGTM a junior's PR sometimes, right? Because strategically doing that is saying you need to write code that could get merged directly into.

main or developer, however you do it. Like, we're not messing around here. And so I think and also that's like, it's a sign of trust, like you trust them, you believe in them, you know, and so like, sometimes I think a strategic LGTM on something is exactly the right thing to do. But, not I mean, you like, people want to feel like you care about what they're doing. And so you have to look at it in order to understand. So I guess those that's

That's

my answer.

Ilya (22:00)
Yeah,

no, and I think that's a terrific answer. Speaking of code reviews, you've done quite a few of them through the open source project. open source is kind of an interesting thing. I was reading Felix Hill's last blog post that made rounds recently. he talks about, so he passed away late last year and had some struggles with.

emotional toll of being in this industry. And one of the things he talks about is being a researcher in the big company, a lot of people are starting to think like, should we even publish stuff? Like, is this like our strategic advantage? And I think our industry grew up on open source. I think it wouldn't be an exaggeration to say that the reason that you and I both were able to make it into this industry is because everything was open and it was like, if you had the curiosity,

You can start in physics in my case, know, and in communication in your case and be like, no, no, no, but I really want to learn this. And it does sadden me that there is a little bit of the closeness now that didn't used to be there, but I still think open source is extremely important. Lots of folks are looking for it to like say, like, I am good enough to do that job. I can come contribute to your project. Do you have any like best practices for somebody who's trying to contribute?

into this and also kind of want to get your take on is open source all that important today or with like these huge models that like I can give you all the code for it you're still not going to be able to do anything with it unless you have you know five million dollars in GPUs and tons of data like is that still something that's important in our industry

Rebecca (23:50)
It's a good question. I think that there are a lot of reasons that people love open source or who have been committed to it in the past. But I think one of the things that I like the most about it is the exercise in transparency. And I think maybe because of that,

Open source still feels very relevant to me.

Ilya (24:24)
Yeah, yeah. So how would you get started on it? Say I've never made a contribution. I come to Yellow Break and I'm like, I wanna help. I wanna get a job with Rebecca. I wanna show her that I got this.

Rebecca (24:29)
so, okay.

So I think that, I think maybe to kind of piggyback on the idea of transparency, think that...

I gave a talk at Pi Data London last year called Mistakes Were Made, which was basically me talking about a bunch of mistakes I've made in my data science logical errors or engineering errors that I've made over my career and talking about what I learned from those mistakes. I think to extend that, that really is most of the value of a senior technical person. It's somebody who has made a lot of mistakes and understands why they were mistakes and can talk about that.

you know, has learned to make better, better mistakes or new mistakes, right? and so I think that if you are looking to get into open source, I think a very good way to do it is to inspect your own code, like your own workflow with whatever tool it is. and, or let's say you have a set of tools that you use, inspect that workflow, identify the sort of clunkiest parts in your code.

and think about them a little bit more and think about why they are clunky. Are they clunky because the tools that you're using don't have a function that does a thing and so you have to write this hacky function that helps you jump from step B to step C? So think about those clunky bits in your code. And...

then you have to sort of ask the question, is this clunky because I wrote it badly or I don't really understand the tools I'm using? maybe is there some functionality that is missing from the tool? And even if it is a very small piece of functionality, like very, very small.

like a small string manipulation thing or a strong, like a small compression thing or a small filtering thing. Just think about like all of the little things you have to do to get data to do what you want it to do. All of those little tiny steps. like, you should sort of get curious about whether some of those steps.

like don't need to be that hard. And like, what does that even mean? Like, what would it mean to make this easier? Like, could I kind of, could I take these two tools that like work together and figure out some way to make them interface better? Like, what is missing there? So kind of thinking about what is the shim?

Like what's the shim that makes everything work slightly better or slightly faster or slightly easier to put together? Slightly easier to run experiments, slightly easier to test, you know, is this working in production or whatever? And open a PR for just that specific teeny tiny little thing and everybody else probably will be like, that is clearly a gap. And even if your PR sucks, if you're right in identifying a gap, they will be motivated to

help you edit the PR to fill the gap. Because once they know that there's a gap, then they want to fill it, and they'll help you probably. I your mileage may vary. But yeah, I think that that's one.

Another strategy is if you are at a big organization, you propose, if what you really want is to contribute to open source, the next time there's a big meeting in your big organization, they're like, we want to do good for the public. Be like, why don't we open, there's a project coming up, here's a piece of it that we could open source.

And we could kind of separate it from the rest of the stuff that needs to be internal and proprietary. And that'll be good for image and marketing will love it. And probably they'll say yes, and then you get to be a contributor and they're paying you to do it. like that's also that works too, probably. I mean, you might know about that one better than me.

Ilya (28:22)
Yeah, be a little bit careful with that strategy. yeah, I had

a major project at Twitter that we were going to open source and it was on track and we got a couple of companies that work with us. Well, I don't know that I should mention which ones they were, but to use it internally as well, because a company like Twitter is not going to launch something that only worked once. We need to work it in a little bit. And that was enough time for...

changes in management that have changed the direction of the company. I mean, it's not even called the same anymore. forget about open sourcing model evaluation frameworks that got axed pretty quickly.

Rebecca (29:04)
But yeah, little,

like, look for little tiny things. Like little shims, I think, you know, are surprisingly effective.

Ilya (29:15)
And what I really like about what you said is like, start with your own problem because lots of people come and they're like, I'm going to look at like the, you know, the concerns that are raised on the repository and I'm going to go through those and I'm going to be a hero and I'm going to fix them. And I'm like, well, there's a reason why some of them are not fixed and you probably don't know it. And so I wonder if you can take us back to backstage to what the other side of open source looks like and how.

awesome it is and how everybody is always nice to you and just come in and do LGTM and everything works swimmingly.

Rebecca (29:54)
I am a little reluctant to be critical because I do feel like I owe open source, you know, a lot.

Ilya (30:10)
Yeah, but I'm not necessarily asking you to say like bad things about people who are contributing, but I am. I think it helps a lot in trying to get your stuff accepted to know that the person on the other side probably only has like 30 seconds to look at it on the top line, right? And very quickly has to judge what this is like. So maybe like you want to put a little bit more of a commit message than like fixed an issue.

or something like this. I think knowing it from your side that way is really useful.

Rebecca (30:36)
Yeah.

Yeah.

I don't know exactly how to describe people who exhibit this behavior, but there are people, like contributors, who come with their own sail, like their own energy, their own wind. And for some reason, even if they are

you know, new to Python or even if they are new to machine learning or new to contributing, maybe even new to using version control. Because honestly, a lot of data scientists do not consider version control as sort of a primary core skill. And I'm not sure what machine learning engineers think now, but I feel like that is a core skill in order to function.

But like if that even if that's not something that you know, but you are motivated to try and figure it out and figure out how to do like GPG commit signing so that we're merging into the repo in a way that's safe so that people can feel that sense of security about the software supply chain because that's important too, right? I mean, but people come with their own energy. That goes a long way because like you said,

can be a little bit of a downer. I think that open source was a gift to me. It helped me learn so much. But being a maintainer, sometimes you feel like you have an albatross.

And so people who come in with like a little bit of oomph, know, just make it easier to go a little bit longer. And that oomph could come from contributors, but it can also come from like all kinds of, you know, like I remember early on when Yellow Brick started to catch on, one of the very most exciting things for me and Benjamin Bangford, who is my co-author and co-maintainer of the project from the very beginning, one of the most exciting things was seeing other people write blog posts.

that use Yellowbrick, like people we didn't know, people we had never met before. And that gave us a big oomph. I can tell you that made it really easy to spend nights and weekends working on Yellowbrick for years and years and years, just kind of powered by like, wow, that's so neat. People we've never met before are using this tool. And it seems like it's helping. So I don't know. There's like...

Ilya (33:07)
Mm-hmm.

No.

Rebecca (33:31)
There's good, good and bad. But it's been a big part of my career.

Ilya (33:38)
Would you do it again if I dropped you back there and said, this is all you're going to have to deal with. This is all that's going to come of it, right? Like, would you do it again?

Rebecca (33:46)
I mean, I think that the answer must be yes, because I mean, many of the people who are part of Rotational now, like Benjamin Bengfort and Edwin Schmir, so my two co-founders, as well as Prima Roman, our senior-most engineer, and Patrick Dazeel, who's a contributor to Yellow Brick, like all of those people.

were involved in Yellow Brick and working on Ensign, which was one of the products you mentioned at the beginning was an open source project that we all built together. So that's open source. And actually, it's not even our only product. Our new one that we're working on is another open source project that's kind of like an AI studio kind of thing. But.

I think the answer must be yes, based on if we're kind of using that as the data. I think we are still doing this. For us, this works as a way to...

develop in a way that makes us feel creative, like we're getting credit for our creativity because that's one of the other great things about open source, like you feel like you're being acknowledged because of commit signing and you know get histories and get blame and stuff like that, but also the transparency part.

Right? Like feeling like I don't have to claim on LinkedIn that I'm doing things the right way or that we're doing things the real way. You can just look at the code and you can decide for yourself.

Ilya (35:19)
No, for sure. And I love that attitude that I think you've always had that like, I'm going to go do stuff and like, that's going to speak for itself rather than, you know, lots of people who are good at selling themselves, but then you're like, can't explain convolutional neural nets. And you're like, you've been an image for a long time. Like you should be able to know this. So yeah.

Yeah, no, I really appreciate that. And I'm sorry, like I'm noticing how much I botched the introduction of you. You guys got multiple products. I did not know that. yeah. So in all of this, you do a lot. You happen to be carrying quite a bit. And so tell me, like, what optimizations to your workflow have you been able to find over the years that have made you?

Rebecca (35:49)
That's okay. That's okay.

Ilya (36:08)
faster, better, maybe fewer mistakes, whatever, right? Like whatever you think is more efficient, what have you found that really works well?

Rebecca (36:19)
I think this is a great question. I honestly think that this is maybe, you when I think about it, this is maybe another thing that sort of differentiates machine learning engineers from maybe people who are...

only occasionally doing machine learning engineering type things or only occasionally, the people who are hobbyists versus people who are doing this all the time is that there's a drive towards seeking out optimizations. Because once you do something enough or it's painful enough, you're like, I'm never doing it that sucky way again. There's got to be a better way. And I think that that idea of there's got to be a better way is really important.

Ilya (36:52)
Mm-hmm.

Rebecca (37:06)
I think that it's really interesting that you asked the question about...

like tooling and optimizations, because I'm right actually right in the very middle of a sort of evaluation. And it's not a kind of scientific evaluation, it's kind of a qualitative evaluation of different generative AI tools. And I'm doing it collaboratively with one of my colleagues, Ryan, who is really, he's maybe one of the biggest users of AI at Rotational.

and he does a lot of biz dev type stuff, but he's exceptionally good at identifying use cases for tools and coming up with these kinds of optimizations. And I do think that part of the trick is delegation, which is something that came up earlier, and it was delegation with respect to people, but part of optimization is a willingness.

to seed control and delegate to, whether that's to like a hotkey or a shortcut on your keyboard or, you know, like using GitHub Copilot or something like that. I will admit, have access, I officially have access to GitHub Copilot through rotational. We have, like everybody has a subscription. I have not really made a huge amount of use of that one yet, but it's not because I...

You know, it's just, you know, I have a long to do list. Like you said, I've got a lot of things to do. It's on my list. But I have been exploring kind of like using a chat GPT and Gemini and Claude kind of doing like a little bit of a bake off with Ryan thinking about the extent to which they're useful and what some of the sort of evaluative criteria are in our kind of weird ad hoc like qualitative evaluation of these tools.

Ilya (38:43)
then.

Mm-hmm.

Rebecca (39:10)
interesting,

because the evaluations that you're used to seeing when you're a machine learning engineer are like, which one can do arithmetic? Which one can count the Rs in strawberry? So the evaluative measures are...

Ilya (39:20)
Right.

Rebecca (39:24)
good, like they're heuristic in the sense that there's a yes and a no. So like they're good for measuring things. But like, do we care what they're measuring? Like, because really what you want to know is like, does this actually optimize my workflow, right? And so we were kind of making this list of things. And it seems like one of the main things is tools that help, that remember stuff about you so you don't have to keep reintroducing yourself and your context and doing that over and over is really

helpful. And even when you can do that at the granularity of a project, that's very helpful to be able to say, and whether that's like a gen AI tool or a tool like shortcut, which is what we use for project management, or slack or things like that. When you give people the option to group things,

like kind of put things into logical categories. That's very powerful. And I do think that, like, I noticed that a lot about tooling. So in the tools that we use, which, you know, as I said, I think those are some of the main ones. One of the new ones that I have started using, which is not an AI tool, is a little like a sketch tool called ExcalaDraw, which I like using for making like little quick whiteboard drawings. Because I honestly think that the one thing that I really miss about being

Ilya (40:37)
Mm-hmm.

Rebecca (40:45)
in person for work is whiteboarding. Because that was critical and still is critical to my think thought process. Like I am very visual. I like sketching out pipelines and talking them through and thinking about how things might fall apart and stuff like that. And whiteboarding is critical to that. So I like using Excalibur for that. So maybe I'll call out that one particularly, because I'm experimenting with that a lot right now. But a few days ago, I wrote a...

Ilya (40:48)
Yeah.

Yeah, I use Excalidraw all the time. I think

it's a terrific tool and it's open enough, right, to where like you can just share it with somebody. They don't have to have anything installed. And yeah, I think it's really great for collaboration.

Rebecca (41:14)
Interesting.

Interesting. Okay, well that's fascinating. It's another data point of somebody else who's found value in that. I will say there's another tool that I haven't used yet, but it's another one on my list and it's actually pretty high on my list right now, is there are a couple of tools that are coming out that are designed to make it easier to generate UIs using generative AI. basically like they map out, it's not just like the design, it's also mapping out the interactions. Like, okay, you know, if you create

Ilya (41:29)
Yeah.

Rebecca (41:55)
this button, then it's got to have this breakout page and they're connected to each other. so kind of like cooking that some some of that into the templating so that you don't have to do all of that work. I think that's like very interesting because on the one hand I've historically been like maybe not the biggest fan of like low code tooling, but I also recognize that for data science in particular the reason that the you know

the division between machine learning, engineering and data science, or one of the reasons why it's so rough is that data scientists operate outside of the application stack. They're working in PowerPoints, they're working in Jupyter notebooks, they're working in that kind of environment, whereas machine learning engineers are operating in the application stack. We're working in Kubernetes clusters, we're working in Docker containers, we're working in Python files, like there are read-mes, right?

It's a different, there's CI, CD, right? It's interacting with a production database. So I think that being able to appreciate that is really key. So for machine learning engineers, you sort of started this by saying like, there's so many things that feel obvious.

that when other people kind of say something, you realize that this is something that's not obvious to them. But that thing of like working outside of the application stack and that being a risk to data scientists for like getting their stuff into production. I'm wondering if tools like, you know, I think the one that I heard about was called Wizard or it's like UI Wizard, but I think that other tools like Figma, you know, there are a suite of new tools coming out that are supposed to make UI.

creation, easier using generative AI, whether you're like typing a description of what you want and it's using a language model then to trigger the templating, or if you draw like an ScalaDraw sketch and you give them the sketch and they convert that into some kind of interaction framework or wireframe or something. I wonder if that has the potential to help bridge that gap a little bit, even if the UIs that they make are kind of disposable.

Ilya (44:20)
Mm-hmm.

Rebecca (44:21)
even if it just sort of helps you think a little bit about like what happens after training.

Ilya (44:26)
Mm-hmm. Yeah, no, I think you're right on there. as I think about the biggest problems of my career, they're all things that people didn't think were that useful until I could put a prototype together. And guess what? I don't do JavaScript. I cobble stuff together, and it's really bad. And if I could outsource that part of it, I can write an API with Python. But on the other side of it,

I'm really bad. it's really good to see what Rebecca is and is not willing to outsource to tools. kind of this brings me to like a question that I think you're a terrific person to answer, like from what I've seen of you, is oftentimes we end up collaborating with people who are not ML people. Oftentimes our leaders are not ML people. And so how...

how do you manage, and again, you come from communication, you come from a perfect background for this. This is one thing that I struggle with. How do you manage articulating value, but at the same time, the risk of machine learning projects? And I had bosses who were like, okay, so this has to be done by the 13th. And I'm like, I don't know, this is an iterative process. Something will be done by 13th, but I don't know that I can hit 96 % accuracy by the 13th or whatever.

How do you have those conversations from the very inception of the project and convincing that like, I guess nowadays nobody needs to be convinced that a mel is needed. Now I spend most of my time convincing people where mel is not needed. But how do you have those conversations?

Rebecca (46:13)
I have a couple of thoughts there. I've been sort of thinking about the idea of design docs, which I think is kind of connected to what you're saying. It's like those pieces of rhetoric that exist between you, technical person, and whoever is like green lighting the project, who is usually less technical than you.

And maybe I'm not sure if it's true where you work, but design docs are kind one of those things. Like, how do you write design docs that kind of allow you to get your projects greenlit and move things forward? And I think that...

Like the main thing is that they have to be confrontational, which surprises me a little bit to say, because I think that my brand as a kind of as a person is a lot about empathy and, you know, collaboration. But I think design docs need to be confrontational. And I think they need to say right now, the way that we do this, it costs this much. takes this long. We've been doing it this way.

You know, it's gone wrong six times in the last three years. It happened on this date, on this date, on this date, every time. The reason that it went wrong is due to this problem. You know, it needs to, you need to be confrontational because the most important thing is not like being able to convince them that you will definitely be able to train a model by the end of this. The most important thing to say is that we all agree that there's a problem.

And maybe machine learning is the answer, but first let's agree that there's a problem and it's serious enough for us to put attention on. for sure we're gonna try to use machine learning, because that's one of the tools in the toolkit, but first let's agree that there's a problem.

Ilya (48:00)
Yeah.

Yeah, no, I think that's a really good approach and like metta where I used to work, right? Utilizes design dogs quite extensively. But OK, so you bought me there. I agree there. But then the project gets underway and inevitably there are things that like.

I didn't, well, there's a no free lunch theorem, right? Like I, ahead of time, I can not tell you which technique, which ML technique would even work here. Or even if I can narrow down to like a space of techniques, there's a hyper parameter that I'm not going to know until I get into the model. And then I'm like, that's going to take me, you know, forever to tune, or maybe I don't, I can't even tune it. We were working at one of the places I was working at, we were working on color detection and color detection seems like

such an easy thing, but we were working on color detection outside and like depending on the position of the sun and the cloudiness and whatever like you can't even tell a beige from a gray and expecting that your model is going to do that and you know some of those things you know obviously good MLE would point out from the beginning but some of those things I don't know until I run into them so how do you communicate those difficulties?

Rebecca (49:34)
Yeah, I I think that maybe I'll come back to the idea of transparency here because the goal is not to deliver this one project and this is the end of everything, right? Like the goal is to build a pipeline so that we can...

do good, we can keep doing better and keep learning and keep introducing new products and new features. So the most important thing is to create kind of a healthy pipeline of experimentation. And to do that, you have to have trust. They have to trust you and you have to trust them. And the only way to do that is to be honest. And so like when you get to those stumbling blocks, you have to say, like in the, you know, in that weekly meeting, you have to say, you know,

we encountered a blocker, this is what the blocker is, here are three options that we're considering. We'll provide an update next week after we've run a couple of experiments. But being able to be transparent and also kind of propose solutions so that you kind of build that trust that you are not going to panic and abandon the project at the first kind of hitch.

Um, that like it's hitches all the way down, baby, you know, you know,

Ilya (50:59)
No,

I totally agree with this. I will never forget the first time that I worked with like true team of MLEs, know, that was really good. And we came to the daily standup or whatever and somebody was like, yeah, this whole thing we're working on doesn't work. And this one guy stands up and he's like, well, we got these three solutions. This one has these pros and these cons and this other one has these pros and these cons. And I'm like,

It like struck me how not stuck you are at any given point that like, yeah, like the solution that you eventually propose is never going to be perfect, but at least you have ways forward, right? Like in my experience, the thing that I tried to get juniors out of very early on is being stuck is like, the first way didn't work. I don't know what to do. I'm like, okay, think, you know, and, and, really to me, this is a

As a leader, honestly, I love it when people, when MLEs come to me and say, that didn't work, but I have three solutions. And I'm like, great, like we can start a conversation here again. So yeah, I definitely appreciate that. We are at the half hour. I don't know if you have another five minutes. If not, we can, okay. Okay. So yeah, let me just ask you one more question for this and then maybe give you a final word. But.

Rebecca (52:16)
I can keep going.

Ilya (52:27)
ML is an industry with a lot of hype and it's an industry with a lot of technologies that come and go. And I think a trap that a lot of people fall into is that they try to learn everything or they give up, right? Like they either say like, this is way too much for me or they try to learn everything. And so kind of, this is kind of a conglomeration of like, how do you navigate hype?

and how do you determine what's important and how do you stay up to date kind of all at the same time is how does Rebecca figure out what's an interesting technology for you to learn and how do you go about it.

Rebecca (53:11)
It's good to go to conferences. So talk to other people.

Ilya (53:18)
Whoa, you know most people that listen to this are incredibly introverted.

Rebecca (53:23)
Yeah, but if we all go to conferences together, then we get

it, right? We're all there being introverts together. But yeah, go to conferences and listen to what other people are doing. And like I said, focus on patterns rather than details. Gosh, it seems like everybody is trying to solve a problem around observability.

you know, with their pipelines. it's, cause that was like that for me, that was the theme maybe a couple of years ago. It seemed like everybody, every talk was about Prometheus and Grafana. Prometheus and Grafana, right? Like, how are we like, how do we know when things go wrong and how can I tell that things are going right? And like, where did that throughput spike come from? You know, that's why the model's broken. You know, like I think that going to conferences and looking for patterns across talks or across booths or, or sessions or whatever is really valuable.

And I guess like the flip side is if you are finding yourself miserable trying to keep up, that might be a clue that you don't really like this and that's okay. That's okay.

Ilya (54:34)
Yeah, one of the things I said on social media recently is like I've been irrelevant like four times in this career, you know, like where honestly like my master's thesis is on object detection and I did not use a CNN because they weren't a thing back then. And so like a year later, everybody was using CNNs and I was like, okay, go learn that. And somebody commented underneath it and said, wow, you're really comfortable with being irrelevant. And I'm like, you kind of have to be.

Rebecca (55:03)
Yeah,

mean, like maybe said a slightly like more optimistic way. Like if you can cultivate an appetite for being a novice, like if it feels fun, like, I'm going to go out and try a new restaurant. Like that's something so fun that you're willing to pay for it. Right.

not get paid for it, you go and pay to eat at a new restaurant. Like you're paying for new experiences, right? So if you have that kind of attitude of like, you know, I'm kind of curious about this, I wish I understood this a little bit more, which is, to be honest with you, it's sort of what took me kind of down sort of my path, right? I got curious about domain-specific languages, which took me to natural language processing, you know, and thinking about how to do,

hypothesis testing better with machine learning models is what took me to Yellow Brick. Trying to understand more about why models sort of behaved strangely in production is what took me down the path of understanding more about platform engineering and distributed systems, taking like a graduate class in cloud engineering and distributed systems at the University of Maryland, like long after I had finished my PhD, and for no credit, by the way. It was just...

really just trying to understand what is consensus and why is that important? do I care about that as a data scientist? Am I even a data scientist anymore? So being willing to change how you think about yourself is helpful too. And staying curious is nice. I think that helps. I want to speak a little bit to the hype part. Because I think that what I would say is it doesn't hurt to assume good intent.

So if somebody is spouting hype, you can decide to think of it as spouting hype. But you could also choose to think of it as this person is curious. And I am curious about this too. And maybe we can engage along those lines if we assume curiosity and good intent on other people. And the other thing is,

If you feel like there's too much hype out there, it's on you to produce thought leadership that is not hype. So put better content out there and tell us like what we're That's how to do it.

Ilya (57:34)
Yeah. No, I

think that's terrific. Definitely a lot more empathetic than the way that I usually think about it. But one thing that I heard, gosh, it was like, it was somebody really high up in IBM. Dern Nareps again this year, he was saying like, okay, like I get hype and like, you know, probably don't want to bet everything on one technology, but also.

Hype is how people know about what we do, right? And people knowing about what we do makes us produce things faster and makes us fail faster. And we figure out where the hype was not necessarily useful and where maybe it was. And so to him, it's very much a catalyst for just getting more reps in, which is what our industry really does well, is like, if you let us get a lot of reps in, we'll figure out what to do right.

And so, yeah, there's definitely an exploration there of its own.

Rebecca (58:33)
Interesting. Like

hype as a hypothesis test. Are people interested in this? Let's put some ideas out there and see if people react. So I like that. Hype as a hypothesis test is sort of an interesting argument.

Ilya (58:37)
Yeah.

Yeah, in my consulting in the last couple of years, when I come to companies, they all want a chat bot. And by the time I leave, they got like a new inventory prediction system or something, but like not a chat bot, right? But.

Rebecca (59:01)
Right, or just like a way to

search through your documents.

Ilya (59:04)
Right. But the thing that got me through the door was the chat bot.

and because everybody thinks they want that. And that's, that's great, right? Like then we can have a conversation and for people who really still need it, let's, let's make a chat bot. But any final thoughts, can people find more about you, about rotational and how to engage?

Rebecca (59:16)
Yeah.

Yeah, so I think LinkedIn is a pretty good place to find me these days. If you are curious about some of the things I'm thinking about, like if you want to read about the idea of using disposable user interfaces as a medium for data scientists to try to deliver in production a little bit more easily, I post on LinkedIn.

You I'm trying to do, you know, New Year's resolution. going to, I'm going to be very, very routine in my posting, but I've posted a couple of times so far this year. And so that's a good way to kind of hear more about what I am thinking about. And if you're curious about rotational, if you're curious about what we do, we are basically an AI studio, right? So, you you come to us with kind of a problem and we'll

like sort of what I've been describing, sort of talk to you, try to understand a little bit better and try to figure out a good way to approach that and the extent to which it would make sense to use machine learning and maybe some other auxiliary things like...

databases or retrieval systems or things like that. And try to help solve the whole problem rather than just solving the convenient problem, which is training a model is the convenient one. Getting data, labeling data, engineering that data, getting it into a system, building kind of the pipelines around it. It's the stuff that people are less excited to do, but that we're happy to deliver because that's really in my experience, that's what it

takes to actually be successful with these kinds of data products, you have to consider the whole solution. So if that's something that you want to think about more or talk about more, please reach out. We're at rotational.io.

Ilya (1:01:19)
Thank you. Thank you very much. I will stop.

Rebecca Bilbro - Open source, entrepreneurship, and 14 years in ML

Join Our Free Trial