Giant Robots Smashing Into Other Giant Robots
00:00:00
/
00:40:46

483: Honeycomb.io with Charity Majors

July 13th, 2023

Charity Majors is Co-Founder and CTO of Honeycomb, which provides full-stack observability that enables engineers to deeply understand and debug production software together.

Victoria and Will talk to Charity about observability, her technical background and decision to start Honeycomb.io, thoughts about the whole ops SRE profession, and things that surprised her along her journey of building a company around observability as a concept.

Become a Sponsor of Giant Robots!

Transcript:

VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido.

WILL: And I'm your other host, Will Larry. And with us today is Charity Majors, Co-Founder and CTO of Honeycomb, which provides full-stack observability that enables engineers to deeply understand and debug production software together. Charity, thank you for joining us. How are you doing?

CHARITY: Thanks for having me. I'm a little bit crunchy from a [laughs] long flight this morning. But I'm very happy to be home in San Francisco and happy to be talking to you.

VICTORIA: Wonderful. And, Charity, I looked at your profile and noticed that you're a fan of whiskey. And I thought I might ask you just to get us started here, like, what's your favorite brand?

CHARITY: Oh, goodness, that's like asking me to choose my favorite child if I had children. [laughter]. You know, I used to really be into the peaty scotches, the Islays, in particular. But lately, I've been more of a bourbon kick. Of course, everybody loves Pappy Van Winkle, George T. Stagg; impossible to find now, but it's so, so good. You know, if it's high-proof and single barrel, I will probably drink it.

VICTORIA: That sounds great. Yeah, I tend to have the same approach. And, like, people ask me if I like it, and I like all of them. [laughter] I don't [inaudible 01:21] that I didn't like. [laughs]

CHARITY: [inaudible 01:23] tongue sting? Then I'm in. [laughs]

VICTORIA: Yeah, [inaudible 01:26].

WILL: See, I'm the opposite. I want something smooth. I'm a fruity drink type of guy. I'm just going, to be honest.

CHARITY: There's no shame in that.

WILL: No shame here. [laughs] Give me a margarita, and you have a happy Will for life. [laughs]

VICTORIA: We'll have to get you to come out and visit San Diego for some margaritas, Will. That's --

CHARITY: Oh yeah.

VICTORIA: Yeah, it's the place to be. Yeah, we do more of a bourbon drink in our house, like bourbon soda. That's usually what we make, like, my own custom simple syrup, and mix it with a little bourbon and soda water. And that's what we do for a cool down at the end of the day sometimes, yeah. Well, awesome.

Let's see. So, Charity, why don't you just tell me a little more about Honeycomb? What is it?

CHARITY: Well, it's a startup that hasn't failed yet, so... [laughs] to my own shock. [laughs] We're still around seven and a half years in. And I say that just so much joking. Like, you're not really supposed to say this as a founder, but, like, I 100% thought we were going to fail from the beginning. But we haven't yet, and we just got more money. So we'll be around for a while.

We kind of pioneered the whole concept of observability, which now doesn't really mean anything at all. Everybody and their mother is like, well, I do observability, too. But back when we started talking about it, it was kind of a little bit revolutionary, I guess in that, you know, we started talking about how important it is to have high cardinality data in your systems. You really can't debug without it.

And the fact that our systems are getting just astronomically more complex, and yet, we're still trying to debug it with these tools based on, you know, the metric data type [laughs] defined since the '70s when space was incredibly rare and expensive. And now space is incredibly cheap, but we should be wasteful with it so we can understand our incredibly complex systems. So that's us.

We really try to empower software engineers to own their own code in production. For a long time, it was like, all of the tools for you to understand your software were really written for low-level ops people because they speak the language of, like, RAM, and disks, and CPU, which you shouldn't have to understand that in order to be able to understand I just deployed something, what went wrong?

WILL: I love the honesty because there are so many founders that I'll talk to, and I'm like, okay, you're very successful. But did you really expect this to be what it is today? Did you really expect to survive? Because, like, just some of their ideas, I'm like, it's brilliant, but if I was with you back in the day, I'd be like, it ain't going to work. It's not going to work. [laughs]

CHARITY: Yeah. And I feel like the VC culture really encourages delusion, just, like, self-delusion, like, this delusive thinking. You're supposed to, like, broadcast just, like, rock-solid confidence in yourself and your ideas at all time. And I think that only sociopaths do that. [laughs] I don't want to work for anyone who's that confident in themselves or their idea.

Because I'm showing my own stripes, I guess, you know, I'm a reliability engineer. I wake up in the morning; I'm like, what's wrong with the day? That's just how my brain works. But I feel like I would rather work with people who are constantly scanning the horizon and being like, okay, what's likely to kill us today? Instead of people who are just like, I am right. [laughs] You know?

VICTORIA: Yeah. And I can relate that back to observability by thinking how, you know, you can have an idea about how your system is supposed to work, and then there's the way that it actually works. [laughs]

CHARITY: Oh my God.

VICTORIA: Right?

CHARITY: Yes. It's so much that.

VICTORIA: Maybe you can tell us just a little bit more about, like, what is observability? Or how would you explain that to someone who isn't necessarily in it every day?

CHARITY: I would explain it; I mean, it depends on who your audience is, of course. But I would explain it like engineers spend all day in their IDEs. And they come to believe that that's what software is. But software is not lines of code. Software is those lines of code running in production with real users using it. That's when software becomes real.

And, for too long, we've treated like that, like, an entirely different...well, it's written. [laughs] You know, for launch, I was like, well, it's ops' problem, as the meme says. But we haven't really gotten to a point yet where...I feel like when you're developing with observability; you should be instrumenting your code as you go with an eye towards your future self. How am I going to know if this is working or not? How am I going to know if this breaks?

And when you deploy it, you should then go and look at your code in production and look at it through the lens of the telemetry that you just wrote and ask yourself, is it doing what I expected it to? Does anything else look weird? Because the cost of finding and fixing bugs goes up exponentially from the moment that you write them. It's like you type a bug; you backspace. Cool, good for you. That's the fastest you can fix it. The next fastest is if you find it when you're running tests. But tests are only ever going to find the things you could predict were going to fail or that have already failed.

The first real opportunity that you have to see if your code really works or not is right after you've deployed it, but only if you've given yourself the telemetry to do so. Like, the idea of just merging your code, like walking out the door or merging your code and waiting to get paged or to get [laughs] escalated to this is madness. This should be such an artifact of the battle days when dev writes, and ops runs it. That doesn't work, right?

Like, in the beginning, we had software engineers who wrote code and ran that code in production, and that's how things should be. You should be writing code and running code in production. And the reason I think we're starting to see that reality emerge again is because our systems have gotten so complicated. We kind of can't not because you can't really run your code as a black box anymore. You can't ignore what's on the inside. You have to be able to look at the code in order to be able to run it effectively.

And conversely, I don't think you could develop good code unless you're constantly exposing yourself to the consequences of that code. It lets you know when it breaks, that whole feedback loop that completely severed when we had dev versus ops. And we're slowly kind of knitting it together again. But, like, that's what's at the heart of that incredibly powerful feedback loop. It's the heart of all software engineering is, instrumenting your code and looking at it and asking yourself, is it doing what I expected it to do?

WILL: That's really neat. You said you're a reliability engineer. What's your background? Tell me more about it because you're the CTO of Honeycomb. So you have some technical background. What does that look like?

CHARITY: Yeah, well, I was a music major and then a serial dropout. I've never graduated from anything, ever. And then, I worked at startups in Silicon Valley. Nothing you'd ever...well, I worked at Linden Lab for a few years and some other places.

But honestly, the reason I started Honeycomb was because...so I worked at Parse. I was the infrastructure lead at Parse; rest in peace. It got acquired by Facebook. And when I was leaving Facebook, it was the only time in my life that I'd ever had a pedigree. Well, I've actually been an ops engineer my entire career. When I was leaving Facebook, I had VCs going, "Would you like some money to do something? Because you're coming from Facebook, so you must be smart."

On the one hand, that was kind of offensive. And on the other hand, like, I kind of felt the obligation to just take the money and run, like, on behalf of all dropouts, of women, and queers everywhere. Just, you know, how often...am I ever going to get this chance again? No, I'm not. So, good.

VICTORIA: Yes, I will accept your money. [laughs]

CHARITY: Yeah, right?

VICTORIA: I will take it. And I'm not surprised that you were a music major. I've met many, I would say, people who are active in social media about DevOps, and then it turns out they were a theater major, [laughs] or music, or something different. And they kind of naturally found their way.

CHARITY: The whole ops SRE profession has historically been a real magnet for weirdo people, weird past, people who took very non-traditional. So it's always been about tinkering, just understanding systems. And there hasn't been this high bar for formal, you know, knowledge that you need just to get your first job. I feel like this is all changing. And it makes me kind of...I understand why it's changing, and it also makes me kind of sad.

VICTORIA: So I think you have a quote about, you know, working on infrastructure teams that everything comes back to databases.

CHARITY: [laughs]

VICTORIA: I wonder if you could expand on that.

CHARITY: I've been an accidental DBA my entire career. I just always seemed to be the one left holding the bag. [laughs] We were playing musical chairs. I just feel like, you know, as you're moving up the stack, you can get more and more reckless. As you move down the stack, the closer you get to, like, bits on disk, the more conservative you have to be, the more blast radius your mistakes could have.

Like, shit changes all the time in JavaScript land. In database land, we're still doing CRUD operators, like, since Stonebreaker did it in the '70s. We're still doing very fundamental stuff. I love it, though, because, I don't know, it's such a capsule of computers at large, which is just that people have no idea how much shit breaks. [laughs] Stuff breaks all the time. And the beauty of it is that we keep going. It's not that things don't break. You have no idea how much stuff is broken in your stack right now. But we find ways to resolve it after the fact.

I just think that data is so fascinating because it has so much gravity. I don't know, I could keep going, but I feel like you get the point. I just think it's really fun. I think danger is fun, I think. It might not surprise you to learn that I, too, was diagnosed with ADHD in the past couple of years. I feel like this is another strand that most DevOps, SRE types have in common, which is just [laughs] highly motivated in a good way by panic. [laughs]

WILL: I love that you said you love danger because I feel like that is right in your wheelhouse. Like, you have to love danger to be in that field because it's predictable. You're the one that's coming in and putting out the fires when everyone sometimes they're running for the window. Like you said, like, you got caught holding the bag. So that's really neat.

This is a big question for me, especially for being an engineer, a dev, do you find that product and design teams understand and see the value in SRE?

CHARITY: Oooh. These types of cultural questions are always so difficult for me to gauge whether or not my sample is representative of the larger population. Because, in my experience, you know, ops teams typically rule the roost, like, they get final say over everything. But I know that that's not typically true. Like, throughout the industry, like, ops teams kind of have a history of being kind of kicked around.

I think that they do see the value because everybody can see when it breaks. But I think that they mostly see the value when it breaks. I think that it takes a rare, farsighted product team to be able to consent to giving, like, investments all along in the kinds of improvements that will pay off later on instead of just pouring all of the resources into fast fixes and features and feature, feature, feature.

And then, of course, you know, you slowly grind to a halt as a team because you're just amassing surface area. You're not paying down your tech debt. And I think it's not always clear to product and design leaders how to make those investments in a way that actually benefits them instead of it just being a cost center. You know, it's just something that's always a break on them instead of actually enabling them to move faster.

WILL: Yeah, yeah. And I can definitely see that being an engineer dev. I'm going to change it a little bit. And I'm going to ask, Victoria, since you're the managing director of that team, how do you feel about that question? Do you feel that's the same thing, or what's your observation of that?

VICTORIA: I think Charity is, like, spot on because it does depend on the type of organization that you're working in, the hierarchy, and who gets priority over budget and things like that. And so the interesting thought for me coming from federal IT organizations into more commercial and startup organizations is that there is a little bit of a disconnect. And we started to ask our designers and developers like, "Well, have you thought much about, like, what happens when this fails?" [laughs] And especially --

CHARITY: Great question.

VICTORIA: Yeah, like, when you're dealing with, like, healthcare startups or with bank startups and really thinking through all the ways it could go wrong. Is it a new pathway? Which I think is exciting for a lot of people. And I'm curious, too, Charity, like with Honeycomb, was there things that surprised you in your journey of discovery about, you know, building a company about observability and what people wanted out of this space?

CHARITY: Oh my goodness. [laughs] Was anything not a surprise? I mean, [laughs] yeah, absolutely. You're a director of what team?

VICTORIA: I'm a managing director of our Mission Control team.

CHARITY: Oooh.

VICTORIA: Which is our platform engineering, and DevOps, and SRE team.

CHARITY: Now, does your platform engineering team have product managers?

VICTORIA: I think it might be me. [laughs]

CHARITY: Aha.

VICTORIA: It might be me. And we have a team lead, and our CTO is actually our acting development director. So he's really leading the development of that project platform.

CHARITY: When I was in New York the last couple of days, I just gave a talk at KubeCon about the Perils, Pitfalls, and Pratfalls of Platform Engineering, just talking about all of the ways that platform teams accidentally steer themselves into the ditch.

One of the biggest mistakes that people make in that situation is not running the platform team like a product team, you know, having a sort of, like, if we build it, they will come sort of a mentality towards the platform that they're building internally for their engineers, and not doing the things like, you know, discovery or finding out like, am I really building, you know, the most important thing, you know, that people need right now?

And it's like, I didn't learn those skills as an engineer. Like, in the infrastructure land, we didn't learn how to work with product people. We didn't learn how to work with designers. And I feel like the biggest piece of career advice that I give, you know, people like me now, is learn how to work with product and like a product org.

I'm curious, like, what you're observing in your realm when it comes to this stuff. Like, how much like a product org do you work?

VICTORIA: Oh, I agree 100%. So I've actually been interested in applying our platform project to the thoughtbot Incubator Program. [laughs]

CHARITY: Mmm.

VICTORIA: So they have this method for doing market strategy, and user interviews, and all of that...exactly what you're saying, like, run it like a product. So I want them to help me with it. [laughs]

CHARITY: Nice.

VICTORIA: Yes, because I am also a managing director, and so we're managing a team and building business. And we also have this product or this open-source project, really. It's not...we don't necessarily want to be prescriptive with how we, as thoughtbot, tell people how to build their platforms. So with every client, we do a deep dive to see how is their dev team actually working? What are the pain points? What are the things we can do based on, like, you know, this collection of tools and knowledge that we have on what's worked for past clients that makes the most sense for them?

So, in that way, I think it is very customer-focused [laughs], right? And that's the motto we want to keep with. And I have been on other project teams where we just try to reproduce what worked for one client and to make that a product. And it doesn't always work [laughs] because of what you're saying. Like, you have to really...and especially, I think that just the diversity of the systems that we are building and have been built is kind of, like, breathtaking [laughs], you know.

CHARITY: Yeah. [chuckles]

VICTORIA: I'm sure you have some familiarity with that.

CHARITY: [laughs]

VICTORIA: But what did you really find in the market that worked for you right away, like, was, like, the problem that you were able to solve and start building within your business?

CHARITY: We did everything all wrong. So I had had this experience at Facebook, which, you know, at Parse, you know, we had all these reliability issues because of the architecture. What we were building was just fundamentally...as soon as any customer got big, like, they would take up all the resources in this shared, you know, tenancy thing, and the whole platform would go down. And it was so frustrating. And we were working on a rewrite and everything. Like, it was professionally humiliating for me as a reliability engineer to have a platform this bad at reliability.

And part of the issue was that you know, we had a million mobile apps, and it was a different app every time, different application...the iTunes Store, like, top five or something. And so the previous generation of tools and strategies like building dashboards and doing retros and being like, well, I'll make a dashboard so that I can find this problem next time immediately, like, just went out the window. Like, none of them would work because they were always about the last battle. And it was always something new.

And at one point, we started getting some datasets into this tool at Facebook called Scuba. It was butt ugly. Like, it was aggressively hostile to users. But it let us do one thing really well, which was slice and dice high cardinality dimensions in near real-time. And having the ability to do that to, like, break down by user ID, which is not possible with, you know, I don't know how familiar --

I'll briefly describe high cardinality. So imagine you have a collection of 100 million users. And the highest possible cardinality would be a unique ID because, you know, social security number, very high cardinality. And something much lower cardinality would be like inches of height. And all of metrics and dashboards are oriented around low cardinality dimensions. If you have more than a couple hundred hosts, you can no longer tag your metrics with a host ID. It just falls apart. So being able to break down by, like, you know, one of a million app IDs.

It took...the amount of time it took for us to identify and find these brand-new problems, it dropped like a rock, like, from hours of opening it. We never even solved a lot of the problems that we saw. We just recovered. We moved on [laughs] with our day, dropped from that to, like, seconds or minutes. Like, it wasn't even an engineering problem anymore. It was like a support problem, you know, you just go click, click, click, click, click, oh, there it is. Just follow the trail of breadcrumbs.

That made such an impression on me. And when I was leaving, I was just like, I can't go back to not having something like this. I was so much less powerful as an engineer. It's just, like, it's unthinkable. So when we started Honeycomb, we were just, like, we went hands down, and we started building. We didn't want to write a database. We had to write a database because there was nothing out there that could do this.

And we spent the first year or two not even really talking to customers. When we did talk to customers, I would tell our engineers to ignore their feedback [laughs] because they were all telling us they wanted better metrics. And we're like, no, we're not doing metrics.

The first thing that we found we could kind of connect to real problems that people were looking for was that it was high cardinality. There were a few, not many; there were a few engineers out there Googling for high cardinality metrics. And those engineers found us and became our earliest customers because we were able to do breathtaking...from their perspective, they were like, we've been told this is impossible. We've been told that this can't be done.

Things like Intercom was able to start tagging other requests with, like, app ID and customer ID. And immediately started noticing things like, oh, this database that we were just about to have to, like, spend six months sharding and extending, oh, it turns out 80% of the queries in flight to this database are all coming from one customer who is paying us $200 a month, so maybe we shouldn't [laughs] do that engineering labor. Maybe we should just, you know, throttle this guy who is only paying us 200 bucks a month. Or just all these things you can't actually see until you can use this very, very special tool. And then once you can see that...

So, like, our first customers became rabid fans and vouched for us to investors, and this still blows people's minds to this day. It's an incredibly difficult thing to explain and describe to people, but once they see it on their own data, it clicks because everybody's run into this problem before, and it's really frustrating.

VICTORIA: Yeah, that's super interesting and a great example to illustrate that point of just, like, not really knowing what's going on in your system. And, you know, you mentioned just, like, certainly at scale, that's when you really, really need to have [laughs] data and insight into your systems.

CHARITY: Yeah.

VICTORIA: But one question I get a lot is, like, at what scale do you actually need to start worrying about SRE? [laughs] Which --

CHARITY: SRE?

VICTORIA: Yeah, I'll let you answer that. Yeah, site reliability or even things like...like, everything under that umbrella like observability, like, you know, putting in monitoring and tracing and all this stuff. Sometimes people are just like, well, when do I actually have to care? [laughs]

CHARITY: I recognize this is, you know, coming from somebody who does this for a living, so, like, people can write it off all they want. But, like, the idea of developing without observability is just sad to me, like, from day one. This is not a tax. It's not something that slows you down or makes your lives worse. It's something that makes your lives better from day one. It helps you move more quickly, with more confidence. It helps you not make as many mistakes. It helps you...

Like, most people are used to interacting with their systems, which are just like flaming hairballs under their bed. Nobody has ever understood these systems. They certainly don't understand them. And every day, they ship more code that they don't understand, create systems that they've never understood. And then an alarm goes off, and everybody just, like, braces for impact because they don't understand them.

This is not the inevitable end state of computing. It doesn't have to be like this. You can have systems that are well-understood, that are tractable, that you could...it's just...it's so sad to me that people are like, oh God, when do I have to add telemetry? And I'm just like, how do you write software without telemetry? How do you have any confidence that the work you're doing is what you thought you were doing? You know, I just...

And, of course, if you're waiting to tack it on later, of course, it's not going to be as useful because you're trying to add telemetry for stuff you were writing weeks, or months, or years ago. The time to add it is while you're writing it. No one is ever going to understand your software as well as you do the moment that you're writing it. That's when you know your original intent. You know what you're trying to do. You know why you're trying to do it. You know what you tried that didn't work. You know, ultimately, what the most valuable pieces of data are. Why wouldn't you leave little breadcrumbs for yourself so that future you can find them?

You know, it's like...I just feel like this entire mental shift it can become just as much of a habit as like commenting your code or adding, you know, commenting in your pull request, you know. It becomes second nature, and reaching for it becomes second nature. You should have in your body a feeling of I'm not done until you've looked at your telemetry in production. That's the first moment that you can tell yourself, ah, yes, it probably does what I think it does, right?

So, like, this question it makes me sad. It gets me a little worked up because I feel like it's such a symptom of people who I know what their jobs are like based on that question, and it's not as good as it could be. Their jobs are much sadder and more confusing than it could be if they had a slightly different approach to telemetry. That's the observability bit.

But about SRE, very few ops engineers start companies, it seems, when I did, you know, I was one of three founding members. And the first thing I did was, of course, spin up an infrastructure and set up CI/CD and all this stuff. And I'm, like, feeling less useful than the others but, you know, doing my part. But that stuff that I spun up, we didn't have to hire an SRE for years, and when we did, it was pretty optional. And this is a system, you know, things trickle down, right?

Doing things right from the beginning and having them be clear and well-understood, and efficient, we were able to do so much with so few people. You know, we were landing, you know, hundreds of thousand-dollar deals with people who thought we had hundreds of employees. We had 12 engineers for the first almost five years, just 12 engineers. But, like, almost all of the energy that they put into the world went into moving the business forward, not fighting with the system, or thrashing the system, or trying to figure out bugs, or trying to track down things that were just, like, impossible to figure out. We waste so much time as engineers by trying to add this stuff in later.

So the actual answer to your question is, like, if you aren't lucky enough to have an ops co-founder, is as soon as you have real users. You know, I've made a career out of basically being the first engineer to join from infrastructure when the software engineers are starting to have real customers. Like, at Parse, they brought me in when they were about to do their alpha release. And they're like, whoa, okay, I guess we better have someone who knows how to run things.

And I came in, and I spent the next, you know, year or so just cleaning up shit that they had done, which wasn't terrible. But, you know, they just didn't really know what they were doing. So I kind of had to undo everything, redo it. And just the earlier, the better, right? It will pay off.

Now, that said, there is a real risk of over-engineering early. Companies they don't fail because they innovated too quickly; let's put it that way. They fail because they couldn't focus. They couldn't connect with their customers. They couldn't do all these things. And so you really do want to do just enough to get you to the next place so that you can put most of your effort into making product for your customers.

But yeah, it's so much easier to set yourself up with auto-deployment so that every CI/CD run automatically deploys your code to production and just maintain as you grow. That is so easy compared to trying to take, you know, a long, slow, you know, leaky deploy process and turn it into one that could auto-deploy safely after every commit. So yeah, do it early. And then maintain is the easiest way in the world to do this stuff.

Mid-Roll Ad:

As life moves online, bricks-and-mortar businesses are having to adapt to survive. With over 18 years of experience building reliable web products and services, thoughtbot is the technology partner you can trust.

We provide the technical expertise to enable your business to adapt and thrive in a changing environment. We start by understanding what’s important to your customers to help you transition to intuitive digital services your customers will trust.

We take the time to understand what makes your business great and work fast yet thoroughly to build, test, and validate ideas, helping you discover new customers.

Take your business online with design‑driven digital acceleration. Find out more at tbot.io/acceleration or click the link in the show notes for this episode.

WILL: Correct me if I'm wrong, I think you said Facebook and mobile. Do you have, not experience with mobile but do you...does Honeycomb do anything in the mobile space? Because I feel like that portion is probably the most complicated for mobile, like, dealing with iOS and Android and everything that they're asking for. So...

CHARITY: We don't have mobile stuff at Honeycomb. Parse was a mobile Backend as a Service. So I went straight from doing all mobile all the time to doing no mobile at all. I also went from doing databases all the time to doing, you know...it's good career advice typically to find a niche and then stay in it, and I have not followed that advice. [laughter] I've just jumped from...as soon as I'm good at something, I start doing something else.

WILL: Let me ask you this, how come you don't see more mobile SRE or help in that area?

CHARITY: I think that you see lots of SREs for mobile apps, but they're on the back-end side. They're on the server side. So it's just not as visible. But even if you've got, like, a stack that's entirely serverless, you still need SRE.

But I think that the model is really shifting. You know, it used to be you hired an SRE team or an ops team to carry the pager for you and to take the alerts and to, like, buffer everything, and nowadays, that's not the expectation. That's not what good companies do. You know, they set up systems for their software engineers to own their code in production. But they need help because they're not experts in this, and that's where SRE types come in. Is that your experience?

WILL: Yeah, for the most part. Yeah, that is.

CHARITY: Yeah, I think that's very healthy.

VICTORIA: And I agree with that as well. And I'm going to take that clip of your reaction to that question about when you should start doing [laughter] observability and just play for everybody whenever someone asks [laughs] me that. I'm like, here's the answer. That's great.

CHARITY: I think a good metaphor for that is like, if you're buying a house and taking out a loan, the more of a down payment you can put down upfront, the lower that your monthly payments are going to be for the rest of your...you amortize that out over the next 20-30 years. The more you can do that, the better your life is going to be because interest rates are a bitch.

VICTORIA: It makes sense. And yeah, like, to your point earlier about when people actually do start to care about it is usually after something has broken in a traumatic way that can be really bad for your clients and, like, your legal [laughter] stance --

CHARITY: That's true.

VICTORIA: As a company.

CHARITY: Facing stuff, yeah, is where people usually start to think about it. But, like, the less visible part, and I think almost the more important part is what it does to your velocity and your ability to execute internally. When you have a good, clean system that is well-tended that, you know, where the amount of time between when you're writing the code and when the code is in production, and you're looking at...when that is short and tight, like, no more than a couple of hours, like, it's a different job than if it takes you, like, days or weeks to deploy. Your changes get bashed up with other people's.

And, you know, like, you enter, like, the software development death spiral where, you know, it takes a while. So your diffs get even bigger, so code review takes even longer, so it takes even longer. And then your changes are all getting bashed up. And, you know, now you need a team to run deploys and releases. And now you need an SRE team to do the firefighting.

And, like, your systems are...the bigger it gets, the more complicated it gets, the more you're spending time just waiting on each other or switching contexts. You ever, like, see an app and been like, oh, that's a cool app? I wonder...they have 800 engineers at that company. And you're just like, what the hell are they all doing? Like, seriously, how does it take that many engineers to build this admittedly nice little product?

I guarantee you it's because their internal hygiene is just terrible. It takes them too long to deploy things. They've forgotten what they've written by the time it's out, so nobody ever goes and looks at it. So it's just like, it's becoming a hairball under your bed. Nobody's looking at it. It's becoming more and more mysterious to you.

Like, I have a rule of thumb which there's no mathematical science behind this, just experience. But it's a rule of thumb that says that if it takes you, you know, on the order of, say, a couple of hours tops to deploy your software, if it takes you that many engineers to build and own that product, well, if your deploys take on the order of days instead of hours, it will take you twice as many people [chuckles] to build and support that product.

And if it takes you weeks to deploy that product, it will take you twice as many again; if anything, that is an underestimate because it actually goes up exponentially, not linearly. But, like, we are so wasteful when it comes to people's time. It is so much easier for managers to go, uh, we're overloaded. Let's hire more people. For some reason, you can always get headcount when you can't actually get the discipline to say no to things or the people to work on internal tools to, like, shrink that gap between when you've written it and when it's live.

And just the waste, it just spirals out of control, man, and it's not good. And, you know, it should be such a fun, creative, fulfilling job where you spend your day solving puzzles for money and moving the business materially forward every day. And instead, how much of our time do we just sit here, like, twiddling our thumbs and waiting for the build to finish or waiting for code review [laughs] to get turned around? Or, you know, swapping projects and, like, trying to page all that context in your brain? Like, it's absurd, and this is not that hard of a problem to fix.

VICTORIA: Engineering should be fun, and it should be dangerous. That's what [laughs] I'm getting out of this --

CHARITY: It should be fun, and it should be dangerous. I love that.

VICTORIA: Fun and dangerous. I like it. [laughs] And speaking of danger, I mean, maybe it's not dangerous, but what does success really look like for you at Honeycomb in the next six months or even in the next five years?

CHARITY: I find it much more easier to answer what failure would look like.

VICTORIA: You can answer that too if you like. [laughs]

CHARITY: [laughs] What would success look like? I mean, obviously, I have no desire to ever go through another acquisition, and I don't want to go out of business. So it'd be nice not to do either of those things, which means since we've taken VC money, IPO would be nice eventually.

But, like, ultimately, like, what motivates Christine and me and our entire company really is just, you know, we're engineers. We've felt this pain. We have seen that the world can be better. [laughs] We really just want to help, you know, move engineering into the current decade.

I feel like there are so many teams out there who hear me talk about this stuff. And they listen wistfully, and they're like, yeah, and they roll their eyes. They're like, yeah, you work in Silicon Valley, or yeah, but you work at a startup, or yeah...they have all these reasons why they don't get nice things.

We're just not good enough engineers is the one that breaks my heart the most because it's not true. Like, it has nothing to do...it has almost nothing to do with how good of an engineer you are. You have to be so much better of an engineer to deal with a giant hairball than with software that gets deployed, you know, within the hour that you can just go look at and see if it's working or not.

I want this to go mainstream. I want people...I want engineers to just have a better time at work. And I want people to succeed at what they're doing. And just...the more we can bring that kind of change to more and more people, the more successful I will feel.

VICTORIA: I really like that. And I think it's great. And it also makes me think I find that people who work in the DevOps space have a certain type of mentality sometimes, [laughs] like, it's about the greater community and, like, just making being at work better. And I also think it maybe makes you more willing to admit your failures [laughs] like you were earlier, right?

CHARITY: Probably.

VICTORIA: That's part of the culture. It's like, well, we messed up. [laughs] We broke stuff, and we're going to learn from it.

CHARITY: It's healthy. I'm trying to institute a rule where at all hands when we're doing different organizations giving an update every two weeks, where we talk two-thirds about our successes and things that worked great and one-third about things that just didn't work. Like, I think we could all stand to talk about our failures a lot more.

VICTORIA: Yeah, makes it a lot less scary, I think [laughs], right?

CHARITY: Yeah, yeah. It democratizes the feeling, and it genuinely...it makes me happy. It's like, that didn't work, great. Now we know not to do it. Of the infinite number of things that we could try, now we know something for real. I think it's exciting. And, I don't know, I think it's funny when things fail. And I think that if we can just laugh about it together...

You know, in every engineering org that I've ever worked at, out of all the teams, the ops types teams have always been the ones that are the most tightly bonded. They have this real, like, Band of Brothers type of sentiment. And I think it's because, you know, we've historically endured most of the pain. [laughs] But, like, that sense of, like, it's us against the system, that there is hilarity in failure. And, at the end of the day, we're all just monkeys, like, poking at electrical sockets is, I think...I think it's healthy. [laughs]

WILL: That's really neat. I love it. This is one of my favorite questions. What advice would you give yourself if you could go back in time?

CHARITY: I don't know. I think I'd just give myself a thumbs up and go; it's going to be all right. I don't know; I wouldn't... I don't think that I would try to alter the time continuum [laughs] in any way. But I had a lot of anxiety when I was younger about going to hell and all this stuff. And so I think...but anything I said to my future self, I wouldn't have believed anyway. So yeah, I respectfully decline the offer.

VICTORIA: That's fair. I mean, I think about that a lot too actually, like, I sometimes think like, well, if I could go back to myself a year ago and just --

CHARITY: Yeah. I would look at me like I was stupid.

[laughter]

VICTORIA: That makes sense. It reminds me a little bit about what you said, though, like, doing SRE and everything upfront or the observability pieces and building it correctly in a way you can deploy fastly is like a gift to your future self. [laughs]

CHARITY: Yes, it is, with a bow. Yes, exactly.

VICTORIA: There you go. Well, all right. I think we are about ready to wrap up. Is there anything you would like to promote specifically?

CHARITY: We just launched this really cool little thing at Honeycomb. And you won't often hear me say the words cool and AI in proximity to each other, but we just launched this really dope little thing. It's a tool for using natural language to ask questions of your telemetry. So, if you just deployed something and you want to know, like, what's slow or did anything change, you can just ask it using English, and it does a ChatGPT thing and generates the right graphs for you. It's pretty sweet.

VICTORIA: That's really cool. So, if you have Honeycomb set up and working in your system and then you can just ask the little chatbot, "Hey, what's going on here?"

CHARITY: Yeah. What's the slowest endpoint? And it'll just tell you, which is great because I feel like I do not think graphically at all. My brain just really doesn't. So I have never been the person who's, like, creating dashboards or graphs. My friend Ben Hartshorne works with me, and he'll make the dashboards. And then I get up in the morning, and I bookmark them. And so we're sort of symbiotic.

But everyone can tweak a query, right? Once you have something that you know is, like, within spitting distance as the data you want, anyone can tweak it, but composing is really hard. So I feel like this really helps you get over that initial hurdle of, like, er, what do I break down by? What do I group by? What are the field names? You just ask it the question, and then you've got to click, click, click, and, like, get exactly what you want out of it. I think it's, like, a game changer.

VICTORIA: That sounds extremely cool. And we will certainly link to it in our show notes today. Thank you so much for being with us and spending the time, Charity.

CHARITY: Yeah, this was really fun.

VICTORIA: You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you can find me on Twitter @victori_ousg.

WILL: And you could find me on Twitter @will23larry.

This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore.

Thanks for listening. See you next time.

ANNOUNCER: This podcast is brought to you by thoughtbot, your expert strategy, design, development, and product management partner. We bring digital products from idea to success and teach you how because we care. Learn more at thoughtbot.com.

Support Giant Robots Smashing Into Other Giant Robots
Sponsors