Giant Robots Smashing Into Other Giant Robots
00:00:00
/
00:47:24

405: RackN Digital Rebar with Rob Hirschfeld

December 23rd, 2021

Chad talks to Rob Hirschfeld, the Founder and CEO of RackN, which develops software to help automate data centers, which they call Digital Rebar. RackN is focused on helping customers automate infrastructure. They focus on customer autonomy and self-management, and that's why they're a software company, not a services or as-a-service platform company.

Digital Rebar is a platform that helps connect all of the different pieces and tools that people use to manage infrastructure into infrastructure pipelines through the seamless multi-component automation across all of the different pieces and parts that have to be run to bring up infrastructure.

Become a Sponsor of Giant Robots!

Transcript:

CHAD: This is the Giant Robots Smashing Into Other Giant Robots Podcast where we explore the design, development, and business of great products. I'm your host, Chad Pytel. And with me today is Rob Hirschfeld, Founder, and CEO of RackN, which develops software to help automate data centers, which they call Digital Rebar. Rob, welcome to the show

ROB: Chad, it is a pleasure to be here. Looking forward to the conversation.

CHAD: Why don't we start with a little bit more information about what RackN and the Digital Rebar platform actually is.

ROB: I would be happy to. RackN is focused on helping customers automate infrastructure. And for us, it's really important that the customers are doing the automation. We're very focused on customer autonomy and self-management. It's why we're a software company, not a services or as a service platform company.

But fundamentally, what Digital Rebar does is it is the platform that helps connect all of the different pieces and tools that people use to manage infrastructure into infrastructure pipelines through the seamless multi-component automation across all of the different pieces and parts that have to be run to bring up infrastructure. And we were talking data centers do a lot of on-premises all the way from the bare metal up. But multi-cloud, you name it, we're doing infrastructure at that level.

CHAD: So, how agnostic to the actual bare metal are you?

ROB: We're very agnostic to the bare metal. The way we look at it is data centers are heterogeneous, diverse places. And that the thing that sometimes blocks companies from being innovative is when they decide, oh, we're going to use this one vendor for this one platform. And that keeps them actually from moving forward. So when we look at data centers, the heterogeneity and sometimes the complexity of that environment is a feature. It's not a bug from that perspective.

And so it's always been important to us to be multi-vendor, to do things in a vendor-neutral way to accommodate the quirks and the differences between...and it's not just vendors; it's actually user choice. A lot of companies have a multi-vendor problem (I'm air quoting) that is actually a multi-team problem where teams have chosen to make different choices.

TerraForm has no conformance standard built into it. [laughs] And so you might have everybody in your company using TerraForm and Ansible happily but all differently. And that's the problem that we walk into when we walk into a data center challenge. And you can't sweep that under the rug. So we embraced it.

CHAD: What kind of companies are your primary customers?

ROB: We're very wide-ranging, from the top banks use us and deploy us, telcos, service providers, very large scale service providers use us under the covers, media companies. It really runs the gamut because it's fundamentally for us just about infrastructure. And our largest customers are racing to be the first to deploy. And it's multi-site, but 20,000 machines that they're managing under our Digital Rebar management system.

CHAD: It's easy, I think, depending on where you sit and your experiences. The cloud providers today can overshadow the idea that there are even people who still have their own data centers or rent a portion of a data center. In today's ecosystem, what are some of the factors that cause someone to do that who isn't an infrastructure provider themselves?

ROB: You know the funny thing about these cloud stories (And we're talking just the day after Amazon had a day-long outage.) is that even the cloud providers don't have you give up operation. You're still responsible for the ops. And for our customers, it's not like they can all just use Lambdas and API gateways. At the end of the day, they're actually doing multi-site distributed operations. And they have these estates that are actually it's more about how do I control distributed infrastructure as much as it is about repatriating.

Now, we do a lot to help people repatriate. And they do that because they want more control. Cost savings is a significant component with this. You get into the 1000s of machines, and that's a big deal. Even at hundreds of machines, you can save a lot of money compared to what you get in cloud.

And I think people get confused with it being an or choice. It really is an and choice. Our best customers are incredibly savvy cloud users. They want that dynamic, resilient very API-driven environment. And they're looking to bring that throughout the organization. And so those are the ones that get excited when they see what we've done because we spend a lot of time doing infrastructure as code and API-driven infrastructure. That's really what they want.

CHAD: Cool. So, how long have you been working on RackN? When did you found it?

ROB: [laughs] Oh my goodness. So RackN is seven years old. Digital Rebar, we consider it to be at its fourth generation, but those numbers actually count back before that. They go back to 2009. The founding team was actually at Dell together in the OpenStack heyday and even before the OpenStack heyday. And we were trying to ship clouds from the Dell Factory.

And what we found was that every customer had this bespoke data center we've already talked about. And we couldn't write automation that would work customer to customer to customer. And it was driving us nuts. We're a software team, not a hardware team inside of Dell. And the idea that if I fixed something in the delivery or in their data center, and couldn't go back to their data center because it was different than what the next customer needed and the next customer needed, we knew that we would never have a community. It's very much about this community and reuse pattern.

There's an interesting story that I picked up from SREcon actually where they were talking about the early days of boilers. This is going back a few centuries ago. But when they first started putting boilers into homes and buildings, there was no pattern, there was no standard. And everybody would basically hire a plumber or a heating architect. Heating architect was a thing.

But you'd build a boiler and every one was custom, and every one was different. And no surprise, they blew up a lot, and they caused fires. And buildings were incredibly unsafe because they were working on high-pressure systems with no pattern. And it took regulation and laws and standards. And now nobody even thinks about it. You just take standard parts, and you connect them together in standard ways. And that creates actually a much more innovative system. You wouldn't want every house to be wired uniquely either.

And so when we look at the state of automation today, we see it as this pre-industrial pre-standardization process and that companies are actually harmed and harming themselves because they don't have standards, and patterns, and practices that they can just roll and know they work.

And so that philosophy started way back in 2009 with the first generation which was called Crowbar. Some of your audience might even remember this from the OpenStack days. It was the first OpenStack installer built around Chef. And it had all sorts of challenges in it, but it showed us the way. And then we iterated up to where Digital Rebar is today. Really fully infrastructure as code, building infrastructure pipelines, and a lot of philosophical pieces we've learned along the way.

CHAD: So you were at Dell working on this thing. How did you decide to leave Dell and start something new?

ROB: Dell helped me with that decision. [laughs] So the challenge of being a software person inside of Dell especially at the time, Crowbar was open-source which did make it easier for us to say, "Hey, we want to part ways but keep the IP." And the funny thing is there's not a scrap of Crowbar in Digital Rebar except one or two naming conventions that we pulled forward and the nod of the name, that Rebar is a nod to Crowbar.

But what happened was Dell when it went private, really did actually double down on the hardware and the more enterprise packaged things. They didn't want to invest in DevOps and that conversation that you need to have with your customers on how they operate, the infrastructure you sold them. And that made Dell not a very good place for me and the team. And so we left Dell, looked at the opportunity to take what we'd been building with Crowbar and then make it into a product. That's been a long journey.

CHAD: Now, did you bootstrap, or did you take investment?

ROB: We took [laughs] a little bit of investment. We raised some seed funding. Certainly not what was in hindsight was going to be sufficient for us to do it. But we thought at the time that we had something that was much more product-ready for customers than it was.

CHAD: And what was the challenge that you found? What was the surprise there that it wasn't as ready as you thought?

ROB: So what we've learned in our space specifically...and there are some things that I think apply to everybody, and there are some things that you might be able to throw on the floor and ignore. I was a big fan of Minimum Viable Product. And it turned out that the MVP strategy was not at all workable with customers in data centers.

Our product is for people running production data centers. And nobody's going to put in software to run a data center that is MVP. It has to be resilient. It has to be robust. It has to be simple enough that they can understand it and solve some core problems, which is still sort of an MVP idea. But it can't be oops. [laughs] You can't have a lot of oops moments when you're dealing with enterprise infrastructure automation software. It has to work.

And importantly, and as a design note, this has been a lesson for us. If it does break, it has to break in very transparent, obvious ways. And I can't emphasize that enough. There's so much that when we build it, we come back and like, was it obvious that it broke? Is it obvious that it broke in a way that you can fix?

CHAD: And it's part of the culture too to do detailed post mortems with explanations and be as transparent as possible or at least find the root cause so that you can address it. That's part of the culture of the space too, right?

ROB: You'd like to hope so. [laughs]

CHAD: Okay. [laughs] In my experience, that's the culture of the space.

ROB: You're looking more at a developer experience. But even with a developer, you've got to be in a post mortem or something. And it's like everybody's pointing to the person to the left and the right sort of by human nature. You don't walk into that room assuming that it was your fault, and you should, but that's not how it usually is approached.

And what we find in the ops space, and I would tell people to work around this pattern if they can, is that if you're the thing doing the automation, you're always the first cause of the problem. So we run into situations where we're doing a configuration, and we find a vendor bug or a glitch or there's something, and we found it. It's our problem whether we were the cause or not. And that's super hard.

I think that people on every side of any type of issue need to look through and figure out what the...the blameless post mortem is a really important piece in all this. At the end of the day, it's always a human system that made a mistake or has something like that. But it's not as simple as the thing that told you the bad news that the messenger was at fault.

And there's a system design element to that. That's what we're talking about here is that when you're exposing errors or when something's not behaving the way you expect, our philosophy is to stop. And we've had some very contentious arguments with customers who were like, "Just retry until it fixes itself," or vendors who were like, "Yeah, if you retry that thing three times, [laughs] then it'll magically go away." And we're like, that's not good behavior. Fix the problem. It actually took us years to put a retry element into the tasks so that you can say, yeah, this does need three retries. Just go do it. We've resisted for a long time for this reason.

CHAD: So you head out into the market. And did you get initial customers, or was there so much resistance to the product that you had that you struggled to get even first customers?

ROB: We had first customers. We had a nice body of code. The problem is actually pretty well understood even by our customers. And so it wasn't hard for them to get a trial going. So we actually had a very profitable customer doing...it was in object storage, public object storage space. And they were installing us. They wanted to move us into all their data centers.

But for it to work, we ended up having an engineer who basically did consulting and worked with them for the better part of six months and built a whole bunch of stuff, got it working. They could plug in servers, and everything would set itself up. And they could hit a button and reset all the servers, and they would talk to the switches. It was an amazing amount of automation.

But, and this happens a lot, the person we'd been working with was an SRE. And when they went to turn it over to the admins in the ops team, they said, [laughs] "We can't operate. There's too much going on, too complex." And we'd actually recognized...and this is a really serious challenge. It's a challenge now that we're almost five years into the generation that came after that experience.

And we recognized there was a problem. And that this wasn't going to create that repeatable experience that we were looking for if it took that much. At the same time, we had been building what is now Digital Rebar in this generation that was a single Golang binary. All the services were bundled into the system. So it listened on different ports but provide all the services, very easy to install, really, really simple. We literally stripped everything back to the basics and restarted.

And we had this experience where we sat down with a customer who had...I'm going to take a second and tell the story because this is such a compelling story from a product experience. So we took our first product. We were in a bake-off with another bare metal focus provisioning at the time. And they were in a lab, and they set our stuff up. And they turned it on, and they provisioned. And they set up the competitor, and they turned it on and provisioned. And both products worked.

Our product took 20 minutes to go through the cycle and the competitor took 3. And the customer came back and said, "I can't use this. I like your product better. It has more controls with all this stuff." But it took 20 minutes instead of 3. We actually logged into the system, looked at it and we were like, "Well, that's because it recognized that your BIOS was out of date, patched your BIOS, updated the system, checked that it was right, and then rebooted the systems and then continued on its way because it recognized your systems were outdated automatically.

And he said, "I didn't want it to do that. I needed it to boot as quickly as possible." And literally, [laughs] we were in the middle of a team retreat. So it's like, the CTO is literally excusing himself on the table to talk to the guy to make this stuff, try and make it right. And he's like, "Well, we've got this new thing. Why don't you install this, what's now Digital Rebar, on the system and repeat the experiment?" And he did and Digital Rebar was even faster than the competitor. And it did exactly just install, booted, and was done.

And he came back to the table, and it took 15 minutes to have the whole conversation to make everything work. It was that much of a simpler design. And he sat down and told the story. And I was in the middle of it. I'm just like, "We're going to have to pivot and put everything into the new version," which is what we did. And we just ripped out the complexity. And then over the last couple of years now, we've built the complexity back into the system to do all those additional but much more customer-driven from that perspective.

CHAD: How did you make sure that as you were changing your focus, putting all of your energy into the new version that you [laughs] didn't introduce too much risk in that process or didn't take too long?

ROB: [laughs] We did take too long and introduced too much risk, and we did run out of money. [laughs] All those things happened. This was a very difficult decision. We thought we could get it done much faster. The challenge of the simpler product was that it was too simple to be enough in customers’ data centers. And so yeah, we almost went out of business in the middle of all this cycle. We had a time where RackN went back down to just the two founders.

And at this point, we'd gotten far enough with the product that we knew it was the right thing. And we'd also embedded a degree...with the way we do the UX, we have this split. The UX runs on a hosted system. It doesn't have to but by default, it does. And then we have the back end. So we were very careful about how we collected metrics because you really need to know who's downloading and using your products.

And we had enough data from that to realize that we had some very committed early users and early customers, just huge brand names that were playing around. So we knew that we'd gotten this mix right, that we were solving a problem in a unique way. But it was going to take time because big companies don't make decisions quickly. We have a joke. We call it the reorg half-life problem. So the half-life of a reorg in any of our customers is about nine months. And either you're successful inside of that reorg half-life, or you have to be resilient across this reorg half.

And so initially, it was taking more than nine months. We had to be able to get the product in play. And once we did, we had some customers who came in with very big checks and let us come back and basically build back up. And we've been adding some really nice names into our customer roster. Unfortunately, it's all private. I can tell you their industries and their scale, but I can't name them.

But that engagement helped drive us towards the feature set and the capabilities and building things up along that process. But it was frustrating. And some of them, especially at the time we were open-source, were very happy to say, "No, we are a super big brand name. We don't pay for software." I'm like, "Most profitable, highest valued companies in the world you don't want to pay for this operational software?" And they're like, "No, we don't have to." And that didn't sit very well with us. Very hard, as a starting startup, it was hard.

CHAD: At the time, everything you were doing was open source.

ROB: So in the Digital Rebar era, we were trying to do Open Core. Digital Rebar itself was open. And then we were trying to hold back the BIOS patches, integrate enterprise single sign-on. So there was a degree of integration pieces that we held back as RackN and then left the core open. So you could use Digital Rebar and run it, which we had actually had a lot of success with people downloading, installing, and running Digital Rebar, not as much success in getting them to pay us for that privilege.

CHAD: So, how did you adjust to that reality?

ROB: We inverted the license. After we landed a couple of big banks and we had several others and some hyperscalers too who were like, "This is really good software. We love it. We're embedding it in our service, but we're not going to pay you." And then they would show up with bugs and complaints and issues and all sorts of stuff still.

And what happened is we started seeing them replicating the closed pieces. The APIs were open. We actually looked at it and listening to our communities, they wanted to see what was in the closed pieces. That was actually operationally important for them to understand how that stuff worked. They never contributed or asked to see anything in the core. And, there's an important and here, and they needed performance improvements in the core that were radically different.

So the original open-source stuff went to maybe 500 machines, and then it started to cap out. And we were like, all right, we're going to have to really rewrite the data store mechanisms that go with this. And the team looked at each other and were like, "We're not going to open source that. That's really complex and challenging IP." And so we said the right model for us is going to be to make the core closed and then allow our community and users to see all the things that they are actually using to interact with their environment. And it ends up being a little bit of a filter.

There are people who only use open-source software. But those companies also don't necessarily want to pay. When I was an open-source evangelist, this was always a problem. You're pounding on the table saying, "If you're using open-source software, you need to understand who to pay for that service, that software that you're getting. If you're not paying for it, that software is going to go away." In a lot of cases, we're a walking example of that.

And it's funny, more of the codebase is open today than it was then. [chuckles] But the challenge is that it's really an open ecosystem now because none of that software is particularly useful without the core to run it and glue everything together.

CHAD: Was that a difficult decision to make? Was it controversial?

ROB: Incredibly difficult. It was something I spent a lot of time agonizing about. My CTO is much clear-eyed on this. From his perspective, he and the other engineers are blood, sweat, and tears putting this in. And it was very frustrating for them to see it running people's production data centers who told us, and this is I think the key, who just said to us, "You know, we're not going to pay money for that." And so for them, it was very clear-eyed it's their work, their sweat equity, very gut feeling for that.

For me, I watched communities with open-source routes, you know, the Kubernetes community. I was in OpenStack. I was on the board for that. And there is definitely a lift that you get from having free software and not having the strings. And I also like the idea that from a support perspective, if you're using open-source software, you could conceivably not care for the vendor that went away. You could find another life for it.

But years have gone by and that's not actually a truism that when you are using open-source software if you're getting it from a vendor, you're not necessarily protected from that vendor making decisions for you. CentOS is a great...the whole we're about to hit the CentOS deadlines, which is the Streams, and you can't get other versions. And we now have three versions of CentOS, at least three versions of CentOS with Rocky, and Alma, and CentOs Streams. Those are very challenging decisions for people running enterprise data centers, not that simple.

And nobody in our communities is running charity data centers. There's no goodwill charity. I'm running a data center out of the goodness of my heart. [laughs] They are all production systems, enterprise. They're doing real production work. And that's a commercial engagement. It's not a feel-good thing.

CHAD: So what did you do in your decision-making process? What pushed you, or what did you come to terms with in order to make that change?

ROB: I had to admit I was wrong. [laughter] I had to think back on statements I'd made and the enthusiasm that I'd had and give up some really hard beliefs. Being a CEO or a founder is the same process. So I wish I could say this was the only time [laughs] I had to question, you know, hard-made assumptions, or some core beliefs in what I thought.

I've had to get really good at questioning when am I projecting this is the way I want the world to be therefore it will be? That's a CEO skill set and a founder skill set...and when that projection is having you on thin ice. And so you constantly have to make that balance.

And this was one of those ones where I'm like, all right, let's do it. And I still wake up some mornings and look at people who are open source only and see how much press they get or how easy it is for them to get mentions and things like that. And I'm like, ah, God, that'd be great. It feels like it's much harder for us because we're commercial to get the amplification.

There are conferences that will amplify open-source TerraForm, great example. It gets tons of amplification for being a single vendor project that's really tightly controlled by HashiCorp. But nobody is afraid to go talk about TerraForm and mention TerraForm and do all this stuff, the amazing use of open source by that company. But they could turn it and twist it, and they could change it. It's not a guarantee by any stretch of the imagination.

CHAD: Well, one of the things that I've come to terms with, and maybe this is a very positive way of looking at it, instead of that you were wrong, [laughter] is to realize that well, you weren't necessarily wrong. It got you to where you were at that point. But maybe in order to go to the next level, you need to do something different. And that's how I come to terms with some things where I need to change my thinking.

ROB: [laughs] I like that. It's good. Sometimes you can look back and be like, yeah, that wasn't the right thing and just own it. But yeah, it does help you to know the path. Part of the reason why I love talking about it with you like this is it's not just Rob was wrong; we're actually walking the path through that decision. And it's easy to imagine us sitting in...we're in a tiny, little shared office listening to calls where...I'll tell you this as a story to make it incredibly concrete because it's exactly how this happened.

We were on a call. Everybody was in the room. And we were talking to a major bank saying, "We love your software." We're like, "Great, we're looking forward to working with you," all this stuff. And they're like, "Yeah, we need you to show us how you built this plugin because we want to write our own version of it."

CHAD: [chuckles]

ROB: We're like, "If you did that, you wouldn't need to buy our software." And they're like, "That's right. We're not going to buy your software."

CHAD: Exactly. [laughs]

ROB: And we're like, "Well, we won't show you how to use it. Then we won't show you how to do that." And they're like, "Well, okay. We'll figure it out ourselves." And so I'm the cheerful, sunny, positive, sort of managing the call, and I'm not just yelling at them. My CTO is sitting next to me literally tearing his hair. This was literally a tearing his hair out moment. And we hung up the call, and we went on a walk around the neighborhood. And he was just like, "What more do you need to hear for you to understand?"

And so it's moments like that. But instead of being like, no, you're wrong, we got to do it this way, I was ready to say, "Okay, what do you think we can do? How do we think we can do it?" And then he left me with a big pile of PR messaging to explain what we're doing, conversations like this. Two years ago when we made this change, almost three, I felt like I was being handed a really hard challenge.

As it turns out, it hasn't been as big a deal. The market has changed about how they perceive open source. And for enterprise customers, they're like, "All right, how do we deal with the licensing for this stuff?" And we're like, "You just buy it from us." And they're like, "That's it?" And I'm like, "Yes." And you guarantee every..." "Yes." They're like, "Oh. Well, that's pretty straightforward. I don't have to worry about..."

We could go way down an open-source rabbit hole and the consulting pieces and who owns the IP, and I used to deal with all that stuff. Now it's very straightforward. [laughs] Like, "You want to buy and use the software to run your data center?" "Yes, I do." "Great."

CHAD: Well, I think this is generally applicable even beyond your specific product but to products in general. It's like, when you're not talking to people who are good customers or who are even going to be your customers who are going to pay for what you want, you can spend a lot of time and energy trying to please them. But you're not going to be successful because they're not going to be your customers no matter what you do.

ROB: And that ends up being a bit of a filter with the open-source pieces is that there are customers who were dyed in the wool open source. And this used to be more true actually as the markets moved a lot. We ended up just not talking to many. But they do, they want a lot. They definitely would ask for features or things and additions and help, things like that. And it's hard to say no. Especially as a startup founder, you want to say yes a lot.

We try to not say yes to things that we don't...and this puts us at a disadvantage I feel like from a marketing perspective. If we don't do something, we tend to say we don't do it, or we could do it, but it would take whatever. I wish more people in the tech space were as disciplined about this does work, this doesn't work, this is a feature. This is something we're working on. It's not how tech marketing typically works sadly. That's why we focus on self-trials so people can use the product.

Mid-roll Ad

I wanted to tell you all about something I've been working on quietly for the past year or so, and that's AgencyU. AgencyU is a membership-based program where I work one-on-one with a small group of agency founders and leaders toward their business goals.

We do one-on-one coaching sessions and also monthly group meetings. We start with goal setting, advice, and problem-solving based on my experiences over the last 18 years of running thoughtbot. As we progress as a group, we all get to know each other more. And many of the AgencyU members are now working on client projects together and even referring work to each other.

Whether you're struggling to grow an agency, taking it to the next level and having growing pains, or a solo founder who just needs someone to talk to, in my 18 years of leading and growing thoughtbot, I've seen and learned from a lot of different situations, and I'd be happy to work with you. Learn more and sign up today at thoughtbot.com/agencyu. That's A-G-E-N-C-Y, the letter U.

CHAD: So you have the core and then you have the ecosystem. And you also mentioned earlier that it is an actual software package that people are buying and installing in their data center. But then you have the UI which is in the cloud and what's in the data center is reporting up to that.

ROB: Well, this is where I'm going to get very technical [laughs] so hang on for a second. We actually use a cross-domain approach. So the way this works...and our UX is written in React. And everything's...boy, there's like three or four things I have to say all at once. So forgive me as I circle.

Everything we do at Digital Rebar is API-first, really API only, so the Golang service with an API, which is amazing. It's the right way to do software. So for our UX, it is a React application that can talk to that what we call an endpoint, that Digital Rebar endpoint. And so the UX is designed to talk directly to the Digital Rebar endpoint, and all of the information that it gets comes from that Digital Rebar endpoint. We do not have to relay it. Like, you have to be inside that network to get access to that endpoint. And the UX just talks to it.

CHAD: Okay. And so the UX is just being served from your centralized servers, but you're just delivering the React for the JavaScript app. And that is talking to the local APIs.

ROB: Right. And so we do use that browser as a bridge. And so when you want to download new content packs...so Digital Rebar is a platform. So you have to download content and automation and pieces into it. The browser is actually your bridge to do that. So the browser can connect to our catalog, pull down our catalog, and then send things into that browser. So it's super handy for that.

But yeah, it's fundamentally...it's all behind your firewall software except...and this is where people get confused because you're downloading it from rackn.io. That download or the URL on the browser looks like it's a RackN URL even though all the traffic is network local.

CHAD: Do your customers tend to stay up to date? Are they updating to the latest version right away all the time?

ROB: [laughs] No, of course not.

CHAD: I figured that was the answer.

ROB: And we maintain patches on old versions and things like that. I wish they were a little faster. I'm not always sad that they're...I'm actually very glad when we do a release like we did yesterday...And in that release, I don't expect any of our production customers to go patch everything.

So in a SaaS, you might actually have to deal with the fact that you've got...and we're back to our heterogeneity story. And this is why it's important that we don't do this. If we were to push that, if we didn't handle every situation for every customer exactly right, there would be chaos. And it would all come back to our team. The way we do it means that we don't have to deal with that. Customers are in control of when they upgrade and when they migrate, except in the UX case.

CHAD: So how do you manage that if someone goes to the UI and their local thing is an old version? Are you detecting that and doing things differently?

ROB: Yes, one of the decisions we made that I'm really happy with is we embedded feature flags into the API. When you log in, it will pull back. We know what the versions are. But versions are really problematic as a way to determine what's in software, not what's not in software. So instead, we get an array back that has feature flags as we add features into the core. And we've been doing this for years. And it's an amazingly productive process.

And so what the UX does is as we add new things into the UX, it will look for those feature flags. And if the feature flag isn't there, it will show you a message that says, "This feature is not available for your endpoint," or show you the thing appropriate without that. And so the UX has gone through years of this process. And so there are literally just places where the UX changes behavior based on what you've installed on your system.

And remember, our customers it's multi-site. So our customers do have multiple versions of Digital Rebar installed across there. So this behavior is really important also for them to be able to do it. And it goes back to LaunchDarkly. I was talking to Edith back in the early days of LaunchDarkly and feature flags, and I got really excited about that. And that's why we embedded it into the product. Everybody should do it. It's amazing.

CHAD: One of the previous episodes a few ago was with actually the thoughtbot CTO, Joe Ferris. And we're on a project together where it's a different way of working but especially when you need it... so much of what I had done previously was versioned APIs. Maybe that works at a certain scale. But you get to a certain scale of software and way of working and wanting to do continuous deployment and continually update features and all that stuff. And it's a really good way of working when instead you are communicating on the level of feature availability.

ROB: And from an ops person's perspective, and this was true with OpenStack, they were adding feature flags down at the metadata for the...it was incredible. They went deep into the versioned API hellscape. It's the only way I can describe it [laughs] because we don't do that. But the thing that that does not help you with is a lot of times the changes that you're looking at from an API perspective are behavior changes, not API changes. Our API over years now has been additive.

And as long as you're okay with new objects showing up, new fields showing up in an object, you could go back to four-year-old software, talk to our API, and it would still work just fine. So all your integrations are going to be good, but the behavior might change. And that's what people don't...they're like, oh, I can make my API version, and everything's good. But the behavior that you're putting behind the scenes might be different. You need a way to express that even more than the APIs in my opinion.

CHAD: I do think you really see that when you...if you're just building a monolithic web app, it's harder to see. But once you separate your UI from your back end...and where I first hit this was with mobile applications. The problem becomes more obvious to you as a developer I think.

ROB: Yes.

CHAD: Because you have some people out there who are actually running different versions of your UI too. So your back end is the same for everybody but your UI is different.

ROB: [laughs]

CHAD: And so you need a back end that can respond to different clients. And a better way to do that rather than versioning your API is to have the clients tell you what they're capable of while they're making the requests and to respond differently. It's much more of a flexible way.

ROB: We do track what UX. We have customers who don't want to use that. They don't even want us changing the UX...or actually normal enterprise. And so they will run...the nice thing about a React app is you can just run it. The Digital Rebar can host its UX, and that's perfectly reasonable. We have customers who do that.

But every core adds more operational complexity. And then if they don't patch the UX, they can fall behind or not get features. So we see that it's...you're describing a real, you know, the more information you're exchanging between the clients and the servers, the better for you to track what's really going on.

CHAD: And I think overall once you can get a little...in my experience, especially people who haven't worked that way, joining the team, it can take a little bit for them to get comfortable with that approach and the flexibility you need to be building into your system. But once people are comfortable with it and the team is comfortable, it really starts to hum.

In my experience, a lot of what we've advocated for in terms of the way software should be built and deployed and that kind of thing is it actually makes it so that you can leave that even easier. And you can really be agile because you can roll things out in a very agile way.

ROB: So are you thinking like an actual rolling deployment where the deployed software has multiple versions coming through?

CHAD: Yep. And you can also have different users seeing different things at different times as well. You can say, "We're going to be doing continual deployment and have code continually deployed." But that doesn't mean that it's part of the release yet, that it's available to users to use.

ROB: Yeah, that ability to split and feature flag is a huge deal.

CHAD: Yeah. What I'm trying to figure out is does this apply to every project even the small like, this just changes the way you should build software? Or is there a time in a product to start introducing that thing?

ROB: I am a big fan of doing it first and fast. There are decisions that we made early that have proven out really well. Feature flags is one of them. We started right away knowing that this would be an important thing for us to do. And same thing with tracking dependencies and being able to say, "I need..." actually, it's helpful because you can write automation that says, "I need this feature in the product." This flag and the product it's not just a version thing. That makes the automation a little bit more portable, easier to maintain.

The other thing we did that I really like is all of our objects have documentation embedded in them. So as I write a parameter or an ask or really anything in the system, everything has a documentation field. And so I can write the documentation for that component right there. And then we modified our build scripts so that they will pull in all of that documentation and create an aggregated view. And so the ability to do just-in-time documentation is very, very high. And so I'm a huge fan of that.

Because then you have the burden of like, oh, I need to go back and write up a whole bunch of documentation really lessened when you can be like, okay, for this parameter, I can explain its behavior, or I can tell you what it does and know that it's going to show up as part of a documentation set that explains it. That's been something I've been a big fan of in what we build.

And not everybody [laughs] is as much a fan. And you can see people writing stuff without particularly crisp documentation behind it. But at least we can go back and add that documentation or lessons learned or things like that. And it's been hugely helpful to have a place to do that.

From a design perspective, one other thing I would say that we did that...and you can imagine the conversation. I have a UX usability focus. I'm out selling the product. So for me, it's how does it demo? How does it show? What's that first experience like? And so for me having icons and colors in the UX, in the experience is really important. Because there's a lot of semantic meaning that people get just looking down a list of icons and seeing that they are different colors and different shapes.

But from the CTO's perspective, that's window dressing. Who cares? It doesn't have functional purpose. And we're both right. There's a lot of times when to me, both people can be right. So we added that as a metafield into all of our objects. And so we have the functional part of the definition of the API. And then we have these metaobjects that you can add in or meta definitions that you can add in behind the scenes to drive icons and colors.

But sometimes UX rendering hints and things like that that from an API perspective, you're like, I don't care, not really an API thing. But from a do I show...this is sensitive information. Do I turn it into a password field? Or should this have a clipboard so I can clipboard icon it, or should I render it in this type of viewer or a plain text viewer? And all that stuff we have a place for.

CHAD: And so it's actually being delivered by the API that's saying that.

ROB: Correct.

CHAD: That's cool.

ROB: It's been very helpful. You can imagine the type of stuff we have, and it's easy to influence UX behaviors without asking for UI change.

CHAD: Now, are these GraphQL APIs?

ROB: No. We looked at doing that. That's probably a whole nother...I might get our CTO on the line for that.

CHAD: [laughs] It's a whole nother episode for that.

ROB: But we could do that. But we made some decisions that it wasn't going to provide a lot of lift for us in navigation at the moment. It's funny, there's stuff that we think is a really cool idea, but we've learned not to jump on them without having really specific customer use cases or validations.

CHAD: Well, like you said, you've got to say no. You've got to make decisions about what is important, and what isn't important now, and what you'll get to later, and that requires discipline.

ROB: This may be a way to bring it full circle. If you go back to the stories of every customer having a unique data center, there’s this heterogeneity and multi-vendor pieces that are really important. The unicycle we have to ride for this is we want our customers to have standard operating processes, standard infrastructure pipelines for this and use those and follow that process.

Because we know if they do, then they'll keep improving as we improve the pipelines. And they're all unique. So there has to be a way in those infrastructure pipelines to do extensions that allow somebody to say, "I need to make this call here in the middle of this pipeline." And we have ways to do that address those needs.

The challenge becomes providing enough opinionated like, this is how you should do things. And it's okay if you have to extend it or change it a little bit or tweak it without it just becoming an open-ended tool where people show up and they're like, "Oh, yeah, I get how to build something." And we have people do this, but they run out of gas in the long journey.

They end up writing bespoke workflows. They write their own pipelines; they do their own integrations. And for them, it's very hard to support them. It's very hard to upgrade them. It's very hard for them to survive the reorg, your nine-month reorg windows.

And so yeah, there's a balance between go do whatever you want, which you have to enable and do it our way because these processes are going to let your teams collaborate, let you reuse software. And we've actually over time been erring more and more on the side of you really need to do it the way we want you to do; reinforce the infrastructure as code processes.

And this is the key, right? I mean, you're coming from a development mindset. You want your tooling to reinforce good behavior, CICD, infrastructure as code, all these things. You need those to be easier to do [laughs] than writing it yourself. And over time, we've been progressing more and more towards the let's make it easier to do it within the opinionated way that we have and less easy to do it within the Wild West pattern.

CHAD: Cool. Well, I think with that, we'll start to wrap up. So if people want to find out more, where are some places that they could do that or get in touch with you?

ROB: The simplest thing is of course rackn.com is the website. We encourage people to just, if this is interesting, download and try the software. If they have a cloud account, it's super easy to play with it, all things RackN through that.

I am very active on Twitter under the handle @zehicle Z-E-H-I-C-L-E. And I'm happy to have conversations around these topics and data center and operations and even the future of cloud and edge computing. So please look me up. I'm excited to have conversations like that.

CHAD: Awesome. And you can subscribe to the show and find notes and transcripts for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you can find me on Twitter @cpytel.

This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening and see you next time.

Announcer: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.

Support Giant Robots Smashing Into Other Giant Robots
Sponsors