
55 episodes

Software at Scale Utsav Shah
-
- Technology
-
-
4.6 • 13 Ratings
-
Software at Scale is where we discuss the technical stories behind large software applications.
www.softwareatscale.dev
-
Software at Scale 55 - Troubleshooting and Operating K8s with Ben Ofiri
Ben Ofiri is the CEO and Co-Founder of Komodor, a Kubernetes troubleshooting platform.
Apple Podcasts | Spotify | Google Podcasts
We had an episode with the other founder of Komodor, Itiel, in 2021, and I thought it would be fun to revisit the topic.
Highlights (ChatGPT Generated)
[0:00] Introduction to the Software At Scale podcast and the guest speaker, Ben Ofiri, CEO and co-founder of Komodor.
- Discussion of why Ben decided to work on a Kubernetes platform and the potential impact of Kubernetes becoming the standard for managing microservices.
- Reasons why companies are interested in adopting Kubernetes, including the ability to scale quickly and cost-effectively, and the enterprise-ready features it offers.
- The different ways companies migrate to Kubernetes, either starting from a small team and gradually increasing usage, or a strategic decision from the top down.
- The flexibility of Kubernetes is its strength, but it also comes with complexity that can lead to increased time spent on alerts and managing incidents.
- The learning curve for developers to be able to efficiently troubleshoot and operate Kubernetes can be steep and is a concern for many organizations.
[8:17] Tools for Managing Kubernetes.
- The challenges that arise when trying to operate and manage Kubernetes.
- DevOps and SRE teams become the bottleneck due to their expertise in managing Kubernetes, leading to frustration for other teams.
- A report by the cloud native observability organization found that one out of five developers felt frustrated enough to want to quit their job due to friction between different teams.
- Ben's idea for Komodor was to take the knowledge and expertise of the DevOps and SRE teams and democratize it to the entire organization.
- The platform simplifies the operation, management, and troubleshooting aspects of Kubernetes for every engineer in the company, from junior developers to the head of engineering.
- One of the most frustrating issues for customers is identifying which teams should care about which issues in Kubernetes, which Komodor helps solve with automated checks and reports that indicate whether the problem is an infrastructure or application issue, among other things.
- Komodor provides suggestions for actions to take but leaves the decision-making and responsibility for taking the action to the users.
- The platform allows users to track how many times they take an action and how useful it is, allowing for optimization over time.
[8:17] Tools for Managing Kubernetes.
[12:03] The Challenge of Balancing Standardization and Flexibility.
- Kubernetes provides a lot of flexibility, but this can lead to fragmented infrastructure and inconsistent usage patterns.
- Komodor aims to strike a balance between standardization and flexibility, allowing for best practices and guidelines to be established while still allowing for customization and unique needs.
[16:14] Using Data to Improve Kubernetes Management.
- The platform tracks user actions and the effectiveness of those actions to make suggestions and fine-tune recommendations over time.
- The goal is to build a machine that knows what actions to take for almost all scenarios in Kubernetes, providing maximum benefit to customers.
[20:40] Why Kubernetes Doesn't Include All Management Functionality.
- Kubernetes is an open-source project with many different directions it can go in terms of adding functionality.
- Reliability, observability, and operational functionality are typically provided by vendors or cloud providers and not organically from the Kubernetes community.
- Different players in the ecosystem contribute different pieces to create a comprehensive experience for the end user.
[25:05] Keeping Up with Kubernetes Development and Adoption.
- How Komodor keeps up with Kubernetes development and adoption.
- The team is data-driven and closely tracks user feedback and needs, as well as new developments and changes in the ecosystem.
- The use and adoptio -
Software at Scale 54 - Community Trust with Vikas Agarwal
Vikas Agarwal is an engineering leader with over twenty years of experience leading engineering teams. We focused this episode on his experience as the Head of Community Trust at Amazon and dealing with the various challenges of fake reviews on Amazon products.
Apple Podcasts | Spotify | Google Podcasts
Highlights (GPT-3 generated)
[0:00:17] Vikas Agarwal's origin story.
[0:00:52] How Vikas learned to code.
[0:03:24] Vikas's first job out of college.
[0:04:30] Vikas' experience with the review business and community trust.
[0:06:10] Mission of the community trust team.
[0:07:14] How to start off with a problem.
[0:09:30] Different flavors of review abuse.
[0:10:15] The program for gift cards and fake reviews.
[0:12:10] Google search and FinTech.
[0:14:00] Fraud and ML models.
[0:15:51] Other things to consider when it comes to trust.
[0:17:42] Ryan Reynolds' funny review on his product.
[0:18:10] Reddit-like problems.
[0:21:03] Activism filters.
[0:23:03] Elon Musk's changing policy.
[0:23:59] False positives and appeals process.
[0:28:29] Stress levels and question mark emails from Jeff Bezos.
[0:30:32] Jeff Bezos' mathematical skills.
[0:31:45] Amazon's closed loop auditing process.
[0:32:24] Amazon's success and leadership principles.
[0:33:35] Operationalizing appeals at scale.
[0:35:45] Data science, metrics, and hackathons.
[0:37:14] Developer experience and iterating changes.
[0:37:52] Advice for tackling a problem of this scale.
[0:39:19] Striving for trust and external validation.
[0:40:01] Amazon's efforts to combat abuse.
[0:40:32] Conclusion.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev -
Software at Scale 53 - Testing Culture with Mike Bland
Mike Bland is a software instigator - he helped drive adoption of automated testing at Google, and the Quality Culture Initiative at Apple.
Apple Podcasts | Spotify | Google Podcasts
Mike’s blog was instrumental towards my decision to pick a job in developer productivity/platform engineering. We talk about the Rainbow of Death - the idea of driving cultural change in large engineering organizations - one of the key challenges of platform engineering teams. And we deep dive into the value and common pushbacks against automated testing.
Highlights (GPT-3 generated)
[0:00 - 0:29] Welcome
[0:29 - 0:38] Explanation of Rainbow of Death
[0:38 - 0:52] Story of Testing Grouplet at Google
[0:52 - 5:52] Benefits of Writing Blogs and Engineering Culture Change
[5:52 - 6:48] Impact of Mike's Blog
[6:48 - 7:45] Automated Testing at Scale
[7:45 - 8:10] "I'm a Snowflake" Mentality
[8:10 - 8:59] Instigator Theory and Crossing the Chasm Model
[8:59 - 9:55] Discussion of Dependency Injection and Functional Decomposition
[9:55 - 16:19] Discussion of Testing and Testable Code
[16:19 - 24:30] Impact of Organizational and Cultural Change on Writing Tests
[24:30 - 26:04] Instigator Theory
[26:04 - 32:47] Strategies for Leaders to Foster and Support Testing
[32:47 - 38:50] Role of Leadership in Promoting Testing
[38:50 - 43:29] Philosophical Implications of Testing Practices
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev -
Software at Scale 52 - Building Build Systems with Benjy Weinberger
Benjy Weinberger is the co-founder of Toolchain, a build tool platform. He is one of the creators of the original Pants, an in-house Twitter build system focused on Scala, and was the VP of Infrastructure at Foursquare. Toolchain now focuses on Pants 2, a revamped build system.
Apple Podcasts | Spotify | Google Podcasts
In this episode, we go back to the basics, and discuss the technical details of scalable build systems, like Pants, Bazel and Buck. A common challenge with these build systems is that it is extremely hard to migrate to them, and have them interoperate with open source tools that are built differently. Benjy’s team redesigned Pants with an initial hyper-focus on Python to fix these shortcomings, in an attempt to create a third generation of build tools - one that easily interoperates with differently built packages, but still fast and scalable.
Machine-generated Transcript
[0:00] Hey, welcome to another episode of the Software at Scale podcast. Joining me here today is Benji Weinberger, previously a software engineer at Google and Twitter, VP of Infrastructure at Foursquare, and now the founder and CEO of Toolchain.Thank you for joining us.Thanks for having me. It's great to be here. Yes. Right from the beginning, I saw that you worked at Google in 2002, which is forever ago, like 20 years ago at this point.What was that experience like? What kind of change did you see as you worked there for a few years?[0:37] As you can imagine, it was absolutely fascinating. And I should mention that while I was at Google from 2002, but that was not my first job.I have been a software engineer for over 25 years. And so there were five years before that where I worked at a couple of companies.One was, and I was living in Israel at the time. So my first job out of college was at Check Point, which was a big successful network security company. And then I worked for a small startup.And then I moved to California and started working at Google. And so I had the experience that I think many people had in those days, and many people still do, of the work you're doing is fascinating, but the tools you're given to do it with as a software engineer are not great.This, I'd had five years of experience of sort of struggling with builds being slow, builds being flaky with everything requiring a lot of effort. There was almost a hazing,ritual quality to it. Like, this is what makes you a great software engineer is struggling through the mud and through the quicksand with this like awful substandard tooling. And,We are not users, we are not people for whom products are meant, right?We make products for other people. Then I got to Google.[2:03] And Google, when I joined, it was actually struggling with a very massive, very slow make file that took forever to parse, let alone run.But the difference was that I had not seen anywhere else was that Google paid a lot of attention to this problem and Google devoted a lot of resources to solving it.And Google was the first place I'd worked and I still I think in many ways the gold standard of developers are first class participants in the business and deserve the best products and the best tools and we will if there's nothing out there for them to use, we will build it in house and we will put a lot of energy into that.And so it was for me, specifically as an engineer.[2:53] A big part of watching that growth from in the sort of early to late 2000s was. The growth of engineering process and best practices and the tools to enforce it and the thing i personally am passionate about is building ci but i'm also talking about.Code review tools and all the tooling around source code management and revision control and just everything to do with engineering process.It really was an object lesson and so very, very fascinating and really inspired a big chunk of the rest of my career.I've heard all sorts of things like Python scripts that had to generate make files and finally they move the Python to -
Software at Scale 51 - Usage based Pricing with Puneet Gupta
Puneet Gupta is the co-founder and CEO of Amberflo, a cloud metering and usage based pricing platform.
Apple Podcasts | Spotify | Google Podcasts
In this episode, we discuss Puneet’s fascinating background early at AWS as a GM and his early experience at Oracle Cloud. We initially discuss why AWS shipped S3 as its first product before any other services. After, we go over the cultural differences between AWS and Oracle, and how usage based pricing and sales tied into the organization’s culture and efficiency.
Our episode covers all the different ways organizations align themselves better when pricing is directly tied to the usage metrics of customers. We discuss how SaaS subscription models are simply reworking of traditional software licenses, how vendors can dispel fears around overages due to dynamic pricing models, and even why Netflix should be a usage-based-priced service :-)
We don’t have a show notes, but I thought it would be interesting to link the initial PR newsletter for S3’s launch, to reflect on how our industry has completely changed over the last few years.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev -
Software at Scale 50 - Redefining Labor with Akshay Buddiga
Akshay Buddiga is the co-founder and CTO of Traba, a labor management platform.
Apple Podcasts | Spotify | Google Podcasts
Sorry for the long hiatus in episodes!
Today’s episode covers a myriad of interesting topics - from being the star of one of the internet’s first viral videos, to experiencing the hyper-growth at the somewhat controversial Zenefits, scaling out the technology platform at Fanatics, starting a company, picking an accelerator, only permitting in-person work, facilitating career growth of gig workers, and more!
Highlights
[0:00] - The infamous Spelling Bee incident.
[06:30] - Why pivot to Computer Science after an undergraduate focus in biomedical engineering?
[09:30] - Going to Stanford for Management Science and getting an education in Computer Science.
[13:00] - Zenefits during hyper-growth. Learning from Parker Conrad.
[18:30] - Building an e-commerce platform with reasonably high scale (powering all NFL gear) as a first software engineering gig. Dealing with lots of constraints from the beginning - like multi-currency support - and delivering a complete solution over several years.
The interesting seasonality - like Game 7 of the NBA finals - and the implications on the software engineers maintaining e-commerce systems. Watching all the super-bowls with coworkers.
[26:00] - A large outage, obviously due to DNS routing.
[31:00] - Why start a company?
[37:30] - Why join OnDeck?
[41:00] - Contrary to the current trend, Traba only allows in-person work. Why is that?
We go on to talk about the implications of remote work and other decisions in an early startup’s product velocity.
[57:00] - On being competitive.
[58:30] - Velocity is really about not working on the incorrect stuff.
[68:00] - What’s next for Traba? What’s the vision?
[72:30] - Building two-sided marketplaces, and the career path for gig workers.
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev
Customer Reviews
Excellent show!
Software at Scale has quickly become one of my favorite podcasts! I’m consistently impressed by the depth of insights and knowledge in each episode. No matter the topic, you’re guaranteed to learn something every time you listen. Highly recommend!