TCP Talks

Justin Brodley & Jonathan Baker

Join Justin Brodley and Jonathan Baker on TCP Talks our show where we interview industry leaders, vendors, and technologists about Cloud Computing, Robotics, Finops, and more.

  1. TCP Talks: The David vs. Goliath of Cloud Storage: Chris Opat from Backblaze on Challenging Hyperscalers

    22 DE JUL.

    TCP Talks: The David vs. Goliath of Cloud Storage: Chris Opat from Backblaze on Challenging Hyperscalers

    For this special edition of TCP Talks, Justin Brodley and Matthew Kohn are joined by Chris Opat, SVP of Cloud Operations at Backblaze, to discuss how the cloud storage innovator is reshaping the industry landscape. From their origins as a consumer backup company to becoming a major player in enterprise cloud storage, Chris shares insights on AI workloads, the true cost of egress fees, and why your data doesn’t have to live in a walled garden. About Backblaze Backblaze started in 2007 with a simple mission: make storage so affordable it’s almost free. The company gained early notoriety for their DIY approach to storage infrastructure, with founders literally bending metal in apartments and conducting “gorilla storage purchasing” raids at Bay Area Best Buys and Fry’s Electronics to build their custom red storage pods. This scrappy, cost-conscious DNA remains central to the company’s identity today. In September 2015, Backblaze made their enterprise pivot with the launch of B2 Cloud Storage, entering the market at one-quarter the cost of Amazon S3. By December of that launch year, they had already attracted over 30,000 users. Today, Backblaze (NASDAQ: BLZE) manages approximately 4.7 exabytes of data across 310,000+ drives, serving over 500,000 customers in 175 countries. What sets Backblaze apart isn’t just their pricing—it’s their philosophy. While hyperscalers have built complex storage tiers with Byzantine billing structures, Backblaze offers one tier of hot storage with transparent, predictable pricing. Their recent push into AI workloads with B2 Overdrive demonstrates their ability to evolve with market demands while maintaining their core value proposition. About Chris Opat Chris Opat joined Backblaze as SVP of Cloud Operations in 2023, bringing over 25 years of experience in building teams and technology at startup and scale-up companies. Before Backblaze, he served as SVP of Platform Engineering and Operations at StackPath, specializing in edge technology and content delivery. His background includes extensive work with private equity portfolio companies, where he honed his skills in rapid transformation and growth. Chris describes himself as someone who thrives in “David vs. Goliath” scenarios, making Backblaze—with its mission to challenge the hyperscaler incumbents—a perfect fit. His passion for building exceptional technical teams and pushing technological boundaries aligns perfectly with Backblaze’s innovative culture. Interview Highlights The David vs. Goliath Mentality 3:15 Chris: “Nothing makes me happier than to watch a customer choose us over the incumbent competitors and have an exceptionally good experience. It’s easy to work for the incumbents and kind of win all the time. It feels so much better when you do it as the upstart that people don’t see coming.” Chris emphasized how Backblaze offers a fundamentally different partner experience compared to hyperscalers. While AWS, Azure, and Google Cloud may provide excellent services, they often lack the personal touch and flexibility that smaller customers need. At Backblaze, customers can directly influence product strategy and speak with decision-makers who shape the company’s direction. Egress Fees: The Hidden Tax of Cloud Storage 7:59 Chris: “Everybody who uses a hyperscaler is very familiar with the taxation of egress fees. It’s not a trivial subject… If you don’t know what you’re doing with a hyperscaler, egress fees can quickly sour your experience. They can drain your budget.” The discussion on egress fees revealed one of Backblaze’s key differentiators: their no egress fee policy through their Bandwidth Alliance partnerships. Chris shared a compelling example of a customer who saved hundreds of thousands of dollars on egress fees in their first year with Backblaze. This transparent pricing model contrasts sharply with hyperscalers, where egress costs can spiral out of control. When asked about recent announcements from Google and Amazon regarding “free” egress, Chris didn’t mince words: 10:07 Chris: “The devil’s in the details… The only way that they honor the free egress for repatriating your data is if you cancel all the services, and the cancellation timeframe… it’s something pretty brisk. It’s like 90 days or something.” AI Workloads: The New Frontier The conversation revealed how dramatically Backblaze’s customer base has evolved, particularly with AI workloads: 13:31 Chris: “We’ve got private connections to some customers where we’re serializing 400 gigabits per second line rate to them, so that they can very, very rapidly move libraries of data to tightly schedule back to back with perhaps maybe a GPU farm instance that they’ve got booked.” This represents a massive shift from their traditional backup use cases. The new B2 Overdrive product specifically addresses these high-bandwidth needs, offering performance levels that Chris claims most competitors “will not entertain giving you… They just flat out won’t.” The Scale Challenge Managing 4.7 exabytes across 310,000+ drives requires sophisticated capacity planning: 12:00 Chris: “It’s a full-time science to be fair. We’ve got people who are fully dedicated to this, and that’s what they do all day… We do nightly linear regressions on the installed environment.” Chris described the “triangle of tension” between reads, writes, and deletes on drives, explaining how they must carefully balance IOPS to ensure no customer experiences performance degradation. The shift from predictable consumer backup patterns to volatile AI workload demands has made this exponentially more complex. Data Sovereignty and Compliance With increasing global regulations around data privacy, Backblaze has invested heavily in ensuring compliance: 28:31 Chris: “When we designed our cluster in Toronto, data privacy and data sovereignty and security is taken extremely seriously there… We very, very carefully curated our network in market to ensure that Canadian customers would have their traffic ingress and egress through Canadian providers end to end.” This attention to data sovereignty extends beyond just storage location—it includes network routing, ensuring data never leaves jurisdictional boundaries unless explicitly requested by the customer. Green Initiatives and Sustainability The discussion touched on the growing importance of environmental considerations in data center operations: 31:55 Chris: “When we do a site selection process, we want to make sure that… their PUE and their operating profile fits what matches our personality… Being able to pass along a high efficiency PUE to a customer, it’s great for business.” Chris highlighted their Stockton, California facility as an example of their commitment to efficiency, noting it has “one of the most incredible PUEs ever.” Key Takeaways 1. Simple Pricing Wins Backblaze’s flat-rate, single-tier storage model eliminates the complexity of hyperscaler billing. No glacial retrieval fees, no complex lifecycle policies—just predictable costs. 2. Egress Freedom Matters The Bandwidth Alliance and no-egress-fee policy aren’t just marketing—they represent fundamental architectural decisions that save customers real money, especially for AI/ML workloads that require frequent data access. 3. Performance at Scale B2 Overdrive’s ability to deliver 400 Gbps demonstrates that alternative cloud providers can match or exceed hyperscaler performance for specific use cases. 4. Location Strategy is Key Strategic placement near GPU compute farms and careful attention to data sovereignty requirements shows Backblaze understands modern workload requirements. 5. The Human Touch Unlike hyperscalers where you need significant spend to get personal attention, Backblaze offers direct access to decision-makers and the ability to influence product direction. Technical Deep Dives Storage Architecture Evolution Chris revealed that Backblaze has moved beyond their famous DIY storage pods to work with OEM partners, enabling them to scale more efficiently while maintaining cost advantages. They’re also exploring flash storage integration to better serve high-IOPS workloads, particularly for AI inference use cases. Network Infrastructure The emphasis on peering relationships and private network interconnects (PNIs) demonstrates sophisticated network planning. Customers can specify routing preferences, including requirements for traffic to never transit the public internet. Capacity Planning for AI The shift from predictable backup workloads to volatile AI demands has required new approaches: Nightly linear regression analysis Real-time monitoring of the read/write/delete “triangle of tension” Flexible OEM partnerships for rapid capacity expansion Strategic geographic expansion to reduce latency to compute resources Industry Implications Chris’s insights suggest several important trends: The Egress Fee Backlash is Real: Hyperscaler “free egress” offers come with significant strings attached, validating alternative providers’ criticism of these practices. AI Changes Everything: Traditional storage patterns don’t apply to AI workloads. Providers must offer both massive bandwidth and flexible data lifecycle management. Hybrid Cloud is the Future: Customers want to use best-of-breed solutions, not be locked into a single provider’s ecosystem. Compliance Complexity is Growing: Data sovereignty isn’t just about location—it’s about network paths, audit trails, and provable data destruction. Quotes from Today’s Show On Customer Experience: “You probably can talk to people at Backblaze that are very influential, that are shaping the strategy of what we do. There’s every opportunity to help us guide part of our product strategy.” On Pricing Transparency: “Customer: “Sometimes we’re not

    37min
  2. TCP Talks: The evolution of Finops & Why you should attend Finops-X

    26/05/2024 · BÔNUS

    TCP Talks: The evolution of Finops & Why you should attend Finops-X

    Summary – Finops X In this conversation, Joe Daly and Rob Martin from the FinOps Foundation discuss the latest developments in the FinOps space and Finops-X. They talk about the evolution of FinOps practices, the growth of the FinOps community, and the importance of the Focus project, which aims to standardize billing data from different cloud providers. They also discuss the adoption of FinOps practices by SaaS companies and the future of the FinOps space. The conversation covers the updates and changes in the FinOps framework, including the addition of allied personas and the simplification of domains and capabilities. It also discusses the upcoming Finops-X conference and the value it provides for attendees, including deep and concrete content, networking opportunities, and career advancement.  Keywords FinOps, FinOps Foundation, FinOps X conference, podcast, cloud providers, Focus project, billing data, cloud-agnostic, tool agnostic, open source project, SaaS companies, FinOps framework, allied personas, domains and capabilities, Finops-X conference, deep content, networking, career advancement, Finops-X Europe Takeaways FinOps practices have evolved to focus on making processes more operational and improving decision-making in businesses. The FinOps Foundation has seen significant growth, with over 100 members, including major cloud providers. The Focus project, an open billing standard, aims to consolidate billing data from different cloud providers and enable more effective cost allocation. The adoption of FinOps practices by SaaS companies is increasing, with a focus on consumption-based licensing management. The future of the FinOps space includes expanding the Focus project to include sustainability data and additional usage-based data. The FinOps framework has been updated to include allied personas and simplified domains and capabilities. Finops-X conference provides valuable content, networking opportunities, and career advancement for attendees. Finops-X Europe conference in Barcelona offers a focused event for the European market. The conversation also mentions the importance of small businesses attending the conference and the success stories of attendees. Sound Bites “How do I make these processes much more operational? How do I affect the broader decision-making going on in my business?” “The Focus project… will consolidate or specify how billing data should come from the different cloud providers.” “The Focus project… essentially handles the data ingestion problem that has plagued a lot of organizations early on.” “The two big changes that happened this year were the addition of a lot of allied personas.” “We’ve simplified those down into four key domains.” “What other things are you guys excited about for Finops-X?” About Joe Daily & Rob Martin Joe Daly is a Director of Community for the FinOps Foundation, which is kind of like sitting at the largest lunch table in Middle School, but with less vaping.  He’s had illustrious careers as a CPA (the Statute of Limitations has past for all tax returns he prepared and he has let his CPA expire), Corporate Taxation, IT Finance & Accounting, IT Portfolio Management, a regrettable stint as Manager of Server Operations, and has started two teams that perform what has come to be known as FinOps.  He lives in Columbus, OH and enjoys copying off Rob. Go Captains! Rob Martin is a FinOps Principal at the FinOps Foundation, which is kind of like being a Middle School Principal, but with less vaping. He’s had illustrious careers at Accenture, the US Department of Justice, Amazon Web Services, and Cloudability, and less lustrious jobs at a few other places. He now spends his time collecting, developing, and distributing FinOps content among the huge global community of people who deliver value from cloud. He lives in Leesburg, VA, and enjoys games (including the FinOps Boardgame!), hiking, and announcing for his son’s high school soccer team. Go Captains! Chapters 00:00 Introduction and Overview 02:32 The Evolution of FinOps Practices 05:19 The Growth of the FinOps Community 06:18 The Importance of the Focus Project 09:29 Adoption of FinOps Practices by SaaS Companies 12:35 The Future of the FinOps Space 24:29 The Value of Finops-X Conference 28:29 Finops-X Europe: A Focused Event for the European Market 29:32 Success Stories and Career Advancement at Finops-X Learn More: Finops Foundation Finops Foundation on Twitter Finops-X Subscribe to The Cloud Pod

    37min
  3. TCP Talks with Rackspace CTO of Public Cloud - Travis Runty

    07/05/2024 · BÔNUS

    TCP Talks with Rackspace CTO of Public Cloud - Travis Runty

    For this special edition of TCP Talks, Justin and Jonathan are joined by Travis Runty, CTO of Public Cloud with Rackspace Technology. In today’s interview, they discuss being accidentally multi cloud, public vs private cloud, and cloud migration, and best practices when assisting clients with their cloud journeys.  Background Rackspace Technology, commonly known as Rackspace, is a leading multi-cloud solutions provider headquartered in San Antonio, Texas, United States. Founded in 1998, Rackspace has established itself as a trusted partner for businesses seeking expertise in managing and optimizing their cloud environments. The company offers a wide range of services aimed at helping organizations navigate the complexities of cloud computing, including cloud migration, managed hosting, security, data analytics, and application modernization. Rackspace supports various cloud platforms, including AWS, Azure, and GCP, among others.  Rackspace prides itself on its “Fanatical Experience” approach, which emphasizes delivering exceptional customer support and service. This commitment to customer satisfaction has contributed to Rackspace’s reputation as a reliable and customer-centric provider in the cloud computing industry.  Meet Travis Runty, CTO of Public Cloud for Rackspace Technology Beginning his career with Rackspace as a Linux engineer, Travis has spent the last 15 years working his way through multiple divisions of the company, including 10 years in senior and director level positions. Most recently, Travis served as VP of Technical Support of Global Cloud Operations from 2020-2022.  Travis is extremely passionate about building and leading high performance engineering teams and delivering innovative solutions. Most recently, as a member of their technology council, Travis wrote an article for Forbes – Building a Cloud-Savvy Workforce: Empowering Your Team for Success – where he discussed best practices for prioritizing workforce enablement, especially when it comes to training and transformation initiatives.  Interview Notes: In the main show, TCP has been talking a lot about Cloud / hybrid cloud / multi-cloud and repatriating data back to on prem, and today’s guest knows all about those topics.  Rackspace has had quite a few phases in their journey to public cloud – including building a data center in an unused mall, introducing managed services, creating partnerships with VMware, an attempt to go head to head with the hyperscalers, and then ultimately focusing on public cloud and instead partnering with the hyperscalers.  Rackspace has both a focus on private and public cloud; when it comes to private cloud they focus mainly on VMware and OpenStack, whereas in the public cloud side, Rackspace partners with the hyperscalers to assist clients with their cloud journey.  Quotes from today’s show  Travis: “We want to make sure that when a customer goes on their public cloud journey, that they actually have a robust strategy that is going to be effective. From there, we’re able to leverage our professional services teams to make sure that they can realize that transformation, and hopefully there *is* a transformation, and it’s not just a lift and shift.” Travis: “A conflict that we continuously have to strike the balance of is when do we apply a cloud native solution, and where do we apply the Rackspace elements on top. The hyperscalers technology is the best there is, and we’re probably not going to create a better version of “x” than AWS does – nor do we want to.” Travis: “We favor cloud native. Every single time we’re going to favor the platform’s native solution, unless the customer has a really really strong opinion about being vendor locked. Which sometimes they do. And if that’s the case we can establish a solution that gives them that portability. But for right now, the customers are generally preferring cloud native solutions.”

    40min
  4. Sonrai Security with Sandy Bird

    11/04/2024

    Sonrai Security with Sandy Bird

    A bonus episode of The Cloud Pod may be just what the doctor ordered, and this week Justin and Jonathan are here to bring you an interview with Sandy Bird of Sonrai Security. There’s so much going on in the IAM space, and we’re really happy to have an expert in the studio with us this week to talk about some of the security least privilege specifics.  Background Sonrai (pronounced Son-ree, which means data in Gaelic) was founded in 2017. Sonrai provides Cloud Data Control, and seeks to deliver a complete risk model of all identity and data relationships, which includes activity and movement across cloud accounts, providers, and third party data stores. Try it free for 14 days Start your free trial today Meet Sandy Bird, Co founder of Sonrai Security Sandy is the co-founder and CTO of Sonrai, and has a long career in the tech industry. He was the CTO and co-founder of Q1 Labs, which was acquired by IBM in 2011, and helped to drive IBM security growth as CTO for global business security there.  Interview Notes: One of the big questions we start the interview with is just how has IAM evolved – and what kind of effect have those changes had on the identity models?  Enterprise wants things to be least privilege, but it’s hard to find the logs. In cloud, however *most* things are logged – and so least privilege became an option.  Sonrai offers the first cloud permissions firewall, which enables one click least privilege management, which is important in the current environment where the platforms operate so differently from each other. With this solution, you have better control of your cloud access, limit your permissions, attack surface, and automate least privilege – all without slowing down DevOps2.  Is the perfect policy achievable? Sandy breaks it between human identities and workload identities; they’re definitely separate. He claims, in workload identities the perfect policy is probably possible. Human identity is hugely sporadic, however, it’s important to at least try to get to that perfect policy, especially when dealing with sensitive information. One of the more interesting data pieces they found was that less than 10% of identities with sensitive permissions actually used them – and you can use the information to balance out actually handing out permissions versus a one time use case.  Sonrai spent a lot of time looking at new solutions to problems with permissions; part of this includes purpose-built integration, offering a flexible open GraphQL API with prebuilt integrations.  Sonrai also offers continuous monitoring; providing ongoing intelligence on all the permission usage – including excess permissions – and enables the removal of unused permissions without any sort of disruptions. Policy automation automatically writes IAM policies tailored to access needs, and simplifies processes for teams.  On demand access is another tool that gives on demand requests for permissions that are restricted with a quick and efficient process.  Quotes from today’s show  Sandy: “The unbelievably powerful model in AWS can do amazing things, especially when you get into some of the advanced conditions – but man, for a human to understand what all this stuff is, is super hard. Then you go to the Azure model, which is very different. It’s an allow first model. If you have an allow anywhere in the tree, you can do whatever is asked, but there’s this hierarchy to the whole thing, and so when you think you want to remove something you may not even be removing it., because something above may have that permission anyway. It’s a whole different model to learn there.”  Sandy: “Only like 8% of those identities actually use the sensitive parts of them; the other 92 just sit in the cloud, never being used, and so most likely during that break loss scenario in the middle of the night, somebody’s troubleshooting, they have to create some stuff, and overpermission it . If we control this centrally, the sprawl doesn’t happen.” Sandy: There is this fear that if I remove this identity, I may not be able to put it back the way it was if it was supposed to be important… We came up with a secondary concept for the things that you were worried about… where we basically short circuit them, and say these things can’t log in and be used anymore, however we don’t delete the key material, we don’t delete the permissions. We leave those all intact.”

    40min
  5. Security & Observability with DataDog's Andrew Krug

    12/04/2023

    Security & Observability with DataDog's Andrew Krug

    Andrew Krug from Datadog In this episode, Andrew Krug talks about Datadog as a security observability tool, shedding light on some of its applications as well as its benefits to engineers. Andrew is the lead in Datadog Security Advocacy and Datadog Security Labs. Also a Cloud Security consultant, he started the Threat Response Project, a toolkit for Amazon Web Services first responders. Andrew has also spoken at Black Hat USA, DEFCON, re:Invent, and other platforms.. DataDog Product Overview Datadog is focused on bringing security to engineering teams, not just security people. One of the biggest advantages of Datadog or other vendors is how they ingest and normalize various log sources. It can be very challenging to maintain a reasonable data structure for logs ingested from cloud providers. Vendors try to provide customers with enough signals that they feel they are getting value while trying not to flood them with unactionable alerts. Also, considering the cloud friendliness for the stack is crucial for clients evaluating a new product. Datadog is active in the open-source community and gives back to groups like the Cloud native computing foundation. One of their popular open-source security tools created is Stratus-red-team which simulates the techniques of attackers in a clean room environment. The criticality of findings is becoming a major topic. It is necessary when evaluating that criticality is based on how much risk applies to the business, and what can be done. One of the things that teams struggle with as high maturity DevOps is trying to automate incident handling or response to critical alerts as this can cause Configuration Drift which is why there is a lot of hesitation to fully automate things. Having someone to make hard choices is at the heart of incident handling processes. Datadog Cloud SIEM was created to help customers who were already customers of logs. Datadog SIEM is also very easy to use such that without being a security expert, the UI is simple. It is quite difficult to deploy a SIEM on completely unstructured logs, hence being able to extract and normalize data to a set of security attributes is highly beneficial. Interestingly, the typical boring hygienic issues that are easy to detect still cause major problems for very large companies. This is where posture management comes in to address issues on time and prevent large breaches. Generally, Datadog is inclined towards moving these detections closer to the data that they are securing, and examining the application run time in real-time to verify that there are no issues. Datadog would be helpful to solve IAM challenges through CSPM which evaluates policies. For engineering teams, the benefit is seen in how information surfaces in areas where they normally look, especially with Datadog Security products where Issues are sorted in order of importance. Security Observability Day is coming up on the 18th of April when Datadog products will be highlighted; the link to sign up is available on the Datadog Twitter page and Datadog community Slack. To find out more, reach out to Andrew on Twitter @andrewkrug and on the Datadog Security Labs website. Top Quotes “I think that great security solutions…start with alerts that you are hundred percent confident as a customer that you would act on” “When we talk about the context of ‘how critical is an alert?’ It is always nice to put that risk lens on it for the business” “Humans are awesome unless you want really consistent results, and that’s where automating part of the process comes into play” “More standardization always lends itself to better detection”

    28min
  6. Evolution of NoSQL with Couchbase CTO, Ravi Mayuram

    24/03/2023

    Evolution of NoSQL with Couchbase CTO, Ravi Mayuram

    In this episode, Ravi Mayuram highlights the functionality of Couchbase as an evolutionary database platform, citing several simple day-to-day use cases and particular advantages of Couchbase. Ravi Mayuram is CTO of Couchbase. He is an accomplished engineering executive with a passion for creating and delivering game-changing products for startups as well as Fortune-500 industry-leading companies. Notes Couchbase set out to build a next-generation database. Data has evolved greatly with IT advancements. The goal was to build a database that will connect people to the newer technologies, addressing problems that relational systems did not have to solve. The fundamental shift is that earlier systems were internally focused, built for trained users but now the systems are built directly for consumers. This shift also plays out in the vast difference in the number of consumers now interacting with these systems compared to the fewer trained users previously interacting with the systems. One of the key factors that sets Couchbase apart is the No-SQL Database. It is a database that has evolved by combining five systems; a Cache and Key-value store, a Document store, a Relational document store, a Search system, and an Analytical system. Secondly, Couchbase performs well in the geo-distributed manner such that with one click, data is made available across availability zones. Lastly, all of this can be done at a large scale in seconds. Regarding the global database concept that Google talks about, a globally consistent database may not be needed by most companies. The performance will be the biggest problem as transaction speed will be considerably low. Couchbase does these transactions locally within the data center and replicates them on the other side. The main issue of relational systems is that they make you pay the price of every transaction no matter how minor, but with Couchbase, it is possible to pay only the cost only with certain crucial transactions. Edge has become a part of the enterprise architecture even such that people now have edge-based solutions. Two edges are emerging; the Network edge and the Tool edge where people are interfacing. Couchbase has built a mobile database available on devices, with sync capability. As a consumer, the primary advantage of bringing data closer to the consumer is the latency issue. Often, data has to go through firewalls and multiple steps which delays it but this is the benefit of Couchbase. The user simply continues to have access to the data while Couchbase synchronizes the data in the back. One of the applications of Couchbase in healthcare is insulin tracking. With many devices that monitor insulin which must work everywhere you go, Couchbase Lite does the insulin tracking, keeps the data even in the absence of a network, and later syncs it for review by healthcare professionals. This is also useful in operating rooms where the network is not accessible. The real benefit is seen when the data eventually gets back to the server and can be interpreted to make decisions on patient care. The Couchbase Capella Service runs in the cloud and allows clients to specify what data should be sent to the edge and what should not be. This offers privacy and security measures, such that even in the loss or damage of a device, the data is secure and can be recovered. To effectively manage edge in devices, a lot of problems must be addressed to make it easier. One of the concerns for anyone coming into Couchbase Capella is the expense of data extraction from the cloud, however, Couchbase is available on all three cloud providers. Also, with Couchbase, there is no need to keep replicating data as you can work on the data without moving it, which largely saves costs. Other use cases for Couchbase include information for flight bookings, flight crew management systems, hotel reservations, and credit card payments. To learn more, visit the Couchbase website. There is also a free trial for the Couchbase Capella Service.   Top Quotes “The modern database has to do more than what the old database did” “Managing edge in devices is not an easy thing, and so you have to solve a lot of problems so it becomes easier”

    38min
  7. Revolutionizing Observability with New Relic featuring Daniel Kim

    02/03/2023

    Revolutionizing Observability with New Relic featuring Daniel Kim

    Revolutionizing Observability with New Relic In this episode, Daniel explains a new strategy towards observability aimed at contextualizing large volumes of data to make it easier for users to identify the root cause of problems with their systems. Daniel Kim is a Principal Developer Relations Engineer at New Relic and the founder of Bit Project, a 501(c)(3) nonprofit dedicated to making tech accessible to under-served communities. His job is basically to get developers excited about Observability, and he hopes to inspire students to maximize their potential in tech through inclusive, accessible developer education. He is passionate about diversity and inclusion in tech, good food, and dad jokes. Show Notes First, it is important to differentiate between monitoring and observability. Monitoring is basically when a code is instrumented to send data to a backend, to give answers to preconceived questions. With Observability, the goal is to monitor your system so as to later ask questions that were not in mind during the instrumentation of the system. Hence, if something new comes up you can find the root cause without modifying the code. There are so many levels of things to check when troubleshooting to find the cause of a problem, and this is where observability comes in. There are different use cases for logs, metrics, and traces; Logs are files that record events, warnings, or errors however logs are ephemeral which means there is increased risk of losing a lot of data. A system needs to be in place to move logs to a central source. Another issue with logs is that it is poorly structured data. Logs are good to have as the last step of observability. Metrics and traces can however help to narrow down where to search in the logs to solve an issue. Metrics are measurements that reflect the performance or health of your applications. They give an overview of how the systems are doing but tend to not be very specific in finding the root cause of a problem; other forms of data have to be adopted to get a clear picture. This is where Traces come in. Traces are pieces of data that track a request as it goes through the system. Because of this, they can identify the root cause of an error or bottlenecks slowing down the system. However, they are very expensive and as such sampling is used when tracing but this reduces the accuracy of traces. Correlating information from logs, metrics, and traces gives a full clear picture for debugging to be carried out successfully. A lot of New Relic customers strive to get more pieces of data to get errors faster. To balance the right data at the right time with the right cost, the first step when collecting large amounts of data is to find out how your organization is leveraging the data. A quick audit of the data to identify useful data is helpful. This can be done monthly or quarterly. Unstructured logs are difficult to aggregate In the cloud native space, being able to be compatible with as many people as possible will determine the winners because there are many projects people use in production. Projects that are compatible with many other projects are the way forward. APM is still very useful to understand application performance and in the future, data from all sources will be correlated to figure out the cause of a problem. Getting value very early from the system involves having a solid infrastructure and installing APM. The real power of full stack observability is getting data from different parts of your stack so you can diagnose what part of your system is going wrong. Leveraging AI to make sense of large amounts of data for engineers is going to be a huge plus. A lot of vendors claim that their alert systems will automatically generate all alerts for you but this is not true because they would not know your team’s needs. It is ultimately up to your team to set up alerts that create an observability strategy. Those who invest time into setting this up get the most ROI from New Relic. Engineers need to figure out what metrics are important to them. About New Relic One: This was made to be a singular observability platform where people can correlate various pieces of data to get more context making the work easy for engineers. The goal was to help engineers to find the information they need as fast as possible, especially during a crisis. This kind of third-party solution is much more applicable for processing millions of logs or larger data, compared to native tools. It also provides a large amount of expertise around observability and curated experiences around machine-generated data. The future seems to have customers tilting towards open-source observability solutions. OpenTelemetry is one example of this, as it brings together all observability offerings in open source in a whole stack observability experience. Visit the New Relic website to learn more about it. To learn more about ways to use New Relic, check out the New Relic Blogs. Top Quotes “Having so much data and information about your system, you’re able to quickly figure and rule out issues that you may be having that’s causing the issue” “A really good practice when we think about controlling cost is getting a really good idea of how you’re actually using the data that you’re collecting” “Having structured logs is really helpful when we’re talking about observability” “Something that I’ve realized in the tenure that I’ve been working in observability is that when something sounds too good to be true, it probably is”

    26min
  8. A New Approach To Spatial Simulations With Rahul Thakkar

    15/02/2023

    A New Approach To Spatial Simulations With Rahul Thakkar

    Spatial Simulations with AWS SimSpace Weaver In this episode, Peter sits with Rahul Thakkar to discuss the revolutionary AWS SimSpace Weaver, highlighting its unique function and applications across several industries. Rahul Thakkar is the Director and General Manager of Simulation Technologies at Amazon Web Services. Before AWS, he held multiple executive roles at Boeing, Brivo, PIXIA, and DreamWorks Animation. He is an inventor, and global technology executive with a background in cloud computing, distributed and high-performance computing, media and entertainment, film, television, defense and intelligence, aerospace, and access control. His film credits include Shrek, Antz, and Legend of Bagger Vance. In 2002, he was part of the team that won an Academy Award for Shrek as the Best Animated Feature. Again in 2016, at the 88th Annual Academy Awards, Thakkar received a Technical Achievement Award. Notes AWS SimSpace Weaver enables customers to run extremely large-scale spatial simulations without having to manage any of the underlying infrastructure. It also removes the complexity of state management of entities as they move about the simulation. Previously, carrying out such simulations would be done sequentially, in a cumbersome manner over years but now it can be done in parallel in weeks. Different organizations have tried out this functionality for several scenarios and the results have been amazing. This value was largely made possible due to the approach of working with customer feedback. Rahul’s interest in the cloud came much later in his career which started initially in the R&D department of the Motion Picture industry where he created many of the complex graphics in movies. He later moved into a small start-up that was developing technologies for satellite imagery and mapping, and from here he moved to aerospace. Generally, he observed the problem that it is very expensive for companies to maintain their infrastructure when dealing with simulations. It also would drain resources and distract from the main focus of the company. Eventually, knew he had to use AWS, and now he works with them. All the other primitive tools within AWS are being consumed to build the service. There is also the ability to write to S3 so that customers can write the simulations out. This helps customers to remember how the simulation played out. Relating this new service to the metaverse, Rahul believes that when it comes to the metaverse, each organization has its vision of what it should be. However, AWS built the tools to empower these organizations to build their metaverses. Despite the possibility of having competition from Azure or GCP, the focus of AWS would remain on the customer and their needs, innovation on their behalf. Identifying new problems that the service would be very applicable for is a great challenge that AWS relies on customers for, to help AWS envision where they want to go with the service. There are definitely many companies running simulations but it is hard to predict how many would migrate to the AWS SimSpace Weaver because it is still a new product. Nonetheless, a lot of industries are interested in this new service. These include smart cities, organizations ranging from local to federal or international, logistics and supply chains, large-scale event planning, or any situation where there is a need to simulate a large problem with digital replicas of the real world. Top Quotes “The fact that we worked from the customer backwards is something that allowed us to deliver the kind of value that they’re getting right now with AWS SimSpace Weaver” “Each one of these organizations have their own vision of a metaverse”

    32min

Sobre

Join Justin Brodley and Jonathan Baker on TCP Talks our show where we interview industry leaders, vendors, and technologists about Cloud Computing, Robotics, Finops, and more.