Join Justin Brodley and Jonathan Baker on TCP Talks our show where we interview industry leaders, vendors, and technologists about Cloud Computing, Robotics, Finops, and more.
Security & Observability with DataDog's Andrew Krug
Andrew Krug from Datadog
In this episode, Andrew Krug talks about Datadog as a security observability tool, shedding light on some of its applications as well as its benefits to engineers.
Andrew is the lead in Datadog Security Advocacy and Datadog Security Labs. Also a Cloud Security consultant, he started the Threat Response Project, a toolkit for Amazon Web Services first responders. Andrew has also spoken at Black Hat USA, DEFCON, re:Invent, and other platforms..
DataDog Product Overview
Datadog is focused on bringing security to engineering teams, not just security people. One of the biggest advantages of Datadog or other vendors is how they ingest and normalize various log sources. It can be very challenging to maintain a reasonable data structure for logs ingested from cloud providers.
Vendors try to provide customers with enough signals that they feel they are getting value while trying not to flood them with unactionable alerts. Also, considering the cloud friendliness for the stack is crucial for clients evaluating a new product.
Datadog is active in the open-source community and gives back to groups like the Cloud native computing foundation. One of their popular open-source security tools created is Stratus-red-team which simulates the techniques of attackers in a clean room environment. The criticality of findings is becoming a major topic. It is necessary when evaluating that criticality is based on how much risk applies to the business, and what can be done.
One of the things that teams struggle with as high maturity DevOps is trying to automate incident handling or response to critical alerts as this can cause Configuration Drift which is why there is a lot of hesitation to fully automate things. Having someone to make hard choices is at the heart of incident handling processes.
Datadog Cloud SIEM was created to help customers who were already customers of logs. Datadog SIEM is also very easy to use such that without being a security expert, the UI is simple. It is quite difficult to deploy a SIEM on completely unstructured logs, hence being able to extract and normalize data to a set of security attributes is highly beneficial. Interestingly, the typical boring hygienic issues that are easy to detect still cause major problems for very large companies. This is where posture management comes in to address issues on time and prevent large breaches.
Generally, Datadog is inclined towards moving these detections closer to the data that they are securing, and examining the application run time in real-time to verify that there are no issues. Datadog would be helpful to solve IAM challenges through CSPM which evaluates policies. For engineering teams, the benefit is seen in how information surfaces in areas where they normally look, especially with Datadog Security products where Issues are sorted in order of importance.
Security Observability Day is coming up on the 18th of April when Datadog products will be highlighted; the link to sign up is available on the Datadog Twitter page and Datadog community Slack. To find out more, reach out to Andrew on Twitter @andrewkrug and on the Datadog Security Labs website.
💡 "I think that great security solutions…start with alerts that you are hundred percent confident as a customer that you would act on"
💡 "When we talk about the context of 'how critical is an alert?’ It is always nice to put that risk lens on it for the business"
💡 "Humans are awesome unless you want really consistent results, and that's where automating part of the process comes into play"
💡 "More standardization always lends itself to better detection"
Evolution of NoSQL with Couchbase CTO, Ravi Mayuram
In this episode, Ravi Mayuram highlights the functionality of Couchbase as an evolutionary database platform, citing several simple day-to-day use cases and particular advantages of Couchbase.
Ravi Mayuram is CTO of Couchbase. He is an accomplished engineering executive with a passion for creating and delivering game-changing products for startups as well as Fortune-500 industry-leading companies.
Couchbase set out to build a next-generation database. Data has evolved greatly with IT advancements. The goal was to build a database that will connect people to the newer technologies, addressing problems that relational systems did not have to solve. The fundamental shift is that earlier systems were internally focused, built for trained users but now the systems are built directly for consumers. This shift also plays out in the vast difference in the number of consumers now interacting with these systems compared to the fewer trained users previously interacting with the systems.
One of the key factors that sets Couchbase apart is the No-SQL Database. It is a database that has evolved by combining five systems; a Cache and Key-value store, a Document store, a Relational document store, a Search system, and an Analytical system. Secondly, Couchbase performs well in the geo-distributed manner such that with one click, data is made available across availability zones. Lastly, all of this can be done at a large scale in seconds.
Regarding the global database concept that Google talks about, a globally consistent database may not be needed by most companies. The performance will be the biggest problem as transaction speed will be considerably low. Couchbase does these transactions locally within the data center and replicates them on the other side. The main issue of relational systems is that they make you pay the price of every transaction no matter how minor, but with Couchbase, it is possible to pay only the cost only with certain crucial transactions.
Edge has become a part of the enterprise architecture even such that people now have edge-based solutions. Two edges are emerging; the Network edge and the Tool edge where people are interfacing. Couchbase has built a mobile database available on devices, with sync capability.
As a consumer, the primary advantage of bringing data closer to the consumer is the latency issue. Often, data has to go through firewalls and multiple steps which delays it but this is the benefit of Couchbase. The user simply continues to have access to the data while Couchbase synchronizes the data in the back.
One of the applications of Couchbase in healthcare is insulin tracking. With many devices that monitor insulin which must work everywhere you go, Couchbase Lite does the insulin tracking, keeps the data even in the absence of a network, and later syncs it for review by healthcare professionals. This is also useful in operating rooms where the network is not accessible. The real benefit is seen when the data eventually gets back to the server and can be interpreted to make decisions on patient care.
The Couchbase Capella Service runs in the cloud and allows clients to specify what data should be sent to the edge and what should not be. This offers privacy and security measures, such that even in the loss or damage of a device, the data is secure and can be recovered. To effectively manage edge in devices, a lot of problems must be addressed to make it easier.
One of the concerns for anyone coming into Couchbase Capella is the expense of data extraction from the cloud, however, Couchbase is available on all three cloud providers. Also, with Couchbase, there is no need to keep replicating data as you can work on the data without moving it, which largely saves costs.
Other use cases for Couchbase include information for flight bookings, flight crew management systems, hotel reservations, and credit card payments. To learn more, visit the Couchbase website.
Revolutionizing Observability with New Relic featuring Daniel Kim
Revolutionizing Observability with New Relic
In this episode, Daniel explains a new strategy towards observability aimed at contextualizing large volumes of data to make it easier for users to identify the root cause of problems with their systems.
Daniel Kim is a Principal Developer Relations Engineer at New Relic and the founder of Bit Project, a 501(c)(3) nonprofit dedicated to making tech accessible to under-served communities. His job is basically to get developers excited about Observability, and he hopes to inspire students to maximize their potential in tech through inclusive, accessible developer education. He is passionate about diversity and inclusion in tech, good food, and dad jokes.
First, it is important to differentiate between monitoring and observability. Monitoring is basically when a code is instrumented to send data to a backend, to give answers to preconceived questions. With Observability, the goal is to monitor your system so as to later ask questions that were not in mind during the instrumentation of the system. Hence, if something new comes up you can find the root cause without modifying the code. There are so many levels of things to check when troubleshooting to find the cause of a problem, and this is where observability comes in.
There are different use cases for logs, metrics, and traces; Logs are files that record events, warnings, or errors however logs are ephemeral which means there is increased risk of losing a lot of data. A system needs to be in place to move logs to a central source. Another issue with logs is that it is poorly structured data. Logs are good to have as the last step of observability. Metrics and traces can however help to narrow down where to search in the logs to solve an issue.
Metrics are measurements that reflect the performance or health of your applications. They give an overview of how the systems are doing but tend to not be very specific in finding the root cause of a problem; other forms of data have to be adopted to get a clear picture. This is where Traces come in.
Traces are pieces of data that track a request as it goes through the system. Because of this, they can identify the root cause of an error or bottlenecks slowing down the system. However, they are very expensive and as such sampling is used when tracing but this reduces the accuracy of traces. Correlating information from logs, metrics, and traces gives a full clear picture for debugging to be carried out successfully. A lot of New Relic customers strive to get more pieces of data to get errors faster.
To balance the right data at the right time with the right cost, the first step when collecting large amounts of data is to find out how your organization is leveraging the data. A quick audit of the data to identify useful data is helpful. This can be done monthly or quarterly. Unstructured logs are difficult to aggregate
In the cloud native space, being able to be compatible with as many people as possible will determine the winners because there are many projects people use in production. Projects that are compatible with many other projects are the way forward.
APM is still very useful to understand application performance and in the future, data from all sources will be correlated to figure out the cause of a problem. Getting value very early from the system involves having a solid infrastructure and installing APM. The real power of full stack observability is getting data from different parts of your stack so you can diagnose what part of your system is going wrong. Leveraging AI to make sense of large amounts of data for engineers is going to be a huge plus.
A lot of vendors claim that their alert systems will automatically generate all alerts for you but this is not true because they would not know your team's needs. It is ultimately up to your team to set up alerts that create an observability strategy. Those who invest time into setting this
A New Approach To Spatial Simulations With Rahul Thakkar
Spatial Simulations with AWS SimSpace Weaver
In this episode, Peter sits with Rahul Thakkar to discuss the revolutionary AWS SimSpace Weaver, highlighting its unique function and applications across several industries.
Rahul Thakkar is the Director and General Manager of Simulation Technologies at Amazon Web Services. Before AWS, he held multiple executive roles at Boeing, Brivo, PIXIA, and DreamWorks Animation. He is an inventor, and global technology executive with a background in cloud computing, distributed and high-performance computing, media and entertainment, film, television, defense and intelligence, aerospace, and access control.
His film credits include Shrek, Antz, and Legend of Bagger Vance. In 2002, he was part of the team that won an Academy Award for Shrek as the Best Animated Feature. Again in 2016, at the 88th Annual Academy Awards, Thakkar received a Technical Achievement Award.
AWS SimSpace Weaver enables customers to run extremely large-scale spatial simulations without having to manage any of the underlying infrastructure. It also removes the complexity of state management of entities as they move about the simulation. Previously, carrying out such simulations would be done sequentially, in a cumbersome manner over years but now it can be done in parallel in weeks. Different organizations have tried out this functionality for several scenarios and the results have been amazing. This value was largely made possible due to the approach of working with customer feedback.
Rahul's interest in the cloud came much later in his career which started initially in the R&D department of the Motion Picture industry where he created many of the complex graphics in movies. He later moved into a small start-up that was developing technologies for satellite imagery and mapping, and from here he moved to aerospace. Generally, he observed the problem that it is very expensive for companies to maintain their infrastructure when dealing with simulations. It also would drain resources and distract from the main focus of the company. Eventually, knew he had to use AWS, and now he works with them.
All the other primitive tools within AWS are being consumed to build the service. There is also the ability to write to S3 so that customers can write the simulations out. This helps customers to remember how the simulation played out.
Relating this new service to the metaverse, Rahul believes that when it comes to the metaverse, each organization has its vision of what it should be. However, AWS built the tools to empower these organizations to build their metaverses. Despite the possibility of having competition from Azure or GCP, the focus of AWS would remain on the customer and their needs, innovation on their behalf.
Identifying new problems that the service would be very applicable for is a great challenge that AWS relies on customers for, to help AWS envision where they want to go with the service. There are definitely many companies running simulations but it is hard to predict how many would migrate to the AWS SimSpace Weaver because it is still a new product. Nonetheless, a lot of industries are interested in this new service. These include smart cities, organizations ranging from local to federal or international, logistics and supply chains, large-scale event planning, or any situation where there is a need to simulate a large problem with digital replicas of the real world.
💡 "The fact that we worked from the customer backwards is something that allowed us to deliver the kind of value that they're getting right now with AWS SimSpace Weaver"
💡 "Each one of these organizations have their own vision of a metaverse"
Applying and Maximizing Observability with Christine Yen
Applying and Maximizing Observability
In this episode, Christine talks about her company, Honeycomb which runs on AWS, with the goal of promoting observability for clients interested in the performance of their code or those trying to identify problem areas that need to be corrected.
Christine Yen is the Co-Founder and CEO of Honeycomb. Before founding Honeycomb, she built analytics products at Parse/Facebook and loved writing software to separate signals from noise. Christine delights in being a developer in a room full of ops folks. Outside of work, Christine is kept busy by her two dogs and wants your sci-fi & fantasy book recommendations.
Honeycomb is an observability platform that helps customers understand why their code is behaving differently from what they expected. The inspiration behind this software came after Christine’s previous company was acquired by Facebook and they realized how software made it very easy to identify problems in large code data within a short time. This encouraged them to build the tool and make it available to all engineers.
If the first wave of DevOps was Ops-people learning how to automate their working code, the second wave would be helping developers learn to operate their code. Honeycomb is designed intentionally to ensure that all types of engineers can make sense of the tool.
Honeycomb has always come up with ways for customers to use AWS products and get the data reflected in Honeycomb to be manipulated. Over the last few months, they have ensured that it is possible for clients to plug into CloudWatch Log and CloudWatch metrics, and redirect data directly from AWS products into Honeycomb instead. Clients can also use Honeycomb to extract data based on what their applications are doing. This applies to performance optimization, experimentation, or any situation where a company wants to try a code to see how it performs on production. The focus remains on the application layer. Before Honeycomb, no one was using observability in this context.
The pricing of Honeycomb is based on the volume of data, which makes it predictable and understandable. Unlike when the pricing scale is based on the fidelity of the data, which can be quite expensive.
Challenges within the observability space: The question is how to help new engineers learn from the seasoned engineers on the team through paper trails left by the seasoned engineers. This is a problem that can only be solved by enabling teams to orient new engineers on their systems without having to create another question as part of the code.
Building an AI Approach in Honeycomb may not be suitable because of the context involved, since training effective machine learning models relies on a vast amount of easily classifiable data and this does not apply in the world of software; every engineering team's systems are different from every other engineering team's systems. Honeycomb is interested in using Al to build these models in order to help users know what questions to ask.
With Honeycomb, usage patterns are much more dependent on the curiosity and proficiency of the engineering team; while some engineers who are used to getting answers directly may just leave the software, those who have a culture of asking questions will benefit more from it.
💡 "Not having to predict ahead of time what matters, is making such a difference in our ability as engineers to get ahead of issues, identify them quickly, resolve them"
💡 "We're out of a world where any individual engineer holds the entire system in their head"
💡 "Observability is the only way forward as we make our worlds ever less predictable"
The Service Not the Software: Anthony Lye on Evolution and Revolution
In this TCP Talks episode, Justin Brodley and Jonathan Baker talk with Anthony Lye, Executive Vice President and General Manager of NetApp’s Public Cloud Services Business Unit. An industry veteran for over 25 years, Anthony has been at the forefront of cloud innovation for over half this time.
Anthony shares his insight on the importance of embracing disruption in the tech industry. He discusses how NetApp seized the right opportunities, got lucky, and came to dominate the Cloud space — even while younger app developers may have no idea what it was.
"They don't comprehend — nor should they — the complexities of infrastructure,” Anthony explains. “And I really love the fact that we've been able to democratize ONTAP, because it's cool, but you’ve got to be really smart to get the best out of it. And so we just decided we would be the smart ones.”
What’s really behind innovation in tech? “The context is where you are. And people like to think that the world operates through evolution. And sometimes it's revolution –- sometimes, you have to do something radically different.”
Anthony also discusses cloud computing trends, the importance of customer focus, what NetApp does differently, and the multi-cloud.
👉 Name: Anthony Lye
👉 What he does: Anthony is Executive Vice President and General Manager of the Public Cloud Services business for NetApp
👉 Key quote: “You’ve got to put the customer in the middle of your business. And you’ve got to go where they want you to go. If you don't, your hold may last a while, but it won't last. And I still can't believe that what we did we got away with, and we've gotten so much time to build so aggressively. It's great.”
👉 Where to find him: LinkedIn
🚨 There are two halves of the cloud space: the IT half and the app half. IT people see huge opportunities in extending data centers. App people want to and can build and run their own stacks, and Anthony took advantage of this. “They don't have to wait for the IT people,” Anthony says. “And I wanted to build something for them — I didn't want to just hang out on the IT side. I went and asked a whole bunch of application people: what do you need?”
🚨 NetApp spies huge business growth potential on the horizon with recurring revenues. “Recurring revenues are the best kinds of revenues you can get,” Anthony clarifies. But people don’t always consider this. “Because they're different, they sort of ignore them — they don't like them. And before they know it, they're years behind and caught. And passed as if they're standing still.”
🚨 The customer is and always should be focused on as front and center of any business. For NetApp, the software and implementation are the same, but the unique integrations are what makes the service stand out. With SaaS, it’s now the second “S” — the service — that matters most. “The rule of SaaS is the other Henry Ford thing: you can have it in any color you want, as long as it's black,” Anthony says. “We're going to run it for you as a service, and you're going to love it”, NetApp tells customers, increasing developer productivity and providing a much higher release cadence.
Here's what was mentioned in the episode 👉
✔️ ARM: the most widely used family of instruction set architectures with over 200 billion ARM chips produced.
✔️ CloudCheckr: an end-to-end cloud management platform with cost, security, resource and service functionality.
✔️ CI/CD (continuous integration/continuous delivery): software development approaches often used in tandem for rapid code delivery and deployment.
✔️ Databricks: a data warehouse and machine learning company.
✔️ Elastic Block Service (EBS): an AWS scalable block service.
✔️ EMC: Dell’s hybrid cloud solution.
✔️ Filament: a cloud-native platform for data analysis.