Streaming Audio features all things Apache Kafka®, Confluent, real-time data, and the cloud. We cover frequently asked questions, best practices, and use cases from the Kafka community—from Kafka connectors and distributed systems, to data mesh, data integration, modern data architectures, and data mesh built with Confluent and cloud Kafka as a service. Join host Kris Jenkins (Senior Developer Advocate, Confluent) as he streams through a series of interviews, stories, and use cases with guests from the data streaming industry. Apache®️, Apache Kafka, Kafka, and the Kafka logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Common Apache Kafka Mistakes to Avoid
What are some of the common mistakes that you have seen with Apache Kafka® record production and consumption? Nikoleta Verbeck (Principal Solutions Architect at Professional Services, Confluent) has a role that specifically tasks her with performance tuning as well as troubleshooting Kafka installations of all kinds. Based on her field experience, she put together a comprehensive list of common issues with recommendations for building, maintaining, and improving Kafka systems that are applicable across use cases.
Kris and Nikoleta begin by discussing the fact that it is common for those migrating to Kafka from other message brokers to implement too many producers, rather than the one per service. Kafka is thread safe and one producer instance can talk to multiple topics, unlike with traditional message brokers, where you may tend to use a client per topic.
Monitoring is an unabashed good in any Kafka system. Nikoleta notes that it is better to monitor from the start of your installation as thoroughly as possible, even if you don't think you ultimately will require so much detail, because it will pay off in the long run. A major advantage of monitoring is that it lets you predict your potential resource growth in a more orderly fashion, as well as helps you to use your current resources more efficiently. Nikoleta mentions the many dashboards that have been built out by her team to accommodate leading monitoring platforms such as Prometheus, Grafana, New Relic, Datadog, and Splunk.
They also discuss a number of useful elements that are optional in Kafka so people tend to be unaware of them. Compression is the first of these, and Nikoleta absolutely recommends that you enable it. Another is producer callbacks, which you can use to catch exceptions. A third is setting a `ConsumerRebalanceListener`, which notifies you about rebalancing events, letting you prepare for any issues that may result from them.
Other topics covered in the episode are batching and the `linger.ms` Kafka producer setting, how to figure out your units of scale, and the metrics tool Trogdor.
5 Common Pitfalls when Using Apache KafkaKafka Internals courselinger.ms producer configs.Fault Injection—TrogdorFrom Apache Kafka to Performance in Confluent CloudKafka CompressionInterface ConsumerRebalanceListenerWatch the video version of this podcastNikoleta Verbeck’s TwitterKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Confluent Cloud usage (details)
Tips For Writing Abstracts and Speaking at Conferences
A well-written abstract is your ticket to conferences, but how do you write an excellent synopsis that will get accepted? As an experienced conference speaker, Robin Moffatt (Principal Developer Advocate, Confluent) often writes presentations that help the developer community to understand Apache Kafka® and its ecosystem. He is also the Program Committee Chair for Kafka Summit and Current 2022: The Next Generation of Kafka Summit. Having seen hundreds of conference submissions, Robin shares best practices for crafting abstracts that stand out, as well as tips for speaking at conferences.
So you want to answer the call for papers? Before writing your abstract, Robin and Kris recommend identifying a topic that you are enthusiastic about, or a topic that can be useful to others. Oftentimes, attendees go to conferences to learn about a given technology, which they may not have extensive knowledge of yet—so a fundamental topic is a good basis for a conference talk.
Once you’ve identified the topic you are interested in, there are key components to an effective write up:
Title: Come up with an enticing title that lets the conference organizers and audiences understand the content at a glance. There is a chance that a great topic could be rejected due to a poor title.Abstract: Summarize the topic you plan to talk about in the proper format and length. Usually, a polished abstract has three short paragraphs consisting of approximately 200 words.It’s essential to spend quality time writing and refining your abstract, while keeping two audience groups in mind—the program committee and the conference attendees. Robin shares that when reviewing submissions, the program committees have a few standards in mind, such as if the topic fits into the overall conference theme, and whether attendees would be interested in the talk. Then if the abstract is accepted, the attendees themselves will decide if they’ll attend a particular session based on the agenda and the brief.
Robin and Kris also discuss why you should submit to a conference in the first place and also give tips for preparing your talk once you are accepted. If you are a new speaker or just someone interested in getting feedback on your abstract, Robin and the conference committees for Current 2022: The Next Generation of Kafka Summit will be hosting office hours to provide feedback.
Current 2022: How to Become a SpeakerHow to Win at the Conference Abstract Submission GameCollection: How to Write a Good Conference AbstractPreparing a New TalkSo How Do you Make Those Cool Diagrams?Syntax Highlighting Code For Presentation Slides Watch Video VersionTwitter: Robin Moffatt | Kris JenkinsJoin the Confluent CommunityUse PODCAST100 to get $100 of Confluent Cloud usage (details)
How I Became a Developer Advocate
What is a developer advocate and how do you become one? In this episode, we have seasoned developer advocates, Kris Jenkins (Senior Developer Advocate, Confluent) and Danica Fine (Senior Developer Advocate, Confluent) answer the question by diving into how they got into the world of developer relations, what they enjoyed the most about their roles, and how you can become one.
Developer advocacy is at the heart of a developer community—helping developers and software engineers to get the most out of a given technology by providing support in form of blog posts, podcasts, conference talks, video tutorials, meetups, and other mediums.
Before stepping into the world of developer relations, both Danica and Kris were hands-on developers. While dedicating professional time, Kris also devoted personal time to supporting fellow developers, such as running local meetups, writing blogs, and organizing hackathons.
While Danica found her calling after learning more about Apache Kafka® and successfully implemented a mission-critical application for a financial services company—transforming 2,000 lines of codes into Kafka Streams. She enjoys building and sharing her knowledge with the community to make technology as accessible and as fun as possible.
Additionally, the duo previews their developer advocacy trip to Singapore and Australia in mid-June, where they will attend local conferences and host in-person meetups on Kafka and event streaming.
Data Mesh Architecture: A Modern Distributed Data Model
Data mesh isn’t software you can download and install, so how do you build a data mesh? In this episode, Adam Bellemare (Staff Technologist, Office of the CTO, Confluent) discusses his data mesh proof of concept and how it can help you conceptualize the ways in which implementing a data mesh could benefit your organization.
Adam begins by noting that while data mesh is a type of modern data architecture, it is only partially a technical issue. For instance, it encompasses the best way to enable various data sets to be stored and made accessible to other teams in a distributed organization. Equally, it’s also a social issue—getting the various teams in an organization to commit to publishing high-quality versions of their data and making them widely available to everyone else. Adam explains that the four data mesh concepts themselves provide the language needed to start discussing the necessary social transitions that must take place within a company to bring about a better, more effective, and efficient data strategy.
The data mesh proof of concept created by Adam's team showcases the possibilities of an event-stream based data mesh in a fully functional model. He explains that there is no widely accepted way to do data mesh, so it's necessarily opinionated. The proof of concept demonstrates what self-service data discovery looks like—you can see schemas, data owners, SLAs, and data quality for each data product. You can also model an app consuming data products, as well as publish your own data products.
In addition to discussing data mesh concepts and the proof of concept, Adam also shares some experiences with organizational data he had as a staff data platform engineer at Shopify. His primary focus was getting their main ecommerce data into Apache Kafka® topics from sharded MySQL—using Kafka Connect and Debezium. He describes how he really came to appreciate the flexibility of having access to important business data within Kafka topics. This allowed people to experiment with new data combinations, letting them come up with new products, novel solutions, and different ways of looking at problems. Such data sharing and experimentation certainly lie at the heart of data mesh.
Adam has been working in the data space for over a decade, with experience in big-data architecture, event-driven microservices, and streaming data platforms. He’s also the author of the book “Building Event-Driven Microservices.”
The Definitive Guide to Building a Data Mesh with Event StreamsWhat is data mesh? Saxo Bank’s Best Practices for Distributed Domain-Driven Architecture Founded on the Data MeshWatch the video version of this podcastKris Jenkins’ TwitterJoin the Confluent CommunityLearn more with Kafka tutorials at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of Confluent Cloud usage (details)
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools
Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.
The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.
Why Use Apache Flink?
The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.
Why use ksqlDB/Kafka Streams?
Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.
There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.
Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.
The Future of SQL: Databases Meet Stream ProcessingBuilding Real-Time Event Streams in the Cloud, On PremisesKafka Streams 101 courseksqlDB 101 courseWatch the video version of this podcastKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 for additional $100 of Confluent Cloud usage (details)
Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB
Apache Kafka® isn’t just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too!
Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.
Danica's model Kafka pipeline begins with moisture sensors in her plants streaming data that is requested by an endless for-loop in a Python script on her Raspberry Pi. The Pi in turn connects to Kafka on Confluent Cloud, where the plant data is sent serialized as Avro. She carefully modeled her data, sending an ID along with a timestamp, a temperature reading, and a moisture reading. On Confluent Cloud, Danica enriches the streaming plant data, which enters as a ksqlDB stream, with metadata such as moisture threshold levels, which is stored in a ksqlDB table.
She windows the streaming data into 12-hour segments in order to avoid constant alerts when a threshold has been crossed. Alerts are sent at the end of the 12-hour period if a threshold has been traversed for a consistent time period within it (one hour, for example). These are sent to the Telegram API using Confluent Cloud's HTTP Sink Connector, which pings her phone when a plant's moisture level is too low.
Potential future project improvement plans include visualizations, adding another Telegram bot to register metadata for new plants, adding machine learning to anticipate watering needs, and potentially closing the loop by pushing data back
to the Raspberry Pi, which could power a visual indicator on the plants themselves.
GitHub: raspberrypi-houseplantsData Pipelines 101 courseTips for Streaming Data Pipelines ft. Danica FineWatch the video version of this podcastDanica Fine's TwitterKris Jenkins’ TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Tim is probably the best tech speaker I have ever listened. Please keep doing them.
Can’t stop listening.
This podcast is so good that I started at episode 1 and am going through them all. It’s amazing how well these complex topics are explained via audio.
a perfect pod
the things i learn from listening this podcast is insurmountable. i’m eternally grateful to you all.