Hubert's Podcast

Hubert Dulay

"Streaming Data Mesh" OReilly. Currently writing his second book "Streaming Databases - Supporting Monolithic Data Engineers hubertdulay.substack.com

  1. 01/05/2024

    Interview with Kai Waehner

    In this podcast Ralph and I interview a former colleague of mine, Kai, who has extensive experience in the data streaming and real-time events space. Kai highlights the top five trends for data streaming with Kafka and Flink, including data sharing, data contracts for governance, serverless stream processing, multi-cloud adoption, and the use of generative AI in real-time contexts. We discuss the role of generative AI in providing accurate answers and the importance of real-time data integration for contextual recommendations, using the example of travel and flight cancellations. We also delve into the role of Flink as a stream processor in ensuring the accuracy and freshness of data for semantic searches and generative AI applications. We also delve into the idea of streaming databases and whether the market is ready to embrace them. We discuss the need for data contracts and data governance to understand the flow of data through systems, as well as the responsibility of the data engineering team in creating embeddings. We also discuss integrating large language models with other applications using technologies like Kafka and provide examples of how generative AI can be integrated into existing business processes. The interview touches on the concept of a "lake house" and the separation of compute and storage for real-time analytics. The guest also highlights Confluent's approach to building Kafka in a cloud-native way and their focus on the streaming side, while emphasizing the need for accessible stream processing solutions for ordinary database users. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

    51 min
  2. 11/22/2023

    Filipinos in Tech - Marlow and Ron

    In continuing the Filipinos in Tech series, in this episode, I interview Marlo Carrillo and Ron Guerrero currently at Databricks but previously from Cloudera. We reflect on the significance of the Balikbayan box, symbolizing resilience and the importance of remembering their roots. We share personal and emotional stories of their own families' journeys to America, the struggles they faced, and the sacrifices made for a better life. We also discuss the challenges of growing up Filipino in different communities, feeling different, and trying to find connections. We highlight how Filipinos assimilate into new cultures while holding onto their heritage, and how language can be a marker of identity and assimilation. The episode explores the immigrant experience and the complexities of belonging to multiple worlds. In addition to discussing our immigrant experiences, we focus on the impact of technology on the Filipino community. We speculate that more Filipinos will join the technology field in the future, including their family members. We discuss the preference for social and personal interactions that Filipinos may have, which could potentially explain the underrepresentation of Filipinos in the tech industry. We express gratitude towards America and its opportunities while acknowledging the unique charm of the Philippines. We also talk about retirement plans and the possibility of returning to the Philippines, with some expressing a desire to visit rather than permanently relocate. SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

    1 hr
  3. 10/27/2023

    Interview with Peter Corless

    In this podcast interview, I discuss federated systems with Peter Corliss, the Director of Product Marketing for StarTree. Peter will be presenting at a meetup next Tuesday: Peter explains the emergence of federated systems from the evolution of web development and the need to define the backend workings of front-end websites. They also explore the definitions of terms like stack, platform, and cluster in today's environment. The conversation highlights the shift from traditional stacks to clusters of systems and discusses the distinction between federated systems and federated data. They also delve into the challenges and limitations of federated systems and databases, emphasizing the trade-offs between moving the data or the processing. They touch on the concept of federated learning in AI and ML and the importance of optimizing data for queries. They conclude by discussing the need for new language and grammar to describe these complex architectures and the importance of collaboration between data sciences and data engineering teams. In the second part of the podcast, the conversation focuses on the interoperability and limitations of cloud computing systems, specifically AWS, Google Cloud, and Azure. The guest notes that while efforts have been made to make these systems interoperable, users still have to choose between different ecosystems offered by providers. They then shift to the importance of replication in data systems and the concept of a data divide. They emphasize the need to choose the best database or system for each specific aspect of an application architecture. They also discuss the potential for a stack to span across cloud regions and continents, allowing for global consistency and the ability to query data from different locations. Finally, they discuss Apache Pino, describing it as a complex system that can act as a cluster of clusters. They highlight its ability to assimilate more components and scale out, as well as its powerful tools for organizing and storing data. They conclude by discussing the expectation of clusters of clusters in modern systems. SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

    55 min

About

"Streaming Data Mesh" OReilly. Currently writing his second book "Streaming Databases - Supporting Monolithic Data Engineers hubertdulay.substack.com