This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Declarative Machine Learning Without The Operational Overhead Using Continual
Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it's not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response. In this episode he shares his perspective on the benefits of declarative machine learning workflows as a means of accelerating adoption in businesses that don't have the time, money, or ambition to build everything from scratch. He also discusses the technical underpinnings of what he is building and how using the data warehouse as a shared resource drastically shortens the time required to see value. This is a fascinating episode and Tristan's work at Continual is likely to be the catalyst for a new stage in the machine learning community.
An Exploration Of The Data Engineering Requirements For Bioinformatics
Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities. In this episode Jillian Rowe shares her experience of working in the field and supporting teams of scientists and analysts with the data infrastructure that they need to get their work done. This is a fascinating exploration of the collaboration between data professionals and scientists.
Setting The Stage For The Next Chapter Of The Cassandra Database
The Cassandra database is one of the first open source options for globally scalable storage systems. Since its introduction in 2008 it has been powering systems at every scale. The community recently released a new major version that marks a milestone in its maturity and stability as a project and database. In this episode Ben Bromhead, CTO of Instaclustr, shares the challenges that the community has worked through, the work that went into the release, and how the stability and testing improvements are setting the stage for the future of the project.
A View From The Round Table Of Gartner's Cool Vendors
Gartner analysts are tasked with identifying promising companies each year that are making an impact in their respective categories. For businesses that are working in the data management and analytics space they recognized the efforts of Timbr.ai, Soda Data, Nexla, and Tada. In this episode the founders and leaders of each of these organizations share their perspective on the current state of the market, and the challenges facing businesses and data professionals today.
Designing And Building Data Platforms As A Product
The term "data platform" gets thrown around a lot, but have you stopped to think about what it actually means for you and your organization? In this episode Lior Gavish, Lior Solomon, and Atul Gupte share their view of what it means to have a data platform, discuss their experiences building them at various companies, and provide advice on how to treat them like a software product. This is a valuable conversation about how to approach the work of selecting the tools that you use to power your data systems and considerations for how they can be woven together for a unified experience across your various stakeholders.
Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana
The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. In recent months the community has focused their efforts on making it the fastest possible option for running your analytics in the cloud. In this episode Dipti Borkar discusses the work that she and her team are doing at Ahana to simplify the work of running your own PrestoDB environment in the cloud. She explains how they are optimizin the runtime to reduce latency and increase query throughput, the ways that they are contributing back to the open source community, and the exciting improvements that are in the works to make Presto an even more powerful option for all of your analytics.
This podcast makes me feel sane when I’m working late running spark sql queries like a monkey
Sets the standard for ALL data podcasts
I have been hooked on this podcast. Tobias is phenomenal. Most data professionals such as company founders are astoundingly poor at explaining what they do. Tobias skillfully extracts this by asking great questions.
His ability to ask great questions and probe deep is what sets him apart.
I have become disillusioned with podcasts like DM Radio or the Architect Show. The hosts are lazy and don’t probe. They are happy to rehash marketing trope about zeta bytes of data, about how data is the new oil etc. but leave us with no new insights.
This is a great podcast. Tobias always asks great questions. The topics and the guest are always really interesting. You can learn interesting and important aspects of the software development world that you won’t get (as easily) surfing the web or reading tech books. Also check out his Python podcast which is equally awesome!