In this episode of the ODSC AIx Podcast, host Sheamus McGovern speaks with Alexandra Ebert, Chief AI and Data Democratization Officer at MOSTLY AI, and one of the foremost voices on privacy-preserving synthetic data and responsible AI. With a diverse background spanning AI ethics, data access policy, and generative AI regulation, Alexandra brings clarity to how synthetic data is not just a privacy tool but a lever for innovation, fairness, and scalable AI adoption. The discussion explores what synthetic data is (and isn’t), its core advantages and limitations, its role in addressing fairness and access challenges in data-driven organizations, and how practitioners can actively shape better downstream model performance. The episode also dives into the MOSTLY AI Prize—a $100,000 global competition to advance privacy-safe, high-utility synthetic data generation. Key Topics Covered: - The different types and use cases of synthetic data (privacy-preserving, simulation-based, creative) - How synthetic data helps solve the “data access paradox” in regulated industries - Key advantages and limitations of synthetic data vs. real-world and legacy anonymized data - Privacy mechanisms: Outlier suppression, statistical mimicry, empirical differential privacy Real-world use cases in healthcare, finance, telco, and simulation environments - Fairness-aware synthetic data generation using statistical parity constraints - Imputing missing data with synthetic distributions - Agentic AI and the role of synthetic data in enabling secure access layers for autonomous agents - Up-sampling rare events (e.g. fraud) to support more explainable models - Open innovation and the mission behind the MOSTLY AI Prize - Tools, SDKs, and open-source workflows for getting started with synthetic data - The MOSTLY AI Prize—a $100,000 global competition Memorable Outtakes “We need to move beyond thinking of real data as the gold standard. It’s often inaccessible, messy, biased—and by design, it's limited to how it was collected. Synthetic data lets us ask: what if our data was as inclusive as we needed it to be?” “So much of synthetic data’s value is in unlocking what’s been locked away—allowing teams to safely build, test, and deploy where real data just isn’t viable.” “It’s not just about boosting performance. Synthetic upsampling lets you use simpler, more explainable models—ones you can actually audit.” References & Resources - Alexandra Ebert – Chief AI & Data Democratization Officer, MOSTLY AI Industry profile: https://mostly.ai/team/alexandra-ebert LinkedIn: https://www.linkedin.com/in/alexandraebert/ -MOSTLY AI Prize – Global competition to advance privacy-preserving synthetic data Website: https://www.mostlyaiprize.com/ - MOSTLY AI GitHub & SDK – Open-source tools for structured synthetic data https://github.com/mostly-ai - https://github.com/mostly-ai/mostly-ai-sdk - Synthetic Data Fairness Paper (ICLR) "Representative & Fair Synthetic Data" Paper link: https://arxiv.org/abs/2104.03007 - Synthetic Data Vault (SDV) https://sdv.dev - TVAE under the SDV umbrella: https://github.com/sdv-dev/SDV Sponsored by Agentic AI Summit 2025 Join the premier virtual event for AI builders from July 15–30. Gain hands-on skills in designing, deploying, and scaling autonomous AI agents. 🔥 Use code podcast for 10% off any ticket. Register now: https://www.summit.ai/ ODSC West 2025 – The Leading AI Training Conference Attend in San Francisco from October 28–30 for expert-led sessions on generative AI, LLMOps, and AI-driven automation. 🔥 Use code podcast for 10% off any ticket. Learn more: https://odsc.com/california