Bot Nirvana | AI & Automation Podcast

Nandan Mullakara

4.6 (11)
TECHNOLOGY
UPDATED BIWEEKLY

Bot Nirvana is a podcast on all things Intelligent Automation. We cover RPA, AI, Process Intelligence, Process Mining, and a host of other tools and techniques for intelligent automation.

APR 22

Alex and Doug

In this episode, we are joined by Intelligent Automation experts Doug Shannon and Alex Dixon to unravel the complex terminology dominating today's Agentic AI automation landscape. The conversation delves into how enterprises are integrating cognitive abilities into traditional automation workflows, exploring the evolution from RPA to intelligent automation and now to agentic process automation. Key topics discussed: Defining agentic process automation and how it differs from traditional RPA Large Action Models (LAMs) and how they're transforming UI-based automation How process mining and task mining data are fueling the next generation of automation The importance of maintaining guardrails and human oversight in enterprise automation The convergence of application modernization and automation technologies The distinction between workflow-based agentic automation and goal-oriented AI agents Real-world examples of automation implementation in call centers and other business processes The promising future of combining LLMs and LAMs to reimagine how work happens More information and Links: Connect with Alex: linkedin.com/in/alexanderrdixon/ Connect with Doug: linkedin.com/in/doug-shannon/ Visit Nandan on the web at nandan.info

26 min
FEB 19

Manish Ballal

Manish Ballal is a GTM and Sales leader with over a decade of experience in the automation space. He is currently leading Generative AI initiatives at Amazon Web Services (AWS). He brings a wealth of experience from both large global technology companies and startups. Previously, he held leadership roles at major GSIs and had a significant tenure at Automation Anywhere. In this episode, we discuss: - Automation evolution - Enterprise deployments - Specific use cases - Challenges with security, AI agents - Process-first approach - Vertical Agents More information and Links: Connect with Manish: Linkedin.com/in/manishballal/ Visit Nandan on the web at nandan.info

26 min
09/18/2024

Agentic Process Automation (APA)

In this episode, we explore Agentic Process Automation (APA), a paradigm that could revolutionize digital automation by harnessing the power of AI agents. The discussion focuses on the ProAgent system as an example of APA. APA introduces a new paradigm where AI-driven agents can analyze, decide, and execute complex tasks with minimal human intervention. We'll unpack the groundbreaking Automation concept which showcases the true potential of AI agents through its innovative approach to workflow construction and execution. Key Topics Covered Introduction to Agentic Process Automation (APA) Comparison between traditional Robotic Process Automation (RPA) and APA ProAgent: A prime example of APA implementation Key innovations of ProAgent: Agentic workflow construction Agentic workflow execution Types of agents in ProAgent: Data agents Control agents Case study: Using ProAgent with Google Sheets for business line management Potential impacts and implications of APA on work and decision-making Future developments and considerations for APA technology This episode was generated using Google Notebook LM, drawing insights from the paper "ProAgent: From Robotic Process Automation to Agentic Process Automation" Stay ahead in your AI journey with Bot Nirvana AI Mastermind. Podcast Transcript All right, everyone. Buckle up, because today's deep dive is going to be a wild ride through the future of automation. We're talking way beyond those basic schedule this kind of tasks. Yeah, we're diving headfirst into the realm where AI takes the wheel and handles the thinking for us. Oh, yeah, the thinking part. Yeah. If you could give your computer a really complex task, something that needs analysis, decision-making, maybe even a dash of creativity, that's what we're talking about. And right now, your typical automation tools, they would hit a wall. Hard. They're great at following those rigid step-by-step instructions. Like robots. Exactly. But when it comes to anything that requires actual brain power. Still got to do it ourselves. Well, that's where this research paper we're diving into today comes in. It's all about something called agentic process automation, or APA for short. And let me tell you, this stuff has the potential to completely change the game. OK, for those of us who haven't dedicated our lives to the art of automation, give us the lowdown. What is APA, and why is it such a big deal? Think about your current automation workhorse RPA, robotic process automation. It's like that super reliable assistant who never complains but needs very specific instructions for every single step. Right. Amazing at those repetitive tasks, but needs you to hold their hand through every decision point. Exactly. Now, imagine that same assistant, but with a secret weapon, an AI sidekick whispering genius solutions in their ear. OK, now you're talking. That's APA in a nutshell. We're giving RPA a massive intelligence boost. So instead of just blindly following pre-programmed rules, we're talking about automation that can actually think. You got it. APA introduces the idea of agents, which are basically AI helpers embedded directly into the workflow. These agents can analyze data, make judgment calls based on that analysis, and even generate things like reports, all without a human meticulously laying out each step. So it's not just about automating tasks anymore. It's about automating the intelligence behind those tasks. You're catching on quickly. And this paper focuses on a system called ProAgent as a prime example of APA in action. All right, lay it on us. What is ProAgent? So ProAgent really highlights the potential of APA with two key innovations-- agentic workflow construction and agentic workflow execution. OK, so those are some pretty hefty terms. Can you break those down for us? Let's start with how ProAgent constructs workflows. What makes it so revolutionary? Well, with your traditional RPA, you're stuck painstakingly designing every single step of the process. It's like writing a super detailed manual for a robot. Right, like you don't want the robot to deviate at all. Exactly. But ProAgent flips the script instead of you having to lay out every tiny detail. I can just, like, figure it out. You give it high level instructions, and the LLM-- that's the AI engine-- actually builds the workflow for you. Wait, so it's like you're telling it what you want to achieve, and it figures out the how to. Think of it like having an AI assistant who understands your goals and can translate those goals into a functional workflow. OK, that is seriously cool. And then, agentic workflow execution-- that's where those agents we talked about come in, right? They're the ones actually doing the heavy lifting. You got it. ProAgent uses two types of agents-- data agents and control agents. They work together like specialized teams within your automated workflow. OK, I'm really curious about these specialist teams now. Let's start with the data agents. What's their area of expertise? Data agents are the masterminds behind complex data processing. We're not talking simple copying and pasting here. Imagine you need a report summarizing key trends from a massive spreadsheet. Yeah, that sounds fun. A data agent can analyze that data, extract the important bits, and generate a report for you all within the automated workflow. OK, so if the data agents are the analysts, are the control agents like the project managers making sure it all comes together? That's a great analogy. Control agents handle the dynamic aspects of the workflow-- those if this, then that-- scenarios. They can assess a situation and choose the best course of action just like a human would. Wow, so they're not just following a predetermined path. They're making decisions on the fly. This is light years beyond basic automation. It really is. And to really illustrate this, the researchers use a really interesting case study with Google Sheets. Imagine you're a manager, and you've got this spreadsheet with hundreds of different business lines. Hundreds of business lines. I can already feel the headache coming on. Right, and each one might have unique needs. Some need detailed reports emailed out. Others might just need a quick update on Slack. Traditionally, you'd need a human to look at each one, figure out the best way to handle it. Oh, for sure. You'd need a whole team just to manage that. But in this case study, ProAgent uses a control agent to do the reading and the decision making. So it's not just matching keywords or something. It's actually understanding the context of each business line. You got it. The control agent can actually analyze the description of a business line and say, OK, this one seems more business to customer, so it needs this kind of report. That's pretty impressive. So the control agent is like the conductor of an orchestra, making sure everything flows smoothly, and each instrument plays its part at the right time. But what about the actual report writing? That's where those data agents step in, right? Exactly. Let's say the control agent flags a business line that requires a super detailed performance report. The data agent swips in, pulls the relevant data points from the spreadsheet, crunches the numbers, and even adds in some insightful summaries. Hold on. It can actually generate insights. Like, it's not just spitting out numbers. It can analyze the data and tell me what's important. That's the really exciting part. This paper shows that ProAgent can tap into the power of LLMs to move beyond just simple reporting. We're talking about identifying trends, comparing performance across different business lines. It could probably even make suggestions based on the data, right? Exactly. This is about real data-driven insights. OK, now I'm really seeing how this could be a game changer. Even for someone like me, who doesn't necessarily geek out over all the automation jargon, this has huge implications. It absolutely does. Think about all those tasks in your work day that could be handled by a system like ProAgent. Those things that eat up your time because they involve, you know, gathering information from different places, making judgment calls. It's like those tasks that, you know, could theoretically be automated, but they require that extra bit of human touch. Precisely. APA has the potential to bridge that gap. Imagine you could be freeing up all this mental bandwidth. All that time you'd normally spend on these tedious tasks, you could be focusing on the strategic stuff, the creative stuff, the work that really needs your unique human perspective. It's like having an army of AI assistants working tirelessly behind the scenes, handling all the heavy lifting so you can focus on the big picture. And it's not just about productivity. It's about reducing that feeling of information overload. APA could help us sift through all the noise, analyze data more effectively, and ultimately make better, more informed decisions. This all sounds incredibly promising, but where do we go from here? What's next for APA and ProAgent? That's the million dollar question. What's so exciting about this research is that it's really just the tip of the iceberg. As LMS continue to evolve, we can expect to see even more sophisticated versions of APA capable of handling increasingly complex tasks. So we could be talking about even more autonomy, even more intelligence, baked into these systems. What kind of impact could that have on the way we work and live? Imagine a world where personalized automation is the norm. Systems like ProAgent could learn your specific preferences, anticipate your needs. Essentially, become an extension of your own expertise. That's amazing. We're talking about a whole new level of human AI collaboration, where technology augments our abilities instead of replacing them. This feels like a pivot

11 min
09/18/2024

OCR 2.0

In this podcast, we dive into the new concept of OCR 2.0 - the future of OCR with LLMs. We explore how this new approach addresses the limitations of traditional OCR by introducing a unified, versatile system capable of understanding various visual languages. We discuss the innovative GOT (General OCR Theory) model, which utilizes a smaller, more efficient language model. The podcast highlights GOT's impressive performance across multiple benchmarks, its ability to handle real-world challenges, and its capacity to preserve complex document structures. We also examine the potential implications of OCR 2.0 for future human-computer interactions and visual information processing across diverse fields. Key Points Traditional OCR vs. OCR 2.0 Current OCR limitations (multi-step process, prone to errors) OCR 2.0: A unified, end-to-end approach Principles of OCR 2.0 End-to-end processing Low cost and accessibility Versatility in recognizing various visual languages GOT (General OCR Theory) Model Uses a smaller, more efficient language model (Quinn) Trained in diverse visual languages (text, math formulas, sheet music, etc.) Training Innovations Data engines for different visual languages E.g. LaTeX for mathematical formulas Performance and Capabilities State-of-the-art results on standard OCR benchmarks Outperforms larger models in some tests Handles real-world challenges (blurry images, odd angles, different lighting) Advanced Features Formatted document OCR (preserving structure and layout) Fine-grained OCR (precise text selection) Generalization to untrained languages This episode was generated using Google Notebook LM, drawing insights from the paper "General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model". Stay ahead in your AI journey with Bot Nirvana AI Mastermind. Podcast Transcript: All right, so we're diving into the future of OCR today. Really interesting stuff. Yeah, and you know how sometimes you just gain a document, you just want the text, you don't really think twice about it. Right, right. But this paper, General OCR Theory, towards OCR 2.0 via a unified end-to-end model. Catchy title. I know, right? But it's not just the title, they're proposing this whole new way of thinking about OCR. OCR 2.0 as they call it. Exactly, it's not just about text anymore. Yeah, it's really about understanding any kind of visual information, like humans do. So much bigger. It's a really ambitious goal. Okay, so before we get ahead of ourselves, let's back up for a second. Okay. How does traditional OCR even work? Like when you and I scan a document, what's actually going on? Well, it's kind of like, imagine an assembly line, right? First, the system has to figure out where on the page the actual text is. Find it. Right, isolate it. Then it crops those bits out. Okay. And then it tries to recognize the individual letters and words. So it's like a multi-step? Yeah, it's a whole process. And we've all been there, right? When one of those steps goes wrong. Oh, tell me about it. And you get that OCR output that's just… Gibberish, told gibberish. The worst. And the paper really digs into this. They're saying that whole assembly line approach, it's not just prone to errors, it's just clunky. Yeah, very inefficient. Like different fonts can throw it off. And write. Different languages, forget it. Oh yeah, if it's not basic printed text, OCR 1.0 really struggles. It's like it doesn't understand the context. Yeah, exactly. It's treating information like it's just a bunch of isolated letters, instead of seeing the bigger picture, you know, the relationships between them. It doesn't get the human element of it. It's missing that human touch, that understanding of how we visually organize information. And that's a problem. A big one. Especially now, when we're just like drowning in visual information everywhere you look. It's true, we need something way more powerful than what we have now. We need a serious upgrade. Enter OCR 2.0. That's what they're proposing, yeah. So what's the magic formula? What makes it so different from what we're used to? Well, the paper lays out three main principles for OCR 2.0. Okay. First, it has to be end to end. It needs to be… And to end. Low cost, accessible. Got it. And most importantly, it needs to be versatile. Versatile, that's a good one. So okay, let's break it down end to end. Does that mean ditching that whole assembly line thing we were talking about? Exactly, yeah. Instead of all those separate steps, OCR 2.0, they're saying it should be one unified model. Okay. One model that can handle the entire process. So much simpler. And much more efficient. Okay, that makes sense. And easier to use, which is key. And then low cost, I mean. Oh, absolutely. That's got to be a priority. We want this to be accessible to everyone, not just… Sure. You know. Right, not just companies with tons of resources. Exactly. And the researchers were really clever about this. Yeah. They actually chose to use a smaller, more efficient language model. Oh, really? Yeah, they called it Quinn and… Instead of one of the massive ones that's been in the news. Exactly. And they proved that you don't need this giant energy guzzling model to get really impressive results with OCR. So efficient and powerful. I like it. That's the goal. But versatile. That's the part that always gets me thinking because… It's where things get really interesting. Yeah, we're not even just talking about recognizing text anymore. No, it's about recognizing any kind of… Visual information. Visual information that humans create, right? Yeah. Like, think about it. Math formulas, diagrams, even something like sheet music. Hold on. Sheet music. Like actually reading music. Yeah. And it's a really good example of how different this is. Okay. Because music, it's not just about recognizing the notes themselves. Right. It's about understanding the timing, the rhythm. So languid. How those symbols all relate to each other. It's a whole system. That's wild. Okay, so how do they even begin to teach a machine to do that? Well, they got really creative with the training data. Okay. Instead of just feeding it like raw text and images, they built these data engines to teach JART different visual languages. Data engines. That sounds intense. Yeah, it's basically like, imagine for the sheet music they used, let me see, it's called humdrum kern. Okay. And essentially what that does is it turns musical notation into code. Oh, interesting. So Johnny T learned to connect those visual symbols to their actual musical meaning. So it's learning the language. Exactly. That's incredible, but sheet music's just one example, right? What other kind of crazy stuff did they throw at this thing? Oh, they really tried everything. Math formulas, those are always fun. I bet. Molecular formula, even simple geometric shapes, squares and circles. Really? Yeah, they used all sorts of tricks to represent these visual elements as code. So GOT could understand it. Exactly. Like for the math formulas, they used a language called latex. Have you heard of that one? Yeah, yeah, that's how a lot of scientists and mathematicians, they use that to write equations. Exactly. It's how they write it so computers can understand it. It's like the code of math. Exactly. And so by training GOT on latex, they weren't just teaching it to recognize what a formula looks like. Right, right. They were teaching it the underlying structure, like the grammar of math itself. Okay, now that is really cool. Yeah, and they found that GOT could actually generalize this knowledge. It could even recognize elements of formulas that it had never seen before. No way. It was like it was starting to understand the language of math, which is pretty incredible when you think about it. Yeah, that's wild. Okay, so we've got this model. It can recognize text. It can recognize all these other complex visual languages. We're getting somewhere. But how does it actually perform? Like does it actually live up to the hype? So this is it, huh? We've got this super OCR model that's been trained on everything but the kitchen sink. Time to put it to the test. We went through the ringer. Yeah. What did they even start with? Well, the classics, right? Plain document OCR, PDFs, articles, that kind of thing. Basic but important. Exactly. And they tested it in both English and Chinese just to see how well-rounded it was. And drumroll, how to do? Crushed it. Absolutely crushed it. No way. State-of-the-art performance on all the standard document OCR benchmarks. That's amazing. Oh, and here's the really interesting part. It actually outperformed some much larger, more complex models in their tests. So it's efficient and it's powerful. That's a winning combo. Exactly. It shows you don't always have to go bigger to get better results. Okay, that's awesome. But what about real-world stuff? You know, the messy stuff. Oh, they thought of that. Like trying to read a sign with a weird font or a crumpled-up napkin with handwriting on it? Yep. All that. They have these data sets specifically designed to trip up OCR systems with blurry images, weird angles, different lighting. The stuff nightmares are made of. Right. And GOT handled it all like a champ. It was really impressive. Okay, so this isn't just some theoretical thing. It actually works. It's the real deal. I'm sold. But there was another thing they mentioned, something about formatted document OCR. What is that exactly? That's where things get really elegance. The formatted documents, it's not just about recognizing the words. Right. It's about understanding the structure of a document. Okay, like the headings and bullet points? Exactly. Tables, the whole nine yards. It's about preserving the way information is organized. So it's like imagine being able to convert a complex PDF in

11 min
08/09/2024

JP Morgenthal

JP Morgenthal (JP) is a seasoned expert in applied AI and automation. With over 20 years of experience as a Chief Technology Officer (CTO) and Solution Architect, JP has been a driving force behind digital transformation for Fortune 1000 companies. His expertise spans IT architecture, cloud strategies, and large-scale system implementations. Currently, JP is the Vice President of Solution Engineering at CafeX Communications, following prominent roles as CTO of Automation Anywhere and App Services at DXC. In this episode, we delve into the convergence of various automation technologies like RPA, BPM, iPaas, and AI. JP shares insights on the influence of new AI advancements, including Large Language Models (LLMs) and AI agents, and explores the future trends in intelligent automation. Join us as we unpack these topics, offering a glimpse into how these innovations reshape the technological landscape. More information and Links: More about JP Morgenthal: https://jpmorgenthal.com/ Connect with JP Morgenthal: linkedin.com/in/jpmorgenthal/ Visit Nandan on the web at nandan.info

29 min
03/08/2024

Decoding the Gen AI secrets

Why is Generative AI the talk of the tech world? Today, we break down the basics and explore its vast applications beyond text generation, from drug discovery to logistics. Subscribe for premium content on mastering AI. Stay ahead in your AI journey with Bot Nirvana AI Mastermind.

6 min
11/21/2023

Shreekant Mandvikar

Shreekant Mandvikar is an Intelligent Automation expert who has helped 20+ customers on their Intelligent Automation journey. He currently leads Intelligent Automation initiatives at Ally Financial. In this chat, we discuss his Intelligent Automation journey and learnings, interesting use cases, tracking value, Gen AI, and more. More information and Links: More about Shreekant : shreekantmandvikar.com Connect with Shreekant: linkedin.com/in/shreekant-mandvikar Visit Nandan on the web at nandan.info

29 min
10/06/2023

Andy Thurai

Andy is VP and principal analyst at Constellation Research. He is an accomplished IT executive having served in leadership roles at major companies like IBM, Intel, and Oracle. He is an expert in AI, AIOps, Observability, Cloud, and other enterprise software. I have read many of his articles in publications such as Forbes and Harvard Business Review. In this chat, we talk about the emergence of ChatGPT, Microsoft vs. Google AI, AI ethics, and the future of AI. More information and Links: Connect with Andy: linkedin.com/in/andythurai HBR article Andy co-authored: hbr.org/2022/09/ai-isnt-ready-to-make-unsupervised-decisions Visit Nandan on the web at nandan.info

36 min

See All (48)

4.6

out of 5

11 Ratings

Great source of information on AI & Automation

08/09/2024

JPMorgenthal

I've been subscribed to Bot Nirvana's newsletter for a few years now and have found it and the podcasts to be a great way to stay abreast of the fast moving automation and AI industries. I recently completed a podcast with Bot Nirvana regarding convergence within the automation space and the impact of AI.
Great relevant Automation content

12/12/2022

AhmdZ1216

Nandan has brought together a great cohort of thought leaders and produces very informative practical content that is usable and informative. Must listen for all in the automation space.
This is a must

04/15/2022

dixbaby

Great engaging content - have been following Nandan for a few years and he’s a thought leader in the space.
Very cool. Keep it up.

01/26/2021

RamkyKV

Very interesting topic and relevant. Congrats.

Bot Nirvana is a podcast on all things Intelligent Automation. We cover RPA, AI, Process Intelligence, Process Mining, and a host of other tools and techniques for intelligent automation.

Creator

Nandan Mullakara
Years Active

2020 - 2025
Episodes

48
Rating

Clean
Show Website

Bot Nirvana | AI & Automation Podcast

Bot Nirvana | AI & Automation Podcast

Alex and Doug

Manish Ballal

Agentic Process Automation (APA)

OCR 2.0

JP Morgenthal

Decoding the Gen AI secrets

Shreekant Mandvikar

Andy Thurai

Great source of information on AI & Automation

Great relevant Automation content

This is a must

Very cool. Keep it up.

About

Information

Bot Nirvana | AI & Automation Podcast

Episodes

Alex and Doug

Manish Ballal

Agentic Process Automation (APA)

OCR 2.0

JP Morgenthal

Decoding the Gen AI secrets

Shreekant Mandvikar

Andy Thurai

Ratings & Reviews

About

Information