DEV

Eric Lamanna

0.0 (0)
Technology

Software and AI development podcast. We cover all things software development, including today's advanced AI development tricks and techniques.

15h ago

AI-Powered Linting: Smarter Static Code Analysis With Machine Learning

Traditional linters are reliable workhorses, but they can only flag what someone thought to write a rule for. This episode of Development explores what happens when you pair static code analysis with machine learning — moving beyond deterministic rule enforcement toward a tool that can recognize subtle, historically problematic patterns the way a seasoned developer does. The discussion is grounded in this deep-dive on building an AI-powered linter with ML models, and goes further into the practical decisions teams face when taking this approach seriously. Here's what the episode covers: Why traditional linters hit a ceiling: Rule-based analysis is only as strong as the rules themselves — edge cases, framework-specific vulnerabilities, and nuanced anti-patterns routinely slip through.How ML changes the analysis model: Instead of a checklist, a trained model learns the statistical shape of problematic code, surfacing issues that no explicit rule would catch.Model selection and data quality: Bigger isn't always better — a focused model trained on well-labeled, domain-specific examples often outperforms a general-purpose LLM for targeted linting tasks.The noise-to-signal problem: An AI linter that cries wolf too often gets ignored; rolling out high-confidence, high-stakes catches first (such as security vulnerabilities) is the key to earning team trust before expanding scope.Continuous improvement and feedback loops: Unlike a static ruleset, an ML-based linter drifts out of alignment if left untouched — developer overrides and missed bugs are valuable retraining signals.Transparent onboarding: Introducing the tool with honest expectations about early false positives turns healthy skepticism into buy-in rather than resistance.The episode makes a clear case that an AI-powered linter is best understood as an always-on first pass — not a replacement for human review, but a way to surface real problems earlier and cheaper than any runtime error ever could. If you're interested in where the intersection of AI and edge hardware is heading next, check out the recent episode Edge AI Explained: Running Smarter Models on Tiny Devices for a complementary look at deploying ML in constrained environments. DEV
2d ago

Edge AI Explained: Running Smarter Models on Tiny Devices

Running intelligence directly on constrained hardware — smartwatches, industrial sensors, smart cameras — is no longer a niche research problem. It's a core skill for modern developers. This episode of Development digs into the practical side of edge AI, drawing on the in-depth guide to integrating AI in edge computing and IoT to explain what it actually takes to deploy capable models on devices with severe memory, power, and connectivity limits. Here's what the episode covers: Why edge AI matters now: The core case for moving inference closer to where data is generated — cutting latency, protecting user privacy, and reducing the real financial cost of constant cloud round-trips.Model compression techniques: A clear breakdown of pruning (stripping low-value connections from a neural network), quantization (shrinking weight precision from 32-bit floats down to 8-bit integers), and knowledge distillation (training a compact "student" model to mimic a larger "teacher").Choosing the right hardware: Why the physical layer is a first-class engineering decision — from bare microcontrollers to purpose-built edge TPUs and NVIDIA Jetson boards — and how hardware choice shapes every optimization decision downstream.Frameworks built for constrained environments: How tools like TensorFlow Lite are designed specifically to operate within tight memory budgets and deliver usable inference speeds without demanding resources that edge devices simply don't have.Real-world applications: Concrete examples across predictive maintenance on industrial equipment, continuous health monitoring on wearables, and local inference on smart home devices — all cases where edge AI is already delivering measurable value.The hybrid edge-cloud model: Why the choice between edge and cloud isn't binary — and how the most effective systems use edge devices for fast, continuous local decisions while escalating genuinely complex cases to the cloud for deeper analysis.The episode also addresses a frequently overlooked dimension: security. Edge devices are often deployed in remote, unmonitored locations, sometimes with default credentials and unencrypted communication channels. The argument made here is direct — security has to be architected in from day one, not patched on after deployment. For developers looking to go deeper on any of these topics, the source article on running AI models on IoT devices is a thorough companion read. And if you're thinking about the broader technology landscape surrounding these decisions, the Development episode Best Web Development Stacks to Use in 2026 is worth your time as well. DEV
3d ago

Best Web Development Stacks to Use in 2026

Stack decisions are among the most consequential choices a developer or technical founder makes — and they're often made too quickly, too early, or for the wrong reasons. This episode of Development uses the best web development stacks guide for 2026 as its foundation, offering a structured tour of today's most relevant technologies and a practical framework for choosing between them before a single line of code is written. The episode moves through the three layers of the modern web stack — front end, back end, and full stack — comparing the leading options at each level, then closes with four decision-making criteria that should weigh more heavily than hype or habit. Here's what's covered: Front-end frameworks compared: React's component-based, virtual DOM approach for dynamic UIs; Angular's opinionated, TypeScript-driven architecture for large enterprise teams; and Vue's progressive, incremental adoption model for smaller teams prioritizing speed.Back-end stacks examined: Node.js for unified JavaScript across the full application and real-time performance; Ruby on Rails for rapid early-stage iteration with convention-over-configuration defaults; and Django for Python-native projects involving data science, AI, or ML integration.Full-stack combinations unpacked: MERN and MEAN as all-JavaScript ecosystems differentiated mainly by React vs. Angular on the front end; and LAMP as the battle-tested, cost-effective foundation still powering a huge share of the web.Scalability and performance: Why the right stack for a startup today may buckle under the demands of a scaled platform in two to three years.Community, ecosystem, and hiring: How the activity level around a technology affects open-source tooling, bug support, and the size of the developer pool you can draw from.Learning curve and total cost: The often-underestimated role of team expertise and honest infrastructure cost modeling in narrowing down viable options.The episode's central argument is straightforward but easy to ignore under deadline pressure: the best stack is the one that fits your specific project requirements, team skills, timeline, and growth trajectory — not the one that's currently trending. Smart stack decisions start with requirements and work backward to tools, never the reverse. For more from the show, check out the episode Why Your GPU Is Loafing: Optimizing Deep Learning Training at Scale, which digs into performance optimization at the infrastructure level. DEV
3d ago

Why Your GPU Is Loafing: Optimizing Deep Learning Training at Scale

Provisioning a powerful GPU only to watch utilization flatline is one of the most common — and costly — frustrations in deep learning. This episode of Development digs into the systemic reasons why large-scale training runs underperform, drawing on this in-depth guide to optimizing GPU utilization for large-scale deep learning models. Rather than hunting for a single silver-bullet fix, the episode frames GPU performance as an interconnected system where small inefficiencies compound — and where targeted, methodical changes add up fast. Here's what the episode covers: Data pipeline bottlenecks: Why a slow or single-threaded data loader is often the first and most impactful culprit — and how parallel workers in PyTorch and TensorFlow keep the GPU fed between batches.Storage layer choices: How the difference between spinning drives, SSDs, and network-attached storage quietly shapes throughput, especially on large datasets.Batch size trade-offs: Why bigger isn't always better — very large batches can hurt generalization and destabilize training, and why incremental tuning beats guesswork.Mixed precision training: How using 16-bit floats for most operations while preserving 32-bit precision where it counts can meaningfully boost throughput and reduce memory pressure with minimal risk using modern framework APIs.Data and model parallelism: The distinction between splitting data across GPUs versus splitting the model itself, and when each approach is the right tool — including what to watch out for when scaling to multi-node setups.Profiling and system balance: Why skipping built-in profiling tools leaves optimization as guesswork, and how CPU capacity, RAM, and network bandwidth all feed into the GPU performance equation.The episode closes with a strong case for disciplined, documented iteration — changing one variable at a time and recording outcomes — as the practice that separates engineers who consistently improve training runs from those who spin their wheels. For more on the data infrastructure side of ML performance, check out the Development episode Zero-Copy Data Pipelines: What Apache Arrow Actually Does for ML. DEV
5d ago

Zero-Copy Data Pipelines: What Apache Arrow Actually Does for ML

Data wrangling is the silent tax on every machine learning project — endless format conversions, redundant buffer copies, and library-to-library shuffling that eats hours without producing a single model improvement. This episode of Development takes a practical look at Apache Arrow and the zero-copy pipeline architecture it enables, drawing on this deep-dive article on zero-copy data pipelines for ML workloads to separate genuine capability from inflated hype. The episode walks through how Arrow works, why its columnar memory layout is a natural fit for ML workloads, and — most valuably — systematically dismantles five misconceptions that are keeping developers from adopting it. Here's what's covered: What zero-copy actually means: Arrow defines a standardized columnar in-memory format so multiple tools and languages can reference the same data block directly, bypassing redundant copies and format translations.Why columnar storage suits ML: Machine learning operations typically sweep across feature columns rather than individual rows; adjacent memory layout lets CPUs and GPUs apply vectorized operations at full speed.Myth #1 — Zero-copy means zero setup: Sharing memory across environments requires deliberate configuration; expect upfront work in exchange for lasting pipeline gains.Myth #2 — Arrow is only for enterprise-scale data: Any pipeline with repeated transformations benefits, whether you're a solo developer or a startup engineering team.Myth #3 — Arrow fixes everything: It's a high-quality component, not a cure-all; model inefficiency and upstream data quality problems still need separate attention.Myths #4 & #5 — It requires C++ and isn't production-ready: Mature Python bindings (including Pandas interoperability) make Arrow accessible at a high level, and its integration into systems like Apache Spark confirms it's well past the experimental stage.The episode closes with a practical recommendation: isolate one data-conversion bottleneck in your existing pipeline, swap in Arrow-native operations, and measure the difference before committing to a broader rollout. Incremental gains on high-frequency data transfers compound quickly across training runs, cutting both iteration time and infrastructure costs. If you enjoyed this episode, you might also want to check out Why Custom CUDA Kernels Could Be Your Deep Learning Secret Weapon for another look at squeezing real performance out of your ML stack. DEV
5d ago

Why Custom CUDA Kernels Could Be Your Deep Learning Secret Weapon

GPU hardware is only as useful as the code running on it. For deep learning teams chasing faster training loops and tighter inference times, the bottleneck isn't always the model or the data pipeline — sometimes it's the abstraction layer between your workload and the silicon. This episode of Development explores building custom CUDA kernels for deep learning performance, making the case that going low-level isn't just for systems programmers — it's a practical tool for anyone serious about squeezing the most out of their GPU. The episode walks through the full arc of writing, integrating, and optimizing a custom CUDA kernel, covering: What CUDA kernels actually are — functions that execute simultaneously across thousands of GPU threads, each handling a small slice of your data, rather than running once on a single processor.Why built-in library kernels fall short — PyTorch and TensorFlow ship highly tuned kernels for common operations, but those kernels must handle every possible edge case; a custom kernel only has to handle yours, and that specificity is where the speed lives.The GPU execution model — understanding how threads, blocks, shared memory, and grids fit together is the foundation for writing kernels that are actually efficient rather than just correct.Key performance concepts — memory coalescing (keeping consecutive threads on consecutive addresses), shared memory (loading data once for a whole block instead of hitting slow global memory repeatedly), and warp efficiency (minimizing branch divergence so no threads sit idle).Integrating with existing frameworks — both PyTorch and TensorFlow offer real extension mechanisms so a custom kernel can be called from Python just like any native operation, keeping it inside your actual training pipeline.Testing, debugging, and profiling — GPU bugs can be subtle and nearly correct; rigorous output verification and tools like NVIDIA Nsight Systems and Nsight Compute are essential for catching errors and pinpointing the next bottleneck to fix.The episode is candid about the trade-off: custom kernels mean taking on memory management, thread organization, and low-level error handling — real costs that generic library calls spare you from. But for teams working with novel architectures, non-standard data transformations, or production latency targets that off-the-shelf ops can't meet, that investment in control pays dividends that compound across every training run. More from the show: What Your Food Truck Website Is Missing — And Why It Matters. DEV
6d ago

What Your Food Truck Website Is Missing — And Why It Matters

Great food alone doesn't keep customers coming back — they have to be able to find you first, and then stay connected between visits. This episode of Development explores how food truck owners can transform a bare-bones website into a 24/7 business-building tool, drawing on 11 essential elements for a food truck website to help operators close the gap between good food and loyal regulars. The conversation covers a wide range of practical, actionable improvements — from the basics that most food truck sites get wrong to the softer touches that quietly build community. Here's what's discussed: Real-time location and schedule visibility — Why an interactive, up-to-date map solves the "I can't find you" problem before it costs you a sale, and how your website and social channels should reinforce rather than duplicate each other.The case for keeping old schedules published — Historical event listings signal consistency and reliability to new visitors, functioning as passive trust-building at zero extra cost.Food photography done right — Why poor photos actively drive customers away, what a professional shoot is actually worth, and how to get strong results with a smartphone when the budget is tight.Frictionless menu access and online ordering — Your menu link should go straight to your menu — not a third-party login page — and integrated ordering options can meaningfully convert browsers into buyers.Authentic behind-the-scenes content and email newsletters — Candid glimpses of truck life build genuine loyalty, while a direct email list remains one of the most algorithm-proof tools a small food business has.Rounding out a professional presence — Visible contact info on every page, embedded social proof, a simple feedback form, and even recipes all contribute to a site that gives visitors reasons to stay, share, and return.The throughline of the episode is straightforward: the food trucks that build lasting followings aren't just the ones making the best food — they're the ones making it effortless to stay connected. The episode is based on a piece by Timothy Carter published at dev.co. More from the show: Writing Efficient Memory Allocators for PyTorch Extensions. WEB DEV
Jul 13

Writing Efficient Memory Allocators for PyTorch Extensions

Building a custom PyTorch extension is hard enough — but for engineers targeting specialized hardware or unconventional data pipelines, the default memory management layer can quietly become the biggest performance bottleneck of all. This episode of Development draws on this in-depth guide to writing efficient memory allocators for PyTorch extensions to walk through everything from the fundamentals of PyTorch's memory model to practical pooling strategies, debugging techniques, and the discipline of knowing when not to over-engineer. Here's what the episode covers: When custom allocators are actually necessary — the specific scenarios (hardware alignment requirements, repetitive tensor shapes, unusual data structures) where PyTorch's excellent built-in caching still isn't enough.How PyTorch's memory model works under the hood — understanding the C++ Allocator interface and why any custom allocator must cooperate with PyTorch's reference tracking rather than work around it.Alignment and layout as foundational performance levers — why 64-byte CPU alignment and 256-byte GPU alignment can meaningfully reduce overhead, and how data layout choices affect memory streaming speed.Memory pooling to fight fragmentation — how pre-allocating and reusing fixed-size blocks eliminates the repeated cost of malloc/free cycles and keeps performance stable across long training runs.Debugging strategies built in from day one — using canary bytes to detect buffer overruns, verbose logging for allocation events, and PyTorch's own torch.cuda.memory_summary() to monitor custom allocator behavior alongside the default.Hybrid approaches, pinned memory, and the transfer cost dimension — why delegating irregular tensor shapes to PyTorch's default allocator often makes more sense than replacing it entirely, and how pinned memory and batched transfers reduce PCIe overhead.The episode closes with a case for restraint: measure real bottlenecks before building complex pooling hierarchies, and let the data — not assumptions — drive how much custom logic you actually need. For more from the show on machine learning engineering in practice, check out the episode AI-Assisted Data Labeling: How Active Learning Loops Change the Game. DEV

See All (47)

Software and AI development podcast. We cover all things software development, including today's advanced AI development tricks and techniques.

Creator

Eric Lamanna
Years Active

2026
Episodes

47
Rating

Clean
Show Website

DEV

DEV

AI-Powered Linting: Smarter Static Code Analysis With Machine Learning

Edge AI Explained: Running Smarter Models on Tiny Devices

Best Web Development Stacks to Use in 2026

Why Your GPU Is Loafing: Optimizing Deep Learning Training at Scale

Zero-Copy Data Pipelines: What Apache Arrow Actually Does for ML

Why Custom CUDA Kernels Could Be Your Deep Learning Secret Weapon

What Your Food Truck Website Is Missing — And Why It Matters

Writing Efficient Memory Allocators for PyTorch Extensions

About

Information

DEV

Episodes