High-level text-to-speech technology (TTS) has made it possible for Siri and Alexa to answer questions and follow commands for a decade. Now, the latest in TTS may mean TTFN – ta-ta for now! – for traditionally-produced audiobooks.
Audiobooks are an increasingly important piece of the revenue pie for publishers. In 2020, audiobook sales topped $1.3 billion, a 12% jump over 2019. With few exceptions, human narrators – authors themselves as well as actors and other artists – are heard in such recordings.
AI-enabled automated audiobook creation, however, lies just beyond the horizon, says electronic publishing analyst Thad McIlroy. The shift from analog to digital voices, he reports, would lower production costs and lead to greater choice in titles, as well as mean a lot less work for Hollywood actors between pictures.
“For certain books, it is simply not economical to bring in talent at a high price level, to go into the studio, and to have sophisticated production values,” McIlroy says. “So the choice becomes an either/or. It’s a binary choice. Are we going to get an audiobook for a backlist title or not? Because if we have to go through traditional production methods, it’s simply not going to be financially possible.”
What stands in the way of the machines, though, is a contractual requirement of the leading self-publishing audiobook platform, ACX, part of Amazon’s Audible service. “Your submitted audiobook must be narrated by a human,” according to the submission requirements. “TTS recordings are not allowed.”
“That’s a big hurdle, and all the [TTS] vendors recognize it. That hurdle will likely go away as soon as Amazon makes a larger commitment itself into using this automated technology,” McIlroy tells CCC’s Chris Kenneally. “Every expert that I spoke to felt that Amazon’s going to come along and endorse it 12 months, 24 months from now. But in the meantime, vendors are stuck using alternate distribution channels."