Will audio ‘fact checkers’ be able to help when fake clips of prominent people are doing the rounds?

Today I read the unsurprising but kind of terrifying news that a machine learning model can turn text into a pretty spot-on recreation of spoken voice, using Joe Rogan’s recognisable baritone as a test case.

I wanted to know if there’s a way to tell the difference between a voice that was actually recorded and one that was synthesised — could this fake Joe really be ‘perfect’? Would slight imperfections allow experts (or software) to act as fact-checkers for potentially synthesised voices?

My first port of call for any sound/music question is Bevan Smith, Wellington-based musician, composer and sound engineer (full disclosure: he also happens to be my husband). So, through the magic of a text conversation, here’s what I learned:

It’s possible for a human to tell the difference between a recording of Joe Rogan and the ‘faux’ Joe. Once Bevan had a baseline of what ‘real’ Joe sounded like, with some effort he could differentiate between the two. His initial reaction was “the faux sounds slightly higher pitched… it’s constricted and kinda more nasal”.

While the machine learning model might be improved over time to produce even more convincing results, Bevan said its very hard to fake an instrument or voice exactly. However, he added that by the time you could prove it was fake “it would be too late… people would believe it and that would be it. If you had a video of someone playing a violin but had ‘sampled’ violin playing, 999 people out of 1000 people would believe it was recorded. With the voice, maybe [it would be] 1000 out of 1000, I wouldn’t question it.”

So if this kind of technology became readily available and was used for nefarious purposes, it could be possible to ‘myth bust’ a recording, but by the time that happened, the damage would be done. But for me, that ability to maybe ‘tell the difference’ gives me a bit of hope.

Like OpenAI’s GPT-2 language model, this speech synthesis model represents a big step in what AI models can do, edging their capabilities into an ever more human-like realm. Like the GPT-2, this new model won’t be released publicly to mitigate against potential misuse, but it’s likely this type of tech will make its way out into the world before long. Now’s the time to make a plan for what happens next.

-AP