March 30, 2026
Fix Bad Audio Instantly? Inside Waves’ Voice Regeneration Tool

What happens when audio is so bad… even the best plugins can’t save it?
In this episode of The Pro Audio Suite, we’re joined by Michael “Gomez” Pearce-Adams from Waves Audio to talk about Voice Regeneration, a new AI-driven tool designed to rebuild poor-quality dialogue into something actually usable.
And this isn’t just noise reduction or EQ… it’s a complete rethink of how broken audio gets fixed.
We dig into:
- Why this will never be a plugin
- How AI is actually “rebuilding” voice recordings
- Real-world use cases, from podcast disasters to phone interviews
- What this means for audio engineers and post-production
- The limits of AI audio, and where it still falls over
There’s also a bigger conversation here about where the industry is heading, and whether tools like this are helping or hurting the craft.
If you’ve ever been handed unusable audio and told “can you fix this?”, this one’s for you.
🎙️ What You’ll Hear in This Episode
- Why traditional tools fail on truly bad audio
- The rise of content creators as the biggest audio market
- Cloud processing vs plugin workflows
- Ethical concerns around AI and audio data
- Accent bias in AI models and why it matters
- Real examples of voice regeneration in action
- How fast AI processing is changing expectations
🔗 Links & Resources
- 🌐 Waves Voice Regeneration: https://www.waves.com/voice-regeneration
- 🎧 More episodes: https://www.proaudiosuite.com
- 💬 Join the conversation: https://www.facebook.com/groups/357898255543203
🙌 Sponsors
Thanks to our mates at:
- Tri-Booth – Portable vocal booths built for voice pros.
Use code TRYPAS200 for $200 off. - Austrian Audio – Making passion heard with world-class microphones like the OC818 and OC18.
1
00:00:00,080 --> 00:00:00,440
Y'all ready?
2
00:00:00,440 --> 00:00:01,800
Be history. Get started.
3
00:00:01,800 --> 00:00:02,320
Welcome.
4
00:00:02,320 --> 00:00:03,680
Hi. Hi. Hi.
5
00:00:03,680 --> 00:00:06,600
Hello, everyone, to the pro audio suite.
6
00:00:06,600 --> 00:00:08,760
These guys are professional and motivated.
7
00:00:08,760 --> 00:00:09,680
Please take the video.
8
00:00:09,680 --> 00:00:12,960
Stars George Wisdom,
founder of Source Element Robert Marshall,
9
00:00:12,960 --> 00:00:16,960
international audio engineer Darren Robin
Roberts and global voice Andrew.
10
00:00:17,440 --> 00:00:21,240
Thanks to tribal Austrian audio lighting,
passionate elements
11
00:00:21,400 --> 00:00:25,360
George, the tech wisdom and Rob and APIs
international demos.
12
00:00:25,440 --> 00:00:26,520
Find out more about us.
13
00:00:26,520 --> 00:00:29,280
Check the Proteas sweetcorn line up.
14
00:00:29,280 --> 00:00:32,280
Ready? Here we go. I'm.
15
00:00:32,720 --> 00:00:34,480
And welcome to another pro audio suite.
16
00:00:34,480 --> 00:00:37,320
Thanks to try Booth. Don't
forget the code.
17
00:00:37,320 --> 00:00:41,240
Try Pip 200 to get $200 off yours.
18
00:00:41,240 --> 00:00:45,000
And by the time this goes to air,
I'll have mine set up somewhere
19
00:00:45,480 --> 00:00:48,160
remotely in Australia
20
00:00:48,160 --> 00:00:51,160
and Austrian audio making passion heard.
21
00:00:51,880 --> 00:00:54,880
We're hearing from our special guest
today, Michael Pearce and Adams.
22
00:00:54,960 --> 00:00:56,840
Or as we like to call him down here.
23
00:00:56,840 --> 00:01:00,200
Gomez from Waves
Audio has a brand new product,
24
00:01:00,200 --> 00:01:04,000
which I have to admit,
I think I looked at, but I can't remember.
25
00:01:04,360 --> 00:01:09,360
It's all to do with hormone treatment
that that is messing with my brain.
26
00:01:09,520 --> 00:01:12,520
I think the funniest thing
about that entire intro is
27
00:01:12,520 --> 00:01:16,440
it sounds like a testimonial on channel
ten at all. Yep.
28
00:01:16,480 --> 00:01:18,120
At 11:00 at night?
29
00:01:18,120 --> 00:01:18,480
Yeah.
30
00:01:18,480 --> 00:01:19,560
It's funny you should mention that.
31
00:01:19,560 --> 00:01:21,840
That's Andrew's pedigree. Yeah.
32
00:01:21,840 --> 00:01:24,400
I mean, a guy called Brad Turner
33
00:01:24,400 --> 00:01:28,560
used to film those in Melbourne years
and years at, like, 1990.
34
00:01:28,920 --> 00:01:29,360
It's like.
35
00:01:29,360 --> 00:01:33,960
And we've got Brad and Michael here from
double TFM to talk about a new product.
36
00:01:34,480 --> 00:01:36,400
Hang on a minute.
37
00:01:36,400 --> 00:01:37,920
Were you doing that? Yeah,
38
00:01:38,960 --> 00:01:39,960
I used to go down there.
39
00:01:39,960 --> 00:01:42,960
I was with Brad. It was a double T.
40
00:01:43,040 --> 00:01:44,840
Yeah. BT in me.
41
00:01:44,840 --> 00:01:45,480
It was fun.
42
00:01:45,480 --> 00:01:47,080
Oh my God, we must have cross paths.
43
00:01:47,080 --> 00:01:50,560
John Dramatis was also involved
at some point.
44
00:01:50,560 --> 00:01:52,440
We all used to make money like that.
45
00:01:52,440 --> 00:01:54,240
Those were the years indeed.
46
00:01:54,240 --> 00:01:56,480
Anyway, anyway,
thanks guys, for having me here.
47
00:01:56,480 --> 00:01:57,600
It's been a pleasure.
48
00:01:57,600 --> 00:01:59,800
I guess I'll see you next time, right?
Yeah. Thanks.
49
00:01:59,800 --> 00:02:01,880
Yeah. And, Good idea. Brad,
if you're listening to.
50
00:02:01,880 --> 00:02:06,520
Yeah, tell us about this new product,
because for some reason,
51
00:02:06,520 --> 00:02:09,360
I thought I would be good to help you
with the beta testing,
52
00:02:09,360 --> 00:02:11,840
so I know all about it, but
you should probably feel our listeners.
53
00:02:11,840 --> 00:02:15,680
So the one of the first things that I need
54
00:02:15,680 --> 00:02:20,400
to mention about what we call
this region is it's
55
00:02:20,400 --> 00:02:24,600
not a plug in
and it will never be a plug in.
56
00:02:24,600 --> 00:02:26,920
And we can talk about that part
later down the line.
57
00:02:26,920 --> 00:02:31,840
But one of the largest and fastest
growing,
58
00:02:32,360 --> 00:02:35,520
audiences, or of users
59
00:02:35,520 --> 00:02:38,720
who create some kind of audio
in the world,
60
00:02:39,640 --> 00:02:44,920
not w users or video
editor users, the content creators.
61
00:02:44,920 --> 00:02:50,440
And, one of the things that I thought
really long and hard about was
62
00:02:51,960 --> 00:02:53,400
they might not
63
00:02:53,400 --> 00:02:56,840
understand how to fix, bad audio,
64
00:02:56,840 --> 00:03:01,320
but they do understand
what bad audio sounds like.
65
00:03:01,320 --> 00:03:03,680
So let's give them a solution.
66
00:03:03,680 --> 00:03:09,200
So we spent about three and a half years
working on building a very large language
67
00:03:09,200 --> 00:03:13,880
model, and training it on
what good audio sounds like.
68
00:03:14,400 --> 00:03:17,840
And we've come up with this product
called Voice Region,
69
00:03:17,840 --> 00:03:21,160
which can make even the worst
70
00:03:22,400 --> 00:03:25,680
meal or retrievable dialog sound
71
00:03:26,840 --> 00:03:28,040
nearly like a podcast.
72
00:03:28,040 --> 00:03:30,640
Mike and very, very usable.
73
00:03:30,640 --> 00:03:33,640
And on the website,
some of the demos I use,
74
00:03:33,960 --> 00:03:37,440
actually most of the demos
are very Australian because, you know,
75
00:03:37,440 --> 00:03:41,480
I have access to a lot of content
creators here, but
76
00:03:41,760 --> 00:03:45,200
literally one of the cases was a podcast,
77
00:03:45,280 --> 00:03:49,160
recording from Riverside, where on video
78
00:03:49,680 --> 00:03:53,040
the guest looked like they were recording
into a nice microphone.
79
00:03:53,040 --> 00:03:54,440
But the reality was
80
00:03:54,440 --> 00:03:55,120
they were actually
81
00:03:55,120 --> 00:03:58,320
recording on their MacBook
microphone, which was across the room.
82
00:03:59,440 --> 00:04:02,640
And in that came a classic mistake.
83
00:04:02,640 --> 00:04:03,120
Common.
84
00:04:03,120 --> 00:04:06,680
Well, it's it's a hugely common problem,
but it's also one
85
00:04:06,680 --> 00:04:08,280
that is emotionally very,
86
00:04:08,280 --> 00:04:12,040
very stressful for somebody who thinks
they've got great content and realizes
87
00:04:12,400 --> 00:04:16,440
it could be completely and utterly useless
because of that fact.
88
00:04:17,040 --> 00:04:21,440
And this is kind of why I built
voice Region, not for the people who
89
00:04:22,320 --> 00:04:25,960
mostly control the content,
but for all of those out of control
90
00:04:25,960 --> 00:04:29,040
scenarios where bad things happen
91
00:04:29,680 --> 00:04:32,440
and you have to work out,
how do I retrieve this?
92
00:04:32,440 --> 00:04:33,560
Do I pay somebody?
93
00:04:33,560 --> 00:04:38,360
Do I learn how to use, tool
that I've never seen before?
94
00:04:38,400 --> 00:04:40,720
Is this something
I can just drag and drop it onto
95
00:04:40,720 --> 00:04:43,200
and have it fix it for me,
which is what we built.
96
00:04:43,200 --> 00:04:46,520
It's quite seriously some of
those examples that you have up there,
97
00:04:47,840 --> 00:04:48,600
the normal
98
00:04:48,600 --> 00:04:51,720
plug in would probably be of no help
anyway.
99
00:04:51,720 --> 00:04:53,120
They're so far gone.
100
00:04:53,120 --> 00:04:54,120
Yeah, that you would
101
00:04:54,120 --> 00:04:57,200
only be detrimental to the audio
you were trying to keep anyway.
102
00:04:57,280 --> 00:04:59,400
And obviously,
if you think about the amount of time
103
00:04:59,400 --> 00:05:00,920
that it would, it would take.
104
00:05:00,920 --> 00:05:04,960
So for example, Robbo, if you've been paid
to take one of those pieces of audio,
105
00:05:05,240 --> 00:05:06,880
you'd spend a few hours on it.
106
00:05:06,880 --> 00:05:09,880
You'd still go back to it
and go, no, I'm still not happy with it.
107
00:05:11,360 --> 00:05:15,400
And one of the first things that we had
when we put the first demos out
108
00:05:15,400 --> 00:05:19,560
was people from, you know, our existing
audience, which are all plugin users,
109
00:05:19,560 --> 00:05:22,560
which we're very grateful for, saying,
make this a plugin,
110
00:05:22,840 --> 00:05:24,200
and we're constantly explaining.
111
00:05:24,200 --> 00:05:27,240
Now, the reason
this will never be a plugin
112
00:05:27,240 --> 00:05:32,640
is because it would break the
CPU and GPU on any computer.
113
00:05:33,640 --> 00:05:34,160
It doesn't
114
00:05:34,160 --> 00:05:37,600
matter if it's an M5
that came out last week or,
115
00:05:38,120 --> 00:05:41,520
you know, a really powerful PC,
it just would break it.
116
00:05:42,640 --> 00:05:46,760
It's not possible if this takes
too much energy and too much power.
117
00:05:47,360 --> 00:05:49,120
So it's in the cloud.
118
00:05:49,120 --> 00:05:51,160
Tell us, what are we doing to the audio?
119
00:05:51,160 --> 00:05:53,800
I mean it's called voice region.
120
00:05:53,800 --> 00:05:56,360
We're literally
well we're literally regenerating.
121
00:05:56,360 --> 00:05:58,600
So we're listening to what it is.
122
00:05:58,600 --> 00:06:00,680
We're taking the tonality.
123
00:06:00,680 --> 00:06:05,120
As much as we can find,
and we're literally rebuilding it
124
00:06:05,120 --> 00:06:09,160
so that it is
what the LM expects it to be.
125
00:06:09,600 --> 00:06:14,160
One of the examples
that displays that fairly clearly on
126
00:06:14,160 --> 00:06:17,520
the website is a Creative Commons
public recording
127
00:06:17,520 --> 00:06:21,320
from one of the 60s trips to the moon,
128
00:06:21,600 --> 00:06:24,880
where we take the NASA
129
00:06:24,880 --> 00:06:28,640
radio from the moon and make it sound
like he's talking into a podcast mic.
130
00:06:30,400 --> 00:06:32,000
And you can tell it's the same person.
131
00:06:32,000 --> 00:06:35,520
It's just it just one minute he's
on the moon, in the next minute he's not.
132
00:06:35,960 --> 00:06:38,760
And so this.
133
00:06:38,760 --> 00:06:40,440
But was he that was he on the moon?
134
00:06:40,440 --> 00:06:43,800
I wasn't going to go there,
but I didn't want to, you know. So.
135
00:06:44,360 --> 00:06:48,120
And my point would be that, Robert,
there's hope for our podcast.
136
00:06:48,120 --> 00:06:49,040
Yes, man.
137
00:06:49,040 --> 00:06:50,040
Yes. That's right.
138
00:06:50,040 --> 00:06:52,880
Well, I mean,
I think I'll just stop recording right now
139
00:06:52,880 --> 00:06:54,920
and you can just capture
this directly off the phone.
140
00:06:54,920 --> 00:06:56,160
Right.
141
00:06:56,160 --> 00:06:56,960
Good enough.
142
00:06:56,960 --> 00:07:00,240
No, no you that Robert it it please,
please please don't.
143
00:07:00,280 --> 00:07:04,600
But take into account Robert that
I actually can do that with voice region.
144
00:07:04,760 --> 00:07:05,680
Right. That's what I'm saying.
145
00:07:05,680 --> 00:07:07,800
I mean like live by the sword, die
by the sword.
146
00:07:07,800 --> 00:07:08,720
Let's let's do it, Rob.
147
00:07:08,720 --> 00:07:12,640
Oh, come on, I'm already doing it
with Gomezs audio actually going to be.
148
00:07:12,880 --> 00:07:15,480
And there's and there's
a couple of different ways to use this.
149
00:07:15,480 --> 00:07:17,440
And there's a couple of different things
150
00:07:17,440 --> 00:07:20,600
that I've built in
and that we're about to kind of add.
151
00:07:20,920 --> 00:07:26,080
One of them is I've given, people
the ability to record directly on to the,
152
00:07:26,120 --> 00:07:29,120
in the app on the either on their phone
153
00:07:29,600 --> 00:07:31,880
or on a computer.
154
00:07:31,880 --> 00:07:33,600
And I've given them a place
to put this script.
155
00:07:33,600 --> 00:07:34,680
And you can either use it
156
00:07:34,680 --> 00:07:37,960
as a teleprompter and have it scroll,
or you can just kind of stick
157
00:07:38,000 --> 00:07:39,600
speed to zero in the script,
158
00:07:39,600 --> 00:07:42,040
will stay there in front of you
so that you can actually
159
00:07:42,040 --> 00:07:43,800
look at the script
while you're talking to a mic,
160
00:07:43,800 --> 00:07:47,720
the same way a voiceover does,
and then for for a.
161
00:07:47,720 --> 00:07:49,320
And one of the reasons we did
162
00:07:49,320 --> 00:07:53,280
this is because a lot of the time,
if talent is remote,
163
00:07:53,960 --> 00:07:58,800
whether it's a podcast guest or somebody
else, you can still make them sound
164
00:07:59,320 --> 00:08:01,960
like they're,
in front of a better microphone
165
00:08:01,960 --> 00:08:05,760
and the room it
with just drag and drop and process.
166
00:08:06,320 --> 00:08:10,680
We're about to add in two weeks,
a pop out teleprompter
167
00:08:11,280 --> 00:08:12,560
so that you can actually put
168
00:08:12,560 --> 00:08:16,400
the teleprompter, on a script,
a separate screen, make it really large
169
00:08:16,680 --> 00:08:20,040
just underneath your camera
so that you can use a teleprompter.
170
00:08:20,040 --> 00:08:23,400
And we're also adding record video
onto the recording page
171
00:08:23,760 --> 00:08:26,760
so you can select a video camera,
select your microphone.
172
00:08:27,000 --> 00:08:29,800
And even if it's the MacBook microphone
and you're in the kitchen,
173
00:08:29,800 --> 00:08:32,640
we'll make it sound like you've got
you're wearing a lapel.
174
00:08:32,640 --> 00:08:34,920
So this all sets up on a server,
is that correct?
175
00:08:34,920 --> 00:08:35,640
Yeah.
176
00:08:35,640 --> 00:08:39,600
And is this part of the,
what's the bundle?
177
00:08:39,600 --> 00:08:42,160
You know, the the subscription.
178
00:08:42,160 --> 00:08:43,200
Do you get that? No, no.
179
00:08:43,200 --> 00:08:46,400
If you're if you're if you're
if you're paying for creative access
180
00:08:46,400 --> 00:08:49,760
plug ins, then you're not getting region.
181
00:08:49,760 --> 00:08:52,760
If you want region
then just go and subscribe.
182
00:08:53,120 --> 00:08:54,920
I mean, I just made it super cheap.
183
00:08:54,920 --> 00:08:58,120
I mean, to get five hours of processing
per month,
184
00:08:58,120 --> 00:09:01,440
I'm charging $4.99 for us right now.
185
00:09:02,200 --> 00:09:02,640
That's cheap.
186
00:09:02,640 --> 00:09:06,600
But I also give you,
everybody on a free account
187
00:09:06,600 --> 00:09:10,520
which needs no billing address,
no credit card or anything.
188
00:09:10,880 --> 00:09:12,600
You get five minutes free per day.
189
00:09:12,600 --> 00:09:16,160
So if you think about all the short
form content and then people could
190
00:09:16,160 --> 00:09:21,040
literally record on their phone,
with the video, not have to worry.
191
00:09:21,040 --> 00:09:26,000
Lapel put the video straight into voice
region and then post it on TikTok, etc..
192
00:09:27,480 --> 00:09:28,440
And by the way, we're
193
00:09:28,440 --> 00:09:32,680
finding the video
files are about 50% of the files
194
00:09:32,680 --> 00:09:35,680
being uploaded to voice region
right now. So
195
00:09:36,080 --> 00:09:37,680
so it takes in video.
196
00:09:37,680 --> 00:09:40,480
Dymocks is it does the audio processing
remarks is a video.
197
00:09:40,480 --> 00:09:41,280
Absolutely.
198
00:09:41,280 --> 00:09:43,680
We give you back in MP4 with clean
audience.
199
00:09:43,680 --> 00:09:44,680
Here's a question then.
200
00:09:44,680 --> 00:09:48,480
And look I know it probably doesn't
need to be answered by you guys, but
201
00:09:48,480 --> 00:09:50,240
I'm sure plenty of people have asking it.
202
00:09:50,240 --> 00:09:54,440
Given the AI temperature at the moment,
shall we say,
203
00:09:54,440 --> 00:10:00,000
what happens to all these AI samples
once they've been processed?
204
00:10:00,000 --> 00:10:04,120
Your audio stays on your dashboard
until 14 days,
205
00:10:04,120 --> 00:10:06,680
and then at that point
it's automatically deleted.
206
00:10:06,680 --> 00:10:08,680
And if you look at the user dashboard, it
says,
207
00:10:08,680 --> 00:10:10,680
I think I've got it in like two places
right now.
208
00:10:10,680 --> 00:10:15,240
It says anything you have in
your files is deleted after 14 days.
209
00:10:15,800 --> 00:10:18,560
One of the things that we do not do
210
00:10:18,560 --> 00:10:21,240
is use any user's files
211
00:10:21,240 --> 00:10:24,040
to try and improve their model,
and model is our model.
212
00:10:24,040 --> 00:10:27,360
We improve it by paying for content
213
00:10:27,360 --> 00:10:31,120
from specific companies that do that
kind of thing for two reasons.
214
00:10:31,120 --> 00:10:32,920
Number one, it's because it's, you know,
215
00:10:32,920 --> 00:10:35,920
we want to stay 100% legal
with where our content comes from.
216
00:10:35,960 --> 00:10:39,400
Number two,
scraping is a really, really bad idea.
217
00:10:39,400 --> 00:10:42,400
It's unethical, it's not moral,
and it's lying.
218
00:10:42,760 --> 00:10:45,600
So we do not use any user's
219
00:10:45,600 --> 00:10:48,600
audio to help educate our model at all.
220
00:10:48,800 --> 00:10:51,800
And by the way,
there's another reason for that.
221
00:10:52,200 --> 00:10:55,920
When you've got a large language
model like this, you don't educate it
222
00:10:55,920 --> 00:10:57,520
by giving it terrible audio.
223
00:10:57,520 --> 00:11:01,040
You you educate it
by giving it very high quality audio.
224
00:11:01,880 --> 00:11:05,240
It's not like
we can educate this by giving it
225
00:11:05,840 --> 00:11:09,000
region processed audio,
because that's a bit like,
226
00:11:09,680 --> 00:11:13,400
you know, open AI chat about scraping
and finding more information
227
00:11:13,400 --> 00:11:15,040
that it created that was wrong.
228
00:11:15,040 --> 00:11:18,720
And it's it's, you know, eating it's
the beast is eating its tail.
229
00:11:18,720 --> 00:11:19,920
Yeah. It's exactly.
230
00:11:19,920 --> 00:11:21,840
Yeah. It's like it's not the way to do it.
231
00:11:21,840 --> 00:11:24,720
But you know, it's
nobody's information gets used.
232
00:11:24,720 --> 00:11:29,520
When does the 499 limited time
because it's normally 999 right.
233
00:11:29,520 --> 00:11:31,000
Well for the creator.
234
00:11:31,000 --> 00:11:31,360
Fine.
235
00:11:31,360 --> 00:11:35,880
Normally it's like ultimately
as far as I'm just, unconcerned,
236
00:11:36,320 --> 00:11:42,240
I will keep going on the full 99
until I feel like I've reached,
237
00:11:42,320 --> 00:11:47,160
a point where, we're established.
238
00:11:47,200 --> 00:11:51,240
We released this on January 26th,
and in a very quiet launch
239
00:11:51,840 --> 00:11:55,920
so far, we we have about 35,000 members,
which, you know, for.
240
00:11:56,120 --> 00:11:58,480
Wow, it's a month and a half
I'm pretty proud of.
241
00:11:58,480 --> 00:11:59,920
Wow. Yes.
242
00:11:59,920 --> 00:12:03,400
We've processed
a ridiculous amount of audio
243
00:12:03,400 --> 00:12:06,720
in that time, which, I also keep tabs on,
244
00:12:07,160 --> 00:12:11,760
but I want people to keep on joining
because one of the things that happens
245
00:12:11,760 --> 00:12:16,160
when you have a free account
and you're using it is occasionally
246
00:12:16,160 --> 00:12:20,640
an error happens
or a file that you upload doesn't work.
247
00:12:20,760 --> 00:12:24,160
And those kind of errors do get flagged.
248
00:12:24,160 --> 00:12:26,320
And they help me and my team.
249
00:12:26,320 --> 00:12:30,120
And I'm looking at the data
every day to see, okay,
250
00:12:30,120 --> 00:12:33,120
I can't hear the file,
I can't see the file.
251
00:12:33,120 --> 00:12:36,840
But if it fails,
I can see what codec it was
252
00:12:37,200 --> 00:12:40,040
and kind of like
what links or what was in it,
253
00:12:40,040 --> 00:12:45,240
so that I actually can learn and improve
how we deal with errors
254
00:12:45,560 --> 00:12:49,680
and also keep on adding,
oh, there's that codec from that.
255
00:12:49,680 --> 00:12:50,400
Okay, cool.
256
00:12:50,400 --> 00:12:53,000
Let's add that
I want to keep the price down
257
00:12:53,000 --> 00:12:57,000
so that we keep on bringing more people
in, by the way, process really fast.
258
00:12:57,000 --> 00:13:01,320
So I mean it immediately tells you
to download a WAV file, I guess that
259
00:13:01,600 --> 00:13:06,080
did you do that so that you could ensure
that there be a local copy for the user?
260
00:13:06,080 --> 00:13:06,760
Absolutely.
261
00:13:06,760 --> 00:13:10,280
And if you once you've done
that once, effectively what will happen
262
00:13:10,280 --> 00:13:13,280
is you press record as you just did.
263
00:13:13,440 --> 00:13:16,520
You can preview it
and go, yeah, I, I'm gonna like that.
264
00:13:16,520 --> 00:13:17,560
I'm going to process that.
265
00:13:17,560 --> 00:13:21,120
As soon as that happens, you press, okay,
it will take a copy of that
266
00:13:21,120 --> 00:13:25,200
and put it into your default downloads
folder so that you have a redundancy.
267
00:13:26,560 --> 00:13:29,880
And for my
perspective, anybody who wants to record
268
00:13:29,880 --> 00:13:34,200
anything should have a backup of that
that's not touched by any server.
269
00:13:34,200 --> 00:13:36,600
So if you haven't got it back out,
it doesn't cut it at all. Right. Yeah.
270
00:13:36,600 --> 00:13:37,680
Exactly. Exactly.
271
00:13:37,680 --> 00:13:40,920
I mean, remember this is developed
by somebody whose theory is
272
00:13:40,920 --> 00:13:43,880
there's no such thing as a backup
unless as the backup of the backup.
273
00:13:43,880 --> 00:13:46,560
Right. You don't have a copy
unless you have two copies. Exactly.
274
00:13:46,560 --> 00:13:46,800
Yeah.
275
00:13:46,800 --> 00:13:49,080
And the other thing
is, once you click on a file,
276
00:13:49,080 --> 00:13:52,120
you have a toggle between the before
and after audio.
277
00:13:52,720 --> 00:13:56,040
And most of the time,
you can just tell by even looking
278
00:13:56,040 --> 00:13:59,040
at the WAV file
that something dramatic has happened.
279
00:13:59,040 --> 00:14:01,680
Is the processing
basically faster than real time?
280
00:14:01,680 --> 00:14:04,560
Way faster than really quick,
really quick?
281
00:14:04,560 --> 00:14:05,280
Yeah.
282
00:14:05,280 --> 00:14:06,640
So how'd you go, Robert?
283
00:14:06,640 --> 00:14:08,760
If you got something,
you can play I. George. Sorry, you guys.
284
00:14:08,760 --> 00:14:12,280
So here's here's, Robert on the phone,
I don't know, I'm going to ask Gomes.
285
00:14:12,760 --> 00:14:16,320
When does the 499 limited time not.
286
00:14:16,880 --> 00:14:19,560
It's because it's normally 999. Right.
287
00:14:19,560 --> 00:14:21,720
All right. And here's
the processed version I don't know.
288
00:14:21,720 --> 00:14:26,840
I'm going to ask Gomes,
when does the port 9900 time not.
289
00:14:27,360 --> 00:14:29,440
It's because it's normally 999, right.
290
00:14:29,440 --> 00:14:30,280
Isn't that bizarre?
291
00:14:30,280 --> 00:14:33,280
It sounds all like a phone to me.
292
00:14:33,360 --> 00:14:37,080
Yeah, yeah, let's let's hear
technology, man.
293
00:14:37,080 --> 00:14:39,360
Sorry. Yeah. That's right.
294
00:14:39,360 --> 00:14:41,200
Yeah, it is freaky.
295
00:14:41,200 --> 00:14:44,160
The first time I did that,
I was not on the phone.
296
00:14:44,160 --> 00:14:48,040
I haven't done a phone one like that,
but no, but the thing is, you know, it's
297
00:14:48,040 --> 00:14:51,120
like we've got we've got a couple of radio
networks in the States
298
00:14:51,120 --> 00:14:54,200
and a radio network in Australia
using this now.
299
00:14:54,480 --> 00:14:58,440
So that phone interviews,
even if it's down to an office to,
300
00:14:58,480 --> 00:15:01,800
you know, doing tags,
hey, it's such and such
301
00:15:01,800 --> 00:15:05,160
and you're listening to suddenly
it's not a phone interview anymore.
302
00:15:05,160 --> 00:15:07,760
Suddenly
you've got them sitting in a studio record
303
00:15:07,760 --> 00:15:10,760
doing the tags,
and they can be used in proper promos.
304
00:15:10,840 --> 00:15:12,880
Take all your old phone IDs and,
305
00:15:14,000 --> 00:15:14,800
hundred percent you
306
00:15:14,800 --> 00:15:17,920
could have you could have Jim Morrison
saying goodbye.
307
00:15:17,920 --> 00:15:21,320
This is Jim Morrison sounding
like he was sitting across the room.
308
00:15:21,320 --> 00:15:23,280
So I don't think Jim Morrison
that's like it.
309
00:15:23,280 --> 00:15:25,800
I just tend not to mind.
310
00:15:25,800 --> 00:15:30,400
So as a as an experiment, it's
just because I don't really have much of
311
00:15:30,400 --> 00:15:30,880
a life.
312
00:15:30,880 --> 00:15:34,320
I about to about three weeks ago,
313
00:15:34,320 --> 00:15:39,400
I went looking for the oldest John
Lennon interviews,
314
00:15:39,400 --> 00:15:42,400
and I found a video of one
315
00:15:42,600 --> 00:15:45,120
that had so much ground.
316
00:15:45,120 --> 00:15:49,080
And, you know, it was like
the interview was in a white coat
317
00:15:49,080 --> 00:15:50,520
and all very, very. And.
318
00:15:50,520 --> 00:15:50,880
Yeah.
319
00:15:50,880 --> 00:15:54,160
And and I put it through voice reach in
and then I put it
320
00:15:54,160 --> 00:15:57,320
through a visual sharpener called Topaz
video.
321
00:15:57,560 --> 00:16:03,400
And I ended up watching this kind of like
Netflix HD video of John Lennon.
322
00:16:03,400 --> 00:16:06,400
And it's it sounded amazing.
323
00:16:06,640 --> 00:16:10,440
I mean, this has some fairly
it has a lot of use cases.
324
00:16:10,440 --> 00:16:12,360
This song
325
00:16:12,360 --> 00:16:15,480
is probably probably a bit advanced
for where you're at with the software.
326
00:16:15,480 --> 00:16:19,440
But just thinking as an audio engineer
now, like we sat to hear the
327
00:16:19,920 --> 00:16:22,080
John Lennon interview
that we're just talking about,
328
00:16:22,080 --> 00:16:24,880
can I pan left and right
or is it just all straight up?
329
00:16:24,880 --> 00:16:27,360
So right now
it's all up the center and in.
330
00:16:27,360 --> 00:16:31,920
And so for example, right now if you also
if you put in a multi-channel file,
331
00:16:32,360 --> 00:16:35,040
at this point in time,
332
00:16:35,040 --> 00:16:38,560
if it's only stereo,
we will, will mono it.
333
00:16:38,800 --> 00:16:43,240
If it's a multi-channel
more than two channels
334
00:16:43,240 --> 00:16:46,520
at this point in time,
we'll politely tell you we can't do it.
335
00:16:46,520 --> 00:16:49,520
Or if we think we can do it,
336
00:16:49,560 --> 00:16:52,200
we're only doing the first channel.
337
00:16:52,200 --> 00:16:53,600
Right.
338
00:16:53,600 --> 00:16:54,720
We're working on that.
339
00:16:54,720 --> 00:17:00,600
That's another reason why I study the data
so much, is so that we can learn from
340
00:17:01,080 --> 00:17:04,800
how the system deals
with, different kinds of files.
341
00:17:05,360 --> 00:17:06,240
It's for spoken.
342
00:17:06,240 --> 00:17:08,160
This is for dialog only. Yes.
343
00:17:08,160 --> 00:17:09,640
Speech.
344
00:17:09,640 --> 00:17:12,400
Yeah. Yeah,
but I can see, like Peter Jackson.
345
00:17:13,360 --> 00:17:14,200
I'm surprised he hasn't
346
00:17:14,200 --> 00:17:17,200
been on the, on the hornpipe saying,
can I have a go?
347
00:17:17,280 --> 00:17:21,280
We already have a couple
of post-production studios who have said
348
00:17:21,280 --> 00:17:26,080
we used this for boom mics from a film
when one of the mics failed.
349
00:17:27,000 --> 00:17:28,560
So it's like.
350
00:17:28,560 --> 00:17:31,960
And one of the reasons
why we've got a pro account which has
351
00:17:32,760 --> 00:17:36,800
13 hours, and then we'll end up
with an enterprise account.
352
00:17:37,040 --> 00:17:40,160
The other thing that we're doing with
this is we're also about to launch
353
00:17:40,160 --> 00:17:43,160
an API version of Voice region,
354
00:17:43,440 --> 00:17:46,920
so that you can actually plug
a voice region into your own funnel,
355
00:17:47,000 --> 00:17:50,160
and kind of create your own service
out of it.
356
00:17:50,360 --> 00:17:54,080
The only person I can say, or part
of the industry suffering from this is
357
00:17:54,080 --> 00:17:57,840
if you're into audio dialog replacement,
because you won't be needed anymore.
358
00:17:57,840 --> 00:17:59,760
Well, well, yes or no.
359
00:17:59,760 --> 00:18:01,280
I mean, there's a couple of things.
360
00:18:01,280 --> 00:18:07,720
Firstly, a lot of the situations
that voice region is useful to a user
361
00:18:08,120 --> 00:18:11,800
are the users who would never
go to a dialog editor in the first place.
362
00:18:12,320 --> 00:18:14,520
They don't even know what the term means.
363
00:18:14,520 --> 00:18:16,520
All they know is their audio is terrible.
364
00:18:16,520 --> 00:18:18,240
They need a solution. What?
365
00:18:18,240 --> 00:18:21,120
What I'm not selling them is a hammer
and a nail.
366
00:18:21,120 --> 00:18:25,320
I'm I'm I'm selling them
a nail hammered into a wall.
367
00:18:26,320 --> 00:18:27,080
Yeah.
368
00:18:27,080 --> 00:18:29,880
Okay. They don't need the process.
They don't understand the process.
369
00:18:29,880 --> 00:18:32,880
They will never know who to go to
or who to ask for.
370
00:18:33,200 --> 00:18:36,200
So therefore, I'm not really risking
anybody's career.
371
00:18:36,360 --> 00:18:39,880
The other thing that's important
here is the content creator
372
00:18:39,880 --> 00:18:42,960
market is like 150 million times
373
00:18:42,960 --> 00:18:45,960
bigger than the audio pro market.
374
00:18:46,200 --> 00:18:48,920
And, I feel like
375
00:18:48,920 --> 00:18:52,200
I'm giving it to like 1% of it.
376
00:18:52,200 --> 00:18:56,760
Maybe so I don't I'm not losing sleep,
feeling like I'm
377
00:18:56,800 --> 00:18:58,640
taking anybody's job away.
378
00:18:58,640 --> 00:19:00,600
I'll tell you someone who could
he could benefit from this.
379
00:19:00,600 --> 00:19:03,880
And as a podcast, I was watching earlier
and I'll tell you exactly
380
00:19:03,880 --> 00:19:07,600
what it's called,
I think, racing news three, six, five.
381
00:19:09,480 --> 00:19:12,520
And they, they do a Formula One podcast
382
00:19:13,680 --> 00:19:17,520
every time there's someone beaming in
383
00:19:17,680 --> 00:19:21,200
to them, like a guest or something,
or the person's on location,
384
00:19:21,800 --> 00:19:24,480
they compressed the shit out of it
to a point
385
00:19:24,480 --> 00:19:27,720
where it's so gated
that half the stuff is missing.
386
00:19:27,720 --> 00:19:30,160
It's like, what the hell did you like?
Woods have gone. Yeah.
387
00:19:30,160 --> 00:19:31,800
It's like, yeah.
388
00:19:31,800 --> 00:19:34,800
It's like,
come on, I've sent them a message saying
389
00:19:34,920 --> 00:19:37,680
your gates count is too heavy
with the gating.
390
00:19:37,680 --> 00:19:39,600
I cut off half your dialog is gone.
391
00:19:39,600 --> 00:19:42,600
Well, you raise an interesting point
that I was going to make
392
00:19:42,680 --> 00:19:45,480
was that, you know,
they're they're fed content creators.
393
00:19:45,480 --> 00:19:47,880
There actually is a direct correlation
394
00:19:47,880 --> 00:19:51,880
between the quality of your audio
and how much people believe you.
395
00:19:52,160 --> 00:19:52,520
Yeah.
396
00:19:52,520 --> 00:19:55,920
It's actually been studied
in at the University of Queensland
397
00:19:55,920 --> 00:19:59,200
and that one in California,
when we talked about this before,
398
00:19:59,200 --> 00:20:04,720
you know, it's like you can draw a line
almost in terms of how listener perception
399
00:20:04,720 --> 00:20:08,560
of the information you're giving them
falls as your audio quality gets more.
400
00:20:08,640 --> 00:20:12,960
Doesn't matter
if you've got an ethics three camera
401
00:20:12,960 --> 00:20:18,360
with a Sigma lens, anamorphic
perfect lighting, and your sharp focus.
402
00:20:18,360 --> 00:20:21,280
If your audio sounds like you're recording
403
00:20:21,280 --> 00:20:25,120
on a Logitech
C nine to a webcam in a kitchen,
404
00:20:26,200 --> 00:20:26,760
which you
405
00:20:26,760 --> 00:20:30,440
probably, it's just a random side note
406
00:20:30,600 --> 00:20:36,280
the highest selling webcams
still in 2026 is the Logitech C9 two.
407
00:20:36,440 --> 00:20:38,320
I've got one.
408
00:20:38,320 --> 00:20:39,800
I think I'm using it right now.
409
00:20:39,800 --> 00:20:40,960
Literally, literally.
410
00:20:40,960 --> 00:20:46,200
It's like I did
I did, I have, a report software
411
00:20:46,240 --> 00:20:50,360
that I pay for a lot for every month
that tells me how many things sell on
412
00:20:50,360 --> 00:20:52,000
Amazon per product.
413
00:20:52,000 --> 00:20:56,240
This thing just on Amazon
sells like at least 20,000 a day.
414
00:20:56,240 --> 00:20:57,240
It's insane.
415
00:20:57,240 --> 00:20:58,560
It's not a bad camera. Lenses.
416
00:20:58,560 --> 00:21:00,960
I just see nine, two, two.
417
00:21:00,960 --> 00:21:03,360
Oh, you've got the posh version
anyway, so.
418
00:21:03,360 --> 00:21:06,080
But yes, good audio is confidence.
419
00:21:06,080 --> 00:21:08,680
And that's kind of one of the things
that from my perspective,
420
00:21:08,680 --> 00:21:13,080
I wanted with voice
region is to give people confidence
421
00:21:13,080 --> 00:21:16,920
because, you know, if you look amazing
but you sound is really rough.
422
00:21:17,480 --> 00:21:20,040
It does lower your self-esteem.
423
00:21:20,040 --> 00:21:22,640
And you think,
do I really want to put that out there?
424
00:21:22,640 --> 00:21:25,760
What I want is somebody to feel like
they can just literally
425
00:21:26,040 --> 00:21:29,280
pick up their phone, record
something, hit process, and
426
00:21:29,840 --> 00:21:32,840
send it out into the world,
know that it sounds amazing.
427
00:21:33,200 --> 00:21:36,200
Without knowing how it was achieved.
428
00:21:36,960 --> 00:21:38,640
We'll focus on the content. Yeah.
429
00:21:38,640 --> 00:21:42,240
And let us look after your
your polish, I agree.
430
00:21:42,240 --> 00:21:45,480
So if you like what you've come up with
for people who are doing content,
431
00:21:45,480 --> 00:21:48,520
instead of worrying about all the shit
that they have no idea what it is,
432
00:21:49,080 --> 00:21:53,080
just do what you do best and let the
you know machine do the work for you.
433
00:21:53,080 --> 00:21:54,280
The other bit for you.
434
00:21:54,280 --> 00:21:58,640
The other thing is that, and I
this is something which I, denied about
435
00:21:58,640 --> 00:22:02,000
was how large a file
436
00:22:02,000 --> 00:22:05,320
I wanted to enable people to play with.
437
00:22:06,120 --> 00:22:08,880
So you can upload anything to voice
region,
438
00:22:08,880 --> 00:22:12,720
up to 1.5 gig in size.
439
00:22:13,280 --> 00:22:16,440
So now, 1.5 gig.
440
00:22:16,440 --> 00:22:20,520
If you're at 724, you know,
441
00:22:20,520 --> 00:22:24,840
something like that with video
you're talking, I mean, I ran
442
00:22:25,800 --> 00:22:28,600
one of the webinars
I did with kind of like Andrew Shepp.
443
00:22:28,600 --> 00:22:31,680
So somebody I ran the entire webinar,
444
00:22:31,680 --> 00:22:35,320
720p after I ripped it down off YouTube
through Voice Region.
445
00:22:35,320 --> 00:22:39,360
And that was like two hours and it cleaned
up at vocals, something chronic.
446
00:22:39,360 --> 00:22:40,440
It was amazing.
447
00:22:40,440 --> 00:22:42,600
But 1.5 gig I think is enough.
448
00:22:42,600 --> 00:22:46,200
And if you just doing audio you could
that's like four hours of audio
449
00:22:46,200 --> 00:22:46,800
right there.
450
00:22:46,800 --> 00:22:49,360
And if you're doing
four hours of podcasting,
451
00:22:49,360 --> 00:22:51,640
you're not going to have an audience,
right?
452
00:22:51,640 --> 00:22:53,200
Okay. They're gone.
453
00:22:53,200 --> 00:22:53,800
They're asleep.
454
00:22:53,800 --> 00:22:55,560
Yeah. We should know. Yeah.
455
00:22:55,560 --> 00:22:57,720
Yeah. That's that's right.
456
00:22:57,720 --> 00:22:59,160
Yeah. About not having an audience.
457
00:22:59,160 --> 00:23:02,160
I mean, yeah.
458
00:23:02,680 --> 00:23:04,800
I think yeah yeah yeah yeah yeah.
459
00:23:04,800 --> 00:23:07,280
So where do you think this leads Gomez?
Where do you think? What?
460
00:23:07,280 --> 00:23:11,440
What's the next horizon
if content creator focus is added
461
00:23:11,440 --> 00:23:13,280
to what waves is already doing?
462
00:23:13,280 --> 00:23:16,160
I mean, not not specifically,
but in a broader sense.
463
00:23:16,160 --> 00:23:20,160
Do you think as an industry in general,
that focus is going to move
464
00:23:20,160 --> 00:23:21,000
in that direction?
465
00:23:21,000 --> 00:23:22,360
Not really.
466
00:23:22,360 --> 00:23:24,440
I mean, let me give you some perspective.
467
00:23:24,440 --> 00:23:29,960
So we there are,
three product managers, waves
468
00:23:29,960 --> 00:23:34,760
who deal specifically in live sound,
because we have one of the most popular
469
00:23:35,280 --> 00:23:38,160
digital live
consoles in the world with the LP one.
470
00:23:39,640 --> 00:23:41,720
There are, ten
471
00:23:41,720 --> 00:23:46,040
product managers who deal specifically
with the plug in market
472
00:23:46,040 --> 00:23:49,240
and coming up with ideas for plug ins,
developing plug ins, etc.
473
00:23:49,840 --> 00:23:51,360
is that's what a product manager does.
474
00:23:51,360 --> 00:23:56,080
A product manager comes up with
the concept for something, fleshes it out,
475
00:23:56,080 --> 00:24:00,000
and then works with the development team
to all the way through to make it happen.
476
00:24:00,600 --> 00:24:05,960
There's one product manager, waves,
who deals with the content creator market.
477
00:24:05,960 --> 00:24:07,280
That's me.
478
00:24:07,280 --> 00:24:12,680
And for a company who has, you know,
we have about 5 million
479
00:24:12,680 --> 00:24:15,840
plug in users who have purchased
plug ins over the years.
480
00:24:15,840 --> 00:24:20,160
And in our database,
our focus is still very,
481
00:24:20,160 --> 00:24:24,480
very much on looking after, the market
that is our heritage.
482
00:24:24,880 --> 00:24:31,040
However, my goals are set on
how can we take advantage
483
00:24:31,040 --> 00:24:34,080
of the processing
that we have, the skills that we have
484
00:24:34,640 --> 00:24:39,440
and the technology that we have to make
a content creators life easier
485
00:24:39,880 --> 00:24:45,240
so that they can focus on content and
not focus on trying to learn a new skill.
486
00:24:45,840 --> 00:24:46,880
And that does two things.
487
00:24:46,880 --> 00:24:51,560
Number one, it helps us retain
the people in the industry like yourselves
488
00:24:51,560 --> 00:24:56,520
who, voiceovers,
production experts, etc..
489
00:24:57,000 --> 00:25:01,240
And means there are less people out there
on Fiverr saying,
490
00:25:01,240 --> 00:25:06,240
hey, I can to you a radio ad,
but also means that,
491
00:25:06,360 --> 00:25:09,600
the audience gets a better product
regardless.
492
00:25:10,160 --> 00:25:13,200
Is there a user feedback, functionality?
493
00:25:13,200 --> 00:25:17,160
I remember kind of in the earlier days
of this type of technology, I'd
494
00:25:17,160 --> 00:25:22,400
watch YouTube videos
that occasional audio randomness,
495
00:25:23,360 --> 00:25:24,280
you'd be just listening to
496
00:25:24,280 --> 00:25:27,560
the dialog, and then 1 or 2
words were just, you know, this.
497
00:25:27,640 --> 00:25:29,640
Yeah. I mean, yeah, yeah,
it wasn't that way.
498
00:25:29,640 --> 00:25:33,440
So if that, that like it was a discord
or one of those.
499
00:25:33,840 --> 00:25:38,160
So I used to crunch the crap
out of things.
500
00:25:38,520 --> 00:25:39,000
Yeah.
501
00:25:39,000 --> 00:25:42,400
So can you, can you give feedback
like if you get a, if you get bad output,
502
00:25:42,400 --> 00:25:46,320
which I would imagine you've done a lot
to prevent from ever happening.
503
00:25:46,320 --> 00:25:47,480
But can you give feedback.
504
00:25:47,480 --> 00:25:52,440
So so will we with the there was a support
page and a contact form,
505
00:25:52,440 --> 00:25:56,480
and I get a report from Tech Support
literally
506
00:25:56,760 --> 00:26:00,480
six days a week on
anything that comes from them.
507
00:26:00,480 --> 00:26:01,600
And we escalate it.
508
00:26:01,600 --> 00:26:05,880
So if it's something that's really easy
and it's a user issue and 80% at the time,
509
00:26:05,880 --> 00:26:08,320
a problem that comes to us
is a user issue.
510
00:26:08,320 --> 00:26:11,080
So something they didn't quite understand.
511
00:26:11,080 --> 00:26:12,480
That we clear up.
512
00:26:12,480 --> 00:26:15,560
But if there's a real problem with audio,
then I will contact them
513
00:26:15,560 --> 00:26:19,840
directly and say,
are you happy to let me listen
514
00:26:19,840 --> 00:26:23,200
to the audio file or,
and give you an explanation?
515
00:26:23,200 --> 00:26:27,800
Because at this point in time,
we're still in the point where
516
00:26:28,400 --> 00:26:31,560
we want to make sure that everybody
who has the chance
517
00:26:32,160 --> 00:26:36,680
to really get an idea of why they file
didn't work should know why.
518
00:26:36,720 --> 00:26:40,440
And one of the other things
we're about to do is we're about to open
519
00:26:40,440 --> 00:26:43,440
a whole new set of social media channels
520
00:26:43,760 --> 00:26:46,880
specifically for the content creator
market, because we can't
521
00:26:47,360 --> 00:26:52,520
we can't really run tutorials on voice
region on, you know, waves, YouTube.
522
00:26:52,520 --> 00:26:54,400
And then somebody will be scrolling
and they'll go from all
523
00:26:54,400 --> 00:26:55,320
that was really useful.
524
00:26:55,320 --> 00:26:59,080
And then I'll the next one is Andrew Ships
talking about high pass and low
525
00:26:59,080 --> 00:27:00,200
pass and we lose them.
526
00:27:01,160 --> 00:27:02,000
Yeah, exactly.
527
00:27:02,000 --> 00:27:03,880
You know,
we're opening a set of new channels.
528
00:27:03,880 --> 00:27:08,880
We just employed a whole new marketing
department to deal specifically with this,
529
00:27:08,880 --> 00:27:14,160
so that we can actually give content
creators the attention they deserve
530
00:27:14,680 --> 00:27:18,280
without affecting anything else in waves,
531
00:27:18,280 --> 00:27:23,320
or lessening the attention
that we give to our heritage audience.
532
00:27:23,320 --> 00:27:24,400
Did you say discord?
533
00:27:24,400 --> 00:27:27,960
Was that the the discord used
a bit crunch everything to?
534
00:27:27,960 --> 00:27:31,760
Yeah, I was going to say because
it used to do this with also discord
535
00:27:31,760 --> 00:27:35,840
when I put it in the system
that would try to clean up the audio,
536
00:27:36,240 --> 00:27:36,960
it would just like
537
00:27:36,960 --> 00:27:40,560
you're talking about a secret, oh,
discord, discord or sorry descript. Yes.
538
00:27:40,640 --> 00:27:46,400
Oh, descript discord studio
toggle called Sound Studio sounds.
539
00:27:46,440 --> 00:27:48,480
So do you remember
the one that you sent us, George,
540
00:27:48,480 --> 00:27:49,920
whether it was a real estate thing
541
00:27:49,920 --> 00:27:52,320
where it all of sudden went
and started doing gibberish.
542
00:27:52,320 --> 00:27:54,120
So it sounded like I.
543
00:27:54,120 --> 00:27:56,880
What I would like to do is get that
that plug in,
544
00:27:56,880 --> 00:28:00,040
and then I want to get the chef
from Sesame Street and run.
545
00:28:00,040 --> 00:28:01,440
That's really just
546
00:28:02,440 --> 00:28:04,760
a new legacy.
547
00:28:04,760 --> 00:28:08,640
So from what I mean,
obviously I haven't managed to get anybody
548
00:28:08,640 --> 00:28:13,560
from descript.com to talk to me about this
for obvious reasons, probably.
549
00:28:13,560 --> 00:28:17,840
But I have I have actually genuinely
I mean, you guys know me.
550
00:28:17,840 --> 00:28:19,760
I'm pretty genuine an open book.
551
00:28:19,760 --> 00:28:20,640
I have reached out
552
00:28:20,640 --> 00:28:23,960
and said I would love to talk to you
about helping you improve that,
553
00:28:24,400 --> 00:28:27,800
because from my perspective,
what I want to do is I want to make sure
554
00:28:27,800 --> 00:28:30,800
that the content creators
who are using anything
555
00:28:30,920 --> 00:28:33,920
get the best and feel that's really good.
556
00:28:35,400 --> 00:28:36,960
What I feel like
557
00:28:36,960 --> 00:28:40,040
the audio process is doing is,
558
00:28:40,800 --> 00:28:43,200
number one,
I feel like it's giving the users a bit
559
00:28:43,200 --> 00:28:46,880
too much control without explaining
what the toggle is doing,
560
00:28:47,560 --> 00:28:50,880
but when it's on full,
which I think most people just put it on
561
00:28:50,880 --> 00:28:55,240
to full without really,
I think it bit crunch is to a point
562
00:28:55,240 --> 00:28:58,440
where, as you said,
some words just disappear.
563
00:28:59,240 --> 00:29:02,760
Now, one of the interesting things
about this, when I was investigating
564
00:29:02,760 --> 00:29:06,200
this kind of problem was
I found out around the same time
565
00:29:06,480 --> 00:29:09,360
that I created video.
566
00:29:09,360 --> 00:29:12,360
I generated video with dialog.
567
00:29:12,600 --> 00:29:15,720
The dialog with most video models
568
00:29:15,720 --> 00:29:19,520
comes out in the codec as 96 K,
569
00:29:20,120 --> 00:29:24,000
but is a bit crunched to Helen back.
570
00:29:24,960 --> 00:29:28,200
If you drag that video onto voice region,
571
00:29:28,400 --> 00:29:32,400
it makes it sound like they're
talking into a boom mike or a lapel.
572
00:29:33,080 --> 00:29:35,440
So suddenly the AI video
573
00:29:35,440 --> 00:29:38,440
that was given away by the audio,
574
00:29:38,520 --> 00:29:41,880
irritatingly to some people,
really impressive to others,
575
00:29:42,040 --> 00:29:45,240
now looks really realistic
because the audio sounds realistic.
576
00:29:45,240 --> 00:29:50,400
I think all AI videos should always
put Waldo somewhere. Yes.
577
00:29:53,880 --> 00:29:55,240
You mean the Waldo watermark?
578
00:29:55,240 --> 00:29:58,320
Yeah, that that's how you know it's
AI because Waldo
579
00:29:58,680 --> 00:30:03,040
is somewhere in the video
semi is saying is you brought up Waldo.
580
00:30:03,360 --> 00:30:06,440
I'll share something which I share
with the team every week at Waves.
581
00:30:06,800 --> 00:30:09,960
Once a week
when we have our, engineering meeting,
582
00:30:10,440 --> 00:30:14,960
I create a new Where's Waldo with
but with me in it.
583
00:30:15,400 --> 00:30:18,400
Using Google's now open on a pro,
584
00:30:18,840 --> 00:30:24,000
and I share it in the teams chat
every week and the product managers like,
585
00:30:24,000 --> 00:30:27,640
okay, where's MPX, where's MPX,
where's it put little.
586
00:30:30,160 --> 00:30:33,520
It's funny, I wear pajamas that, stripy.
587
00:30:34,000 --> 00:30:37,120
I wore I, we were bananas in pajamas.
588
00:30:37,520 --> 00:30:39,960
Did this bananas did that.
Does it make here?
589
00:30:39,960 --> 00:30:41,360
We were in a meeting.
590
00:30:41,360 --> 00:30:43,080
We were in a in the today.
591
00:30:43,080 --> 00:30:45,240
And I was wearing a collared shirt and,
592
00:30:45,240 --> 00:30:48,520
and, and I had three people say,
you look like you're a prisoner.
593
00:30:49,200 --> 00:30:50,680
And it's like I might.
594
00:30:52,080 --> 00:30:54,960
So because we all are prisoners.
595
00:30:54,960 --> 00:30:58,640
Well, for me, for me as a user of waves,
the hardest thing for me to wrap
596
00:30:58,640 --> 00:31:01,640
my head around when I hadn't looked at it
yet was just understanding that,
597
00:31:01,640 --> 00:31:04,600
well, that's, you know,
so that's like number one, it's like
598
00:31:04,600 --> 00:31:07,760
you have to get out
of the ecosystem of Waves Central
599
00:31:08,200 --> 00:31:13,080
when your brain clicks and goes,
oh yeah, first it's the web app.
600
00:31:13,080 --> 00:31:19,200
Okay, I used video I did on voice
region was to the large waves market
601
00:31:19,600 --> 00:31:23,760
saying and it literally is all about,
hey, we've just released this thing.
602
00:31:24,000 --> 00:31:26,880
Just so you know, it's
not a plug in, don't it?
603
00:31:26,880 --> 00:31:28,920
That's kind of the idea of the video.
604
00:31:28,920 --> 00:31:33,240
It's it's not, you know,
we're not we're not degrading or saying,
605
00:31:33,240 --> 00:31:35,280
you know,
I uses couldn't understand stuff,
606
00:31:35,280 --> 00:31:38,400
but we wanted to make it clear
that for the first time ever, we're
607
00:31:38,400 --> 00:31:41,400
releasing an audio process
that isn't a plug in.
608
00:31:41,880 --> 00:31:45,360
You know, it's like,
I wouldn't have wanted to release this
609
00:31:45,360 --> 00:31:48,880
without letting the bigger
audience know first were about to.
610
00:31:49,000 --> 00:31:51,840
Is it also mobile friendly?
I didn't forget, oh, absolutely.
611
00:31:51,840 --> 00:31:52,400
To the point.
612
00:31:52,400 --> 00:31:57,920
The in the next push in the next few days,
we're adding a little banner that says,
613
00:31:57,920 --> 00:32:01,320
hey, do you want to kind of shortcut this
so it looks like an app on your phone?
614
00:32:01,640 --> 00:32:04,000
Teleprompter works on the phone.
615
00:32:04,000 --> 00:32:08,040
You can audio record,
to a script on your phone.
616
00:32:08,040 --> 00:32:09,240
You can drag a video
617
00:32:09,240 --> 00:32:12,240
or an audio file from your phone
straight in and press process.
618
00:32:12,680 --> 00:32:15,880
I'm not letting people on mobile
phone record video
619
00:32:16,040 --> 00:32:17,160
when we release that,
620
00:32:17,160 --> 00:32:20,760
because a mobile phone is perfectly good
at recording video by itself
621
00:32:21,280 --> 00:32:22,080
and you're holding it.
622
00:32:22,080 --> 00:32:24,720
Just yeah,
record the video with your standard app.
623
00:32:24,720 --> 00:32:27,320
Upload that. That's easy.
So here's a question.
624
00:32:27,320 --> 00:32:32,760
Then this just occurred to me
if I were a voiceover artist
625
00:32:33,920 --> 00:32:36,920
and I was putting together my travel rig
626
00:32:37,800 --> 00:32:41,760
and I had a interface like,
627
00:32:42,880 --> 00:32:46,640
you know, maybe something that some
certain podcast may have put together.
628
00:32:46,840 --> 00:32:48,280
Yeah, sold for a while.
629
00:32:48,280 --> 00:32:51,880
But I needed something to record onto.
630
00:32:52,560 --> 00:32:54,240
Could I use it just as a recorder?
631
00:32:54,240 --> 00:32:56,720
Like, I guess the question is,
do I have to process?
632
00:32:56,720 --> 00:32:58,560
Can I record on to there
and then just down?
633
00:32:58,560 --> 00:32:59,480
You don't have to process.
634
00:32:59,480 --> 00:33:01,320
You can just do it that way
if you want to. Yeah.
635
00:33:01,320 --> 00:33:05,360
So so so does it burn your hours
if you use that that teleprompter
636
00:33:05,360 --> 00:33:09,240
and the recorder
or is that all without using the hours
637
00:33:09,240 --> 00:33:11,480
that the, you know, the,
the only thing the,
638
00:33:11,480 --> 00:33:15,880
the only thing that burns you in
minutes is the processing processing.
639
00:33:15,880 --> 00:33:17,480
So that's why I recorded a call.
640
00:33:17,480 --> 00:33:20,040
It's a teleprompter and a recorder
right there. Yeah.
641
00:33:20,040 --> 00:33:22,640
And that's honestly
and it's actually intentional.
642
00:33:22,640 --> 00:33:25,880
And one of the reasons why I did
that is because
643
00:33:26,400 --> 00:33:29,160
of my heritage and my background from,
644
00:33:29,160 --> 00:33:32,440
you know, not just 20 years in music
production in the States, but,
645
00:33:32,640 --> 00:33:35,640
you know, 15
before that radio production in Australia.
646
00:33:36,200 --> 00:33:38,080
One of the things
that when you've got headphones
647
00:33:38,080 --> 00:33:40,480
on, you've recorded a script
and you listen back, it's
648
00:33:40,480 --> 00:33:44,960
when you listen back that you realize,
yep, sounds good, but I'm in a hotel room.
649
00:33:44,960 --> 00:33:47,320
You know what? I'm
just going to click process in the room.
650
00:33:47,320 --> 00:33:48,880
That's exactly right there.
651
00:33:48,880 --> 00:33:51,120
I love that
you integrated the script reader.
652
00:33:51,120 --> 00:33:53,760
That was first. I was like, oh, I had to.
653
00:33:53,760 --> 00:33:57,720
I mean, it's like, I mean,
I've got two users who are on pro accounts
654
00:33:57,720 --> 00:34:01,400
because they
they are professional audiobook readers
655
00:34:01,840 --> 00:34:06,280
and they love me to death
because I got them to beta test this.
656
00:34:06,280 --> 00:34:10,760
And they were like, wait,
so I can have literally chapters in this?
657
00:34:11,080 --> 00:34:14,120
I just have it
right there in front of me,
658
00:34:14,120 --> 00:34:17,120
scrolling really short
and just do it at my own speed.
659
00:34:17,280 --> 00:34:19,920
It's like, yeah, and just pause
if you want to.
660
00:34:19,920 --> 00:34:22,560
This is going to be controversial,
but I guarantee you could
661
00:34:22,560 --> 00:34:25,800
literally just record an entire audiobook
on your phone full stop.
662
00:34:25,800 --> 00:34:29,880
If you if you give this really good audio,
like really good audio,
663
00:34:29,880 --> 00:34:32,200
will it still touch it
or will it go, that's good.
664
00:34:32,200 --> 00:34:33,160
Interesting question.
665
00:34:33,160 --> 00:34:38,600
I'm so glad you asked it now because of
the fact that this is a very large model
666
00:34:38,960 --> 00:34:42,600
that's trained on high quality
files, it's trained on
667
00:34:43,000 --> 00:34:47,160
not the finest quality,
it's trained on quality files.
668
00:34:47,520 --> 00:34:53,120
So when somebody from a studio puts
in a file that's recorded in, you know, a
669
00:34:53,120 --> 00:34:57,480
Neumann and, in a soundproofed
670
00:34:57,480 --> 00:35:00,680
room, acute compressor preamp.
671
00:35:00,680 --> 00:35:04,080
And exactly if you put that into it,
it will degrade the file.
672
00:35:04,320 --> 00:35:06,400
Interesting enough. Yeah. Okay.
673
00:35:06,400 --> 00:35:10,080
That makes sense, because what it does
is it's still trying to go, okay,
674
00:35:10,080 --> 00:35:13,800
I have a quality,
but I'm trying to get this file to this.
675
00:35:13,800 --> 00:35:16,640
I don't know what to do with this,
so it's going to degrade it for you.
676
00:35:16,640 --> 00:35:19,280
But only if you hit process. Right.
677
00:35:19,280 --> 00:35:19,960
You can still read.
678
00:35:19,960 --> 00:35:22,600
Of course. Yes, I understand, yeah.
679
00:35:22,600 --> 00:35:25,920
At some point you decide
this model is baked.
680
00:35:26,120 --> 00:35:28,800
Let's build a new model and start
all over. We're on.
681
00:35:28,800 --> 00:35:30,800
We're on the third model. Yeah.
682
00:35:30,800 --> 00:35:33,600
Because as I said earlier
683
00:35:33,600 --> 00:35:38,120
on, we're also about to, enable
this is an API
684
00:35:38,120 --> 00:35:41,400
that people can use as a feature
to put into their own website.
685
00:35:41,400 --> 00:35:43,360
And charge four or whatever.
686
00:35:43,360 --> 00:35:47,120
So what this will end up being is,
687
00:35:47,520 --> 00:35:50,920
they'll be the initial model
688
00:35:50,920 --> 00:35:54,040
that will end up being kind of
like one of the most affordable ones.
689
00:35:54,040 --> 00:35:57,360
And then as we build
690
00:35:57,360 --> 00:36:01,240
new models
that will be more dynamic or brighter
691
00:36:01,240 --> 00:36:05,920
or this then will we'll decide
how we want to move forward
692
00:36:05,920 --> 00:36:09,560
with, the, the consumer app,
the one that we just released.
693
00:36:09,600 --> 00:36:13,400
You get old model, it says this is a voice
that sounds like it was recorded
694
00:36:13,400 --> 00:36:14,920
with the UAD seven.
695
00:36:14,920 --> 00:36:16,120
Yeah. I don't want to do that. Yeah.
696
00:36:16,120 --> 00:36:20,160
There's a couple of reasons
I don't want to go that exact number
697
00:36:20,160 --> 00:36:22,080
one, it's a huge claim.
698
00:36:22,080 --> 00:36:26,320
Yeah, but number two,
one of the things that happens in
699
00:36:26,520 --> 00:36:30,560
if you're creating a tool
that content creators use is
700
00:36:30,800 --> 00:36:35,680
you run a risk of giving people who
701
00:36:36,840 --> 00:36:39,720
are focusing on reporting their content
and creating content
702
00:36:39,720 --> 00:36:42,480
and don't want to be an audio engineer
or a video editor,
703
00:36:42,480 --> 00:36:43,960
you're asking them to make
704
00:36:43,960 --> 00:36:47,960
too many decisions without the experience
knowledge to back it up.
705
00:36:48,320 --> 00:36:51,320
And that becomes what I call
feature creep.
706
00:36:51,320 --> 00:36:52,120
Yeah.
707
00:36:52,120 --> 00:36:56,320
You're asking them to make a decision
on control, on something
708
00:36:56,720 --> 00:37:00,320
that, they may not be able to hear.
709
00:37:00,320 --> 00:37:05,160
Well, you know, even that is like
but understanding what you're hearing,
710
00:37:05,160 --> 00:37:09,840
whereas my goal is still I don't want
to give you a hammer and a nail.
711
00:37:09,840 --> 00:37:12,240
I want to give you a nail
that's in a wall.
712
00:37:12,240 --> 00:37:16,160
I would imagine 99% of content
creators would ask, what is a UAD set?
713
00:37:16,400 --> 00:37:17,360
Yeah, exactly.
714
00:37:17,360 --> 00:37:17,560
Yeah.
715
00:37:17,560 --> 00:37:20,880
If you take a look at what
a lot of there's been a lot of,
716
00:37:22,040 --> 00:37:25,040
experience over the
years with different pro audio companies
717
00:37:25,480 --> 00:37:28,480
entering the content creator market
in one way or another.
718
00:37:28,560 --> 00:37:30,000
And I've studied most of them.
719
00:37:30,000 --> 00:37:33,560
And I'm not saying I or
we are perfect in any way, but
720
00:37:33,560 --> 00:37:37,360
one of the things I've learned from others
mistakes is, number one,
721
00:37:37,560 --> 00:37:41,360
never use pro audio nomenclature
when you're talking to a market
722
00:37:41,360 --> 00:37:44,880
that could be a wedding event planner
doing a vlog.
723
00:37:46,480 --> 00:37:48,920
At what point in her life or his life?
724
00:37:48,920 --> 00:37:52,760
If they come across a high
pass, low pass filter
725
00:37:52,760 --> 00:37:56,480
and the a two and 2 to 1 compression
ratio,
726
00:37:56,960 --> 00:38:00,200
a it's, you know,
727
00:38:00,360 --> 00:38:04,680
even the term compression or EQ, you just
I want to stay away from them.
728
00:38:05,200 --> 00:38:06,080
Well, even in clarity,
729
00:38:06,080 --> 00:38:10,640
if you have I think I haven't checked
in the last two months, but three models.
730
00:38:10,640 --> 00:38:11,760
Right.
731
00:38:11,760 --> 00:38:12,480
That's a user. Yeah.
732
00:38:12,480 --> 00:38:16,160
And there will be more,
you know, the clarity
733
00:38:16,440 --> 00:38:19,200
development team, and it's a team.
734
00:38:19,200 --> 00:38:23,600
The they're constantly at work
improving, tweaking.
735
00:38:24,360 --> 00:38:27,680
And obviously there has been some,
you know, questions as well
736
00:38:27,680 --> 00:38:28,560
out in the marketplace.
737
00:38:28,560 --> 00:38:30,840
It's like,
well, you made clarity a plug in.
738
00:38:32,040 --> 00:38:34,520
Is region just a website of clarity?
739
00:38:34,520 --> 00:38:36,440
No. Completely different models,
740
00:38:36,440 --> 00:38:38,920
completely different models,
completely different processes.
741
00:38:38,920 --> 00:38:41,920
Clarity can be a plug in region
742
00:38:42,000 --> 00:38:46,120
will never be a plug in
because it will melt a different products.
743
00:38:46,120 --> 00:38:51,680
I mean clarity 100 is a is is not there
to change the character of the sound.
744
00:38:51,680 --> 00:38:54,840
It's there to isolate the voice
in the character that's given to it.
745
00:38:54,840 --> 00:38:55,080
Right?
746
00:38:55,080 --> 00:38:58,080
Whereas we literally regenerate a voice.
747
00:38:58,440 --> 00:39:01,840
You change the character, you're like,
re wouldn't say
748
00:39:02,040 --> 00:39:05,640
we're not really changing the character,
which was the
749
00:39:05,800 --> 00:39:11,200
the Sonicare, like like this, this, this
was originally recorded on, whatever.
750
00:39:11,200 --> 00:39:14,200
Like a, like a carbon
microphone wire recording.
751
00:39:14,320 --> 00:39:14,640
Yeah.
752
00:39:14,640 --> 00:39:17,280
And now it sounds like it's
coming out of a SM seven.
753
00:39:17,280 --> 00:39:18,360
Yeah. Yeah.
754
00:39:18,360 --> 00:39:20,600
You know,
like the character of the recording.
755
00:39:20,600 --> 00:39:22,280
Not not the person.
756
00:39:22,280 --> 00:39:24,560
Would rebuilding it
be a better way to say it?
757
00:39:24,560 --> 00:39:29,680
We're renovating, renovate,
kind of regenerating in a lot of case,
758
00:39:29,680 --> 00:39:33,480
because it's like we're
we're kind of looking at the in a way,
759
00:39:33,480 --> 00:39:38,800
I guess the DNA of vocal tone, and going,
okay, what?
760
00:39:39,360 --> 00:39:41,640
Let's put some work in and improve this.
761
00:39:41,640 --> 00:39:45,680
So in the end, you know, how well
you know, how you go to various websites.
762
00:39:45,680 --> 00:39:46,440
We won't name them.
763
00:39:46,440 --> 00:39:50,320
And then you've put your voice in there
and there's the whole idea of a one shot.
764
00:39:50,400 --> 00:39:51,840
Oh, the voice model.
765
00:39:51,840 --> 00:39:54,160
Yeah. No we don't do that. Cloning. Yeah.
766
00:39:54,160 --> 00:39:55,640
So this is not a clone.
767
00:39:55,640 --> 00:39:58,640
Like, just to be clear, it's not learning.
768
00:39:58,640 --> 00:40:02,400
This person's voice
the that someone uploads to it,
769
00:40:02,520 --> 00:40:04,800
and and those people
have nothing to worry about.
770
00:40:04,800 --> 00:40:07,160
Like, oh, now
you've uploaded my voice to some website.
771
00:40:07,160 --> 00:40:08,800
I don't want my voice uploaded to.
772
00:40:08,800 --> 00:40:10,680
No, we don't we don't retain.
773
00:40:10,680 --> 00:40:14,520
It's like we it goes through our process.
774
00:40:14,520 --> 00:40:15,920
It goes out the other end.
775
00:40:15,920 --> 00:40:18,200
It doesn't even stay in our cloud.
776
00:40:18,200 --> 00:40:20,640
It goes back to your dashboard
777
00:40:20,640 --> 00:40:23,640
and after 14 days,
we just delete them from the system.
778
00:40:24,200 --> 00:40:25,480
You know, I had one
779
00:40:25,480 --> 00:40:29,440
case named a year ago because I know
everything's happening so fast.
780
00:40:29,440 --> 00:40:32,440
So whenever I say I tried this thing,
781
00:40:32,840 --> 00:40:36,320
you sort of have to say when you know,
because, like, everything's changing.
782
00:40:36,920 --> 00:40:39,840
But the client had sent
me some audio from, like,
783
00:40:40,800 --> 00:40:41,360
Catholic
784
00:40:41,360 --> 00:40:45,920
priest or something, and it was all this
archival audio and it was recorded
785
00:40:45,920 --> 00:40:50,040
very on a bad technology,
probably like a 12
786
00:40:50,040 --> 00:40:54,280
bit 32kHz or worse recorder
787
00:40:54,280 --> 00:40:58,360
that was built in to the,
I don't know, something pretty bad.
788
00:40:59,240 --> 00:41:02,680
And they sent it to me
and the audio was pretty bad.
789
00:41:02,680 --> 00:41:06,960
And then I threw it into one of these
AI tools that regenerate, essentially.
790
00:41:07,480 --> 00:41:10,800
And when she listened to it back, she was
she was kind of
791
00:41:11,160 --> 00:41:17,080
I think she didn't know what to expect
because it was a reconstruction.
792
00:41:17,080 --> 00:41:17,560
Right.
793
00:41:17,560 --> 00:41:21,520
And so she thought
it didn't sound authentic.
794
00:41:22,120 --> 00:41:25,880
So then I was like, well,
I don't know what to tell you.
795
00:41:25,880 --> 00:41:28,880
This is probably what it did sound like,
796
00:41:29,520 --> 00:41:34,440
but you've heard it for the last 30 years
sounding like it comes through a tin can.
797
00:41:35,240 --> 00:41:38,640
You know, out of us
it's the it's exactly the same thing
798
00:41:38,640 --> 00:41:42,560
that happens with,
799
00:41:42,560 --> 00:41:46,440
I, I was talking to the guy, producer,
composer guy
800
00:41:46,440 --> 00:41:50,640
called Greg Wells, a few months ago,
and we were talking about,
801
00:41:51,720 --> 00:41:54,720
a soundtrack that he's just composed.
802
00:41:55,560 --> 00:41:59,000
And one of the biggest problems
with composing a soundtrack
803
00:41:59,280 --> 00:42:02,320
and making the film director
and the company happy with
804
00:42:02,320 --> 00:42:05,720
it is when they're actually
doing the edit of the film.
805
00:42:05,720 --> 00:42:10,160
They use a temp soundtrack,
which is usually an existing soundtrack
806
00:42:10,160 --> 00:42:14,000
from something else that kind of fits,
but they spend so many months
807
00:42:14,000 --> 00:42:16,720
on the cutting room floor
with the temp soundtrack.
808
00:42:16,720 --> 00:42:17,200
No idea.
809
00:42:17,200 --> 00:42:20,320
So when the new soundtrack
that's being composed specifically
810
00:42:20,320 --> 00:42:23,320
for it comes up, a lot of the time
they're like, oh, that sucks.
811
00:42:23,760 --> 00:42:28,800
So yeah, I think the funniest example
of this is, is, some clients of mine
812
00:42:28,800 --> 00:42:33,560
were talking about a session and basically
the scratch track was done with an AI,
813
00:42:34,080 --> 00:42:37,080
and then they were
recording the real voice,
814
00:42:37,520 --> 00:42:41,160
and they began asking the voice talent
to read it more like the scratch track,
815
00:42:41,440 --> 00:42:44,440
which was basically saying,
please read it more like I
816
00:42:44,520 --> 00:42:47,520
yeah, it's like, yeah, like
817
00:42:47,520 --> 00:42:51,520
how to lose your soul in five seconds
and yeah,
818
00:42:51,560 --> 00:42:54,720
yeah,
I had that exact thing happen to me with a
819
00:42:54,720 --> 00:42:56,760
we talked about the session before,
820
00:42:56,760 --> 00:43:00,960
whether it was a female reading to this
short film that they wanted me to voice.
821
00:43:00,960 --> 00:43:03,840
And I was kind of thinking,
that sounds okay. Anyway,
822
00:43:04,880 --> 00:43:06,600
but they said, oh, no, no,
823
00:43:06,600 --> 00:43:09,560
the check out the three words
are not pronounced correctly.
824
00:43:09,560 --> 00:43:13,640
And I'm thinking, well, why don't
you just get a back to redo those lines?
825
00:43:13,800 --> 00:43:16,800
But it wasn't until later
I realized it was an AI,
826
00:43:17,120 --> 00:43:20,880
but they they were trying to get me like
the inflection on the on the guide track.
827
00:43:20,880 --> 00:43:21,720
Can you try and do that?
828
00:43:21,720 --> 00:43:25,080
And it wasn't until the next day,
I think, and I mentioned this before,
829
00:43:25,080 --> 00:43:29,040
another episode that I was playing it
to, to my son, who's 17.
830
00:43:29,040 --> 00:43:30,760
I said, oh, you know,
this is what happened, boy,
831
00:43:30,760 --> 00:43:34,200
because that's I, I'm like, yeah,
I can hear it a mile away.
832
00:43:34,200 --> 00:43:36,200
Usually two they can hear it a mile away.
833
00:43:36,200 --> 00:43:41,360
The can I, I'm, I'm 56, but I can hear it
a mile away because I've spent so many,
834
00:43:41,520 --> 00:43:46,080
it spent the last few years
literally deep in AI research.
835
00:43:46,680 --> 00:43:51,200
And one of the things that's really hard,
even if you're using something is good,
836
00:43:51,200 --> 00:43:52,920
is 11 laps.
837
00:43:52,920 --> 00:43:59,040
You have to make sure that words,
especially in English words
838
00:43:59,040 --> 00:44:02,040
that, the same.
839
00:44:02,440 --> 00:44:06,920
So so for example,
one, or one, they're so windy.
840
00:44:08,120 --> 00:44:10,040
Oh. And
841
00:44:10,040 --> 00:44:13,080
you have to work
out a way of spelling them phonetically
842
00:44:13,080 --> 00:44:16,040
so that they, separated.
843
00:44:16,040 --> 00:44:19,200
And there was so many other words
in the English language that I just
844
00:44:19,200 --> 00:44:23,640
does not get its perceptive head around
because it doesn't have a perception
845
00:44:23,920 --> 00:44:27,920
version of the context in between
this word, this word and this word,
846
00:44:28,320 --> 00:44:32,040
when it's spelt one way
but it's pronounced another.
847
00:44:32,400 --> 00:44:36,040
So I remember doing this
with a phone in chips in college.
848
00:44:36,040 --> 00:44:39,200
And just to get this thing to say
electronics,
849
00:44:39,680 --> 00:44:43,640
you had to put like ten ese in a row
because it had to do like electronics.
850
00:44:43,640 --> 00:44:47,400
And if you wanted to say electronics,
you had to put like IEEE.
851
00:44:47,400 --> 00:44:50,400
L like electronics. Yeah.
852
00:44:50,520 --> 00:44:52,240
Because it just yeah.
853
00:44:52,240 --> 00:44:54,840
If to like force it
to get out what you want
854
00:44:54,840 --> 00:44:58,240
because it's not always changing
but it's also a dialect.
855
00:44:58,240 --> 00:45:02,160
I mean, you know, I,
I've just heard a couple of things like,
856
00:45:02,160 --> 00:45:07,520
I heard Gomez say data instead of data,
but it spelt the same.
857
00:45:07,520 --> 00:45:10,280
But it's pronounced differently,
like dance.
858
00:45:10,280 --> 00:45:11,120
Dance, you know.
859
00:45:11,120 --> 00:45:14,120
Yeah. 2020 is an American man.
860
00:45:14,280 --> 00:45:16,680
Yeah, yeah yeah, yeah. Tomato, tomato.
861
00:45:16,680 --> 00:45:18,400
Yeah, exactly.
862
00:45:18,400 --> 00:45:21,040
It's like when you get, someone to say it
863
00:45:21,040 --> 00:45:22,720
like English is not their first language.
864
00:45:22,720 --> 00:45:24,880
And then they're selling
their writing on the script.
865
00:45:24,880 --> 00:45:28,600
Their phonetic version
of how you pronounce that word.
866
00:45:29,120 --> 00:45:31,680
But they forget that the fact is that
867
00:45:31,680 --> 00:45:33,560
their phonetics are going
to be different to yours
868
00:45:33,560 --> 00:45:35,320
because they say certain things
differently.
869
00:45:35,320 --> 00:45:36,200
The way you would say it.
870
00:45:36,200 --> 00:45:40,960
So when you read their phonetic version
of that word, it's completely wrong.
871
00:45:40,960 --> 00:45:42,960
And they come back and go, no, no, no,
that's not what I want, I want it,
872
00:45:42,960 --> 00:45:43,840
blah, blah, blah.
873
00:45:43,840 --> 00:45:45,080
It's like, well, that's what you just did.
874
00:45:45,080 --> 00:45:47,280
But, you know, because they will
pronounce it differently.
875
00:45:47,280 --> 00:45:50,560
So now this brings up something
which I think is interesting.
876
00:45:50,560 --> 00:45:55,440
I find it fascinating in
when when building the model for region
877
00:45:55,920 --> 00:45:56,760
was happening.
878
00:45:56,760 --> 00:46:01,360
The first model is a term that we realized
879
00:46:01,360 --> 00:46:06,000
early on that
we had to rectify, which is accent bias.
880
00:46:06,960 --> 00:46:07,880
If a model
881
00:46:07,880 --> 00:46:12,200
is educated
on too many of a specific accents,
882
00:46:12,200 --> 00:46:15,360
then it doesn't understand how to deal
883
00:46:15,360 --> 00:46:18,640
with other pronunciations or accents.
884
00:46:18,640 --> 00:46:21,640
So initially, early on,
885
00:46:22,040 --> 00:46:26,440
an Australian accent was an anomaly,
and some of the first tests we did
886
00:46:26,800 --> 00:46:30,080
was frustrating
because the Americans would sound amazing,
887
00:46:30,320 --> 00:46:33,000
the Brits would sound amazing,
but any time I tried it,
888
00:46:33,000 --> 00:46:36,000
it would sound glitchy as hell
and we worked out.
889
00:46:36,400 --> 00:46:40,200
It just doesn't understand
this half drunk way we talk.
890
00:46:40,560 --> 00:46:43,920
Hey, careful when
891
00:46:45,200 --> 00:46:46,240
we're not half drunk.
892
00:46:46,240 --> 00:46:49,480
It's like we tend to kind
of take shortcuts with the way we speak.
893
00:46:50,040 --> 00:46:55,520
So we had to kind of make sure that we
spent a lot of time avoiding accent bias.
894
00:46:55,520 --> 00:46:59,840
So this is only for English
at this English language.
895
00:46:59,840 --> 00:47:04,360
At this point no works with Hebrew, works
with French, German, Spanish, Japanese.
896
00:47:05,240 --> 00:47:06,800
But it is language specific.
897
00:47:06,800 --> 00:47:07,440
Basically.
898
00:47:07,440 --> 00:47:10,600
No, it's human voice specific.
899
00:47:10,600 --> 00:47:12,240
Yes. It's to understand it.
900
00:47:12,240 --> 00:47:15,240
So so if like does it work with pig Latin?
901
00:47:15,920 --> 00:47:16,640
I have no idea.
902
00:47:16,640 --> 00:47:20,480
I, I it's not high on my list
of priorities, to be honest.
903
00:47:21,440 --> 00:47:22,760
Not trained on pig Latin.
904
00:47:22,760 --> 00:47:25,040
Yet Robert Young recorded this thing.
905
00:47:25,040 --> 00:47:29,040
Like, if it's like some unknown language
that it's never, ever heard before,
906
00:47:29,400 --> 00:47:32,920
here's here's the key,
here's the key with voice region.
907
00:47:33,360 --> 00:47:35,400
If you're putting a file into voice
region,
908
00:47:35,400 --> 00:47:40,640
any in the language that you speak,
you can understand the words
909
00:47:40,640 --> 00:47:43,200
that are coming
out of the speaker's mouth.
910
00:47:43,200 --> 00:47:46,440
Regardless of the noise level,
we have a chance
911
00:47:46,440 --> 00:47:48,720
of saving that file for you.
912
00:47:48,720 --> 00:47:53,880
If there are words in that file
that you cannot comprehend yourself,
913
00:47:53,880 --> 00:47:57,400
then our chance goes down by about 80%.
914
00:47:57,640 --> 00:47:58,280
Okay, yeah.
915
00:47:58,280 --> 00:48:01,200
Is this is this forensics proof?
916
00:48:01,200 --> 00:48:05,400
Like, like, could someone go to court
and be like, here's this awful recording.
917
00:48:05,400 --> 00:48:07,920
But look, he said I murdered her.
918
00:48:07,920 --> 00:48:11,160
And then voice regions like, perfect,
like their cat.
919
00:48:11,160 --> 00:48:12,480
You hear that? 100%.
920
00:48:12,480 --> 00:48:14,400
And it's like this. Yeah, absolutely.
921
00:48:14,400 --> 00:48:16,080
Yeah. We're not making up. We're not.
922
00:48:16,080 --> 00:48:19,080
There's no point in the system
where we make stuff up.
923
00:48:19,280 --> 00:48:21,680
We regenerate what the system understands.
924
00:48:21,680 --> 00:48:25,480
So a word that is really garbled
because there was a technical,
925
00:48:26,720 --> 00:48:27,280
a word that
926
00:48:27,280 --> 00:48:30,280
is really garbled is still going
to stay really garbled.
927
00:48:30,440 --> 00:48:31,560
Yeah, yeah.
928
00:48:31,560 --> 00:48:34,840
So one of the early cases
and because, I mean, one of the things
929
00:48:34,840 --> 00:48:38,920
which we did is we spent a lot of time
throwing files at it
930
00:48:38,920 --> 00:48:42,680
that were really, really challenging,
like two people
931
00:48:42,800 --> 00:48:48,000
filmed on an iPhone, but they're 30ft away
on the edge of a cliff with wind.
932
00:48:48,400 --> 00:48:50,680
Once talking, one's further away.
933
00:48:50,680 --> 00:48:53,000
She's yelling back at him with wind.
934
00:48:53,000 --> 00:48:53,880
Yeah, you've got the wind.
935
00:48:53,880 --> 00:48:56,880
And we learned very quickly the
936
00:48:56,960 --> 00:48:59,440
if you're listening to a file
and you're listening to the audio
937
00:48:59,440 --> 00:49:02,040
and you can't understand
what they're saying,
938
00:49:02,040 --> 00:49:05,560
then our chance of getting
there is a lot lower.
939
00:49:05,560 --> 00:49:06,800
It's not every case.
940
00:49:06,800 --> 00:49:09,240
There are some. There we go. Oh,
that's what she said.
941
00:49:09,240 --> 00:49:12,000
But most of the time it's at that point
942
00:49:12,000 --> 00:49:16,800
where the audio dialog frequencies
are so mixed down
943
00:49:16,800 --> 00:49:20,680
into the noise frequencies
that we can't regenerate
944
00:49:20,680 --> 00:49:23,480
because we have nothing there
to work with.
945
00:49:23,480 --> 00:49:27,480
So what it does not do,
it does not interpolate.
946
00:49:27,960 --> 00:49:30,720
No, it's not using context.
947
00:49:30,720 --> 00:49:34,000
It's not, it's not,
it's not, it's not there's no kind of
948
00:49:34,040 --> 00:49:37,040
decoding the language
and re encoding the language.
949
00:49:37,120 --> 00:49:41,200
It's just decoding the voice,
the human voice in there and then
950
00:49:41,200 --> 00:49:45,080
performing the voice in the examples
I have on the landing page on the website.
951
00:49:45,080 --> 00:49:46,920
A very intentional.
952
00:49:46,920 --> 00:49:52,000
They're all bad quality,
but they're also all very if you listen
953
00:49:52,000 --> 00:49:55,000
to them, you can understand the words,
but you're like, oh, that's terrible.
954
00:49:55,120 --> 00:49:56,240
Yeah.
955
00:49:56,240 --> 00:49:58,960
The and the reason for
that is exactly what I just said.
956
00:49:58,960 --> 00:50:01,960
It's like there is no point in the system
where,
957
00:50:02,040 --> 00:50:05,200
we just guess because that's,
958
00:50:05,920 --> 00:50:09,800
you know, we already have enough of that,
which in other AI systems,
959
00:50:09,800 --> 00:50:13,200
it's like what I did was we
we created a model
960
00:50:14,120 --> 00:50:17,520
that can bring back
nearly irretrievable audio.
961
00:50:17,920 --> 00:50:20,960
But it's also not going to happen
in every single case.
962
00:50:20,960 --> 00:50:25,040
There are some cases, obviously,
where people put in audio and go, yeah,
963
00:50:25,040 --> 00:50:25,640
it's, you know,
964
00:50:26,760 --> 00:50:27,320
so is a
965
00:50:27,320 --> 00:50:32,240
future possible pro version or an advanced
version would be to literally regenerate
966
00:50:32,240 --> 00:50:36,280
a word that was totally garbled
so the user could type in the word foxy
967
00:50:36,280 --> 00:50:39,280
and they will regenerate the word
foxy from that.
968
00:50:39,560 --> 00:50:41,400
It's been it it's it's been discussed.
969
00:50:41,400 --> 00:50:45,280
One of the things that comes up with this,
that is a big problem for me,
970
00:50:45,840 --> 00:50:48,360
because as a product manager,
one of the things that you're thinking
971
00:50:48,360 --> 00:50:49,920
isn't is something possible.
972
00:50:49,920 --> 00:50:52,920
You're thinking, is this possible in a way
973
00:50:53,280 --> 00:50:58,200
that the result will be something
that the user will take for
974
00:50:58,200 --> 00:51:01,560
granted, is something that happened
and can move on with their life
975
00:51:01,840 --> 00:51:03,360
without worrying about it?
976
00:51:03,360 --> 00:51:05,840
And my answer at this point in
time is still no,
977
00:51:05,840 --> 00:51:08,360
because what you're doing
is you're saying, okay, right.
978
00:51:08,360 --> 00:51:12,320
So we have to, first of all, clone
a certain amount of the voice
979
00:51:12,320 --> 00:51:15,480
in between, around the word
that's illegible.
980
00:51:15,840 --> 00:51:20,320
We then have to make sure that, okay,
they're typing in a word.
981
00:51:20,320 --> 00:51:21,840
That's what the word was.
982
00:51:21,840 --> 00:51:25,160
We have to get that voice to
then say that we then have to get it
983
00:51:25,160 --> 00:51:29,400
in the right intonation,
the right emotion, the right,
984
00:51:29,400 --> 00:51:34,520
the right quality to fit the rest
of the regenerated audio around it.
985
00:51:34,800 --> 00:51:38,560
And if you're creating one word, it's
actually a lot harder than it sounds
986
00:51:38,560 --> 00:51:39,360
to create one.
987
00:51:39,360 --> 00:51:41,400
That's average quality.
988
00:51:41,400 --> 00:51:46,440
That's, even with the rest of it than it
is to create one that's perfect quality.
989
00:51:46,800 --> 00:51:49,920
Yeah, it's a different product, really.
990
00:51:50,280 --> 00:51:53,280
It's I mean, it's adding
a feature to the product.
991
00:51:53,480 --> 00:51:55,080
I'm going to drop in an old school
solution.
992
00:51:55,080 --> 00:51:57,920
Remember in the day when if there was.
993
00:51:57,920 --> 00:52:01,080
And it can only work
if it's a small like one word problem,
994
00:52:01,560 --> 00:52:04,560
but you would get a male or a female,
995
00:52:05,120 --> 00:52:07,200
not the same voice to do it.
996
00:52:07,200 --> 00:52:08,160
And then you would cut it
997
00:52:08,160 --> 00:52:11,320
in and it's so fast that the mind doesn't
actually perceive it.
998
00:52:11,320 --> 00:52:14,120
Yeah. Who were the people that were best
at doing that?
999
00:52:14,120 --> 00:52:15,720
It was always the musicians
1000
00:52:15,720 --> 00:52:18,680
because the musicians
could follow the pitch of the person.
1001
00:52:18,680 --> 00:52:21,560
Yeah, I done it a few times.
1002
00:52:21,560 --> 00:52:25,000
I'll be honest to save things I can't
remember and I never got paid for it.
1003
00:52:25,000 --> 00:52:27,160
Like like The Voice.
I'm sure Opie could do it.
1004
00:52:27,160 --> 00:52:29,520
And I know Paul Davies and I used to do it
all the time.
1005
00:52:29,520 --> 00:52:30,360
It'd be 105.
1006
00:52:30,360 --> 00:52:32,320
Yeah, yeah, I used to do it.
1007
00:52:32,320 --> 00:52:35,240
Change. Promise.
I've done this, Robert. Where?
1008
00:52:35,240 --> 00:52:38,640
He's like, the audio is like chopped
the end of a word off.
1009
00:52:38,640 --> 00:52:38,960
And I've
1010
00:52:40,200 --> 00:52:41,040
recorded the end of that.
1011
00:52:41,040 --> 00:52:42,120
The rest of it, I mean.
1012
00:52:42,120 --> 00:52:43,320
And you would.
1013
00:52:43,320 --> 00:52:45,840
You can't pick it in the old days
if you actually,
1014
00:52:45,840 --> 00:52:48,000
if you dropped in
and you dropped the end off
1015
00:52:48,000 --> 00:52:50,320
like an ING off
the end of a word or something.
1016
00:52:50,320 --> 00:52:51,000
Yeah.
1017
00:52:51,000 --> 00:52:55,120
My I know my daughter can can match pitch
and I can supernatural way when my
1018
00:52:55,200 --> 00:52:57,200
when my girlfriend
first introduced herself.
1019
00:52:57,200 --> 00:53:02,200
My girlfriend's name is Firoozeh
and my daughter immediately picked it up
1020
00:53:02,200 --> 00:53:06,240
and I said, I said, hey, honey,
do you want to do a drawing for fear?
1021
00:53:06,240 --> 00:53:11,400
Issa and my daughter immediately says,
it's not fear as a dad, it's Firoozeh.
1022
00:53:11,920 --> 00:53:14,800
And she like immediate,
like a tape recorder
1023
00:53:14,800 --> 00:53:18,080
played back, you know, in her brain
the way it's supposed to sound.
1024
00:53:18,080 --> 00:53:19,800
And I was like, Holy crap.
1025
00:53:19,800 --> 00:53:23,640
Yeah, you know, my 12 year
old can playback audio from her brain.
1026
00:53:23,880 --> 00:53:26,040
That's the key.
That's the key to her being a mimic.
1027
00:53:27,360 --> 00:53:28,920
Yeah, you can mimic stuff.
1028
00:53:28,920 --> 00:53:32,160
Like I remember going to,
Because my other half,
1029
00:53:32,160 --> 00:53:35,000
she speaks Italian, and it was like,
you should learn Italian
1030
00:53:35,000 --> 00:53:38,440
so we can when we go there, we can both,
you know, communicate.
1031
00:53:38,440 --> 00:53:39,600
And I was like, oh, all right. Okay.
1032
00:53:39,600 --> 00:53:42,000
So I got, you know,
1033
00:53:42,000 --> 00:53:43,960
convinced I would, go and do this thing.
1034
00:53:43,960 --> 00:53:48,600
So I went to do my first Italian lesson
in Melbourne, and,
1035
00:53:48,600 --> 00:53:52,080
I finish the thing, they go, oh, well,
you know, you were really good.
1036
00:53:52,080 --> 00:53:55,000
You're really good.
And I was just mimicking.
1037
00:53:55,000 --> 00:53:58,280
But when it came to actually building
structure and knowing what the words meant
1038
00:53:58,280 --> 00:54:00,600
and how you put them all together,
I had no fucking idea.
1039
00:54:00,600 --> 00:54:03,840
So I'm just going to ask JP,
you know, to like an AI.
1040
00:54:03,840 --> 00:54:06,960
So to learn to, to learn Italian
in Melbourne, what do you do?
1041
00:54:06,960 --> 00:54:09,400
You just go to a restaurant
on like Elm Street or something.
1042
00:54:09,400 --> 00:54:10,920
And do you know what?
1043
00:54:10,920 --> 00:54:13,840
You know where the Italian school
was like on the street? Yeah,
1044
00:54:14,920 --> 00:54:17,560
mom. I mean, yeah,
1045
00:54:17,560 --> 00:54:19,560
exactly.
1046
00:54:19,560 --> 00:54:21,440
Just order an appetizer and just. Wow.
1047
00:54:21,440 --> 00:54:22,240
It's cool.
1048
00:54:22,240 --> 00:54:23,000
Yeah, yeah, yeah.
1049
00:54:23,000 --> 00:54:26,040
So just go into your lesson,
go faster and stuff after.
1050
00:54:26,040 --> 00:54:26,320
Yeah.
1051
00:54:26,320 --> 00:54:28,200
Well, you know,
I've been getting more and more teaching
1052
00:54:28,200 --> 00:54:31,640
production for podcasters and creators,
so I'll definitely be
1053
00:54:32,160 --> 00:54:36,000
mentioning this as a tool, you know, and,
and letting people know to try it out
1054
00:54:36,320 --> 00:54:40,080
because I have, you know, I'm working
more with corporate, you know, creators.
1055
00:54:40,080 --> 00:54:42,360
That's another whole, you know, category.
1056
00:54:42,360 --> 00:54:44,840
You know, people
that do content for corporate.
1057
00:54:44,840 --> 00:54:48,720
So C-suite, you know, let's let's talk
for a second just quickly about that.
1058
00:54:48,720 --> 00:54:52,480
So for example, somebody who's done
a course that they've recorded
1059
00:54:52,480 --> 00:54:56,240
their screen and the little hover over
camera in the loom or something like that,
1060
00:54:56,880 --> 00:55:01,200
but the camera is a webcam and their
microphone is their MacBook or something.
1061
00:55:01,640 --> 00:55:05,840
It's like literally if you're lucky,
lucky that, would you if you're lucky.
1062
00:55:05,840 --> 00:55:09,840
But just literally drag the course
video onto voice region
1063
00:55:10,000 --> 00:55:12,000
hit process and you're done. Yeah.
1064
00:55:12,000 --> 00:55:13,440
No. Quick question for you.
1065
00:55:13,440 --> 00:55:15,720
Where can people go and check this out.
1066
00:55:15,720 --> 00:55:19,680
Waves.com/voice region.
1067
00:55:20,640 --> 00:55:22,200
Or just type
1068
00:55:22,200 --> 00:55:26,320
in voice region waves into your, Google
1069
00:55:26,320 --> 00:55:30,400
or whatever
you want and go off and find it useful.
1070
00:55:30,600 --> 00:55:34,920
This could be, this could be an integral
part of my, road case.
1071
00:55:34,920 --> 00:55:38,400
Well, I think I was going to suggest that
even if you're not a content creator,
1072
00:55:38,400 --> 00:55:40,680
if you're a voiceover artist going,
nah, never needed.
1073
00:55:40,680 --> 00:55:42,480
Just do yourself a favor.
1074
00:55:42,480 --> 00:55:45,840
Go and just have a play with it
because you'll be blown away.
1075
00:55:45,880 --> 00:55:50,160
And one of the reasons
that I put the teleprompter in there and,
1076
00:55:50,160 --> 00:55:53,280
is because of voiceover
1077
00:55:53,560 --> 00:55:56,960
artists like UAP, and also,
1078
00:55:57,400 --> 00:56:01,160
how often you find yourself in a position
when you're,
1079
00:56:01,560 --> 00:56:06,520
you're not with your rig and you need to
record something and you need it to get it
1080
00:56:06,520 --> 00:56:09,920
back to the station or to the studio,
and have a way to do it.
1081
00:56:10,560 --> 00:56:14,480
Now, in that case, it's not taking
a producer's job away from them.
1082
00:56:14,520 --> 00:56:18,840
What it does is it gives you, Robbo,
less work to do at the other end
1083
00:56:19,200 --> 00:56:20,880
than you would have had to.
1084
00:56:20,880 --> 00:56:26,000
And that's also why I put a a teleprompter
script reader in there with zero scroll.
1085
00:56:26,280 --> 00:56:30,360
Because if you're pasting in a 32nd
script, then you know you don't need it
1086
00:56:30,400 --> 00:56:30,840
scroll.
1087
00:56:30,840 --> 00:56:34,040
You just need it in front of your face
or additions to it
1088
00:56:34,200 --> 00:56:37,960
if you got one, you I added a restaurant,
for example, or something like that.
1089
00:56:37,960 --> 00:56:39,320
You could just jump in the bathroom
1090
00:56:39,320 --> 00:56:43,080
on your phone and record it, stick
it up, do it and send it straight back.
1091
00:56:43,560 --> 00:56:48,160
You could, but I wouldn't because I
don't think it's it gets rid of slurring.
1092
00:56:48,520 --> 00:56:49,360
I'm guessing
1093
00:56:50,560 --> 00:56:51,640
the red wine.
1094
00:56:51,640 --> 00:56:52,280
Yeah. Yeah.
1095
00:56:52,280 --> 00:56:54,760
That's not good.
That's not going to cut it. See.
1096
00:56:54,760 --> 00:56:58,560
You see that's the other thing is like
I worked very hard to try and make sure.
1097
00:56:58,560 --> 00:57:00,520
And I did a lot of the coding for the,
1098
00:57:00,520 --> 00:57:03,400
for the,
for the teleprompter here in Brisbane.
1099
00:57:03,400 --> 00:57:07,680
I worked very hard
to try and make sure that the scroll,
1100
00:57:07,720 --> 00:57:10,720
especially on a mobile
phone, was not jittery
1101
00:57:11,080 --> 00:57:13,960
and it's slow enough
that you can look at it on a,
1102
00:57:13,960 --> 00:57:17,600
you know, normal phone,
which is like a tablet these days
1103
00:57:18,040 --> 00:57:23,680
and be able to read things, without
constantly moving your eyes up and down.
1104
00:57:24,120 --> 00:57:25,080
I'm really proud of this.
1105
00:57:25,080 --> 00:57:28,560
It's like and so and if anybody out
there as well is interested
1106
00:57:28,560 --> 00:57:32,480
in using an API version of this
for your own service,
1107
00:57:32,480 --> 00:57:36,120
reach out to these wonderful hosts
and they'll get in contact with me.
1108
00:57:36,440 --> 00:57:38,720
That picture on the floor behind you.
1109
00:57:38,720 --> 00:57:41,280
Lichtenstein.
Yeah. Can you can you do me a favor?
1110
00:57:41,280 --> 00:57:42,960
Can you hang the fucking thing up?
1111
00:57:42,960 --> 00:57:46,400
Know how much it frustrates me
to see looking at that picture
1112
00:57:46,720 --> 00:57:49,080
just sitting on the floor?
Because it's really cool.
1113
00:57:49,080 --> 00:57:51,160
It is really cool,
but it's also really heavy.
1114
00:57:51,160 --> 00:57:54,240
It's actually, a wooden cargo,
1115
00:57:55,000 --> 00:57:59,320
slat crate that, was painted on.
1116
00:58:00,560 --> 00:58:01,760
So if I hang
1117
00:58:01,760 --> 00:58:05,560
this or if I hang this on the wall,
it's it'll bring the wall down.
1118
00:58:05,560 --> 00:58:06,960
So, yeah,
1119
00:58:06,960 --> 00:58:09,240
I looked at the picture of the girl
and of course,
1120
00:58:09,240 --> 00:58:11,000
the context of what we're doing here.
1121
00:58:11,000 --> 00:58:14,160
Podcasting, I thought, was what it's like.
1122
00:58:14,480 --> 00:58:17,480
It's having.
1123
00:58:17,880 --> 00:58:19,960
Yeah, I know it is.
1124
00:58:19,960 --> 00:58:20,760
But I like her hair.
1125
00:58:20,760 --> 00:58:22,960
Her hair.
1126
00:58:22,960 --> 00:58:24,320
Now I'm looking at it. Now.
1127
00:58:24,320 --> 00:58:25,840
I won't be able to unsee that.
1128
00:58:25,840 --> 00:58:26,520
You can't.
1129
00:58:26,520 --> 00:58:28,200
You can't unsee it. See it?
1130
00:58:28,200 --> 00:58:28,720
That's right.
1131
00:58:28,720 --> 00:58:31,720
When she said, sorry, man. So.
1132
00:58:32,320 --> 00:58:33,360
Well, that was fun.
1133
00:58:33,360 --> 00:58:36,600
Is it over the front? Audio. Sweet.
1134
00:58:36,800 --> 00:58:40,720
Thanks to Driver and Austrian audio
recorded using Source
1135
00:58:40,720 --> 00:58:44,560
Connect, edited by Andrew Peaches
and mixed by Robert God.
1136
00:58:44,560 --> 00:58:45,840
Your own audio issues,
1137
00:58:45,840 --> 00:58:49,440
just ask Robert, echoing tech support
for George the Tech with.
1138
00:58:49,560 --> 00:58:51,360
Don't forget to subscribe to the show
1139
00:58:51,360 --> 00:58:54,680
and join the conversation
on our Facebook group to leave a comment,
1140
00:58:54,760 --> 00:58:58,520
suggest a topic or just say goodbye,
drop us a note at our website.
1141
00:58:58,560 --> 00:59:00,400
Rodeo swedish.com.
00:00:00,080 --> 00:00:00,440
Y'all ready?
2
00:00:00,440 --> 00:00:01,800
Be history. Get started.
3
00:00:01,800 --> 00:00:02,320
Welcome.
4
00:00:02,320 --> 00:00:03,680
Hi. Hi. Hi.
5
00:00:03,680 --> 00:00:06,600
Hello, everyone, to the pro audio suite.
6
00:00:06,600 --> 00:00:08,760
These guys are professional and motivated.
7
00:00:08,760 --> 00:00:09,680
Please take the video.
8
00:00:09,680 --> 00:00:12,960
Stars George Wisdom,
founder of Source Element Robert Marshall,
9
00:00:12,960 --> 00:00:16,960
international audio engineer Darren Robin
Roberts and global voice Andrew.
10
00:00:17,440 --> 00:00:21,240
Thanks to tribal Austrian audio lighting,
passionate elements
11
00:00:21,400 --> 00:00:25,360
George, the tech wisdom and Rob and APIs
international demos.
12
00:00:25,440 --> 00:00:26,520
Find out more about us.
13
00:00:26,520 --> 00:00:29,280
Check the Proteas sweetcorn line up.
14
00:00:29,280 --> 00:00:32,280
Ready? Here we go. I'm.
15
00:00:32,720 --> 00:00:34,480
And welcome to another pro audio suite.
16
00:00:34,480 --> 00:00:37,320
Thanks to try Booth. Don't
forget the code.
17
00:00:37,320 --> 00:00:41,240
Try Pip 200 to get $200 off yours.
18
00:00:41,240 --> 00:00:45,000
And by the time this goes to air,
I'll have mine set up somewhere
19
00:00:45,480 --> 00:00:48,160
remotely in Australia
20
00:00:48,160 --> 00:00:51,160
and Austrian audio making passion heard.
21
00:00:51,880 --> 00:00:54,880
We're hearing from our special guest
today, Michael Pearce and Adams.
22
00:00:54,960 --> 00:00:56,840
Or as we like to call him down here.
23
00:00:56,840 --> 00:01:00,200
Gomez from Waves
Audio has a brand new product,
24
00:01:00,200 --> 00:01:04,000
which I have to admit,
I think I looked at, but I can't remember.
25
00:01:04,360 --> 00:01:09,360
It's all to do with hormone treatment
that that is messing with my brain.
26
00:01:09,520 --> 00:01:12,520
I think the funniest thing
about that entire intro is
27
00:01:12,520 --> 00:01:16,440
it sounds like a testimonial on channel
ten at all. Yep.
28
00:01:16,480 --> 00:01:18,120
At 11:00 at night?
29
00:01:18,120 --> 00:01:18,480
Yeah.
30
00:01:18,480 --> 00:01:19,560
It's funny you should mention that.
31
00:01:19,560 --> 00:01:21,840
That's Andrew's pedigree. Yeah.
32
00:01:21,840 --> 00:01:24,400
I mean, a guy called Brad Turner
33
00:01:24,400 --> 00:01:28,560
used to film those in Melbourne years
and years at, like, 1990.
34
00:01:28,920 --> 00:01:29,360
It's like.
35
00:01:29,360 --> 00:01:33,960
And we've got Brad and Michael here from
double TFM to talk about a new product.
36
00:01:34,480 --> 00:01:36,400
Hang on a minute.
37
00:01:36,400 --> 00:01:37,920
Were you doing that? Yeah,
38
00:01:38,960 --> 00:01:39,960
I used to go down there.
39
00:01:39,960 --> 00:01:42,960
I was with Brad. It was a double T.
40
00:01:43,040 --> 00:01:44,840
Yeah. BT in me.
41
00:01:44,840 --> 00:01:45,480
It was fun.
42
00:01:45,480 --> 00:01:47,080
Oh my God, we must have cross paths.
43
00:01:47,080 --> 00:01:50,560
John Dramatis was also involved
at some point.
44
00:01:50,560 --> 00:01:52,440
We all used to make money like that.
45
00:01:52,440 --> 00:01:54,240
Those were the years indeed.
46
00:01:54,240 --> 00:01:56,480
Anyway, anyway,
thanks guys, for having me here.
47
00:01:56,480 --> 00:01:57,600
It's been a pleasure.
48
00:01:57,600 --> 00:01:59,800
I guess I'll see you next time, right?
Yeah. Thanks.
49
00:01:59,800 --> 00:02:01,880
Yeah. And, Good idea. Brad,
if you're listening to.
50
00:02:01,880 --> 00:02:06,520
Yeah, tell us about this new product,
because for some reason,
51
00:02:06,520 --> 00:02:09,360
I thought I would be good to help you
with the beta testing,
52
00:02:09,360 --> 00:02:11,840
so I know all about it, but
you should probably feel our listeners.
53
00:02:11,840 --> 00:02:15,680
So the one of the first things that I need
54
00:02:15,680 --> 00:02:20,400
to mention about what we call
this region is it's
55
00:02:20,400 --> 00:02:24,600
not a plug in
and it will never be a plug in.
56
00:02:24,600 --> 00:02:26,920
And we can talk about that part
later down the line.
57
00:02:26,920 --> 00:02:31,840
But one of the largest and fastest
growing,
58
00:02:32,360 --> 00:02:35,520
audiences, or of users
59
00:02:35,520 --> 00:02:38,720
who create some kind of audio
in the world,
60
00:02:39,640 --> 00:02:44,920
not w users or video
editor users, the content creators.
61
00:02:44,920 --> 00:02:50,440
And, one of the things that I thought
really long and hard about was
62
00:02:51,960 --> 00:02:53,400
they might not
63
00:02:53,400 --> 00:02:56,840
understand how to fix, bad audio,
64
00:02:56,840 --> 00:03:01,320
but they do understand
what bad audio sounds like.
65
00:03:01,320 --> 00:03:03,680
So let's give them a solution.
66
00:03:03,680 --> 00:03:09,200
So we spent about three and a half years
working on building a very large language
67
00:03:09,200 --> 00:03:13,880
model, and training it on
what good audio sounds like.
68
00:03:14,400 --> 00:03:17,840
And we've come up with this product
called Voice Region,
69
00:03:17,840 --> 00:03:21,160
which can make even the worst
70
00:03:22,400 --> 00:03:25,680
meal or retrievable dialog sound
71
00:03:26,840 --> 00:03:28,040
nearly like a podcast.
72
00:03:28,040 --> 00:03:30,640
Mike and very, very usable.
73
00:03:30,640 --> 00:03:33,640
And on the website,
some of the demos I use,
74
00:03:33,960 --> 00:03:37,440
actually most of the demos
are very Australian because, you know,
75
00:03:37,440 --> 00:03:41,480
I have access to a lot of content
creators here, but
76
00:03:41,760 --> 00:03:45,200
literally one of the cases was a podcast,
77
00:03:45,280 --> 00:03:49,160
recording from Riverside, where on video
78
00:03:49,680 --> 00:03:53,040
the guest looked like they were recording
into a nice microphone.
79
00:03:53,040 --> 00:03:54,440
But the reality was
80
00:03:54,440 --> 00:03:55,120
they were actually
81
00:03:55,120 --> 00:03:58,320
recording on their MacBook
microphone, which was across the room.
82
00:03:59,440 --> 00:04:02,640
And in that came a classic mistake.
83
00:04:02,640 --> 00:04:03,120
Common.
84
00:04:03,120 --> 00:04:06,680
Well, it's it's a hugely common problem,
but it's also one
85
00:04:06,680 --> 00:04:08,280
that is emotionally very,
86
00:04:08,280 --> 00:04:12,040
very stressful for somebody who thinks
they've got great content and realizes
87
00:04:12,400 --> 00:04:16,440
it could be completely and utterly useless
because of that fact.
88
00:04:17,040 --> 00:04:21,440
And this is kind of why I built
voice Region, not for the people who
89
00:04:22,320 --> 00:04:25,960
mostly control the content,
but for all of those out of control
90
00:04:25,960 --> 00:04:29,040
scenarios where bad things happen
91
00:04:29,680 --> 00:04:32,440
and you have to work out,
how do I retrieve this?
92
00:04:32,440 --> 00:04:33,560
Do I pay somebody?
93
00:04:33,560 --> 00:04:38,360
Do I learn how to use, tool
that I've never seen before?
94
00:04:38,400 --> 00:04:40,720
Is this something
I can just drag and drop it onto
95
00:04:40,720 --> 00:04:43,200
and have it fix it for me,
which is what we built.
96
00:04:43,200 --> 00:04:46,520
It's quite seriously some of
those examples that you have up there,
97
00:04:47,840 --> 00:04:48,600
the normal
98
00:04:48,600 --> 00:04:51,720
plug in would probably be of no help
anyway.
99
00:04:51,720 --> 00:04:53,120
They're so far gone.
100
00:04:53,120 --> 00:04:54,120
Yeah, that you would
101
00:04:54,120 --> 00:04:57,200
only be detrimental to the audio
you were trying to keep anyway.
102
00:04:57,280 --> 00:04:59,400
And obviously,
if you think about the amount of time
103
00:04:59,400 --> 00:05:00,920
that it would, it would take.
104
00:05:00,920 --> 00:05:04,960
So for example, Robbo, if you've been paid
to take one of those pieces of audio,
105
00:05:05,240 --> 00:05:06,880
you'd spend a few hours on it.
106
00:05:06,880 --> 00:05:09,880
You'd still go back to it
and go, no, I'm still not happy with it.
107
00:05:11,360 --> 00:05:15,400
And one of the first things that we had
when we put the first demos out
108
00:05:15,400 --> 00:05:19,560
was people from, you know, our existing
audience, which are all plugin users,
109
00:05:19,560 --> 00:05:22,560
which we're very grateful for, saying,
make this a plugin,
110
00:05:22,840 --> 00:05:24,200
and we're constantly explaining.
111
00:05:24,200 --> 00:05:27,240
Now, the reason
this will never be a plugin
112
00:05:27,240 --> 00:05:32,640
is because it would break the
CPU and GPU on any computer.
113
00:05:33,640 --> 00:05:34,160
It doesn't
114
00:05:34,160 --> 00:05:37,600
matter if it's an M5
that came out last week or,
115
00:05:38,120 --> 00:05:41,520
you know, a really powerful PC,
it just would break it.
116
00:05:42,640 --> 00:05:46,760
It's not possible if this takes
too much energy and too much power.
117
00:05:47,360 --> 00:05:49,120
So it's in the cloud.
118
00:05:49,120 --> 00:05:51,160
Tell us, what are we doing to the audio?
119
00:05:51,160 --> 00:05:53,800
I mean it's called voice region.
120
00:05:53,800 --> 00:05:56,360
We're literally
well we're literally regenerating.
121
00:05:56,360 --> 00:05:58,600
So we're listening to what it is.
122
00:05:58,600 --> 00:06:00,680
We're taking the tonality.
123
00:06:00,680 --> 00:06:05,120
As much as we can find,
and we're literally rebuilding it
124
00:06:05,120 --> 00:06:09,160
so that it is
what the LM expects it to be.
125
00:06:09,600 --> 00:06:14,160
One of the examples
that displays that fairly clearly on
126
00:06:14,160 --> 00:06:17,520
the website is a Creative Commons
public recording
127
00:06:17,520 --> 00:06:21,320
from one of the 60s trips to the moon,
128
00:06:21,600 --> 00:06:24,880
where we take the NASA
129
00:06:24,880 --> 00:06:28,640
radio from the moon and make it sound
like he's talking into a podcast mic.
130
00:06:30,400 --> 00:06:32,000
And you can tell it's the same person.
131
00:06:32,000 --> 00:06:35,520
It's just it just one minute he's
on the moon, in the next minute he's not.
132
00:06:35,960 --> 00:06:38,760
And so this.
133
00:06:38,760 --> 00:06:40,440
But was he that was he on the moon?
134
00:06:40,440 --> 00:06:43,800
I wasn't going to go there,
but I didn't want to, you know. So.
135
00:06:44,360 --> 00:06:48,120
And my point would be that, Robert,
there's hope for our podcast.
136
00:06:48,120 --> 00:06:49,040
Yes, man.
137
00:06:49,040 --> 00:06:50,040
Yes. That's right.
138
00:06:50,040 --> 00:06:52,880
Well, I mean,
I think I'll just stop recording right now
139
00:06:52,880 --> 00:06:54,920
and you can just capture
this directly off the phone.
140
00:06:54,920 --> 00:06:56,160
Right.
141
00:06:56,160 --> 00:06:56,960
Good enough.
142
00:06:56,960 --> 00:07:00,240
No, no you that Robert it it please,
please please don't.
143
00:07:00,280 --> 00:07:04,600
But take into account Robert that
I actually can do that with voice region.
144
00:07:04,760 --> 00:07:05,680
Right. That's what I'm saying.
145
00:07:05,680 --> 00:07:07,800
I mean like live by the sword, die
by the sword.
146
00:07:07,800 --> 00:07:08,720
Let's let's do it, Rob.
147
00:07:08,720 --> 00:07:12,640
Oh, come on, I'm already doing it
with Gomezs audio actually going to be.
148
00:07:12,880 --> 00:07:15,480
And there's and there's
a couple of different ways to use this.
149
00:07:15,480 --> 00:07:17,440
And there's a couple of different things
150
00:07:17,440 --> 00:07:20,600
that I've built in
and that we're about to kind of add.
151
00:07:20,920 --> 00:07:26,080
One of them is I've given, people
the ability to record directly on to the,
152
00:07:26,120 --> 00:07:29,120
in the app on the either on their phone
153
00:07:29,600 --> 00:07:31,880
or on a computer.
154
00:07:31,880 --> 00:07:33,600
And I've given them a place
to put this script.
155
00:07:33,600 --> 00:07:34,680
And you can either use it
156
00:07:34,680 --> 00:07:37,960
as a teleprompter and have it scroll,
or you can just kind of stick
157
00:07:38,000 --> 00:07:39,600
speed to zero in the script,
158
00:07:39,600 --> 00:07:42,040
will stay there in front of you
so that you can actually
159
00:07:42,040 --> 00:07:43,800
look at the script
while you're talking to a mic,
160
00:07:43,800 --> 00:07:47,720
the same way a voiceover does,
and then for for a.
161
00:07:47,720 --> 00:07:49,320
And one of the reasons we did
162
00:07:49,320 --> 00:07:53,280
this is because a lot of the time,
if talent is remote,
163
00:07:53,960 --> 00:07:58,800
whether it's a podcast guest or somebody
else, you can still make them sound
164
00:07:59,320 --> 00:08:01,960
like they're,
in front of a better microphone
165
00:08:01,960 --> 00:08:05,760
and the room it
with just drag and drop and process.
166
00:08:06,320 --> 00:08:10,680
We're about to add in two weeks,
a pop out teleprompter
167
00:08:11,280 --> 00:08:12,560
so that you can actually put
168
00:08:12,560 --> 00:08:16,400
the teleprompter, on a script,
a separate screen, make it really large
169
00:08:16,680 --> 00:08:20,040
just underneath your camera
so that you can use a teleprompter.
170
00:08:20,040 --> 00:08:23,400
And we're also adding record video
onto the recording page
171
00:08:23,760 --> 00:08:26,760
so you can select a video camera,
select your microphone.
172
00:08:27,000 --> 00:08:29,800
And even if it's the MacBook microphone
and you're in the kitchen,
173
00:08:29,800 --> 00:08:32,640
we'll make it sound like you've got
you're wearing a lapel.
174
00:08:32,640 --> 00:08:34,920
So this all sets up on a server,
is that correct?
175
00:08:34,920 --> 00:08:35,640
Yeah.
176
00:08:35,640 --> 00:08:39,600
And is this part of the,
what's the bundle?
177
00:08:39,600 --> 00:08:42,160
You know, the the subscription.
178
00:08:42,160 --> 00:08:43,200
Do you get that? No, no.
179
00:08:43,200 --> 00:08:46,400
If you're if you're if you're
if you're paying for creative access
180
00:08:46,400 --> 00:08:49,760
plug ins, then you're not getting region.
181
00:08:49,760 --> 00:08:52,760
If you want region
then just go and subscribe.
182
00:08:53,120 --> 00:08:54,920
I mean, I just made it super cheap.
183
00:08:54,920 --> 00:08:58,120
I mean, to get five hours of processing
per month,
184
00:08:58,120 --> 00:09:01,440
I'm charging $4.99 for us right now.
185
00:09:02,200 --> 00:09:02,640
That's cheap.
186
00:09:02,640 --> 00:09:06,600
But I also give you,
everybody on a free account
187
00:09:06,600 --> 00:09:10,520
which needs no billing address,
no credit card or anything.
188
00:09:10,880 --> 00:09:12,600
You get five minutes free per day.
189
00:09:12,600 --> 00:09:16,160
So if you think about all the short
form content and then people could
190
00:09:16,160 --> 00:09:21,040
literally record on their phone,
with the video, not have to worry.
191
00:09:21,040 --> 00:09:26,000
Lapel put the video straight into voice
region and then post it on TikTok, etc..
192
00:09:27,480 --> 00:09:28,440
And by the way, we're
193
00:09:28,440 --> 00:09:32,680
finding the video
files are about 50% of the files
194
00:09:32,680 --> 00:09:35,680
being uploaded to voice region
right now. So
195
00:09:36,080 --> 00:09:37,680
so it takes in video.
196
00:09:37,680 --> 00:09:40,480
Dymocks is it does the audio processing
remarks is a video.
197
00:09:40,480 --> 00:09:41,280
Absolutely.
198
00:09:41,280 --> 00:09:43,680
We give you back in MP4 with clean
audience.
199
00:09:43,680 --> 00:09:44,680
Here's a question then.
200
00:09:44,680 --> 00:09:48,480
And look I know it probably doesn't
need to be answered by you guys, but
201
00:09:48,480 --> 00:09:50,240
I'm sure plenty of people have asking it.
202
00:09:50,240 --> 00:09:54,440
Given the AI temperature at the moment,
shall we say,
203
00:09:54,440 --> 00:10:00,000
what happens to all these AI samples
once they've been processed?
204
00:10:00,000 --> 00:10:04,120
Your audio stays on your dashboard
until 14 days,
205
00:10:04,120 --> 00:10:06,680
and then at that point
it's automatically deleted.
206
00:10:06,680 --> 00:10:08,680
And if you look at the user dashboard, it
says,
207
00:10:08,680 --> 00:10:10,680
I think I've got it in like two places
right now.
208
00:10:10,680 --> 00:10:15,240
It says anything you have in
your files is deleted after 14 days.
209
00:10:15,800 --> 00:10:18,560
One of the things that we do not do
210
00:10:18,560 --> 00:10:21,240
is use any user's files
211
00:10:21,240 --> 00:10:24,040
to try and improve their model,
and model is our model.
212
00:10:24,040 --> 00:10:27,360
We improve it by paying for content
213
00:10:27,360 --> 00:10:31,120
from specific companies that do that
kind of thing for two reasons.
214
00:10:31,120 --> 00:10:32,920
Number one, it's because it's, you know,
215
00:10:32,920 --> 00:10:35,920
we want to stay 100% legal
with where our content comes from.
216
00:10:35,960 --> 00:10:39,400
Number two,
scraping is a really, really bad idea.
217
00:10:39,400 --> 00:10:42,400
It's unethical, it's not moral,
and it's lying.
218
00:10:42,760 --> 00:10:45,600
So we do not use any user's
219
00:10:45,600 --> 00:10:48,600
audio to help educate our model at all.
220
00:10:48,800 --> 00:10:51,800
And by the way,
there's another reason for that.
221
00:10:52,200 --> 00:10:55,920
When you've got a large language
model like this, you don't educate it
222
00:10:55,920 --> 00:10:57,520
by giving it terrible audio.
223
00:10:57,520 --> 00:11:01,040
You you educate it
by giving it very high quality audio.
224
00:11:01,880 --> 00:11:05,240
It's not like
we can educate this by giving it
225
00:11:05,840 --> 00:11:09,000
region processed audio,
because that's a bit like,
226
00:11:09,680 --> 00:11:13,400
you know, open AI chat about scraping
and finding more information
227
00:11:13,400 --> 00:11:15,040
that it created that was wrong.
228
00:11:15,040 --> 00:11:18,720
And it's it's, you know, eating it's
the beast is eating its tail.
229
00:11:18,720 --> 00:11:19,920
Yeah. It's exactly.
230
00:11:19,920 --> 00:11:21,840
Yeah. It's like it's not the way to do it.
231
00:11:21,840 --> 00:11:24,720
But you know, it's
nobody's information gets used.
232
00:11:24,720 --> 00:11:29,520
When does the 499 limited time
because it's normally 999 right.
233
00:11:29,520 --> 00:11:31,000
Well for the creator.
234
00:11:31,000 --> 00:11:31,360
Fine.
235
00:11:31,360 --> 00:11:35,880
Normally it's like ultimately
as far as I'm just, unconcerned,
236
00:11:36,320 --> 00:11:42,240
I will keep going on the full 99
until I feel like I've reached,
237
00:11:42,320 --> 00:11:47,160
a point where, we're established.
238
00:11:47,200 --> 00:11:51,240
We released this on January 26th,
and in a very quiet launch
239
00:11:51,840 --> 00:11:55,920
so far, we we have about 35,000 members,
which, you know, for.
240
00:11:56,120 --> 00:11:58,480
Wow, it's a month and a half
I'm pretty proud of.
241
00:11:58,480 --> 00:11:59,920
Wow. Yes.
242
00:11:59,920 --> 00:12:03,400
We've processed
a ridiculous amount of audio
243
00:12:03,400 --> 00:12:06,720
in that time, which, I also keep tabs on,
244
00:12:07,160 --> 00:12:11,760
but I want people to keep on joining
because one of the things that happens
245
00:12:11,760 --> 00:12:16,160
when you have a free account
and you're using it is occasionally
246
00:12:16,160 --> 00:12:20,640
an error happens
or a file that you upload doesn't work.
247
00:12:20,760 --> 00:12:24,160
And those kind of errors do get flagged.
248
00:12:24,160 --> 00:12:26,320
And they help me and my team.
249
00:12:26,320 --> 00:12:30,120
And I'm looking at the data
every day to see, okay,
250
00:12:30,120 --> 00:12:33,120
I can't hear the file,
I can't see the file.
251
00:12:33,120 --> 00:12:36,840
But if it fails,
I can see what codec it was
252
00:12:37,200 --> 00:12:40,040
and kind of like
what links or what was in it,
253
00:12:40,040 --> 00:12:45,240
so that I actually can learn and improve
how we deal with errors
254
00:12:45,560 --> 00:12:49,680
and also keep on adding,
oh, there's that codec from that.
255
00:12:49,680 --> 00:12:50,400
Okay, cool.
256
00:12:50,400 --> 00:12:53,000
Let's add that
I want to keep the price down
257
00:12:53,000 --> 00:12:57,000
so that we keep on bringing more people
in, by the way, process really fast.
258
00:12:57,000 --> 00:13:01,320
So I mean it immediately tells you
to download a WAV file, I guess that
259
00:13:01,600 --> 00:13:06,080
did you do that so that you could ensure
that there be a local copy for the user?
260
00:13:06,080 --> 00:13:06,760
Absolutely.
261
00:13:06,760 --> 00:13:10,280
And if you once you've done
that once, effectively what will happen
262
00:13:10,280 --> 00:13:13,280
is you press record as you just did.
263
00:13:13,440 --> 00:13:16,520
You can preview it
and go, yeah, I, I'm gonna like that.
264
00:13:16,520 --> 00:13:17,560
I'm going to process that.
265
00:13:17,560 --> 00:13:21,120
As soon as that happens, you press, okay,
it will take a copy of that
266
00:13:21,120 --> 00:13:25,200
and put it into your default downloads
folder so that you have a redundancy.
267
00:13:26,560 --> 00:13:29,880
And for my
perspective, anybody who wants to record
268
00:13:29,880 --> 00:13:34,200
anything should have a backup of that
that's not touched by any server.
269
00:13:34,200 --> 00:13:36,600
So if you haven't got it back out,
it doesn't cut it at all. Right. Yeah.
270
00:13:36,600 --> 00:13:37,680
Exactly. Exactly.
271
00:13:37,680 --> 00:13:40,920
I mean, remember this is developed
by somebody whose theory is
272
00:13:40,920 --> 00:13:43,880
there's no such thing as a backup
unless as the backup of the backup.
273
00:13:43,880 --> 00:13:46,560
Right. You don't have a copy
unless you have two copies. Exactly.
274
00:13:46,560 --> 00:13:46,800
Yeah.
275
00:13:46,800 --> 00:13:49,080
And the other thing
is, once you click on a file,
276
00:13:49,080 --> 00:13:52,120
you have a toggle between the before
and after audio.
277
00:13:52,720 --> 00:13:56,040
And most of the time,
you can just tell by even looking
278
00:13:56,040 --> 00:13:59,040
at the WAV file
that something dramatic has happened.
279
00:13:59,040 --> 00:14:01,680
Is the processing
basically faster than real time?
280
00:14:01,680 --> 00:14:04,560
Way faster than really quick,
really quick?
281
00:14:04,560 --> 00:14:05,280
Yeah.
282
00:14:05,280 --> 00:14:06,640
So how'd you go, Robert?
283
00:14:06,640 --> 00:14:08,760
If you got something,
you can play I. George. Sorry, you guys.
284
00:14:08,760 --> 00:14:12,280
So here's here's, Robert on the phone,
I don't know, I'm going to ask Gomes.
285
00:14:12,760 --> 00:14:16,320
When does the 499 limited time not.
286
00:14:16,880 --> 00:14:19,560
It's because it's normally 999. Right.
287
00:14:19,560 --> 00:14:21,720
All right. And here's
the processed version I don't know.
288
00:14:21,720 --> 00:14:26,840
I'm going to ask Gomes,
when does the port 9900 time not.
289
00:14:27,360 --> 00:14:29,440
It's because it's normally 999, right.
290
00:14:29,440 --> 00:14:30,280
Isn't that bizarre?
291
00:14:30,280 --> 00:14:33,280
It sounds all like a phone to me.
292
00:14:33,360 --> 00:14:37,080
Yeah, yeah, let's let's hear
technology, man.
293
00:14:37,080 --> 00:14:39,360
Sorry. Yeah. That's right.
294
00:14:39,360 --> 00:14:41,200
Yeah, it is freaky.
295
00:14:41,200 --> 00:14:44,160
The first time I did that,
I was not on the phone.
296
00:14:44,160 --> 00:14:48,040
I haven't done a phone one like that,
but no, but the thing is, you know, it's
297
00:14:48,040 --> 00:14:51,120
like we've got we've got a couple of radio
networks in the States
298
00:14:51,120 --> 00:14:54,200
and a radio network in Australia
using this now.
299
00:14:54,480 --> 00:14:58,440
So that phone interviews,
even if it's down to an office to,
300
00:14:58,480 --> 00:15:01,800
you know, doing tags,
hey, it's such and such
301
00:15:01,800 --> 00:15:05,160
and you're listening to suddenly
it's not a phone interview anymore.
302
00:15:05,160 --> 00:15:07,760
Suddenly
you've got them sitting in a studio record
303
00:15:07,760 --> 00:15:10,760
doing the tags,
and they can be used in proper promos.
304
00:15:10,840 --> 00:15:12,880
Take all your old phone IDs and,
305
00:15:14,000 --> 00:15:14,800
hundred percent you
306
00:15:14,800 --> 00:15:17,920
could have you could have Jim Morrison
saying goodbye.
307
00:15:17,920 --> 00:15:21,320
This is Jim Morrison sounding
like he was sitting across the room.
308
00:15:21,320 --> 00:15:23,280
So I don't think Jim Morrison
that's like it.
309
00:15:23,280 --> 00:15:25,800
I just tend not to mind.
310
00:15:25,800 --> 00:15:30,400
So as a as an experiment, it's
just because I don't really have much of
311
00:15:30,400 --> 00:15:30,880
a life.
312
00:15:30,880 --> 00:15:34,320
I about to about three weeks ago,
313
00:15:34,320 --> 00:15:39,400
I went looking for the oldest John
Lennon interviews,
314
00:15:39,400 --> 00:15:42,400
and I found a video of one
315
00:15:42,600 --> 00:15:45,120
that had so much ground.
316
00:15:45,120 --> 00:15:49,080
And, you know, it was like
the interview was in a white coat
317
00:15:49,080 --> 00:15:50,520
and all very, very. And.
318
00:15:50,520 --> 00:15:50,880
Yeah.
319
00:15:50,880 --> 00:15:54,160
And and I put it through voice reach in
and then I put it
320
00:15:54,160 --> 00:15:57,320
through a visual sharpener called Topaz
video.
321
00:15:57,560 --> 00:16:03,400
And I ended up watching this kind of like
Netflix HD video of John Lennon.
322
00:16:03,400 --> 00:16:06,400
And it's it sounded amazing.
323
00:16:06,640 --> 00:16:10,440
I mean, this has some fairly
it has a lot of use cases.
324
00:16:10,440 --> 00:16:12,360
This song
325
00:16:12,360 --> 00:16:15,480
is probably probably a bit advanced
for where you're at with the software.
326
00:16:15,480 --> 00:16:19,440
But just thinking as an audio engineer
now, like we sat to hear the
327
00:16:19,920 --> 00:16:22,080
John Lennon interview
that we're just talking about,
328
00:16:22,080 --> 00:16:24,880
can I pan left and right
or is it just all straight up?
329
00:16:24,880 --> 00:16:27,360
So right now
it's all up the center and in.
330
00:16:27,360 --> 00:16:31,920
And so for example, right now if you also
if you put in a multi-channel file,
331
00:16:32,360 --> 00:16:35,040
at this point in time,
332
00:16:35,040 --> 00:16:38,560
if it's only stereo,
we will, will mono it.
333
00:16:38,800 --> 00:16:43,240
If it's a multi-channel
more than two channels
334
00:16:43,240 --> 00:16:46,520
at this point in time,
we'll politely tell you we can't do it.
335
00:16:46,520 --> 00:16:49,520
Or if we think we can do it,
336
00:16:49,560 --> 00:16:52,200
we're only doing the first channel.
337
00:16:52,200 --> 00:16:53,600
Right.
338
00:16:53,600 --> 00:16:54,720
We're working on that.
339
00:16:54,720 --> 00:17:00,600
That's another reason why I study the data
so much, is so that we can learn from
340
00:17:01,080 --> 00:17:04,800
how the system deals
with, different kinds of files.
341
00:17:05,360 --> 00:17:06,240
It's for spoken.
342
00:17:06,240 --> 00:17:08,160
This is for dialog only. Yes.
343
00:17:08,160 --> 00:17:09,640
Speech.
344
00:17:09,640 --> 00:17:12,400
Yeah. Yeah,
but I can see, like Peter Jackson.
345
00:17:13,360 --> 00:17:14,200
I'm surprised he hasn't
346
00:17:14,200 --> 00:17:17,200
been on the, on the hornpipe saying,
can I have a go?
347
00:17:17,280 --> 00:17:21,280
We already have a couple
of post-production studios who have said
348
00:17:21,280 --> 00:17:26,080
we used this for boom mics from a film
when one of the mics failed.
349
00:17:27,000 --> 00:17:28,560
So it's like.
350
00:17:28,560 --> 00:17:31,960
And one of the reasons
why we've got a pro account which has
351
00:17:32,760 --> 00:17:36,800
13 hours, and then we'll end up
with an enterprise account.
352
00:17:37,040 --> 00:17:40,160
The other thing that we're doing with
this is we're also about to launch
353
00:17:40,160 --> 00:17:43,160
an API version of Voice region,
354
00:17:43,440 --> 00:17:46,920
so that you can actually plug
a voice region into your own funnel,
355
00:17:47,000 --> 00:17:50,160
and kind of create your own service
out of it.
356
00:17:50,360 --> 00:17:54,080
The only person I can say, or part
of the industry suffering from this is
357
00:17:54,080 --> 00:17:57,840
if you're into audio dialog replacement,
because you won't be needed anymore.
358
00:17:57,840 --> 00:17:59,760
Well, well, yes or no.
359
00:17:59,760 --> 00:18:01,280
I mean, there's a couple of things.
360
00:18:01,280 --> 00:18:07,720
Firstly, a lot of the situations
that voice region is useful to a user
361
00:18:08,120 --> 00:18:11,800
are the users who would never
go to a dialog editor in the first place.
362
00:18:12,320 --> 00:18:14,520
They don't even know what the term means.
363
00:18:14,520 --> 00:18:16,520
All they know is their audio is terrible.
364
00:18:16,520 --> 00:18:18,240
They need a solution. What?
365
00:18:18,240 --> 00:18:21,120
What I'm not selling them is a hammer
and a nail.
366
00:18:21,120 --> 00:18:25,320
I'm I'm I'm selling them
a nail hammered into a wall.
367
00:18:26,320 --> 00:18:27,080
Yeah.
368
00:18:27,080 --> 00:18:29,880
Okay. They don't need the process.
They don't understand the process.
369
00:18:29,880 --> 00:18:32,880
They will never know who to go to
or who to ask for.
370
00:18:33,200 --> 00:18:36,200
So therefore, I'm not really risking
anybody's career.
371
00:18:36,360 --> 00:18:39,880
The other thing that's important
here is the content creator
372
00:18:39,880 --> 00:18:42,960
market is like 150 million times
373
00:18:42,960 --> 00:18:45,960
bigger than the audio pro market.
374
00:18:46,200 --> 00:18:48,920
And, I feel like
375
00:18:48,920 --> 00:18:52,200
I'm giving it to like 1% of it.
376
00:18:52,200 --> 00:18:56,760
Maybe so I don't I'm not losing sleep,
feeling like I'm
377
00:18:56,800 --> 00:18:58,640
taking anybody's job away.
378
00:18:58,640 --> 00:19:00,600
I'll tell you someone who could
he could benefit from this.
379
00:19:00,600 --> 00:19:03,880
And as a podcast, I was watching earlier
and I'll tell you exactly
380
00:19:03,880 --> 00:19:07,600
what it's called,
I think, racing news three, six, five.
381
00:19:09,480 --> 00:19:12,520
And they, they do a Formula One podcast
382
00:19:13,680 --> 00:19:17,520
every time there's someone beaming in
383
00:19:17,680 --> 00:19:21,200
to them, like a guest or something,
or the person's on location,
384
00:19:21,800 --> 00:19:24,480
they compressed the shit out of it
to a point
385
00:19:24,480 --> 00:19:27,720
where it's so gated
that half the stuff is missing.
386
00:19:27,720 --> 00:19:30,160
It's like, what the hell did you like?
Woods have gone. Yeah.
387
00:19:30,160 --> 00:19:31,800
It's like, yeah.
388
00:19:31,800 --> 00:19:34,800
It's like,
come on, I've sent them a message saying
389
00:19:34,920 --> 00:19:37,680
your gates count is too heavy
with the gating.
390
00:19:37,680 --> 00:19:39,600
I cut off half your dialog is gone.
391
00:19:39,600 --> 00:19:42,600
Well, you raise an interesting point
that I was going to make
392
00:19:42,680 --> 00:19:45,480
was that, you know,
they're they're fed content creators.
393
00:19:45,480 --> 00:19:47,880
There actually is a direct correlation
394
00:19:47,880 --> 00:19:51,880
between the quality of your audio
and how much people believe you.
395
00:19:52,160 --> 00:19:52,520
Yeah.
396
00:19:52,520 --> 00:19:55,920
It's actually been studied
in at the University of Queensland
397
00:19:55,920 --> 00:19:59,200
and that one in California,
when we talked about this before,
398
00:19:59,200 --> 00:20:04,720
you know, it's like you can draw a line
almost in terms of how listener perception
399
00:20:04,720 --> 00:20:08,560
of the information you're giving them
falls as your audio quality gets more.
400
00:20:08,640 --> 00:20:12,960
Doesn't matter
if you've got an ethics three camera
401
00:20:12,960 --> 00:20:18,360
with a Sigma lens, anamorphic
perfect lighting, and your sharp focus.
402
00:20:18,360 --> 00:20:21,280
If your audio sounds like you're recording
403
00:20:21,280 --> 00:20:25,120
on a Logitech
C nine to a webcam in a kitchen,
404
00:20:26,200 --> 00:20:26,760
which you
405
00:20:26,760 --> 00:20:30,440
probably, it's just a random side note
406
00:20:30,600 --> 00:20:36,280
the highest selling webcams
still in 2026 is the Logitech C9 two.
407
00:20:36,440 --> 00:20:38,320
I've got one.
408
00:20:38,320 --> 00:20:39,800
I think I'm using it right now.
409
00:20:39,800 --> 00:20:40,960
Literally, literally.
410
00:20:40,960 --> 00:20:46,200
It's like I did
I did, I have, a report software
411
00:20:46,240 --> 00:20:50,360
that I pay for a lot for every month
that tells me how many things sell on
412
00:20:50,360 --> 00:20:52,000
Amazon per product.
413
00:20:52,000 --> 00:20:56,240
This thing just on Amazon
sells like at least 20,000 a day.
414
00:20:56,240 --> 00:20:57,240
It's insane.
415
00:20:57,240 --> 00:20:58,560
It's not a bad camera. Lenses.
416
00:20:58,560 --> 00:21:00,960
I just see nine, two, two.
417
00:21:00,960 --> 00:21:03,360
Oh, you've got the posh version
anyway, so.
418
00:21:03,360 --> 00:21:06,080
But yes, good audio is confidence.
419
00:21:06,080 --> 00:21:08,680
And that's kind of one of the things
that from my perspective,
420
00:21:08,680 --> 00:21:13,080
I wanted with voice
region is to give people confidence
421
00:21:13,080 --> 00:21:16,920
because, you know, if you look amazing
but you sound is really rough.
422
00:21:17,480 --> 00:21:20,040
It does lower your self-esteem.
423
00:21:20,040 --> 00:21:22,640
And you think,
do I really want to put that out there?
424
00:21:22,640 --> 00:21:25,760
What I want is somebody to feel like
they can just literally
425
00:21:26,040 --> 00:21:29,280
pick up their phone, record
something, hit process, and
426
00:21:29,840 --> 00:21:32,840
send it out into the world,
know that it sounds amazing.
427
00:21:33,200 --> 00:21:36,200
Without knowing how it was achieved.
428
00:21:36,960 --> 00:21:38,640
We'll focus on the content. Yeah.
429
00:21:38,640 --> 00:21:42,240
And let us look after your
your polish, I agree.
430
00:21:42,240 --> 00:21:45,480
So if you like what you've come up with
for people who are doing content,
431
00:21:45,480 --> 00:21:48,520
instead of worrying about all the shit
that they have no idea what it is,
432
00:21:49,080 --> 00:21:53,080
just do what you do best and let the
you know machine do the work for you.
433
00:21:53,080 --> 00:21:54,280
The other bit for you.
434
00:21:54,280 --> 00:21:58,640
The other thing is that, and I
this is something which I, denied about
435
00:21:58,640 --> 00:22:02,000
was how large a file
436
00:22:02,000 --> 00:22:05,320
I wanted to enable people to play with.
437
00:22:06,120 --> 00:22:08,880
So you can upload anything to voice
region,
438
00:22:08,880 --> 00:22:12,720
up to 1.5 gig in size.
439
00:22:13,280 --> 00:22:16,440
So now, 1.5 gig.
440
00:22:16,440 --> 00:22:20,520
If you're at 724, you know,
441
00:22:20,520 --> 00:22:24,840
something like that with video
you're talking, I mean, I ran
442
00:22:25,800 --> 00:22:28,600
one of the webinars
I did with kind of like Andrew Shepp.
443
00:22:28,600 --> 00:22:31,680
So somebody I ran the entire webinar,
444
00:22:31,680 --> 00:22:35,320
720p after I ripped it down off YouTube
through Voice Region.
445
00:22:35,320 --> 00:22:39,360
And that was like two hours and it cleaned
up at vocals, something chronic.
446
00:22:39,360 --> 00:22:40,440
It was amazing.
447
00:22:40,440 --> 00:22:42,600
But 1.5 gig I think is enough.
448
00:22:42,600 --> 00:22:46,200
And if you just doing audio you could
that's like four hours of audio
449
00:22:46,200 --> 00:22:46,800
right there.
450
00:22:46,800 --> 00:22:49,360
And if you're doing
four hours of podcasting,
451
00:22:49,360 --> 00:22:51,640
you're not going to have an audience,
right?
452
00:22:51,640 --> 00:22:53,200
Okay. They're gone.
453
00:22:53,200 --> 00:22:53,800
They're asleep.
454
00:22:53,800 --> 00:22:55,560
Yeah. We should know. Yeah.
455
00:22:55,560 --> 00:22:57,720
Yeah. That's that's right.
456
00:22:57,720 --> 00:22:59,160
Yeah. About not having an audience.
457
00:22:59,160 --> 00:23:02,160
I mean, yeah.
458
00:23:02,680 --> 00:23:04,800
I think yeah yeah yeah yeah yeah.
459
00:23:04,800 --> 00:23:07,280
So where do you think this leads Gomez?
Where do you think? What?
460
00:23:07,280 --> 00:23:11,440
What's the next horizon
if content creator focus is added
461
00:23:11,440 --> 00:23:13,280
to what waves is already doing?
462
00:23:13,280 --> 00:23:16,160
I mean, not not specifically,
but in a broader sense.
463
00:23:16,160 --> 00:23:20,160
Do you think as an industry in general,
that focus is going to move
464
00:23:20,160 --> 00:23:21,000
in that direction?
465
00:23:21,000 --> 00:23:22,360
Not really.
466
00:23:22,360 --> 00:23:24,440
I mean, let me give you some perspective.
467
00:23:24,440 --> 00:23:29,960
So we there are,
three product managers, waves
468
00:23:29,960 --> 00:23:34,760
who deal specifically in live sound,
because we have one of the most popular
469
00:23:35,280 --> 00:23:38,160
digital live
consoles in the world with the LP one.
470
00:23:39,640 --> 00:23:41,720
There are, ten
471
00:23:41,720 --> 00:23:46,040
product managers who deal specifically
with the plug in market
472
00:23:46,040 --> 00:23:49,240
and coming up with ideas for plug ins,
developing plug ins, etc.
473
00:23:49,840 --> 00:23:51,360
is that's what a product manager does.
474
00:23:51,360 --> 00:23:56,080
A product manager comes up with
the concept for something, fleshes it out,
475
00:23:56,080 --> 00:24:00,000
and then works with the development team
to all the way through to make it happen.
476
00:24:00,600 --> 00:24:05,960
There's one product manager, waves,
who deals with the content creator market.
477
00:24:05,960 --> 00:24:07,280
That's me.
478
00:24:07,280 --> 00:24:12,680
And for a company who has, you know,
we have about 5 million
479
00:24:12,680 --> 00:24:15,840
plug in users who have purchased
plug ins over the years.
480
00:24:15,840 --> 00:24:20,160
And in our database,
our focus is still very,
481
00:24:20,160 --> 00:24:24,480
very much on looking after, the market
that is our heritage.
482
00:24:24,880 --> 00:24:31,040
However, my goals are set on
how can we take advantage
483
00:24:31,040 --> 00:24:34,080
of the processing
that we have, the skills that we have
484
00:24:34,640 --> 00:24:39,440
and the technology that we have to make
a content creators life easier
485
00:24:39,880 --> 00:24:45,240
so that they can focus on content and
not focus on trying to learn a new skill.
486
00:24:45,840 --> 00:24:46,880
And that does two things.
487
00:24:46,880 --> 00:24:51,560
Number one, it helps us retain
the people in the industry like yourselves
488
00:24:51,560 --> 00:24:56,520
who, voiceovers,
production experts, etc..
489
00:24:57,000 --> 00:25:01,240
And means there are less people out there
on Fiverr saying,
490
00:25:01,240 --> 00:25:06,240
hey, I can to you a radio ad,
but also means that,
491
00:25:06,360 --> 00:25:09,600
the audience gets a better product
regardless.
492
00:25:10,160 --> 00:25:13,200
Is there a user feedback, functionality?
493
00:25:13,200 --> 00:25:17,160
I remember kind of in the earlier days
of this type of technology, I'd
494
00:25:17,160 --> 00:25:22,400
watch YouTube videos
that occasional audio randomness,
495
00:25:23,360 --> 00:25:24,280
you'd be just listening to
496
00:25:24,280 --> 00:25:27,560
the dialog, and then 1 or 2
words were just, you know, this.
497
00:25:27,640 --> 00:25:29,640
Yeah. I mean, yeah, yeah,
it wasn't that way.
498
00:25:29,640 --> 00:25:33,440
So if that, that like it was a discord
or one of those.
499
00:25:33,840 --> 00:25:38,160
So I used to crunch the crap
out of things.
500
00:25:38,520 --> 00:25:39,000
Yeah.
501
00:25:39,000 --> 00:25:42,400
So can you, can you give feedback
like if you get a, if you get bad output,
502
00:25:42,400 --> 00:25:46,320
which I would imagine you've done a lot
to prevent from ever happening.
503
00:25:46,320 --> 00:25:47,480
But can you give feedback.
504
00:25:47,480 --> 00:25:52,440
So so will we with the there was a support
page and a contact form,
505
00:25:52,440 --> 00:25:56,480
and I get a report from Tech Support
literally
506
00:25:56,760 --> 00:26:00,480
six days a week on
anything that comes from them.
507
00:26:00,480 --> 00:26:01,600
And we escalate it.
508
00:26:01,600 --> 00:26:05,880
So if it's something that's really easy
and it's a user issue and 80% at the time,
509
00:26:05,880 --> 00:26:08,320
a problem that comes to us
is a user issue.
510
00:26:08,320 --> 00:26:11,080
So something they didn't quite understand.
511
00:26:11,080 --> 00:26:12,480
That we clear up.
512
00:26:12,480 --> 00:26:15,560
But if there's a real problem with audio,
then I will contact them
513
00:26:15,560 --> 00:26:19,840
directly and say,
are you happy to let me listen
514
00:26:19,840 --> 00:26:23,200
to the audio file or,
and give you an explanation?
515
00:26:23,200 --> 00:26:27,800
Because at this point in time,
we're still in the point where
516
00:26:28,400 --> 00:26:31,560
we want to make sure that everybody
who has the chance
517
00:26:32,160 --> 00:26:36,680
to really get an idea of why they file
didn't work should know why.
518
00:26:36,720 --> 00:26:40,440
And one of the other things
we're about to do is we're about to open
519
00:26:40,440 --> 00:26:43,440
a whole new set of social media channels
520
00:26:43,760 --> 00:26:46,880
specifically for the content creator
market, because we can't
521
00:26:47,360 --> 00:26:52,520
we can't really run tutorials on voice
region on, you know, waves, YouTube.
522
00:26:52,520 --> 00:26:54,400
And then somebody will be scrolling
and they'll go from all
523
00:26:54,400 --> 00:26:55,320
that was really useful.
524
00:26:55,320 --> 00:26:59,080
And then I'll the next one is Andrew Ships
talking about high pass and low
525
00:26:59,080 --> 00:27:00,200
pass and we lose them.
526
00:27:01,160 --> 00:27:02,000
Yeah, exactly.
527
00:27:02,000 --> 00:27:03,880
You know,
we're opening a set of new channels.
528
00:27:03,880 --> 00:27:08,880
We just employed a whole new marketing
department to deal specifically with this,
529
00:27:08,880 --> 00:27:14,160
so that we can actually give content
creators the attention they deserve
530
00:27:14,680 --> 00:27:18,280
without affecting anything else in waves,
531
00:27:18,280 --> 00:27:23,320
or lessening the attention
that we give to our heritage audience.
532
00:27:23,320 --> 00:27:24,400
Did you say discord?
533
00:27:24,400 --> 00:27:27,960
Was that the the discord used
a bit crunch everything to?
534
00:27:27,960 --> 00:27:31,760
Yeah, I was going to say because
it used to do this with also discord
535
00:27:31,760 --> 00:27:35,840
when I put it in the system
that would try to clean up the audio,
536
00:27:36,240 --> 00:27:36,960
it would just like
537
00:27:36,960 --> 00:27:40,560
you're talking about a secret, oh,
discord, discord or sorry descript. Yes.
538
00:27:40,640 --> 00:27:46,400
Oh, descript discord studio
toggle called Sound Studio sounds.
539
00:27:46,440 --> 00:27:48,480
So do you remember
the one that you sent us, George,
540
00:27:48,480 --> 00:27:49,920
whether it was a real estate thing
541
00:27:49,920 --> 00:27:52,320
where it all of sudden went
and started doing gibberish.
542
00:27:52,320 --> 00:27:54,120
So it sounded like I.
543
00:27:54,120 --> 00:27:56,880
What I would like to do is get that
that plug in,
544
00:27:56,880 --> 00:28:00,040
and then I want to get the chef
from Sesame Street and run.
545
00:28:00,040 --> 00:28:01,440
That's really just
546
00:28:02,440 --> 00:28:04,760
a new legacy.
547
00:28:04,760 --> 00:28:08,640
So from what I mean,
obviously I haven't managed to get anybody
548
00:28:08,640 --> 00:28:13,560
from descript.com to talk to me about this
for obvious reasons, probably.
549
00:28:13,560 --> 00:28:17,840
But I have I have actually genuinely
I mean, you guys know me.
550
00:28:17,840 --> 00:28:19,760
I'm pretty genuine an open book.
551
00:28:19,760 --> 00:28:20,640
I have reached out
552
00:28:20,640 --> 00:28:23,960
and said I would love to talk to you
about helping you improve that,
553
00:28:24,400 --> 00:28:27,800
because from my perspective,
what I want to do is I want to make sure
554
00:28:27,800 --> 00:28:30,800
that the content creators
who are using anything
555
00:28:30,920 --> 00:28:33,920
get the best and feel that's really good.
556
00:28:35,400 --> 00:28:36,960
What I feel like
557
00:28:36,960 --> 00:28:40,040
the audio process is doing is,
558
00:28:40,800 --> 00:28:43,200
number one,
I feel like it's giving the users a bit
559
00:28:43,200 --> 00:28:46,880
too much control without explaining
what the toggle is doing,
560
00:28:47,560 --> 00:28:50,880
but when it's on full,
which I think most people just put it on
561
00:28:50,880 --> 00:28:55,240
to full without really,
I think it bit crunch is to a point
562
00:28:55,240 --> 00:28:58,440
where, as you said,
some words just disappear.
563
00:28:59,240 --> 00:29:02,760
Now, one of the interesting things
about this, when I was investigating
564
00:29:02,760 --> 00:29:06,200
this kind of problem was
I found out around the same time
565
00:29:06,480 --> 00:29:09,360
that I created video.
566
00:29:09,360 --> 00:29:12,360
I generated video with dialog.
567
00:29:12,600 --> 00:29:15,720
The dialog with most video models
568
00:29:15,720 --> 00:29:19,520
comes out in the codec as 96 K,
569
00:29:20,120 --> 00:29:24,000
but is a bit crunched to Helen back.
570
00:29:24,960 --> 00:29:28,200
If you drag that video onto voice region,
571
00:29:28,400 --> 00:29:32,400
it makes it sound like they're
talking into a boom mike or a lapel.
572
00:29:33,080 --> 00:29:35,440
So suddenly the AI video
573
00:29:35,440 --> 00:29:38,440
that was given away by the audio,
574
00:29:38,520 --> 00:29:41,880
irritatingly to some people,
really impressive to others,
575
00:29:42,040 --> 00:29:45,240
now looks really realistic
because the audio sounds realistic.
576
00:29:45,240 --> 00:29:50,400
I think all AI videos should always
put Waldo somewhere. Yes.
577
00:29:53,880 --> 00:29:55,240
You mean the Waldo watermark?
578
00:29:55,240 --> 00:29:58,320
Yeah, that that's how you know it's
AI because Waldo
579
00:29:58,680 --> 00:30:03,040
is somewhere in the video
semi is saying is you brought up Waldo.
580
00:30:03,360 --> 00:30:06,440
I'll share something which I share
with the team every week at Waves.
581
00:30:06,800 --> 00:30:09,960
Once a week
when we have our, engineering meeting,
582
00:30:10,440 --> 00:30:14,960
I create a new Where's Waldo with
but with me in it.
583
00:30:15,400 --> 00:30:18,400
Using Google's now open on a pro,
584
00:30:18,840 --> 00:30:24,000
and I share it in the teams chat
every week and the product managers like,
585
00:30:24,000 --> 00:30:27,640
okay, where's MPX, where's MPX,
where's it put little.
586
00:30:30,160 --> 00:30:33,520
It's funny, I wear pajamas that, stripy.
587
00:30:34,000 --> 00:30:37,120
I wore I, we were bananas in pajamas.
588
00:30:37,520 --> 00:30:39,960
Did this bananas did that.
Does it make here?
589
00:30:39,960 --> 00:30:41,360
We were in a meeting.
590
00:30:41,360 --> 00:30:43,080
We were in a in the today.
591
00:30:43,080 --> 00:30:45,240
And I was wearing a collared shirt and,
592
00:30:45,240 --> 00:30:48,520
and, and I had three people say,
you look like you're a prisoner.
593
00:30:49,200 --> 00:30:50,680
And it's like I might.
594
00:30:52,080 --> 00:30:54,960
So because we all are prisoners.
595
00:30:54,960 --> 00:30:58,640
Well, for me, for me as a user of waves,
the hardest thing for me to wrap
596
00:30:58,640 --> 00:31:01,640
my head around when I hadn't looked at it
yet was just understanding that,
597
00:31:01,640 --> 00:31:04,600
well, that's, you know,
so that's like number one, it's like
598
00:31:04,600 --> 00:31:07,760
you have to get out
of the ecosystem of Waves Central
599
00:31:08,200 --> 00:31:13,080
when your brain clicks and goes,
oh yeah, first it's the web app.
600
00:31:13,080 --> 00:31:19,200
Okay, I used video I did on voice
region was to the large waves market
601
00:31:19,600 --> 00:31:23,760
saying and it literally is all about,
hey, we've just released this thing.
602
00:31:24,000 --> 00:31:26,880
Just so you know, it's
not a plug in, don't it?
603
00:31:26,880 --> 00:31:28,920
That's kind of the idea of the video.
604
00:31:28,920 --> 00:31:33,240
It's it's not, you know,
we're not we're not degrading or saying,
605
00:31:33,240 --> 00:31:35,280
you know,
I uses couldn't understand stuff,
606
00:31:35,280 --> 00:31:38,400
but we wanted to make it clear
that for the first time ever, we're
607
00:31:38,400 --> 00:31:41,400
releasing an audio process
that isn't a plug in.
608
00:31:41,880 --> 00:31:45,360
You know, it's like,
I wouldn't have wanted to release this
609
00:31:45,360 --> 00:31:48,880
without letting the bigger
audience know first were about to.
610
00:31:49,000 --> 00:31:51,840
Is it also mobile friendly?
I didn't forget, oh, absolutely.
611
00:31:51,840 --> 00:31:52,400
To the point.
612
00:31:52,400 --> 00:31:57,920
The in the next push in the next few days,
we're adding a little banner that says,
613
00:31:57,920 --> 00:32:01,320
hey, do you want to kind of shortcut this
so it looks like an app on your phone?
614
00:32:01,640 --> 00:32:04,000
Teleprompter works on the phone.
615
00:32:04,000 --> 00:32:08,040
You can audio record,
to a script on your phone.
616
00:32:08,040 --> 00:32:09,240
You can drag a video
617
00:32:09,240 --> 00:32:12,240
or an audio file from your phone
straight in and press process.
618
00:32:12,680 --> 00:32:15,880
I'm not letting people on mobile
phone record video
619
00:32:16,040 --> 00:32:17,160
when we release that,
620
00:32:17,160 --> 00:32:20,760
because a mobile phone is perfectly good
at recording video by itself
621
00:32:21,280 --> 00:32:22,080
and you're holding it.
622
00:32:22,080 --> 00:32:24,720
Just yeah,
record the video with your standard app.
623
00:32:24,720 --> 00:32:27,320
Upload that. That's easy.
So here's a question.
624
00:32:27,320 --> 00:32:32,760
Then this just occurred to me
if I were a voiceover artist
625
00:32:33,920 --> 00:32:36,920
and I was putting together my travel rig
626
00:32:37,800 --> 00:32:41,760
and I had a interface like,
627
00:32:42,880 --> 00:32:46,640
you know, maybe something that some
certain podcast may have put together.
628
00:32:46,840 --> 00:32:48,280
Yeah, sold for a while.
629
00:32:48,280 --> 00:32:51,880
But I needed something to record onto.
630
00:32:52,560 --> 00:32:54,240
Could I use it just as a recorder?
631
00:32:54,240 --> 00:32:56,720
Like, I guess the question is,
do I have to process?
632
00:32:56,720 --> 00:32:58,560
Can I record on to there
and then just down?
633
00:32:58,560 --> 00:32:59,480
You don't have to process.
634
00:32:59,480 --> 00:33:01,320
You can just do it that way
if you want to. Yeah.
635
00:33:01,320 --> 00:33:05,360
So so so does it burn your hours
if you use that that teleprompter
636
00:33:05,360 --> 00:33:09,240
and the recorder
or is that all without using the hours
637
00:33:09,240 --> 00:33:11,480
that the, you know, the,
the only thing the,
638
00:33:11,480 --> 00:33:15,880
the only thing that burns you in
minutes is the processing processing.
639
00:33:15,880 --> 00:33:17,480
So that's why I recorded a call.
640
00:33:17,480 --> 00:33:20,040
It's a teleprompter and a recorder
right there. Yeah.
641
00:33:20,040 --> 00:33:22,640
And that's honestly
and it's actually intentional.
642
00:33:22,640 --> 00:33:25,880
And one of the reasons why I did
that is because
643
00:33:26,400 --> 00:33:29,160
of my heritage and my background from,
644
00:33:29,160 --> 00:33:32,440
you know, not just 20 years in music
production in the States, but,
645
00:33:32,640 --> 00:33:35,640
you know, 15
before that radio production in Australia.
646
00:33:36,200 --> 00:33:38,080
One of the things
that when you've got headphones
647
00:33:38,080 --> 00:33:40,480
on, you've recorded a script
and you listen back, it's
648
00:33:40,480 --> 00:33:44,960
when you listen back that you realize,
yep, sounds good, but I'm in a hotel room.
649
00:33:44,960 --> 00:33:47,320
You know what? I'm
just going to click process in the room.
650
00:33:47,320 --> 00:33:48,880
That's exactly right there.
651
00:33:48,880 --> 00:33:51,120
I love that
you integrated the script reader.
652
00:33:51,120 --> 00:33:53,760
That was first. I was like, oh, I had to.
653
00:33:53,760 --> 00:33:57,720
I mean, it's like, I mean,
I've got two users who are on pro accounts
654
00:33:57,720 --> 00:34:01,400
because they
they are professional audiobook readers
655
00:34:01,840 --> 00:34:06,280
and they love me to death
because I got them to beta test this.
656
00:34:06,280 --> 00:34:10,760
And they were like, wait,
so I can have literally chapters in this?
657
00:34:11,080 --> 00:34:14,120
I just have it
right there in front of me,
658
00:34:14,120 --> 00:34:17,120
scrolling really short
and just do it at my own speed.
659
00:34:17,280 --> 00:34:19,920
It's like, yeah, and just pause
if you want to.
660
00:34:19,920 --> 00:34:22,560
This is going to be controversial,
but I guarantee you could
661
00:34:22,560 --> 00:34:25,800
literally just record an entire audiobook
on your phone full stop.
662
00:34:25,800 --> 00:34:29,880
If you if you give this really good audio,
like really good audio,
663
00:34:29,880 --> 00:34:32,200
will it still touch it
or will it go, that's good.
664
00:34:32,200 --> 00:34:33,160
Interesting question.
665
00:34:33,160 --> 00:34:38,600
I'm so glad you asked it now because of
the fact that this is a very large model
666
00:34:38,960 --> 00:34:42,600
that's trained on high quality
files, it's trained on
667
00:34:43,000 --> 00:34:47,160
not the finest quality,
it's trained on quality files.
668
00:34:47,520 --> 00:34:53,120
So when somebody from a studio puts
in a file that's recorded in, you know, a
669
00:34:53,120 --> 00:34:57,480
Neumann and, in a soundproofed
670
00:34:57,480 --> 00:35:00,680
room, acute compressor preamp.
671
00:35:00,680 --> 00:35:04,080
And exactly if you put that into it,
it will degrade the file.
672
00:35:04,320 --> 00:35:06,400
Interesting enough. Yeah. Okay.
673
00:35:06,400 --> 00:35:10,080
That makes sense, because what it does
is it's still trying to go, okay,
674
00:35:10,080 --> 00:35:13,800
I have a quality,
but I'm trying to get this file to this.
675
00:35:13,800 --> 00:35:16,640
I don't know what to do with this,
so it's going to degrade it for you.
676
00:35:16,640 --> 00:35:19,280
But only if you hit process. Right.
677
00:35:19,280 --> 00:35:19,960
You can still read.
678
00:35:19,960 --> 00:35:22,600
Of course. Yes, I understand, yeah.
679
00:35:22,600 --> 00:35:25,920
At some point you decide
this model is baked.
680
00:35:26,120 --> 00:35:28,800
Let's build a new model and start
all over. We're on.
681
00:35:28,800 --> 00:35:30,800
We're on the third model. Yeah.
682
00:35:30,800 --> 00:35:33,600
Because as I said earlier
683
00:35:33,600 --> 00:35:38,120
on, we're also about to, enable
this is an API
684
00:35:38,120 --> 00:35:41,400
that people can use as a feature
to put into their own website.
685
00:35:41,400 --> 00:35:43,360
And charge four or whatever.
686
00:35:43,360 --> 00:35:47,120
So what this will end up being is,
687
00:35:47,520 --> 00:35:50,920
they'll be the initial model
688
00:35:50,920 --> 00:35:54,040
that will end up being kind of
like one of the most affordable ones.
689
00:35:54,040 --> 00:35:57,360
And then as we build
690
00:35:57,360 --> 00:36:01,240
new models
that will be more dynamic or brighter
691
00:36:01,240 --> 00:36:05,920
or this then will we'll decide
how we want to move forward
692
00:36:05,920 --> 00:36:09,560
with, the, the consumer app,
the one that we just released.
693
00:36:09,600 --> 00:36:13,400
You get old model, it says this is a voice
that sounds like it was recorded
694
00:36:13,400 --> 00:36:14,920
with the UAD seven.
695
00:36:14,920 --> 00:36:16,120
Yeah. I don't want to do that. Yeah.
696
00:36:16,120 --> 00:36:20,160
There's a couple of reasons
I don't want to go that exact number
697
00:36:20,160 --> 00:36:22,080
one, it's a huge claim.
698
00:36:22,080 --> 00:36:26,320
Yeah, but number two,
one of the things that happens in
699
00:36:26,520 --> 00:36:30,560
if you're creating a tool
that content creators use is
700
00:36:30,800 --> 00:36:35,680
you run a risk of giving people who
701
00:36:36,840 --> 00:36:39,720
are focusing on reporting their content
and creating content
702
00:36:39,720 --> 00:36:42,480
and don't want to be an audio engineer
or a video editor,
703
00:36:42,480 --> 00:36:43,960
you're asking them to make
704
00:36:43,960 --> 00:36:47,960
too many decisions without the experience
knowledge to back it up.
705
00:36:48,320 --> 00:36:51,320
And that becomes what I call
feature creep.
706
00:36:51,320 --> 00:36:52,120
Yeah.
707
00:36:52,120 --> 00:36:56,320
You're asking them to make a decision
on control, on something
708
00:36:56,720 --> 00:37:00,320
that, they may not be able to hear.
709
00:37:00,320 --> 00:37:05,160
Well, you know, even that is like
but understanding what you're hearing,
710
00:37:05,160 --> 00:37:09,840
whereas my goal is still I don't want
to give you a hammer and a nail.
711
00:37:09,840 --> 00:37:12,240
I want to give you a nail
that's in a wall.
712
00:37:12,240 --> 00:37:16,160
I would imagine 99% of content
creators would ask, what is a UAD set?
713
00:37:16,400 --> 00:37:17,360
Yeah, exactly.
714
00:37:17,360 --> 00:37:17,560
Yeah.
715
00:37:17,560 --> 00:37:20,880
If you take a look at what
a lot of there's been a lot of,
716
00:37:22,040 --> 00:37:25,040
experience over the
years with different pro audio companies
717
00:37:25,480 --> 00:37:28,480
entering the content creator market
in one way or another.
718
00:37:28,560 --> 00:37:30,000
And I've studied most of them.
719
00:37:30,000 --> 00:37:33,560
And I'm not saying I or
we are perfect in any way, but
720
00:37:33,560 --> 00:37:37,360
one of the things I've learned from others
mistakes is, number one,
721
00:37:37,560 --> 00:37:41,360
never use pro audio nomenclature
when you're talking to a market
722
00:37:41,360 --> 00:37:44,880
that could be a wedding event planner
doing a vlog.
723
00:37:46,480 --> 00:37:48,920
At what point in her life or his life?
724
00:37:48,920 --> 00:37:52,760
If they come across a high
pass, low pass filter
725
00:37:52,760 --> 00:37:56,480
and the a two and 2 to 1 compression
ratio,
726
00:37:56,960 --> 00:38:00,200
a it's, you know,
727
00:38:00,360 --> 00:38:04,680
even the term compression or EQ, you just
I want to stay away from them.
728
00:38:05,200 --> 00:38:06,080
Well, even in clarity,
729
00:38:06,080 --> 00:38:10,640
if you have I think I haven't checked
in the last two months, but three models.
730
00:38:10,640 --> 00:38:11,760
Right.
731
00:38:11,760 --> 00:38:12,480
That's a user. Yeah.
732
00:38:12,480 --> 00:38:16,160
And there will be more,
you know, the clarity
733
00:38:16,440 --> 00:38:19,200
development team, and it's a team.
734
00:38:19,200 --> 00:38:23,600
The they're constantly at work
improving, tweaking.
735
00:38:24,360 --> 00:38:27,680
And obviously there has been some,
you know, questions as well
736
00:38:27,680 --> 00:38:28,560
out in the marketplace.
737
00:38:28,560 --> 00:38:30,840
It's like,
well, you made clarity a plug in.
738
00:38:32,040 --> 00:38:34,520
Is region just a website of clarity?
739
00:38:34,520 --> 00:38:36,440
No. Completely different models,
740
00:38:36,440 --> 00:38:38,920
completely different models,
completely different processes.
741
00:38:38,920 --> 00:38:41,920
Clarity can be a plug in region
742
00:38:42,000 --> 00:38:46,120
will never be a plug in
because it will melt a different products.
743
00:38:46,120 --> 00:38:51,680
I mean clarity 100 is a is is not there
to change the character of the sound.
744
00:38:51,680 --> 00:38:54,840
It's there to isolate the voice
in the character that's given to it.
745
00:38:54,840 --> 00:38:55,080
Right?
746
00:38:55,080 --> 00:38:58,080
Whereas we literally regenerate a voice.
747
00:38:58,440 --> 00:39:01,840
You change the character, you're like,
re wouldn't say
748
00:39:02,040 --> 00:39:05,640
we're not really changing the character,
which was the
749
00:39:05,800 --> 00:39:11,200
the Sonicare, like like this, this, this
was originally recorded on, whatever.
750
00:39:11,200 --> 00:39:14,200
Like a, like a carbon
microphone wire recording.
751
00:39:14,320 --> 00:39:14,640
Yeah.
752
00:39:14,640 --> 00:39:17,280
And now it sounds like it's
coming out of a SM seven.
753
00:39:17,280 --> 00:39:18,360
Yeah. Yeah.
754
00:39:18,360 --> 00:39:20,600
You know,
like the character of the recording.
755
00:39:20,600 --> 00:39:22,280
Not not the person.
756
00:39:22,280 --> 00:39:24,560
Would rebuilding it
be a better way to say it?
757
00:39:24,560 --> 00:39:29,680
We're renovating, renovate,
kind of regenerating in a lot of case,
758
00:39:29,680 --> 00:39:33,480
because it's like we're
we're kind of looking at the in a way,
759
00:39:33,480 --> 00:39:38,800
I guess the DNA of vocal tone, and going,
okay, what?
760
00:39:39,360 --> 00:39:41,640
Let's put some work in and improve this.
761
00:39:41,640 --> 00:39:45,680
So in the end, you know, how well
you know, how you go to various websites.
762
00:39:45,680 --> 00:39:46,440
We won't name them.
763
00:39:46,440 --> 00:39:50,320
And then you've put your voice in there
and there's the whole idea of a one shot.
764
00:39:50,400 --> 00:39:51,840
Oh, the voice model.
765
00:39:51,840 --> 00:39:54,160
Yeah. No we don't do that. Cloning. Yeah.
766
00:39:54,160 --> 00:39:55,640
So this is not a clone.
767
00:39:55,640 --> 00:39:58,640
Like, just to be clear, it's not learning.
768
00:39:58,640 --> 00:40:02,400
This person's voice
the that someone uploads to it,
769
00:40:02,520 --> 00:40:04,800
and and those people
have nothing to worry about.
770
00:40:04,800 --> 00:40:07,160
Like, oh, now
you've uploaded my voice to some website.
771
00:40:07,160 --> 00:40:08,800
I don't want my voice uploaded to.
772
00:40:08,800 --> 00:40:10,680
No, we don't we don't retain.
773
00:40:10,680 --> 00:40:14,520
It's like we it goes through our process.
774
00:40:14,520 --> 00:40:15,920
It goes out the other end.
775
00:40:15,920 --> 00:40:18,200
It doesn't even stay in our cloud.
776
00:40:18,200 --> 00:40:20,640
It goes back to your dashboard
777
00:40:20,640 --> 00:40:23,640
and after 14 days,
we just delete them from the system.
778
00:40:24,200 --> 00:40:25,480
You know, I had one
779
00:40:25,480 --> 00:40:29,440
case named a year ago because I know
everything's happening so fast.
780
00:40:29,440 --> 00:40:32,440
So whenever I say I tried this thing,
781
00:40:32,840 --> 00:40:36,320
you sort of have to say when you know,
because, like, everything's changing.
782
00:40:36,920 --> 00:40:39,840
But the client had sent
me some audio from, like,
783
00:40:40,800 --> 00:40:41,360
Catholic
784
00:40:41,360 --> 00:40:45,920
priest or something, and it was all this
archival audio and it was recorded
785
00:40:45,920 --> 00:40:50,040
very on a bad technology,
probably like a 12
786
00:40:50,040 --> 00:40:54,280
bit 32kHz or worse recorder
787
00:40:54,280 --> 00:40:58,360
that was built in to the,
I don't know, something pretty bad.
788
00:40:59,240 --> 00:41:02,680
And they sent it to me
and the audio was pretty bad.
789
00:41:02,680 --> 00:41:06,960
And then I threw it into one of these
AI tools that regenerate, essentially.
790
00:41:07,480 --> 00:41:10,800
And when she listened to it back, she was
she was kind of
791
00:41:11,160 --> 00:41:17,080
I think she didn't know what to expect
because it was a reconstruction.
792
00:41:17,080 --> 00:41:17,560
Right.
793
00:41:17,560 --> 00:41:21,520
And so she thought
it didn't sound authentic.
794
00:41:22,120 --> 00:41:25,880
So then I was like, well,
I don't know what to tell you.
795
00:41:25,880 --> 00:41:28,880
This is probably what it did sound like,
796
00:41:29,520 --> 00:41:34,440
but you've heard it for the last 30 years
sounding like it comes through a tin can.
797
00:41:35,240 --> 00:41:38,640
You know, out of us
it's the it's exactly the same thing
798
00:41:38,640 --> 00:41:42,560
that happens with,
799
00:41:42,560 --> 00:41:46,440
I, I was talking to the guy, producer,
composer guy
800
00:41:46,440 --> 00:41:50,640
called Greg Wells, a few months ago,
and we were talking about,
801
00:41:51,720 --> 00:41:54,720
a soundtrack that he's just composed.
802
00:41:55,560 --> 00:41:59,000
And one of the biggest problems
with composing a soundtrack
803
00:41:59,280 --> 00:42:02,320
and making the film director
and the company happy with
804
00:42:02,320 --> 00:42:05,720
it is when they're actually
doing the edit of the film.
805
00:42:05,720 --> 00:42:10,160
They use a temp soundtrack,
which is usually an existing soundtrack
806
00:42:10,160 --> 00:42:14,000
from something else that kind of fits,
but they spend so many months
807
00:42:14,000 --> 00:42:16,720
on the cutting room floor
with the temp soundtrack.
808
00:42:16,720 --> 00:42:17,200
No idea.
809
00:42:17,200 --> 00:42:20,320
So when the new soundtrack
that's being composed specifically
810
00:42:20,320 --> 00:42:23,320
for it comes up, a lot of the time
they're like, oh, that sucks.
811
00:42:23,760 --> 00:42:28,800
So yeah, I think the funniest example
of this is, is, some clients of mine
812
00:42:28,800 --> 00:42:33,560
were talking about a session and basically
the scratch track was done with an AI,
813
00:42:34,080 --> 00:42:37,080
and then they were
recording the real voice,
814
00:42:37,520 --> 00:42:41,160
and they began asking the voice talent
to read it more like the scratch track,
815
00:42:41,440 --> 00:42:44,440
which was basically saying,
please read it more like I
816
00:42:44,520 --> 00:42:47,520
yeah, it's like, yeah, like
817
00:42:47,520 --> 00:42:51,520
how to lose your soul in five seconds
and yeah,
818
00:42:51,560 --> 00:42:54,720
yeah,
I had that exact thing happen to me with a
819
00:42:54,720 --> 00:42:56,760
we talked about the session before,
820
00:42:56,760 --> 00:43:00,960
whether it was a female reading to this
short film that they wanted me to voice.
821
00:43:00,960 --> 00:43:03,840
And I was kind of thinking,
that sounds okay. Anyway,
822
00:43:04,880 --> 00:43:06,600
but they said, oh, no, no,
823
00:43:06,600 --> 00:43:09,560
the check out the three words
are not pronounced correctly.
824
00:43:09,560 --> 00:43:13,640
And I'm thinking, well, why don't
you just get a back to redo those lines?
825
00:43:13,800 --> 00:43:16,800
But it wasn't until later
I realized it was an AI,
826
00:43:17,120 --> 00:43:20,880
but they they were trying to get me like
the inflection on the on the guide track.
827
00:43:20,880 --> 00:43:21,720
Can you try and do that?
828
00:43:21,720 --> 00:43:25,080
And it wasn't until the next day,
I think, and I mentioned this before,
829
00:43:25,080 --> 00:43:29,040
another episode that I was playing it
to, to my son, who's 17.
830
00:43:29,040 --> 00:43:30,760
I said, oh, you know,
this is what happened, boy,
831
00:43:30,760 --> 00:43:34,200
because that's I, I'm like, yeah,
I can hear it a mile away.
832
00:43:34,200 --> 00:43:36,200
Usually two they can hear it a mile away.
833
00:43:36,200 --> 00:43:41,360
The can I, I'm, I'm 56, but I can hear it
a mile away because I've spent so many,
834
00:43:41,520 --> 00:43:46,080
it spent the last few years
literally deep in AI research.
835
00:43:46,680 --> 00:43:51,200
And one of the things that's really hard,
even if you're using something is good,
836
00:43:51,200 --> 00:43:52,920
is 11 laps.
837
00:43:52,920 --> 00:43:59,040
You have to make sure that words,
especially in English words
838
00:43:59,040 --> 00:44:02,040
that, the same.
839
00:44:02,440 --> 00:44:06,920
So so for example,
one, or one, they're so windy.
840
00:44:08,120 --> 00:44:10,040
Oh. And
841
00:44:10,040 --> 00:44:13,080
you have to work
out a way of spelling them phonetically
842
00:44:13,080 --> 00:44:16,040
so that they, separated.
843
00:44:16,040 --> 00:44:19,200
And there was so many other words
in the English language that I just
844
00:44:19,200 --> 00:44:23,640
does not get its perceptive head around
because it doesn't have a perception
845
00:44:23,920 --> 00:44:27,920
version of the context in between
this word, this word and this word,
846
00:44:28,320 --> 00:44:32,040
when it's spelt one way
but it's pronounced another.
847
00:44:32,400 --> 00:44:36,040
So I remember doing this
with a phone in chips in college.
848
00:44:36,040 --> 00:44:39,200
And just to get this thing to say
electronics,
849
00:44:39,680 --> 00:44:43,640
you had to put like ten ese in a row
because it had to do like electronics.
850
00:44:43,640 --> 00:44:47,400
And if you wanted to say electronics,
you had to put like IEEE.
851
00:44:47,400 --> 00:44:50,400
L like electronics. Yeah.
852
00:44:50,520 --> 00:44:52,240
Because it just yeah.
853
00:44:52,240 --> 00:44:54,840
If to like force it
to get out what you want
854
00:44:54,840 --> 00:44:58,240
because it's not always changing
but it's also a dialect.
855
00:44:58,240 --> 00:45:02,160
I mean, you know, I,
I've just heard a couple of things like,
856
00:45:02,160 --> 00:45:07,520
I heard Gomez say data instead of data,
but it spelt the same.
857
00:45:07,520 --> 00:45:10,280
But it's pronounced differently,
like dance.
858
00:45:10,280 --> 00:45:11,120
Dance, you know.
859
00:45:11,120 --> 00:45:14,120
Yeah. 2020 is an American man.
860
00:45:14,280 --> 00:45:16,680
Yeah, yeah yeah, yeah. Tomato, tomato.
861
00:45:16,680 --> 00:45:18,400
Yeah, exactly.
862
00:45:18,400 --> 00:45:21,040
It's like when you get, someone to say it
863
00:45:21,040 --> 00:45:22,720
like English is not their first language.
864
00:45:22,720 --> 00:45:24,880
And then they're selling
their writing on the script.
865
00:45:24,880 --> 00:45:28,600
Their phonetic version
of how you pronounce that word.
866
00:45:29,120 --> 00:45:31,680
But they forget that the fact is that
867
00:45:31,680 --> 00:45:33,560
their phonetics are going
to be different to yours
868
00:45:33,560 --> 00:45:35,320
because they say certain things
differently.
869
00:45:35,320 --> 00:45:36,200
The way you would say it.
870
00:45:36,200 --> 00:45:40,960
So when you read their phonetic version
of that word, it's completely wrong.
871
00:45:40,960 --> 00:45:42,960
And they come back and go, no, no, no,
that's not what I want, I want it,
872
00:45:42,960 --> 00:45:43,840
blah, blah, blah.
873
00:45:43,840 --> 00:45:45,080
It's like, well, that's what you just did.
874
00:45:45,080 --> 00:45:47,280
But, you know, because they will
pronounce it differently.
875
00:45:47,280 --> 00:45:50,560
So now this brings up something
which I think is interesting.
876
00:45:50,560 --> 00:45:55,440
I find it fascinating in
when when building the model for region
877
00:45:55,920 --> 00:45:56,760
was happening.
878
00:45:56,760 --> 00:46:01,360
The first model is a term that we realized
879
00:46:01,360 --> 00:46:06,000
early on that
we had to rectify, which is accent bias.
880
00:46:06,960 --> 00:46:07,880
If a model
881
00:46:07,880 --> 00:46:12,200
is educated
on too many of a specific accents,
882
00:46:12,200 --> 00:46:15,360
then it doesn't understand how to deal
883
00:46:15,360 --> 00:46:18,640
with other pronunciations or accents.
884
00:46:18,640 --> 00:46:21,640
So initially, early on,
885
00:46:22,040 --> 00:46:26,440
an Australian accent was an anomaly,
and some of the first tests we did
886
00:46:26,800 --> 00:46:30,080
was frustrating
because the Americans would sound amazing,
887
00:46:30,320 --> 00:46:33,000
the Brits would sound amazing,
but any time I tried it,
888
00:46:33,000 --> 00:46:36,000
it would sound glitchy as hell
and we worked out.
889
00:46:36,400 --> 00:46:40,200
It just doesn't understand
this half drunk way we talk.
890
00:46:40,560 --> 00:46:43,920
Hey, careful when
891
00:46:45,200 --> 00:46:46,240
we're not half drunk.
892
00:46:46,240 --> 00:46:49,480
It's like we tend to kind
of take shortcuts with the way we speak.
893
00:46:50,040 --> 00:46:55,520
So we had to kind of make sure that we
spent a lot of time avoiding accent bias.
894
00:46:55,520 --> 00:46:59,840
So this is only for English
at this English language.
895
00:46:59,840 --> 00:47:04,360
At this point no works with Hebrew, works
with French, German, Spanish, Japanese.
896
00:47:05,240 --> 00:47:06,800
But it is language specific.
897
00:47:06,800 --> 00:47:07,440
Basically.
898
00:47:07,440 --> 00:47:10,600
No, it's human voice specific.
899
00:47:10,600 --> 00:47:12,240
Yes. It's to understand it.
900
00:47:12,240 --> 00:47:15,240
So so if like does it work with pig Latin?
901
00:47:15,920 --> 00:47:16,640
I have no idea.
902
00:47:16,640 --> 00:47:20,480
I, I it's not high on my list
of priorities, to be honest.
903
00:47:21,440 --> 00:47:22,760
Not trained on pig Latin.
904
00:47:22,760 --> 00:47:25,040
Yet Robert Young recorded this thing.
905
00:47:25,040 --> 00:47:29,040
Like, if it's like some unknown language
that it's never, ever heard before,
906
00:47:29,400 --> 00:47:32,920
here's here's the key,
here's the key with voice region.
907
00:47:33,360 --> 00:47:35,400
If you're putting a file into voice
region,
908
00:47:35,400 --> 00:47:40,640
any in the language that you speak,
you can understand the words
909
00:47:40,640 --> 00:47:43,200
that are coming
out of the speaker's mouth.
910
00:47:43,200 --> 00:47:46,440
Regardless of the noise level,
we have a chance
911
00:47:46,440 --> 00:47:48,720
of saving that file for you.
912
00:47:48,720 --> 00:47:53,880
If there are words in that file
that you cannot comprehend yourself,
913
00:47:53,880 --> 00:47:57,400
then our chance goes down by about 80%.
914
00:47:57,640 --> 00:47:58,280
Okay, yeah.
915
00:47:58,280 --> 00:48:01,200
Is this is this forensics proof?
916
00:48:01,200 --> 00:48:05,400
Like, like, could someone go to court
and be like, here's this awful recording.
917
00:48:05,400 --> 00:48:07,920
But look, he said I murdered her.
918
00:48:07,920 --> 00:48:11,160
And then voice regions like, perfect,
like their cat.
919
00:48:11,160 --> 00:48:12,480
You hear that? 100%.
920
00:48:12,480 --> 00:48:14,400
And it's like this. Yeah, absolutely.
921
00:48:14,400 --> 00:48:16,080
Yeah. We're not making up. We're not.
922
00:48:16,080 --> 00:48:19,080
There's no point in the system
where we make stuff up.
923
00:48:19,280 --> 00:48:21,680
We regenerate what the system understands.
924
00:48:21,680 --> 00:48:25,480
So a word that is really garbled
because there was a technical,
925
00:48:26,720 --> 00:48:27,280
a word that
926
00:48:27,280 --> 00:48:30,280
is really garbled is still going
to stay really garbled.
927
00:48:30,440 --> 00:48:31,560
Yeah, yeah.
928
00:48:31,560 --> 00:48:34,840
So one of the early cases
and because, I mean, one of the things
929
00:48:34,840 --> 00:48:38,920
which we did is we spent a lot of time
throwing files at it
930
00:48:38,920 --> 00:48:42,680
that were really, really challenging,
like two people
931
00:48:42,800 --> 00:48:48,000
filmed on an iPhone, but they're 30ft away
on the edge of a cliff with wind.
932
00:48:48,400 --> 00:48:50,680
Once talking, one's further away.
933
00:48:50,680 --> 00:48:53,000
She's yelling back at him with wind.
934
00:48:53,000 --> 00:48:53,880
Yeah, you've got the wind.
935
00:48:53,880 --> 00:48:56,880
And we learned very quickly the
936
00:48:56,960 --> 00:48:59,440
if you're listening to a file
and you're listening to the audio
937
00:48:59,440 --> 00:49:02,040
and you can't understand
what they're saying,
938
00:49:02,040 --> 00:49:05,560
then our chance of getting
there is a lot lower.
939
00:49:05,560 --> 00:49:06,800
It's not every case.
940
00:49:06,800 --> 00:49:09,240
There are some. There we go. Oh,
that's what she said.
941
00:49:09,240 --> 00:49:12,000
But most of the time it's at that point
942
00:49:12,000 --> 00:49:16,800
where the audio dialog frequencies
are so mixed down
943
00:49:16,800 --> 00:49:20,680
into the noise frequencies
that we can't regenerate
944
00:49:20,680 --> 00:49:23,480
because we have nothing there
to work with.
945
00:49:23,480 --> 00:49:27,480
So what it does not do,
it does not interpolate.
946
00:49:27,960 --> 00:49:30,720
No, it's not using context.
947
00:49:30,720 --> 00:49:34,000
It's not, it's not,
it's not, it's not there's no kind of
948
00:49:34,040 --> 00:49:37,040
decoding the language
and re encoding the language.
949
00:49:37,120 --> 00:49:41,200
It's just decoding the voice,
the human voice in there and then
950
00:49:41,200 --> 00:49:45,080
performing the voice in the examples
I have on the landing page on the website.
951
00:49:45,080 --> 00:49:46,920
A very intentional.
952
00:49:46,920 --> 00:49:52,000
They're all bad quality,
but they're also all very if you listen
953
00:49:52,000 --> 00:49:55,000
to them, you can understand the words,
but you're like, oh, that's terrible.
954
00:49:55,120 --> 00:49:56,240
Yeah.
955
00:49:56,240 --> 00:49:58,960
The and the reason for
that is exactly what I just said.
956
00:49:58,960 --> 00:50:01,960
It's like there is no point in the system
where,
957
00:50:02,040 --> 00:50:05,200
we just guess because that's,
958
00:50:05,920 --> 00:50:09,800
you know, we already have enough of that,
which in other AI systems,
959
00:50:09,800 --> 00:50:13,200
it's like what I did was we
we created a model
960
00:50:14,120 --> 00:50:17,520
that can bring back
nearly irretrievable audio.
961
00:50:17,920 --> 00:50:20,960
But it's also not going to happen
in every single case.
962
00:50:20,960 --> 00:50:25,040
There are some cases, obviously,
where people put in audio and go, yeah,
963
00:50:25,040 --> 00:50:25,640
it's, you know,
964
00:50:26,760 --> 00:50:27,320
so is a
965
00:50:27,320 --> 00:50:32,240
future possible pro version or an advanced
version would be to literally regenerate
966
00:50:32,240 --> 00:50:36,280
a word that was totally garbled
so the user could type in the word foxy
967
00:50:36,280 --> 00:50:39,280
and they will regenerate the word
foxy from that.
968
00:50:39,560 --> 00:50:41,400
It's been it it's it's been discussed.
969
00:50:41,400 --> 00:50:45,280
One of the things that comes up with this,
that is a big problem for me,
970
00:50:45,840 --> 00:50:48,360
because as a product manager,
one of the things that you're thinking
971
00:50:48,360 --> 00:50:49,920
isn't is something possible.
972
00:50:49,920 --> 00:50:52,920
You're thinking, is this possible in a way
973
00:50:53,280 --> 00:50:58,200
that the result will be something
that the user will take for
974
00:50:58,200 --> 00:51:01,560
granted, is something that happened
and can move on with their life
975
00:51:01,840 --> 00:51:03,360
without worrying about it?
976
00:51:03,360 --> 00:51:05,840
And my answer at this point in
time is still no,
977
00:51:05,840 --> 00:51:08,360
because what you're doing
is you're saying, okay, right.
978
00:51:08,360 --> 00:51:12,320
So we have to, first of all, clone
a certain amount of the voice
979
00:51:12,320 --> 00:51:15,480
in between, around the word
that's illegible.
980
00:51:15,840 --> 00:51:20,320
We then have to make sure that, okay,
they're typing in a word.
981
00:51:20,320 --> 00:51:21,840
That's what the word was.
982
00:51:21,840 --> 00:51:25,160
We have to get that voice to
then say that we then have to get it
983
00:51:25,160 --> 00:51:29,400
in the right intonation,
the right emotion, the right,
984
00:51:29,400 --> 00:51:34,520
the right quality to fit the rest
of the regenerated audio around it.
985
00:51:34,800 --> 00:51:38,560
And if you're creating one word, it's
actually a lot harder than it sounds
986
00:51:38,560 --> 00:51:39,360
to create one.
987
00:51:39,360 --> 00:51:41,400
That's average quality.
988
00:51:41,400 --> 00:51:46,440
That's, even with the rest of it than it
is to create one that's perfect quality.
989
00:51:46,800 --> 00:51:49,920
Yeah, it's a different product, really.
990
00:51:50,280 --> 00:51:53,280
It's I mean, it's adding
a feature to the product.
991
00:51:53,480 --> 00:51:55,080
I'm going to drop in an old school
solution.
992
00:51:55,080 --> 00:51:57,920
Remember in the day when if there was.
993
00:51:57,920 --> 00:52:01,080
And it can only work
if it's a small like one word problem,
994
00:52:01,560 --> 00:52:04,560
but you would get a male or a female,
995
00:52:05,120 --> 00:52:07,200
not the same voice to do it.
996
00:52:07,200 --> 00:52:08,160
And then you would cut it
997
00:52:08,160 --> 00:52:11,320
in and it's so fast that the mind doesn't
actually perceive it.
998
00:52:11,320 --> 00:52:14,120
Yeah. Who were the people that were best
at doing that?
999
00:52:14,120 --> 00:52:15,720
It was always the musicians
1000
00:52:15,720 --> 00:52:18,680
because the musicians
could follow the pitch of the person.
1001
00:52:18,680 --> 00:52:21,560
Yeah, I done it a few times.
1002
00:52:21,560 --> 00:52:25,000
I'll be honest to save things I can't
remember and I never got paid for it.
1003
00:52:25,000 --> 00:52:27,160
Like like The Voice.
I'm sure Opie could do it.
1004
00:52:27,160 --> 00:52:29,520
And I know Paul Davies and I used to do it
all the time.
1005
00:52:29,520 --> 00:52:30,360
It'd be 105.
1006
00:52:30,360 --> 00:52:32,320
Yeah, yeah, I used to do it.
1007
00:52:32,320 --> 00:52:35,240
Change. Promise.
I've done this, Robert. Where?
1008
00:52:35,240 --> 00:52:38,640
He's like, the audio is like chopped
the end of a word off.
1009
00:52:38,640 --> 00:52:38,960
And I've
1010
00:52:40,200 --> 00:52:41,040
recorded the end of that.
1011
00:52:41,040 --> 00:52:42,120
The rest of it, I mean.
1012
00:52:42,120 --> 00:52:43,320
And you would.
1013
00:52:43,320 --> 00:52:45,840
You can't pick it in the old days
if you actually,
1014
00:52:45,840 --> 00:52:48,000
if you dropped in
and you dropped the end off
1015
00:52:48,000 --> 00:52:50,320
like an ING off
the end of a word or something.
1016
00:52:50,320 --> 00:52:51,000
Yeah.
1017
00:52:51,000 --> 00:52:55,120
My I know my daughter can can match pitch
and I can supernatural way when my
1018
00:52:55,200 --> 00:52:57,200
when my girlfriend
first introduced herself.
1019
00:52:57,200 --> 00:53:02,200
My girlfriend's name is Firoozeh
and my daughter immediately picked it up
1020
00:53:02,200 --> 00:53:06,240
and I said, I said, hey, honey,
do you want to do a drawing for fear?
1021
00:53:06,240 --> 00:53:11,400
Issa and my daughter immediately says,
it's not fear as a dad, it's Firoozeh.
1022
00:53:11,920 --> 00:53:14,800
And she like immediate,
like a tape recorder
1023
00:53:14,800 --> 00:53:18,080
played back, you know, in her brain
the way it's supposed to sound.
1024
00:53:18,080 --> 00:53:19,800
And I was like, Holy crap.
1025
00:53:19,800 --> 00:53:23,640
Yeah, you know, my 12 year
old can playback audio from her brain.
1026
00:53:23,880 --> 00:53:26,040
That's the key.
That's the key to her being a mimic.
1027
00:53:27,360 --> 00:53:28,920
Yeah, you can mimic stuff.
1028
00:53:28,920 --> 00:53:32,160
Like I remember going to,
Because my other half,
1029
00:53:32,160 --> 00:53:35,000
she speaks Italian, and it was like,
you should learn Italian
1030
00:53:35,000 --> 00:53:38,440
so we can when we go there, we can both,
you know, communicate.
1031
00:53:38,440 --> 00:53:39,600
And I was like, oh, all right. Okay.
1032
00:53:39,600 --> 00:53:42,000
So I got, you know,
1033
00:53:42,000 --> 00:53:43,960
convinced I would, go and do this thing.
1034
00:53:43,960 --> 00:53:48,600
So I went to do my first Italian lesson
in Melbourne, and,
1035
00:53:48,600 --> 00:53:52,080
I finish the thing, they go, oh, well,
you know, you were really good.
1036
00:53:52,080 --> 00:53:55,000
You're really good.
And I was just mimicking.
1037
00:53:55,000 --> 00:53:58,280
But when it came to actually building
structure and knowing what the words meant
1038
00:53:58,280 --> 00:54:00,600
and how you put them all together,
I had no fucking idea.
1039
00:54:00,600 --> 00:54:03,840
So I'm just going to ask JP,
you know, to like an AI.
1040
00:54:03,840 --> 00:54:06,960
So to learn to, to learn Italian
in Melbourne, what do you do?
1041
00:54:06,960 --> 00:54:09,400
You just go to a restaurant
on like Elm Street or something.
1042
00:54:09,400 --> 00:54:10,920
And do you know what?
1043
00:54:10,920 --> 00:54:13,840
You know where the Italian school
was like on the street? Yeah,
1044
00:54:14,920 --> 00:54:17,560
mom. I mean, yeah,
1045
00:54:17,560 --> 00:54:19,560
exactly.
1046
00:54:19,560 --> 00:54:21,440
Just order an appetizer and just. Wow.
1047
00:54:21,440 --> 00:54:22,240
It's cool.
1048
00:54:22,240 --> 00:54:23,000
Yeah, yeah, yeah.
1049
00:54:23,000 --> 00:54:26,040
So just go into your lesson,
go faster and stuff after.
1050
00:54:26,040 --> 00:54:26,320
Yeah.
1051
00:54:26,320 --> 00:54:28,200
Well, you know,
I've been getting more and more teaching
1052
00:54:28,200 --> 00:54:31,640
production for podcasters and creators,
so I'll definitely be
1053
00:54:32,160 --> 00:54:36,000
mentioning this as a tool, you know, and,
and letting people know to try it out
1054
00:54:36,320 --> 00:54:40,080
because I have, you know, I'm working
more with corporate, you know, creators.
1055
00:54:40,080 --> 00:54:42,360
That's another whole, you know, category.
1056
00:54:42,360 --> 00:54:44,840
You know, people
that do content for corporate.
1057
00:54:44,840 --> 00:54:48,720
So C-suite, you know, let's let's talk
for a second just quickly about that.
1058
00:54:48,720 --> 00:54:52,480
So for example, somebody who's done
a course that they've recorded
1059
00:54:52,480 --> 00:54:56,240
their screen and the little hover over
camera in the loom or something like that,
1060
00:54:56,880 --> 00:55:01,200
but the camera is a webcam and their
microphone is their MacBook or something.
1061
00:55:01,640 --> 00:55:05,840
It's like literally if you're lucky,
lucky that, would you if you're lucky.
1062
00:55:05,840 --> 00:55:09,840
But just literally drag the course
video onto voice region
1063
00:55:10,000 --> 00:55:12,000
hit process and you're done. Yeah.
1064
00:55:12,000 --> 00:55:13,440
No. Quick question for you.
1065
00:55:13,440 --> 00:55:15,720
Where can people go and check this out.
1066
00:55:15,720 --> 00:55:19,680
Waves.com/voice region.
1067
00:55:20,640 --> 00:55:22,200
Or just type
1068
00:55:22,200 --> 00:55:26,320
in voice region waves into your, Google
1069
00:55:26,320 --> 00:55:30,400
or whatever
you want and go off and find it useful.
1070
00:55:30,600 --> 00:55:34,920
This could be, this could be an integral
part of my, road case.
1071
00:55:34,920 --> 00:55:38,400
Well, I think I was going to suggest that
even if you're not a content creator,
1072
00:55:38,400 --> 00:55:40,680
if you're a voiceover artist going,
nah, never needed.
1073
00:55:40,680 --> 00:55:42,480
Just do yourself a favor.
1074
00:55:42,480 --> 00:55:45,840
Go and just have a play with it
because you'll be blown away.
1075
00:55:45,880 --> 00:55:50,160
And one of the reasons
that I put the teleprompter in there and,
1076
00:55:50,160 --> 00:55:53,280
is because of voiceover
1077
00:55:53,560 --> 00:55:56,960
artists like UAP, and also,
1078
00:55:57,400 --> 00:56:01,160
how often you find yourself in a position
when you're,
1079
00:56:01,560 --> 00:56:06,520
you're not with your rig and you need to
record something and you need it to get it
1080
00:56:06,520 --> 00:56:09,920
back to the station or to the studio,
and have a way to do it.
1081
00:56:10,560 --> 00:56:14,480
Now, in that case, it's not taking
a producer's job away from them.
1082
00:56:14,520 --> 00:56:18,840
What it does is it gives you, Robbo,
less work to do at the other end
1083
00:56:19,200 --> 00:56:20,880
than you would have had to.
1084
00:56:20,880 --> 00:56:26,000
And that's also why I put a a teleprompter
script reader in there with zero scroll.
1085
00:56:26,280 --> 00:56:30,360
Because if you're pasting in a 32nd
script, then you know you don't need it
1086
00:56:30,400 --> 00:56:30,840
scroll.
1087
00:56:30,840 --> 00:56:34,040
You just need it in front of your face
or additions to it
1088
00:56:34,200 --> 00:56:37,960
if you got one, you I added a restaurant,
for example, or something like that.
1089
00:56:37,960 --> 00:56:39,320
You could just jump in the bathroom
1090
00:56:39,320 --> 00:56:43,080
on your phone and record it, stick
it up, do it and send it straight back.
1091
00:56:43,560 --> 00:56:48,160
You could, but I wouldn't because I
don't think it's it gets rid of slurring.
1092
00:56:48,520 --> 00:56:49,360
I'm guessing
1093
00:56:50,560 --> 00:56:51,640
the red wine.
1094
00:56:51,640 --> 00:56:52,280
Yeah. Yeah.
1095
00:56:52,280 --> 00:56:54,760
That's not good.
That's not going to cut it. See.
1096
00:56:54,760 --> 00:56:58,560
You see that's the other thing is like
I worked very hard to try and make sure.
1097
00:56:58,560 --> 00:57:00,520
And I did a lot of the coding for the,
1098
00:57:00,520 --> 00:57:03,400
for the,
for the teleprompter here in Brisbane.
1099
00:57:03,400 --> 00:57:07,680
I worked very hard
to try and make sure that the scroll,
1100
00:57:07,720 --> 00:57:10,720
especially on a mobile
phone, was not jittery
1101
00:57:11,080 --> 00:57:13,960
and it's slow enough
that you can look at it on a,
1102
00:57:13,960 --> 00:57:17,600
you know, normal phone,
which is like a tablet these days
1103
00:57:18,040 --> 00:57:23,680
and be able to read things, without
constantly moving your eyes up and down.
1104
00:57:24,120 --> 00:57:25,080
I'm really proud of this.
1105
00:57:25,080 --> 00:57:28,560
It's like and so and if anybody out
there as well is interested
1106
00:57:28,560 --> 00:57:32,480
in using an API version of this
for your own service,
1107
00:57:32,480 --> 00:57:36,120
reach out to these wonderful hosts
and they'll get in contact with me.
1108
00:57:36,440 --> 00:57:38,720
That picture on the floor behind you.
1109
00:57:38,720 --> 00:57:41,280
Lichtenstein.
Yeah. Can you can you do me a favor?
1110
00:57:41,280 --> 00:57:42,960
Can you hang the fucking thing up?
1111
00:57:42,960 --> 00:57:46,400
Know how much it frustrates me
to see looking at that picture
1112
00:57:46,720 --> 00:57:49,080
just sitting on the floor?
Because it's really cool.
1113
00:57:49,080 --> 00:57:51,160
It is really cool,
but it's also really heavy.
1114
00:57:51,160 --> 00:57:54,240
It's actually, a wooden cargo,
1115
00:57:55,000 --> 00:57:59,320
slat crate that, was painted on.
1116
00:58:00,560 --> 00:58:01,760
So if I hang
1117
00:58:01,760 --> 00:58:05,560
this or if I hang this on the wall,
it's it'll bring the wall down.
1118
00:58:05,560 --> 00:58:06,960
So, yeah,
1119
00:58:06,960 --> 00:58:09,240
I looked at the picture of the girl
and of course,
1120
00:58:09,240 --> 00:58:11,000
the context of what we're doing here.
1121
00:58:11,000 --> 00:58:14,160
Podcasting, I thought, was what it's like.
1122
00:58:14,480 --> 00:58:17,480
It's having.
1123
00:58:17,880 --> 00:58:19,960
Yeah, I know it is.
1124
00:58:19,960 --> 00:58:20,760
But I like her hair.
1125
00:58:20,760 --> 00:58:22,960
Her hair.
1126
00:58:22,960 --> 00:58:24,320
Now I'm looking at it. Now.
1127
00:58:24,320 --> 00:58:25,840
I won't be able to unsee that.
1128
00:58:25,840 --> 00:58:26,520
You can't.
1129
00:58:26,520 --> 00:58:28,200
You can't unsee it. See it?
1130
00:58:28,200 --> 00:58:28,720
That's right.
1131
00:58:28,720 --> 00:58:31,720
When she said, sorry, man. So.
1132
00:58:32,320 --> 00:58:33,360
Well, that was fun.
1133
00:58:33,360 --> 00:58:36,600
Is it over the front? Audio. Sweet.
1134
00:58:36,800 --> 00:58:40,720
Thanks to Driver and Austrian audio
recorded using Source
1135
00:58:40,720 --> 00:58:44,560
Connect, edited by Andrew Peaches
and mixed by Robert God.
1136
00:58:44,560 --> 00:58:45,840
Your own audio issues,
1137
00:58:45,840 --> 00:58:49,440
just ask Robert, echoing tech support
for George the Tech with.
1138
00:58:49,560 --> 00:58:51,360
Don't forget to subscribe to the show
1139
00:58:51,360 --> 00:58:54,680
and join the conversation
on our Facebook group to leave a comment,
1140
00:58:54,760 --> 00:58:58,520
suggest a topic or just say goodbye,
drop us a note at our website.
1141
00:58:58,560 --> 00:59:00,400
Rodeo swedish.com.












