View Full Version : voice recognition systems - Help!!


Rhys Carter
November 22nd, 2009, 05:07 AM
i am considering buying a voice recognition programme and am looking for any and all the advice i can get. i would need to use it NOT to recognise my voice but to recognise someone elses - here's the deal. i recently shot some interviews with my father & rather than transcribe it all by hand i need it to recognise his voice from playback. I am currently using Edius Neo for editing purposes but am considering upgrading my system (possibly CS4). Any advice would be much appreciated. thanks.

Paul Cascio
November 22nd, 2009, 07:12 AM
Won't work, but here's an alternative.

Dragon Naturally Speaking is the standard amongst voice recognition, it's very good but it's not perfect. You need to speak a bit differently than you normally would to get maximum accuracy. You'll annunciate more and your voice needs to train the system.

My suggestion is that you listen to the interviews through an earphone and repeat them (audio prompting) as you are listening, using DNS. Your results will be much better than using the interview audio, although you may want to try it.

I use DNS for creative and instructional writing. It's great for that, once you learn its idiosyncrocies and get used to composing verbally.

Pete Cofrancesco
November 22nd, 2009, 10:42 AM
I've used Dragon Naturally Speaking and I don't like it. The problem is it takes just as long to go through and fix all the mistakes as it does to simply type it yourself. If you can't type quickly or don't want to do it then pay someone. Go cheap and find high school/college student to do it.

Eric Stemen
November 23rd, 2009, 10:25 PM
I'm also thinking about getting dragon. The basic version is only $45 on their site right now. I just wish I could use a trial first.

Rhys Carter
November 26th, 2009, 12:54 PM
Thanks for your replies guys - indeed i must agree a trial would be ideal.
Having done a little more research into DNS, i have read some positive reviews but have also seen some not so positive.

The main complaint seems to be lack of technical support from the manufacturers.
Also, understandably i suppose is the mistakes it makes - this is inevitable if you ask me as no voice rec system can be 100% correct 100% of the time.

Buyers of version 10 seem to feel left down & they suggest hanging on to version 9 or 9.5 if you already have it.
oh what to do???!

Paul Cascio
November 27th, 2009, 09:59 PM
I've used Dragon Naturally Speaking and I don't like it. The problem is it takes just as long to go through and fix all the mistakes as it does to simply type it yourself. If you can't type quickly or don't want to do it then pay someone. Go cheap and find high school/college student to do it.

Pete, I disagree. DNS is not good for everything, but it's great for getting your ideas into written form. I find that it's even good for later revisions. Consider giving it another chance. If you're patient, learn it idiosyncrocies, and accept the limitations of voice recognition, it's a great creative tool.

Perrone Ford
November 27th, 2009, 10:28 PM
We've used DNS since it's inception. In fact, the creator of the system did a demo for my company when the product was still in beta. Last year, we bought the latest version of it to try and use it as a way to transcribe video for us. That failed miserably. Then we used the audio-prompting and that was better, but only after spending considerable time training it.

It's just not the correct solution. If you REALLY want to get this done properly, with good accuracy, and you don't mind paying a bit of money, PM me, and I'll get you the info you really need to make this work.

In case you are wondering, my company has to be compliant with Federal Rule 508 which dictates that we provide captioning for ANY video we create for public consumption or for use within the company. Essentially, everything I shoot at the office has be be transcribed and captioned. We do this a lot.

Pete Cofrancesco
November 28th, 2009, 05:35 PM
Pete, I disagree. DNS is not good for everything, but it's great for getting your ideas into written form. I find that it's even good for later revisions. Consider giving it another chance. If you're patient, learn it idiosyncrocies, and accept the limitations of voice recognition, it's a great creative tool.
I stand by what I said. For his purposes it even makes less sense because he wants to transcribe someone else's voice. I've taken the time in the past to train it to my voice and the worst thing is it commonly misinterprets an entire phrase. So you might say "I took my dog for a walk" and it will transcribe that as "I draw with chalk". In addition say any non-dictionary word like someone's name, technical term, street, town, abbreviation, short phrase, etc., it spits out the wrong words. When you go back and proof read later its easy to miss because both the grammar and spelling are correct but it has translated something complete different than what was said. If DNS was that accurate and saved time, court reporters would use it, and they don't because its simply better to spend the time to type it correct the first time.

Perrone Ford
November 28th, 2009, 05:48 PM
A properly trained DNS system can be faster than a court reporter on a longhand keyboard, but not even close for a certified reporter on a shorthand machine.

In any event DNS is not the correct solution to this problem.

Pete Cofrancesco
November 28th, 2009, 05:57 PM
A properly trained DNS system can be faster than a court reporter on a longhand keyboard, but not even close for a certified reporter on a shorthand machine.

In any event DNS is not the correct solution to this problem.
That's an odd caveat to add if a certified reporter didn't use shorthand. That's like saying I could out play Tiger Woods at golf if he used a stick instead of a golf club. Besides, even if a court reported used a keyboard they're going to 98+% accurate. Can you make that claim with DNS? Because who cares how fast DNS can translate if its not accurate. That's my point.

Perrone Ford
November 28th, 2009, 06:15 PM
Pete,

I understand your point. However, I also understand some of the limitations of having court reporters do certain work. And in some instances they cannot use their steno machines. Like when they have to feed computer systems that are captioning on the fly. If we can tie them to dedicated CC hardware it's no issue, but that isn't always the case.

Pete Cofrancesco
November 28th, 2009, 07:04 PM
I'm not familiar with CC but I think technology has progressed to allow steno machines to digitally interface. But often ppl come here looking for an impossible cheap solution like I need a good shotgun mic for $50...

I work with court reporters and they often have their steno machine plugged into a laptop which converts the shorthand to ordinary text which then can be sent real time to netbooks for clients to view.

Les Wilson
November 28th, 2009, 08:50 PM
Rhys,
I worked in this field for many years with some of the best technology around. Google me.

Fundamentally, systems that can recognize large vocabularies with accuracy in the 95% and up range are systems that are trained to a person's voice, vocabulary and speaking habits. Even still, the high success rates are achieved with skilled speakers using systems trained to their voice. Yes, inevitably, getting a dictation system to have low errors comes with help from the speaker adapting to the system.

The best systems will only get about 70% accuracy out of the box. Over time, training of the system by reading to it, dictating to it, and telling it the errors it made will raise the accuracy.

If you are willing and your father is able, you can have him go through the training procedure. If your system allows you to run your recorded audio through it, you can then start the process of correcting and feeding back corrections.

Like others have said here, by the time you are done, even with a system running at 90% accuracy, you are fixing one out of every 10 words and you may have been better off doing it yourself. YMMV

Michael Nistler
November 28th, 2009, 09:34 PM
Hmm, I have DNS 10 on my laptop and version 7 on this old PC. Still, I've had good results not knowing much about how to correctly use it. Below is an example reading the text of a prior post. For me, it's an easy read through to make a few minor corrections (I see I don't know the command to capitalize a word - it's not "cap that" but I'm sure I could easily look it up if I had the need). Also, I noticed that when I captured the original text in MS-Word before copying it over here, it caught the typo by the original poster (double words "be be" at the end of the post), but I left it as-is.

So when I've got hours of text to transcribe, I'll continue to use DNS to help. One thing I've learned is not to worry about how DNS is transcribing as you read - just let it do its thing and make the corrections later (much faster workflow).

However, if we're doing a project requiring closed caption for a customer and had the original text/script, I'd be the first to think about using a transciption service.

Regards, Michael

We've used Dragon NaturallySpeaking since its inception. In fact, the creator of the system did a demo for my company when the product was still in beta. Last year, we bought the latest version of it to try and use it as a way to transcribe video for us. That failed miserably. Then we used the audio-prompting and that was better, but only after spending considerable time training at.

It's just not the correct solution. But If You Really want to get this done properly, with good accuracy, and you don't mind paying a bit of money, P. M. may, and all get two the info you really need to make this work.

In case you're wondering, my company has To Be Compliant with Federal Role 508 which dictates that we provide captioning for Any video we create for public consumption or four years within the company. Essentially, everything I shoot at the office has be be transcribed and captioned. We do this a lot.

Eric Stemen
November 29th, 2009, 05:57 PM
Thanks for the example Michael. It looks like that would be pretty quick to go back through and fix the mistakes.