Speach to Text Software? at DVinfo.net

Go Back   DV Info Net > The Tools of DV and HD Production > All Things Audio

All Things Audio
Everything Audio, from acquisition to postproduction.

Thread Tools Search this Thread
Old August 26th, 2010, 10:07 PM   #1
Major Player
Join Date: Apr 2010
Location: Colorado Springs, CO
Posts: 230
Speach to Text Software?

I have some audio interviews that I would like to convert to text so that I can use them in a verity of different documents.

Any recommendations?
Chris Sgaraglino
Widow Creek
Chris Sgaraglino is offline   Reply With Quote
Old August 27th, 2010, 12:42 AM   #2
Join Date: Oct 2009
Location: Mesa, AZ
Posts: 1,384
Nuance - Dragon Naturally Speaking

I have the iphone version and it's super accurate. My aunt has been using it on her XP machine for years and raves about it.
A7RII, C100, 1Dx, 5Dmk3, 70D, Kessler goodies, Adobe, Pro Tools and more!
Robert Turchick is offline   Reply With Quote
Old August 27th, 2010, 09:59 AM   #3
Join Date: Sep 2004
Location: Bristol, CT (Home of EPSN)
Posts: 1,173
I use Drago for writing, especially first drafts. It's great once you understand how it works and adjust your speech pattern slightly.

The Dragon Iphone app is my favorite Iphone app.
Paul Cascio
Paul Cascio is offline   Reply With Quote
Old August 28th, 2010, 01:52 PM   #4
Major Player
Join Date: Mar 2010
Location: Red Lodge, Montana
Posts: 889
I haven't tried the iPhone version but thought it was primarily a dictation app. (Also, we don't yet have iPhone coverage in the area where I live and work.) I have worked a bit with the Nuance/Dragon apps but found their usefulness was limited to dictation (where you can train them to your voice). Never had much luck with it for translating/transcibing meetings, interviews, etc.

But, if you've got Adobe's Premiere Pro CS4 or CS5 or Soundbooth CS4 or CS5, you've got a speech-to-text program in their "metadata speech analysis function" which can do a decent job with interview transcription.

The operative word is "may." I do a fair amount of work for lawyers. This sometimes involves video of depositions and sometimes mp3 or wav audio recordings of meetings and interviews. I've used the CS4/CS5 program when they've needed something so immediately --- say, when we are on the road or its at the end of the day and the secretaries have gone home and the lawyers need something to study that evening.

I import the audio or video file into CS5, put it on the timeline, click on the audio track, go to the source monitor window and click on the Metadata tab, and then click the "Analyze button." The last time I did this, it transcribed about 6 hours of deposition into a text file in about an hour.

This could not replace a stenographer's transcription, but it was good enough to be usable for what the lawyer needed.

Adobe Speech Analysis does a very good job in distinguishing one deposition/interview voice from another for purposes of separating questions and answers. It gives you a script where it identifies the speaker (with a label/name) gives a text of what was said. The response/answer is identified as coming from another speaker. Although it does a good job of separating the text from different speakers, sometimes it identifies the same voice as a different speaker.

It has trouble with technical terms.

As you would expect, much depends on the clarity of the recording, the absence of ambient noise, and timbre of the speaker's voice.

Accents can throw it for a loop. I think of an example of this kind of thing from a Prairie Home Companion/Guy Noir episode. Guy is in North Carolina looking for a town called Boiling Springs. Minnesotan Guy (using a sort of New York accent) pronounces it "boyle-ng" but the North Carolina locals say "bye-lyn." It is hard enough for us to figure out what was meant when a Texan pronounces "cookie" with four syllables or a Minnesotan says, "you betcha' ... well, you get the picture.

So, "usable" and "accurate enough to immediately drop into a publishable document" can be different things. Proofreading and corrections are a must. The system can learn and can work with "reference scripts." Say, you have several interview sessions with the same speaker. You use the corrected first text conversion as a "reference script" for helping with accurancy in translating later interviews.

Actually, this software reminds me of what is was like with PC-based OCR from 15 and 20 years ago. Sometimes, the results were excellent. Sometimes, the scan worked well enough to be usable. Sometimes you were better off just typing the thing from scratch.

Hope this helps.
Jay West is offline   Reply With Quote
Old August 29th, 2010, 07:32 AM   #5
Major Player
Join Date: Aug 2007
Location: El Cerrito, CA
Posts: 266

my two cents:
- used Nuance's Dragon software with barely acceptable results;
not a bad product at all, but it's a dictation software, hence it needs to
be fine-tuned to ONE particular voice; you can't expect great results if used
with a variety of voices;
- switched to outsourcing the task to transcription services:
I get fairly accurate transcriptions for around $1.55/minute of recorded audio
(the company I outsource to charges between $1.55 and $3.20,
depending on sound quality, accents, interview style, etc.).
Hope this helps.

All the best

bricioledamerica.blogspot.com (in Italian)
Vasco Dones is offline   Reply

DV Info Net refers all where-to-buy and where-to-rent questions exclusively to these trusted full line dealers and rental houses...

Professional Video
(800) 833-4801
Portland, OR

Omega Broadcast
(512) 251-7778
Austin, TX

(973) 335-4460
Mountain Lakes, NJ

Abel Cine Tech
(888) 700-4416
N.Y. NY & L.A. CA

(800) 238-8480
Glendale, CA

Precision Camera
(800) 677-1023
Austin, TX

DV Info Net also encourages you to support local businesses and buy from an authorized dealer in your neighborhood.
  You are here: DV Info Net > The Tools of DV and HD Production > All Things Audio

Thread Tools Search this Thread
Search this Thread:

Advanced Search



All times are GMT -6. The time now is 07:16 PM.

DV Info Net -- Real Names, Real People, Real Info!
1998-2015 The Digital Video Information Network