Record to Text ?
I want to be able to record an interview, either in person or over the phone, then somehow have that recording converted to text.
That is Speech-to-Text, (not the other way around).
Google voice is only so-so for this; Plan B is to take MP3 files and have someone in India transcribe them for $10 per hour.
I’d love to find a software solution to this.
Any ideas?



Tweet
Facebook
Reddit
Digg this!





May 13th, 2010 at 5:06 pm
M Hoffer turned me on to this company: http://www.smartcode.com/
They may have something you could use. Also, it wasn’t Dylan who trapped the Beatles in a hemp mesh…it was Hoffer. All those commas are just him eating Doritos!!! : P
May 13th, 2010 at 5:06 pm
http://www.nuance.com/naturallyspeaking/
Nuance is the Market Leader.. http://www.nuance.com/naturallyspeaking/landing/small-business.asp
many other Warez available http://clusty.com/search?input-form=clusty-simple&v%3Asources=webplus&query=Speech+to+Text+Software
if one was interested in ‘Road-Testing’/ doing a compare and contrast, could be a worthwhile project for an intern..
~~~
BR: I have Dragon Naturally Speaking — upgraded both my PC at the office, and the Mac at home, and the damned software thing insists on being retrained. Its frickin exhausting.
May 13th, 2010 at 5:12 pm
learn to type 100wpm? I am going to try nuance myself very soon…
May 13th, 2010 at 5:12 pm
Check with NSA. They’ve been doing speech and voice recognition for some time. But if you want to get even better check into the semantic/intent analysis they have been funding at major universities.
Perhaps all those commercial companies got their start there.
As a last resort backtrack the work Lucent (now Alcatel-Lucent) has been doing. My recent info suggests it has been spun off in a separate corp, just like the micro-cameras they invented.
May 13th, 2010 at 5:12 pm
Ten bucks an hour isn’t such a bad deal; voice-recognition software isn’t going to be flawless anyway.
May 13th, 2010 at 5:15 pm
Arequipa,
do this http://www.smartcode.com/downloads/voice-to-text.html , at least (:
May 13th, 2010 at 5:17 pm
I would agree with Mr. Hoffer. Naturally speaking is the market leader in text to speech.
You will have to clean it up afterwards, but it does a decent job.
Note also: It is designed for a person to train the software for their personal use, so your results would likely be worse for a random person and no training, particularly if an accent is involved.
May 13th, 2010 at 5:31 pm
That is Speech-to-Text, (not the other way around).
May 13th, 2010 at 5:38 pm
Unfortunately innovation in the speech recognition area has been stagnant for many years.
May 13th, 2010 at 5:39 pm
Barry, Windows,, all versions since XP have some pretty good speech recognition built-in,, worth a spin,, from my experience the built-in Windows speech recognition works about as well as the Dragon paid for product..
In fact it is probably worth the experiment to try out the Google speech to text with the mp3 file.. Google keeps getting better and the batch interface may be way better,,,Google seems to have better non-trained recognition.. have not tried though.
May 13th, 2010 at 5:55 pm
Barry, there is a recent article on exactly why you should just send them to India for transcription.
http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition
Best of luck.
May 13th, 2010 at 6:01 pm
Try this, it may be more humorous than accurate, google voice has a voicemail to text feature. Call someone with google voice, pipe the message through and see how well it translates! Let me know if you need an invite.
May 13th, 2010 at 6:08 pm
dolbydog,
that’s a good article, should give insight into why Nuance is so focused on “Doctors”, and “Lawyers”..at the min..
May 13th, 2010 at 6:09 pm
Never used this stuff – but here’s a review of that dragonvoice. Seem to remember IBM was using it.
http://www.consumersearch.com/voice-recognition-software
May 13th, 2010 at 6:13 pm
My job is developing speech recognition systems for large companies. Unfortunately the technology just isn’t there yet for very accurate random untrained voice recognition. It’s very good when you have knowledge of what is going to be said, but for random speech in an uncontrolled setting, it’s still quite a ways off.
If accuracy is important at all, transcription is your only option. If you just need the general gist of the conversation, the google speech to text is probably sufficient if you do it quickly after the interview so you can remember what was said and correct the bad recognitions.
May 13th, 2010 at 6:24 pm
http://en.wikipedia.org/wiki/Speech_recognition
Suspect you’re just slightly ahead of the “good stuff”…probably on it’s way to a Walmart near you using off the shelf DSPs. Surprisingly (?), few must have bought those PC packages…so your first practical speech to text converter may appear embedded within a kids toy?
Military
[edit] High-performance fighter aircraft
Some important conclusions from the work were as follows:
Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.
Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — *with lower recognition rates, pilots would not use the system*.
More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.
May 13th, 2010 at 6:31 pm
Get one of our long term unemployed to do it for $5.
May 13th, 2010 at 6:32 pm
http://www.nch.com.au/scribe/
Aussies usually make good solid stuff. Know nothing about this one, but these guys built the answering machine software I bought 7(?) years ago.
I’m a hardware guy – so entrusting *anything* to software was very difficult. Runs on an old thinkpad 600E and never breaks.
If you can’t get Microsofts latest OS to run without annoyance, find an old copy of Win 2K and do this to it:
http://www.litepc.com/xplite.html
Uses Microsofts own hidden built ins to rip out the bloat. MS becomes very stable.
May 13th, 2010 at 6:34 pm
http://www.nch.com.au/software/dictation.html
PC audio is their specialty.
May 13th, 2010 at 6:36 pm
Send it to me. I’ll do it for a large coffee. :)
May 13th, 2010 at 6:37 pm
If you have the interview in a courtroom, they record it all for you on the taxpayer’s dime!
May 13th, 2010 at 6:38 pm
…and transcribe it too!
May 13th, 2010 at 6:51 pm
What the hell is wrong with you Barry? It’s called HIRE A HOT TRANSCRIPTIONIST.
f*cking guy
May 13th, 2010 at 7:01 pm
Who needs THAT arounnd the office !
More trouble — no thanks.
May 13th, 2010 at 7:18 pm
Barry..I’ll do it for $9.50/hr. Cdn, dollars and a signed copy of your book—-oh, and a mention in your blog.
May 13th, 2010 at 7:42 pm
You really can’t find a transcriptionist in the USA who would do this for $10/hour? I find that hard to believe in this economy. If true, then this country truly is doomed.
May 13th, 2010 at 7:53 pm
er, there’s an app for that?
I think it’s called jott but it probably sucks for anything more complicated than run fido run but what do I know…
May 13th, 2010 at 8:14 pm
Dragon / IBM is what’s used for half closed-captioning TV services, the other half being live trained operator using courtroom-style Stenotype to captions equipments.
I installed a Dragon system and the tweak was to trash the original crappy mike that comes with, and use a decent omni dynamic mike with some compression and bandwidth limiting from 400 Hz. to around 8 KHz. It gets it right around 95% of the time.
May 13th, 2010 at 8:21 pm
“Dragon dictation” app for iPhone or iPod Touch/iPad?
It’s free & it’s pretty accurate.
May 13th, 2010 at 8:29 pm
The Nuance products (speech to text) are speaker dependent so limited use in this situation. More speaker independent tech is coming out regularly-mostly for telecom voice access apps.
Try you question at SpeechTek magazine. They are on top of the software apps for speech.
Speech.Technology@emediapro.com
May 13th, 2010 at 8:38 pm
I haven’t tried it myself, but I know people that have used Amazon Mechanical Turk for this. You take your mp3, split it into short segments, and offer a small amount for each translation. You do each segment several times to avoid errors. It works out to be cheaper than the $10 per hour.
May 13th, 2010 at 9:10 pm
It hasn’t gotten any better in the 20 years since I worked for a voice recognition company? Huh. Would’ve thought we could have solved this one by now….
Oh well, just like all that artificial intelligence software I worked on, I suppose! ;^)
At least the Internet worked out well.
May 13th, 2010 at 9:19 pm
Before you go for DragonDictate, you might want to see Brad DeLong’s current post:
http://delong.typepad.com/sdj/2010/05/the-beatings-will-continue-until-morale-improves-dragondictate-for-iphone-department.html
I have no experience, so I have no further comment :-)
May 13th, 2010 at 9:48 pm
I gave up on dragon years ago and haven’t heard that it’s gotten substantially better. I vote for India, or maybe someone wearing a sandwich board who’d do it for little more.
May 13th, 2010 at 9:51 pm
I use a service called copytalk, you dial, play your tape, speak etc and you get an email back.
http://www.copytalk.com/mobilescribe.po?
May 13th, 2010 at 10:32 pm
This is a pretty great VM service…
http://www.simulscribe.com/
May 13th, 2010 at 10:48 pm
filter out everyone who does not have first-hand experience and does not currently use software to do this … it’s a pretty small set of responders.
Myself, I take this to mean that the technology has not yet arrived in a professional sort of way. When you think about all the peripheral recognition skills that go into speech recognition, and the sophisticated semantic analysis that goes into resolving homynyms, it’s not surprising. This is a VERY difficult problem.
If software was able to accomplish speech recognition (with an acceptably low error rate), we would have perfect grammar checkers — and yet we do not.
If a certain amount of errors are acceptable, go ahead an try out the best software packages recommended — but if you don’t want to have to manually review the result, playing the audio while you read it back, stick with a quick-response web service with a human being at the other end of the connection.
Some things are still best done by the puny humans.
May 14th, 2010 at 12:32 am
Mac Speech Dictate
is 100% improvement over dragon
Don’t know if it’ll do untrained voices
but its exact, where Dragon Dictate fails
May 14th, 2010 at 1:57 am
Whoa, 10$ per hour? Don’t go to India, in Germany you will find lots of qualified people who work for a lot less.
May 14th, 2010 at 7:12 am
I looked into this a few years ago and wasn’t really satisfied with price, quality and turnaround time so I abandoned the project. It will get better with time, but technology isn’t there yet.
If you need a relatively quick turnaround and are willing to pay up, check out:
http://www.speak-write.com
You can phone in your dictation, upload recordings or download iPhone or Android apps for it and they get your work back to you in under 3 hours. They’ll charge you 1-2 cents per word.
May 14th, 2010 at 7:22 am
BR, sadly, dancin has it right.
About a year ago I wanted to do essentially what you describe for a very large project involving customer feedback for a huge multinational corporation. We investigated and tried about every readily available option and in then end it had to be done the brute force way – transcription and even that was not as simple as it sounds. Speech-to-text conversion for random speech isn’t yet capable of dealing with the ginormous multitude of dialect variations. Even our transcribers had difficulty with the wide range of dialects seen in the U.S. alone, and when we threw in non-American native speakers it got really interesting. Add to it the complications of a recorded session and the loss of fidelity plus added noise and it becomes about near impossible. We even tried STT conversion as a first step and then have a transcriptionist “clean it up” thinking it would save time. The transcriptionists went nuts and all told us it would take them less time to just do it from scratch, brute force. The best deal we came up with was using college and talented high-school co-ops for about the same as what you quote for India. Sourcing your transcription locally will give you much better control over the product (meaning less do-over and polishing when you get it – and it WILL need polishing even with good transcription), and I suspect you can find some very capable talent via local education co-op programs.
May 14th, 2010 at 8:17 am
Try Amazon’s Mechanical Turk Community – for a reasonable rate you can have the audio transcribed several times and use the compare feature in a word processor to confirm an accurate transcription. You can set qualifications for those who want to do your work, and over time, you will develop a group of “turkers” who will do your work with the precision you want. Often these are college students who can’t work a fixed schedule or live in college towns with little available employment.
https://www.mturk.com
User Community
http://turkers.proboards.com/index.cgi?
Worth a few minutes of your time
May 14th, 2010 at 9:23 am
the industry leader (when i looked into it a long time ago) is NUANCE
http://www.nuance.com/naturallyspeaking/products/product-comparison.asp
i used to have their stock (NUAN), hard to sleep w/ a PE of 450….
May 14th, 2010 at 10:39 am
India would be more like 2 bucks per hour. For $10, I’m sure you could find a ton of people in the US.
BTW, the nuance/dragon software sucks.
Seeking alpha puts up earnings call transcripts in a matter of a few hours. They are generally very accurate. I don’t know what technology they use but I’m sure you could pull a few strings to find out.
May 14th, 2010 at 8:03 pm
Dragon Speech Recognition Family of Products
Turn Talk into Type
Most people speak over 120 words per minute but type less than 40 words per minute. What if you could create email, documents and spreadsheets simply by speaking? What if you could control your PC just by talking to it? This includes launching applications, opening files, managing e-mail and working on the Web — all by voice.
With speech recognition software from Nuance Communications, you can turn your voice into text three times faster than most people type. Just start talking, and the software will recognize your voice instantly, delivering up to 99% accuracy as soon as you get started. Accuracy will continually improve the more you use the software.
It’s easy to get started with speech recognition, whether you’re using a PC or a Mac. Each edition of our speech recognition software delivers the same fast and accurate transcription of spoken words. But some editions include more advanced features to make interacting with your computer – regardless of whether it’s a PC or a Mac — easier than ever.
May 14th, 2010 at 11:45 pm
amy goodman at democracy now ! has been providing very good quality rush transcripts of her interviews for years – you should ask her who she gets to do it… here’s a link to her site
http://www.democracynow.org/
and a link to her contact info which has a NYC phone
http://www.democracynow.org/contact
May 16th, 2010 at 10:11 am
Why would you send this to India for $10 an hour when you can easily hire an undergrad to do it for $8? I know several undergrads who have taken informal jobs like this (as an added bonus, they were all easy on the eyes too!). Just craigslist it, or go to your local university and ask them if they have a jobs board, or ask a friend if they know any responsible undergrads they can recommend.