Record to Text ?

Email this post Print this post
By Barry Ritholtz - May 13th, 2010, 5:00PM

Quick question:

I want to be able to record an interview, either in person or over the phone, then somehow have that recording converted to text.

That is Speech-to-Text, (not the other way around).

Google voice is only so-so for this; Plan B is to take MP3 files and have someone in India transcribe them for $10 per hour.

I’d love to find a software solution to this.

Any ideas?

Comments

Please use the comments to demonstrate your own ignorance, unfamiliarity with empirical data, ability to repeat discredited memes, and lack of respect for scientific knowledge. Also, be sure to create straw men and argue against things I have neither said nor even implied. Any irrelevancies you can mention will also be appreciated. Lastly, kindly forgo all civility in your discourse . . . you are, after all, anonymous.

47 Responses to “Record to Text ?”

  1. Arequipa01 Says:

    M Hoffer turned me on to this company: http://www.smartcode.com/

    They may have something you could use. Also, it wasn’t Dylan who trapped the Beatles in a hemp mesh…it was Hoffer. All those commas are just him eating Doritos!!! : P

  2. Mark E Hoffer Says:

    http://www.nuance.com/naturallyspeaking/

    Nuance is the Market Leader.. http://www.nuance.com/naturallyspeaking/landing/small-business.asp

    many other Warez available http://clusty.com/search?input-form=clusty-simple&v%3Asources=webplus&query=Speech+to+Text+Software

    if one was interested in ‘Road-Testing’/ doing a compare and contrast, could be a worthwhile project for an intern..

    ~~~

    BR: I have Dragon Naturally Speaking — upgraded both my PC at the office, and the Mac at home, and the damned software thing insists on being retrained. Its frickin exhausting.

  3. Mike S Says:

    learn to type 100wpm? I am going to try nuance myself very soon…

  4. alfred e Says:

    Check with NSA. They’ve been doing speech and voice recognition for some time. But if you want to get even better check into the semantic/intent analysis they have been funding at major universities.

    Perhaps all those commercial companies got their start there.

    As a last resort backtrack the work Lucent (now Alcatel-Lucent) has been doing. My recent info suggests it has been spun off in a separate corp, just like the micro-cameras they invented.

  5. DL Says:

    Ten bucks an hour isn’t such a bad deal; voice-recognition software isn’t going to be flawless anyway.

  6. Mark E Hoffer Says:

    Arequipa,

    do this http://www.smartcode.com/downloads/voice-to-text.html , at least (:

  7. msaroff Says:

    I would agree with Mr. Hoffer. Naturally speaking is the market leader in text to speech.

    You will have to clean it up afterwards, but it does a decent job.

    Note also: It is designed for a person to train the software for their personal use, so your results would likely be worse for a random person and no training, particularly if an accent is involved.

  8. Barry Ritholtz Says:

    That is Speech-to-Text, (not the other way around).

  9. patient renter Says:

    Unfortunately innovation in the speech recognition area has been stagnant for many years.

  10. Tbrander Says:

    Barry, Windows,, all versions since XP have some pretty good speech recognition built-in,, worth a spin,, from my experience the built-in Windows speech recognition works about as well as the Dragon paid for product..

    In fact it is probably worth the experiment to try out the Google speech to text with the mp3 file.. Google keeps getting better and the batch interface may be way better,,,Google seems to have better non-trained recognition.. have not tried though.

  11. dolbydog Says:

    Barry, there is a recent article on exactly why you should just send them to India for transcription.

    http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition

    Best of luck.

  12. changja Says:

    Try this, it may be more humorous than accurate, google voice has a voicemail to text feature. Call someone with google voice, pipe the message through and see how well it translates! Let me know if you need an invite.

  13. Mark E Hoffer Says:

    dolbydog,

    that’s a good article, should give insight into why Nuance is so focused on “Doctors”, and “Lawyers”..at the min..

  14. subscriptionblocker Says:

    Never used this stuff – but here’s a review of that dragonvoice. Seem to remember IBM was using it.

    http://www.consumersearch.com/voice-recognition-software

  15. dancin Says:

    My job is developing speech recognition systems for large companies. Unfortunately the technology just isn’t there yet for very accurate random untrained voice recognition. It’s very good when you have knowledge of what is going to be said, but for random speech in an uncontrolled setting, it’s still quite a ways off.

    If accuracy is important at all, transcription is your only option. If you just need the general gist of the conversation, the google speech to text is probably sufficient if you do it quickly after the interview so you can remember what was said and correct the bad recognitions.

  16. subscriptionblocker Says:

    http://en.wikipedia.org/wiki/Speech_recognition

    Suspect you’re just slightly ahead of the “good stuff”…probably on it’s way to a Walmart near you using off the shelf DSPs. Surprisingly (?), few must have bought those PC packages…so your first practical speech to text converter may appear embedded within a kids toy?

    Military
    [edit] High-performance fighter aircraft

    Some important conclusions from the work were as follows:

    Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.
    Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — *with lower recognition rates, pilots would not use the system*.
    More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.

  17. Myr Says:

    Get one of our long term unemployed to do it for $5.

  18. subscriptionblocker Says:

    http://www.nch.com.au/scribe/

    Aussies usually make good solid stuff. Know nothing about this one, but these guys built the answering machine software I bought 7(?) years ago.

    I’m a hardware guy – so entrusting *anything* to software was very difficult. Runs on an old thinkpad 600E and never breaks.

    If you can’t get Microsofts latest OS to run without annoyance, find an old copy of Win 2K and do this to it:

    http://www.litepc.com/xplite.html

    Uses Microsofts own hidden built ins to rip out the bloat. MS becomes very stable.

  19. subscriptionblocker Says:

    http://www.nch.com.au/software/dictation.html

    PC audio is their specialty.

  20. vachon Says:

    Send it to me. I’ll do it for a large coffee. :)

  21. Jonathan Says:

    If you have the interview in a courtroom, they record it all for you on the taxpayer’s dime!

  22. Jonathan Says:

    …and transcribe it too!

  23. Evoo Kermartin Says:

    What the hell is wrong with you Barry? It’s called HIRE A HOT TRANSCRIPTIONIST.

    f*cking guy

  24. Barry Ritholtz Says:

    Who needs THAT arounnd the office !

    More trouble — no thanks.

  25. greg Says:

    Barry..I’ll do it for $9.50/hr. Cdn, dollars and a signed copy of your book—-oh, and a mention in your blog.

  26. rgc Says:

    You really can’t find a transcriptionist in the USA who would do this for $10/hour? I find that hard to believe in this economy. If true, then this country truly is doomed.

  27. lalaland Says:

    er, there’s an app for that?

    I think it’s called jott but it probably sucks for anything more complicated than run fido run but what do I know…

  28. gloppie Says:

    Dragon / IBM is what’s used for half closed-captioning TV services, the other half being live trained operator using courtroom-style Stenotype to captions equipments.
    I installed a Dragon system and the tweak was to trash the original crappy mike that comes with, and use a decent omni dynamic mike with some compression and bandwidth limiting from 400 Hz. to around 8 KHz. It gets it right around 95% of the time.

  29. panchog Says:

    “Dragon dictation” app for iPhone or iPod Touch/iPad?

    It’s free & it’s pretty accurate.

  30. Rehabengineer Says:

    The Nuance products (speech to text) are speaker dependent so limited use in this situation. More speaker independent tech is coming out regularly-mostly for telecom voice access apps.

    Try you question at SpeechTek magazine. They are on top of the software apps for speech.

    Speech.Technology@emediapro.com

  31. blu Says:

    I haven’t tried it myself, but I know people that have used Amazon Mechanical Turk for this. You take your mp3, split it into short segments, and offer a small amount for each translation. You do each segment several times to avoid errors. It works out to be cheaper than the $10 per hour.

  32. donna Says:

    It hasn’t gotten any better in the 20 years since I worked for a voice recognition company? Huh. Would’ve thought we could have solved this one by now….

    Oh well, just like all that artificial intelligence software I worked on, I suppose! ;^)

    At least the Internet worked out well.

  33. apikoros Says:

    Before you go for DragonDictate, you might want to see Brad DeLong’s current post:

    http://delong.typepad.com/sdj/2010/05/the-beatings-will-continue-until-morale-improves-dragondictate-for-iphone-department.html

    I have no experience, so I have no further comment :-)

  34. Mike in Nola Says:

    I gave up on dragon years ago and haven’t heard that it’s gotten substantially better. I vote for India, or maybe someone wearing a sandwich board who’d do it for little more.

  35. The MacDaddy Says:

    I use a service called copytalk, you dial, play your tape, speak etc and you get an email back.

    http://www.copytalk.com/mobilescribe.po?

  36. thehofa Says:

    This is a pretty great VM service…

    http://www.simulscribe.com/

  37. constantnormal Says:

    filter out everyone who does not have first-hand experience and does not currently use software to do this … it’s a pretty small set of responders.

    Myself, I take this to mean that the technology has not yet arrived in a professional sort of way. When you think about all the peripheral recognition skills that go into speech recognition, and the sophisticated semantic analysis that goes into resolving homynyms, it’s not surprising. This is a VERY difficult problem.

    If software was able to accomplish speech recognition (with an acceptably low error rate), we would have perfect grammar checkers — and yet we do not.

    If a certain amount of errors are acceptable, go ahead an try out the best software packages recommended — but if you don’t want to have to manually review the result, playing the audio while you read it back, stick with a quick-response web service with a human being at the other end of the connection.

    Some things are still best done by the puny humans.

  38. dcsos Says:

    Mac Speech Dictate
    is 100% improvement over dragon

    Don’t know if it’ll do untrained voices
    but its exact, where Dragon Dictate fails

  39. Chris Says:

    Whoa, 10$ per hour? Don’t go to India, in Germany you will find lots of qualified people who work for a lot less.

  40. Mike Says:

    I looked into this a few years ago and wasn’t really satisfied with price, quality and turnaround time so I abandoned the project. It will get better with time, but technology isn’t there yet.

    If you need a relatively quick turnaround and are willing to pay up, check out:
    http://www.speak-write.com

    You can phone in your dictation, upload recordings or download iPhone or Android apps for it and they get your work back to you in under 3 hours. They’ll charge you 1-2 cents per word.

  41. Mr.E. Says:

    BR, sadly, dancin has it right.

    About a year ago I wanted to do essentially what you describe for a very large project involving customer feedback for a huge multinational corporation. We investigated and tried about every readily available option and in then end it had to be done the brute force way – transcription and even that was not as simple as it sounds. Speech-to-text conversion for random speech isn’t yet capable of dealing with the ginormous multitude of dialect variations. Even our transcribers had difficulty with the wide range of dialects seen in the U.S. alone, and when we threw in non-American native speakers it got really interesting. Add to it the complications of a recorded session and the loss of fidelity plus added noise and it becomes about near impossible. We even tried STT conversion as a first step and then have a transcriptionist “clean it up” thinking it would save time. The transcriptionists went nuts and all told us it would take them less time to just do it from scratch, brute force. The best deal we came up with was using college and talented high-school co-ops for about the same as what you quote for India. Sourcing your transcription locally will give you much better control over the product (meaning less do-over and polishing when you get it – and it WILL need polishing even with good transcription), and I suspect you can find some very capable talent via local education co-op programs.

  42. tmmike Says:

    Try Amazon’s Mechanical Turk Community – for a reasonable rate you can have the audio transcribed several times and use the compare feature in a word processor to confirm an accurate transcription. You can set qualifications for those who want to do your work, and over time, you will develop a group of “turkers” who will do your work with the precision you want. Often these are college students who can’t work a fixed schedule or live in college towns with little available employment.
    https://www.mturk.com

    User Community
    http://turkers.proboards.com/index.cgi?

    Worth a few minutes of your time

  43. carrottop Says:

    the industry leader (when i looked into it a long time ago) is NUANCE
    http://www.nuance.com/naturallyspeaking/products/product-comparison.asp

    i used to have their stock (NUAN), hard to sleep w/ a PE of 450….

  44. pseudboy Says:

    India would be more like 2 bucks per hour. For $10, I’m sure you could find a ton of people in the US.

    BTW, the nuance/dragon software sucks.

    Seeking alpha puts up earnings call transcripts in a matter of a few hours. They are generally very accurate. I don’t know what technology they use but I’m sure you could pull a few strings to find out.

  45. Deborah Says:

    Dragon Speech Recognition Family of Products
    Turn Talk into Type

    Most people speak over 120 words per minute but type less than 40 words per minute. What if you could create email, documents and spreadsheets simply by speaking? What if you could control your PC just by talking to it? This includes launching applications, opening files, managing e-mail and working on the Web — all by voice.

    With speech recognition software from Nuance Communications, you can turn your voice into text three times faster than most people type. Just start talking, and the software will recognize your voice instantly, delivering up to 99% accuracy as soon as you get started. Accuracy will continually improve the more you use the software.

    It’s easy to get started with speech recognition, whether you’re using a PC or a Mac. Each edition of our speech recognition software delivers the same fast and accurate transcription of spoken words. But some editions include more advanced features to make interacting with your computer – regardless of whether it’s a PC or a Mac — easier than ever.

  46. harryappenzeller Says:

    amy goodman at democracy now ! has been providing very good quality rush transcripts of her interviews for years – you should ask her who she gets to do it… here’s a link to her site
    http://www.democracynow.org/
    and a link to her contact info which has a NYC phone
    http://www.democracynow.org/contact

  47. franklin411 Says:

    Why would you send this to India for $10 an hour when you can easily hire an undergrad to do it for $8? I know several undergrads who have taken informal jobs like this (as an added bonus, they were all easy on the eyes too!). Just craigslist it, or go to your local university and ask them if they have a jobs board, or ask a friend if they know any responsible undergrads they can recommend.

124 queries. 0.296 seconds.