macOS Catalina introduces Voice Control, a new way to fully control your Mac entirely with your voice. Voice Control uses the Siri speech-recognition engine to improve on the Enhanced Dictation feature available in earlier versions of macOS.1

How to turn on Voice Control

After upgrading to macOS Catalina, follow these steps to turn on Voice Control:

  1. Choose Apple menu  > System Preferences, then click Accessibility.
  2. Click Voice Control in the sidebar.
  3. Select Enable Voice Control. When you turn on Voice Control for the first time, your Mac completes a one-time download from Apple.2
    Voice Control preferences

Aug 02, 2014  In Lion, you use the same keystroke to start and stop text-to-speech. By default it is set as Option-Esc, but can be changed in the Speech prefs pane in System Preferences. The behavior in Snow Leopard should be similar. Edit -Just confirmed via booting into 10.6.8 that the behavior in Snow Leopard is the same. Dragon Dictate is by far the most accurate. While the built-in dictation in Mac OS X is good, it doesn't learn when you correct it. Dragon Dictate gets better the. Dictation is Apple’s own free dictation software on Mac (and the equivalent of WSR – Windows Speech Recognition) that has been a feature since macOS Sierra. By default it’s only suitable for dictations of 30 seconds or less but you can turn-on Enhanced Dictation for unlimited transcriptions.

When Voice Control is enabled, you see an onscreen microphone representing the mic selected in Voice Control preferences.

To pause Voice Control and stop it from from listening, say ”Go to sleep” or click Sleep. To resume Voice Control, say or click ”Wake up.”

How to use Voice Control

Get to know Voice Control by reviewing the list of voice commands available to you: Say “Show commands” or ”Show me what I can say.” The list varies based on context, and you may discover variations not listed. To make it easier to know whether Voice Control heard your phrase as a command, you can select ”Play sound when command is recognized” in Voice Control preferences.

Basic navigation

Voice Control recognizes the names of many apps, labels, controls, and other onscreen items, so you can navigate by combining those names with certain commands. Here are some examples:

  • Open Pages: ”Open Pages.” Then create a new document: ”Click New Document.” Then choose one of the letter templates: 'Click Letter. Click Classic Letter.” Then save your document: ”Save document.”
  • Start a new message in Mail: ”Click New Message.” Then address it: ”John Appleseed.”
  • Turn on Dark Mode: ”Open System Preferences. Click General. Click Dark.” Then quit System Preferences: ”Quit System Preferences” or ”Close window.”
  • Restart your Mac: ”Click Apple menu. Click Restart” (or use the number overlay and say ”Click 8”).

You can also create your own voice commands.

Number overlays

Use number overlays to quickly interact with parts of the screen that Voice Control recognizes as clickable, such as menus, checkboxes, and buttons. To turn on number overlays, say ”Show numbers.” Then just say a number to click it.

Number overlays make it easy to interact with complex interfaces, such as web pages. For example, in your web browser you could say ”Search for Apple stores near me.” Then use the number overlay to choose one of the results: ”Show numbers. Click 64.” (If the name of the link is unique, you might also be able to click it without overlays by saying ”Click” and the name of the link.)

Voice Control automatically shows numbers in menus and wherever you need to distinguish between items that have the same name.


Grid overlays

Use grid overlays to interact with parts of the screen that don't have a control, or that Voice Control doesn't recognize as clickable.

Say “Show grid” to show a numbered grid on your screen, or ”Show window grid” to limit the grid to the active window. Say a grid number to subdivide that area of the grid, and repeat as needed to continue refining your selection.

To click the item behind a grid number, say ”Click” and the number. Or say ”Zoom” and the number to zoom in on that area of the grid, then automatically hide the grid. You can also use grid numbers to drag a selected item from one area of the grid to another: ”Drag 3 to 14.”

To hide grid numbers, say ”Hide numbers.” To hide both numbers and grid, say ”Hide grid.”

Dictation

When the cursor is in a document, email message, text message, or other text field, you can dictate continuously. Dictation converts your spoken words into text.

  • To enter a punctuation mark, symbol, or emoji, just speak its name, such as ”question mark” or ”percent sign” or ”happy emoji.” These may vary by language or dialect.
  • To move around and select text, you can use commands like ”Move up two sentences” or ”Move forward one paragraph” or ”Select previous word” or ”Select next paragraph.”
  • To format text, try ”Bold that” or ”Capitalize that,” for example. Say ”numeral” to format your next phrase as a number.
  • To delete text, you can choose from many delete commands. For example, say “delete that” and Voice Control knows to delete what you just typed. Or say ”Delete all” to delete everything and start over.

Voice Control understands contextual cues, so you can seamlessly transition between text dictation and commands. For example, to dictate and then send a birthday greeting in Messages, you could say ”Happy Birthday. Click Send.” Or to replace a phrase, say ”Replace I’m almost there with I just arrived.”

You can also create your own vocabulary for use with dictation.

Create your own voice commands and vocabulary

Create your own voice commands

  1. Open Voice Control preferences, such as by saying ”Open Voice Control preferences.”
  2. Click Commands or say ”Click Commands.” The complete list of all commands opens.
  3. To add a new command, click the add button (+) or say ”Click add.” Then configure these options to define the command:
    • When I say: Enter the word or phrase that you want to be able to speak to perform the action.
    • While using: Choose whether your Mac performs the action only when you're using a particular app.
    • Perform: Choose the action to perform. You can open a Finder item, open a URL, paste text, paste data from the clipboard, press a keyboard shortcut, select a menu item, or run an Automator workflow.
  4. Use the checkboxes to turn commands on or off. You can also select a command to find out whether other phrases work with that command. For example, “Undo that” works with several phrases, including “Undo this” and “Scratch that.”

To quickly add a new command, you can say ”Make this speakable.” Voice Control will help you configure the new command based on the context. For example, if you speak this command while a menu item is selected, Voice Control helps you make a command for choosing that menu item.

Create your own dictation vocabulary

  1. Open Voice Control preferences, such as by saying ”Open Voice Control preferences.”
  2. Click Vocabulary, or say ”Click Vocabulary.”
  3. Click the add button (+) or say ”Click add.”
  4. Type a new word or phrase as you want it to be entered when spoken.

Learn more

  • For the best performance when using Voice Control with a Mac notebook computer and an external display, keep your notebook lid open or use an external microphone.
  • All audio processing for Voice Control happens on your device, so your personal data is always kept private.
  • Use Voice Control on your iPhone or iPod touch.
  • Learn more about accessibility features in Apple products.

1. Voice Control uses the Siri speech-recognition engine for U.S. English only. Other languages and dialects use the speech-recognition engine previously available with Enhanced Dictation.

2. If you're on a business or school network that uses a proxy server, Voice Control might not be able to download. Have your network administrator refer to the network ports used by Apple software products.

June 2017: a key component for these instructions is no longer actively maintained, so these instructions are no longer valid for Modern Mac configurations.

I listen to podcasts. I watch videos. I watch podcasts of different languages. But more than anything I read and write. I practice languages. That’s just how I roll. And sometimes, my ramblings bring me as far as understanding English meaning of some specific kikuyu translation texts.

Frequently I want to save an audio snippet or video clip for future reference. Sure I could save the source media file, if I had unlimited disk space. But what I usually do is keep a link to the original source and text synopsis of the snippet. That both saves on storage and makes future searches for that particular item simpler.

If you’re like me, you really want the original text more than a synopsis. It take s a bit of extra effort, but I have a nice solution that uses only a Mac and open source software. Read below for instructions on converting an MP3 audio file to a text document.

Speech To Text Software Mac Os

The Basics of Configuring Your Mac to Transcribe .MP3 Audio

Speech To Text Software Mac Os 10

Here’s what you need:

  • The original media (.mp3 file, for example)
  • Soundflower. Soundflower is an application that creates a virtual audio channel and directs audio input and output to physical or virtual devices.
  • Audacity. Audacity is a free application for recording and editing sounds.
  • TextEdit.app. TextEdit is the default text editor/word processor that is included in Mac OS X.

Follow the instructions on the developer websites to get all of the software installed and working on your system. Once you have the software installed, the next step is to configure your Mac to use Soundflower for dictation.

  • Open System Preferences and click on “Dictation & Speech”
  • Select the Dictation tab
  • Select “Soundflower (2ch)” as the dictation input source
  • Click Dictation to “On”
  • Tick the “Use Enhanced Dictation” box

Your Mac is ready for dictation. When dictation is turned on in TextEdit (or a another word processing app), your Mac will transcribe sound from the Soundflower input source.

Getting Your Audio and Text Files Ready

Next, you need to queue up the audio file in Audacity and direct output to Soundflower. For those who are new to Audacity, this will be the trickiest step. But relax, you don’t need to learn much about Audacity beyond deciding what section of sound to play and how to select the audio output from the default speakers to Soundflower.

  • Launch Audacity
  • Import your audio file into audacity (File–> Import, or simply drag the file into the center of the Audacity screen.)
  • Click the play button to give it a listen, then click stop once your confident you have the right sound clip/transcription area.
  • Choose Audacity –> Preferences –> Devices. Under playback, choose “Soundflower (2ch)” to switch the output from the onboard speakers to Soundflower. Click “OK”

With Audacity and your sound file queued up, its time to turn your attention to TextEdit.

  • Launch TextEdit
  • Create a “New Document”
  • You may want to add some meta data to the document, such as the podcast name, episode #, publish date and URL, to go along with the key transcript.
  • Position the cursor in the file where you want the transcript to appear.

And … Action!

It’s time to start audio playback and dictation transcription. Here both sequence and timing are important:

  1. In Audacity, move the scrubber start location 10-15 seconds before the key transcription area.
  2. Press “Play.” The scrubber and meters will start moving, though you won’t hear any sound. The audio signal is going to Soundflower instead of to the speakers.
  3. Put focus on Text edit and position the cursor where you want the transcription to begin.
  4. Select Edit –> Start Dictation. (or use the hot key combination, Fn Fn). A microphone icon with a “Done” button will appear to the left of your document.
  5. Text will start appearing in the document. It will likely lag by about 3-5 seconds.
  6. After approximately 30 seconds press the “done” button. Transcription will continue until complete.

This is the fun part: watch as transcription happens in real time right in the document window. Look Ma, no hands!

And now you have the original text (and most likely a few errors) as text to save. In the future you can easily search and retrieve the information.

An Excellent Alternative: Google Docs Voice Typing

While the solution above works great for offline work, one alternative with a lot of promise is Google Docs. The Voice Typing feature work much like the dictation service in Mac OS. It has the crowdsourcing advantages and privacy disadvantages of other Google products. If you’re OK with that, I found Voice Typing to do an very good job with accuracy and it can go longer that Mac OS dictation.

To use Google Voice Typing, follow all of the steps above with Soundflower, Dictation preferences and configuring Audacity. Instead of using TextEdit, you’ll want to start the Chrome browser and create a Google Doc. Once you are in document, Select Tools –> Voice typing

The user interface and process of starting and stopping transcription is the same as with TextEdit.

Dictation and Transcription Limitations

This process sets you well on you way to the goal of a high fidelity audio transcription. But it will be short of perfect. Here’s what you can do to go from good to perfect:

Download Speech To Text Software

  • Understand that Mac OS dictation transcription works for a maximum of 30 seconds at a time. If you need longer, you may want to use an alternate technology such as Dragon.
  • Audio playback needs to start before dictation/transcription begins in TextEdit. TextEdit needs to be in focus for dictation to work. If you set the Audacity scrubber a few seconds ahead of target snippet, you’ll be fine.
  • Transcription cannot intuit punctuation. You’ll need to add that after the fact.
  • If you have multiple speakers or a noisy background, you may need to complete one additional step of creating a pristine audio file to work from. This can be done by listening to the sound through headphones and speaking the text into an audio recorder. Use the recording of your voice to drive the transcription.