When a child can't speak, a text-to-speech app becomes their voice. The quality of that voice matters more than most people think. It affects whether the child wants to use the app, whether listeners take them seriously, and whether communication feels natural or mechanical.
This guide covers how TTS works in AAC apps, why voice quality has improved dramatically, and what to look for when choosing an app for a nonverbal or minimally verbal child.
What Text-to-Speech Actually Does in AAC
In an AAC context, text-to-speech converts symbols or text into spoken words. The child taps a symbol for "I," then "want," then "water." The app combines those words and speaks the sentence aloud: "I want water."
This is different from how most people encounter TTS. Siri and Alexa use TTS for short responses. Screen readers use TTS to read web pages. But in AAC, TTS is someone's primary voice. It's how they ask for help, tell a joke, say "I love you," and argue with their siblings.
That difference changes what matters. A voice that's fine for reading a Wikipedia article aloud may not be acceptable as a child's personal voice.
Neural Voices vs. Robotic Voices
TTS technology has gone through several generations.
Robotic/concatenative voices
Early TTS systems (and many budget apps still) use concatenative synthesis. This approach records a human speaker saying thousands of short sound fragments, then stitches them together. The result sounds choppy, mechanical, and obviously artificial. Pauses land in odd places. Intonation is flat.
Neural voices
Modern TTS uses neural networks trained on large datasets of human speech. The output sounds remarkably natural. Inflection, rhythm, and emphasis follow patterns that real speech follows. Most listeners can't distinguish a high-quality neural voice from a recording of a human speaker.
The difference is not subtle. Put a robotic voice and a neural voice side by side, and the gap is immediately obvious.
Why Voice Quality Matters for Identity
For a speaking child, their voice is part of who they are. It carries personality, emotion, and identity. The same is true for an AAC user's synthesized voice. It just gets overlooked more often.
Consider a 7-year-old girl using an AAC app with a deep adult male voice. Or a teenager using a voice that sounds like a GPS unit from 2008. These mismatches affect how the user feels about their communication device and how others respond to them.
Good AAC apps offer multiple voice options so users can find one that matches their age, gender identity, and personality. Some apps now offer voices in multiple accents and languages, which matters for bilingual families using AAC.
Voice banking
Some people who are losing their speech due to progressive conditions like ALS can record their natural voice and create a synthetic version of it. This is called voice banking. It's a separate process from standard TTS, but some AAC apps support importing custom voice profiles.
How TTS Works with Symbol Boards
In most AAC apps, TTS and symbols work together in a pipeline:
- The user taps symbols on their communication board.
- Each symbol maps to a word or phrase stored in the app's vocabulary database.
- Selected words appear in a message bar at the top of the screen.
- The user taps a "speak" button (or the message bar itself).
- The app sends the text string to the TTS engine.
- The TTS engine generates audio and plays it through the device speaker.
This entire process happens in under a second on modern devices. The delay between tapping "speak" and hearing the voice is barely noticeable.
Grammar and sentence construction
Some AAC apps, including SabiKo, apply basic grammar rules before sending text to the TTS engine. If a child taps "I," "want," "go," "park," the app can output "I want to go to the park" with appropriate articles and prepositions inserted. This makes the spoken output sound more natural and grammatically correct.
Without grammar support, TTS reads exactly what's there, including awkward phrasing like "I want go park."
Offline vs. Cloud TTS
This is a critical distinction for AAC users.
| Feature | Offline TTS | Cloud TTS |
|---|---|---|
| Requires internet | No | Yes |
| Voice quality | Good (neural voices available) | Excellent |
| Latency | Very low | Depends on connection |
| Privacy | Data stays on device | Audio processed on servers |
| Availability | Always works | Fails without Wi-Fi |
| Voice variety | Limited selection | Larger selection |
Why offline matters for AAC
A child needs to communicate at the playground, in the car, at grandma's house, and at the grocery store. None of these places guarantee internet access. If the TTS engine requires a cloud connection, the child literally loses their voice when the Wi-Fi drops.
This is not a minor inconvenience. It's a safety issue. A child who can't call for help because they're out of cell range is a child without communication.
The best approach: both
The ideal setup uses offline TTS as the default (so communication is always available) with the option to use higher-quality cloud voices when internet is available. Some apps handle this automatically.
Voice Options Across AAC Apps
Here's how major AAC apps compare on voice features:
| App | Voice Type | Languages | Offline | Free Voices |
|---|---|---|---|---|
| SabiKo | Neural | 5 languages, 37 voices | Yes | 6 neural voices free |
| Proloquo2Go | Neural | 18 languages | Yes | Included with purchase |
| TouchChat | Acapela/NeoSpeech | Multiple | Yes | Included with purchase |
| TD Snap | Acapela | Multiple | Yes | Included with purchase |
| CoughDrop | System TTS | Device-dependent | Varies | System voices only |
| LetMeTalk | System TTS | Device-dependent | Varies | System voices only |
A few things stand out. Premium apps like Proloquo2Go and TouchChat include voices, but the apps themselves cost $100 to $250. Budget options like LetMeTalk and CoughDrop rely on whatever voices the device's operating system provides, which vary widely in quality.
SabiKo includes 6 neural voices in the free tier, with 37 total voices across 5 languages available with Pro. That's an unusual amount of voice variety for a free app.
Choosing the Right Voice for Your Child
Selecting a voice is a personal decision. Here are practical guidelines:
Match age range. Use a child's voice for a child. Adult voices on children's devices create a disconnect that affects how peers and adults interact with the user.
Match language and accent. If your family speaks a specific language at home, find a voice in that language. If you speak English with a regional accent, look for voice options that reflect that.
Let the user choose. Whenever possible, let the AAC user listen to voice samples and pick the one they prefer. Self-determination matters, even in seemingly small choices. This is their voice.
Consider volume and clarity. Some voices sound great through headphones but are hard to hear in noisy environments. Test voices in real-world settings, like a busy kitchen or a classroom.
Test full sentences. A voice that sounds good saying single words may sound odd in full sentences. Listen to how it handles "Can I have more please" and "I don't want that" before deciding.
SabiKo's Voice System
SabiKo uses neural text-to-speech voices that work fully offline. Every voice is downloaded to the device, so communication is never dependent on internet access. For a detailed look at SabiKo's voice options, including how to choose the right one, see our neural voices feature guide.
The free tier includes 6 voices. Pro users get access to all 37 voices across 5 languages: English, Spanish, French, German, and Portuguese. Each language has multiple voice options so users can find one that fits.
All voices support the message bar and grammar correction features, so spoken output sounds natural even when the user selects symbols in a simplified order.
Common Questions
Will TTS replace my child's natural speech? No. Research consistently shows that AAC does not prevent or delay speech development. A 2006 meta-analysis by Millar, Light, and Schlosser reviewed 23 studies and found that AAC either had no effect on speech or increased it. TTS gives your child a way to communicate now while speech skills continue to develop.
Can my child use TTS at school? Yes. AAC devices (including apps on tablets) are protected under the Individuals with Disabilities Education Act (IDEA) in the US. Schools cannot prohibit a child from using their communication device. If you encounter resistance, request an IEP meeting and involve your child's SLP.
What if my child doesn't like any of the voices? Try different apps. Voice preferences are personal and valid. Some children prefer a higher pitch. Some prefer a lower one. Some want a voice that sounds like their parent. Keep trying until you find one that feels right.
Do I need to buy an expensive app to get good voices? Not anymore. Neural voices were once exclusive to premium apps costing hundreds of dollars. SabiKo offers 6 neural voices for free. The gap between free and paid voice quality has narrowed significantly.
Getting Started
- Download a free AAC app with neural voices. SabiKo is a good place to start.
- Let your child listen to the available voices and pick a favorite.
- Set up a basic communication board with core words.
- Practice in a quiet environment first, then gradually use it in noisier settings.
- Work with your child's SLP to integrate the app into therapy and daily routines.
A text-to-speech app can be the most important tool in a nonverbal child's life. The technology exists. The voices sound natural. Many options are free. There's no reason to wait.
Download SabiKo free and give your child a voice today.