iOS speech-to-text module for Expo for real-time transcription

1 month ago 7

A native iOS speech-to-text module for Expo applications, built using Apple's Speech framework. This module provides real-time speech recognition with multi-language support, audio visualization, and comprehensive event handling. And will be used in the PANOT app.

Higly based on https://github.com/jamsch/expo-speech-recognition repo, more complete and much stable, you should check it out as a better alternative.

Real-time speech recognition with interim results
Multi-language support (English, Spanish, French, Italian, German, Portuguese, and more)
Audio level monitoring for visualizations and animations
Confidence scores for transcription accuracy
iOS native implementation using Apple's Speech framework
Comprehensive permission handling with Expo's permission system
Event-driven architecture with real-time updates
Thread-safe implementation using Swift actors
TypeScript support with full type definitions
Performance optimized with DSP-accelerated audio processing

Add the following permissions to your app.json or app.config.js:

{ "expo": { "ios": { "infoPlist": { "NSMicrophoneUsageDescription": "This app needs access to microphone for speech recognition.", "NSSpeechRecognitionUsageDescription": "This app needs speech recognition to convert your speech to text." } } } }

After installing, rebuild your iOS app:

import PanotSpeechModule from "panot-speech"; import { useEffect, useState } from "react"; function App() { const [transcript, setTranscript] = useState(""); useEffect(() => { // Listen for transcript updates const sub = PanotSpeechModule.addListener("onTranscriptUpdate", (event) => { setTranscript(event.transcript); console.log("Confidence:", event.confidence); console.log("Is Final:", event.isFinal); }); return () => sub.remove(); }, []); const startRecording = async () => { // Request permissions const result = await PanotSpeechModule.requestPermissions(); if (result.status === "granted") { // Start transcribing with interim results in English PanotSpeechModule.startTranscribing(true, "en-US"); } }; const stopRecording = () => { PanotSpeechModule.stopTranscribing(); }; return ( <> <Text>{transcript}</Text> <Button title="Start" onPress={startRecording} /> <Button title="Stop" onPress={stopRecording} /> </> ); }

requestPermissions(): Promise<PermissionResponse>

Requests both microphone and speech recognition permissions.

const result = await PanotSpeechModule.requestPermissions(); if (result.status === "granted") { // Permissions granted }

getPermissions(): Promise<PermissionResponse>

Checks the current permission status without requesting.

const result = await PanotSpeechModule.getPermissions();

startTranscribing(interimResults?: boolean, lang?: string): void

Starts speech recognition.

Parameters:

interimResults (optional): Show partial results as you speak (default: true)
lang (optional): Language code (default: "en-US")

Examples:

// Basic usage (English with interim results) PanotSpeechModule.startTranscribing(); // Spanish with interim results PanotSpeechModule.startTranscribing(true, "es-ES"); // French without interim results (only final) PanotSpeechModule.startTranscribing(false, "fr-FR");

Stops the current speech recognition session.

PanotSpeechModule.stopTranscribing();

Stops recognition and clears the current transcript.

PanotSpeechModule.resetTranscript();

getState(): Promise<RecognitionState>

Returns the current recognition state.

const state = await PanotSpeechModule.getState(); // Returns: "inactive" | "starting" | "recognizing" | "stopping"

getSupportedLocales(): Promise<SupportedLocalesResponse>

Returns all languages supported by the device.

const { locales, installedLocales } = await PanotSpeechModule.getSupportedLocales(); console.log(locales); // ["en-US", "es-ES", "fr-FR", ...]

isLocaleSupported(locale: string): boolean

Checks if a specific language is supported.

const isSupported = PanotSpeechModule.isLocaleSupported("es-ES");

Fired when the transcript is updated (partial or final results).

interface TranscriptUpdateEvent { transcript: string; // The recognized text isFinal: boolean; // Whether this is a final result confidence: number; // Confidence score (0.0 to 1.0) } PanotSpeechModule.addListener("onTranscriptUpdate", (event) => { console.log(event.transcript); });

Fired when a speech recognition error occurs.

interface ErrorEvent { error: string; // Error code message: string; // Human-readable error message } PanotSpeechModule.addListener("onError", (event) => { console.error(event.error, event.message); });

Error Codes:

"not-allowed" - Permissions not granted
"language-not-supported" - Language not supported
"audio-capture" - Audio capture failed
"no-speech" - No speech detected
"service-not-allowed" - Siri/Dictation disabled

Fired when the transcription status changes.

interface StatusChangeEvent { isTranscribing: boolean; } PanotSpeechModule.addListener("onStatusChange", (event) => { console.log("Recording:", event.isTranscribing); });

Fired when speech recognition starts.

PanotSpeechModule.addListener("onStart", () => { console.log("Started!"); });

Fired when speech recognition ends.

PanotSpeechModule.addListener("onEnd", () => { console.log("Ended!"); });

Fired periodically with audio input level (for visualizations).

interface VolumeChangeEvent { volume: number; // Range: -2 to 10 (normalized audio level) } PanotSpeechModule.addListener("onVolumeChange", (event) => { const normalized = (event.volume + 2) / 12; // Convert to 0-1 // Use for animations, visualizations, etc. });

Check available languages:

const { locales } = await PanotSpeechModule.getSupportedLocales();

Audio Visualization Example

Create stunning audio visualizations using the volume events:

import { Animated } from "react-native"; import { useRef, useEffect } from "react"; function AudioVisualizer() { const scaleAnim = useRef(new Animated.Value(1)).current; useEffect(() => { const sub = PanotSpeechModule.addListener("onVolumeChange", (event) => { const normalized = (event.volume + 2) / 12; // 0 to 1 Animated.spring(scaleAnim, { toValue: 1 + normalized * 0.5, useNativeDriver: true, }).start(); }); return () => sub.remove(); }, []); return ( <Animated.View style={{ width: 100, height: 100, borderRadius: 50, backgroundColor: "red", transform: [{ scale: scaleAnim }], }} /> ); }

Complete React Component Example

import React, { useState, useEffect } from "react"; import { View, Text, TouchableOpacity, StyleSheet } from "react-native"; import PanotSpeechModule from "panot-speech"; import { PermissionStatus } from "expo-modules-core"; export default function SpeechToText() { const [hasPermissions, setHasPermissions] = useState(false); const [isTranscribing, setIsTranscribing] = useState(false); const [transcript, setTranscript] = useState(""); const [confidence, setConfidence] = useState(0); const [selectedLanguage, setSelectedLanguage] = useState("en-US"); useEffect(() => { // Check permissions checkPermissions(); // Set up event listeners const transcriptSub = PanotSpeechModule.addListener( "onTranscriptUpdate", (event) => { setTranscript(event.transcript); setConfidence(event.confidence); } ); const statusSub = PanotSpeechModule.addListener( "onStatusChange", (event) => { setIsTranscribing(event.isTranscribing); } ); const errorSub = PanotSpeechModule.addListener("onError", (event) => { console.error(event.error, event.message); alert(`Error: ${event.message}`); }); return () => { transcriptSub.remove(); statusSub.remove(); errorSub.remove(); }; }, []); const checkPermissions = async () => { const result = await PanotSpeechModule.getPermissions(); setHasPermissions(result.status === PermissionStatus.GRANTED); }; const requestPermissions = async () => { const result = await PanotSpeechModule.requestPermissions(); setHasPermissions(result.status === PermissionStatus.GRANTED); }; const startRecording = () => { if (!hasPermissions) { requestPermissions(); return; } PanotSpeechModule.startTranscribing(true, selectedLanguage); }; const stopRecording = () => { PanotSpeechModule.stopTranscribing(); }; return ( <View style={styles.container}> <Text style={styles.title}>Speech to Text</Text> {/* Permissions */} <Text> Permissions: {hasPermissions ? "✅ Granted" : "❌ Not Granted"} </Text> {/* Transcript */} <View style={styles.transcriptBox}> <Text>{transcript || "Start speaking..."}</Text> {transcript && ( <Text style={styles.confidence}> Confidence: {(confidence * 100).toFixed(0)}% </Text> )} </View> {/* Controls */} <View style={styles.controls}> {!isTranscribing ? ( <TouchableOpacity style={styles.button} onPress={startRecording}> <Text style={styles.buttonText}>🎙️ Start</Text> </TouchableOpacity> ) : ( <TouchableOpacity style={styles.stopButton} onPress={stopRecording}> <Text style={styles.buttonText}>⏹️ Stop</Text> </TouchableOpacity> )} </View> {isTranscribing && <Text style={styles.status}>Recording...</Text>} </View> ); } const styles = StyleSheet.create({ container: { flex: 1, padding: 20 }, title: { fontSize: 24, fontWeight: "bold", marginBottom: 20 }, transcriptBox: { backgroundColor: "#f5f5f5", padding: 16, borderRadius: 8, marginVertical: 20, minHeight: 100, }, confidence: { marginTop: 8, fontSize: 12, color: "#666" }, controls: { flexDirection: "row", gap: 12 }, button: { backgroundColor: "#4CAF50", padding: 16, borderRadius: 8, flex: 1, }, stopButton: { backgroundColor: "#f44336", padding: 16, borderRadius: 8, flex: 1, }, buttonText: { color: "white", fontSize: 18, fontWeight: "bold", textAlign: "center", }, status: { marginTop: 16, textAlign: "center", color: "#f44336", fontWeight: "600", }, });

Switching Languages Dynamically

const [language, setLanguage] = useState("en-US"); const switchToSpanish = () => { setLanguage("es-ES"); PanotSpeechModule.stopTranscribing(); PanotSpeechModule.startTranscribing(true, "es-ES"); };

Getting Only Final Results

// Don't show interim results, only final transcriptions PanotSpeechModule.startTranscribing(false, "en-US");

Checking Recognition State

const state = await PanotSpeechModule.getState(); if (state === "recognizing") { console.log("Currently recording"); } else if (state === "inactive") { console.log("Not recording"); }

Audio Processing: DSP-accelerated using Apple's Accelerate framework
Memory: Optimized with Swift actors for thread-safety
CPU Usage: Minimal (~2-5% on modern devices)
Battery: Efficient audio pipeline with proper lifecycle management
Latency: <100ms for interim results
Accuracy: Leverages Apple's ML models (depends on language and audio quality)

iOS: 13.4+
Expo SDK: 49+
React Native: 0.72+
Swift: 5.4+

Ensure you've added both NSMicrophoneUsageDescription and NSSpeechRecognitionUsageDescription to your Info.plist
Rebuild the app after adding permissions
Check iOS Settings → Privacy → Microphone/Speech Recognition

Use getSupportedLocales() to check available languages on the device
Some languages may not be available on all iOS versions
Download language packs in iOS Settings → General → Keyboard → Keyboards

Speech Recognition Not Working

Verify internet connection (required for cloud-based recognition)
Check that Siri and Dictation are enabled in iOS Settings
Ensure the microphone is not being used by another app
Try speaking more clearly or increasing volume

App Crashes on Permission Request

Make sure you've added the required usage descriptions to Info.plist
iOS will crash immediately if these are missing

Audio Visualization Not Updating

Ensure you're listening to the onVolumeChange event
Check that speech recognition is actively running
Volume updates occur ~10 times per second

Contributions are welcome! Please feel free to submit a Pull Request.

MIT

Built using:

Apple's Speech Framework
Expo Modules API
Swift Actors for concurrency
Accelerate framework for DSP

Note: This module currently supports iOS only. Android support may be added in future versions.

Read Entire Article