## Introduction
In the wave of digital education transformation, HarmonyOS 5 has opened up a new paradigm of intelligent interaction for educational software through its innovative distributed capabilities and AI technology stack. Taking the *K12 oral training scenario* as an entry point, this article deeply analyzes how to use the ArkUI framework and AI voice services to create smart education solutions with functions such as real-time speech evaluation and intelligent transcription of classroom content, achieving three major breakthroughs:
* Technical Highlights* Multimodal Interaction: Dual-channel input of voice and touch, supporting teaching scenarios such as classroom quick response and oral follow‑up Educational‑Level Latency: 1.2‑second edge‑side speech recognition response to ensure smooth classroom interaction Accessibility Support: Real‑time subtitle generation technology to assist in special education scenarios
* Value in Educational Scenarios*
- *Language Learning*: AI speech evaluation enables real‑time scoring of pronunciation accuracy - *Classroom Recording*: Automatically generates timestamped text of teaching content - *Homework Grading*: Quickly invokes question bank resources via voice commands
Build a real‑time speech‑to‑text function that supports long‑pressing a button to trigger recording and dynamically displays recognition results. Suitable for scenarios such as voice input and real‑time subtitles.
---
## Detailed Development Process
### 1. Environment Preparation
*System Requirements*: HarmonyOS 5 API 9+ *Device Support*: Requires verification of device microphone hardware capabilities
```typescript // Device capability detection if (!canIUse('SystemCapability.AI.SpeechRecognizer')) { promptAction.showToast({ message: 'Device does not support speech recognition' }) } ```
### 2. Permission Configuration
*Step Description*:
1. Declare permissions: Add to `module.json5`:
```json "requestPermissions": [ { "name": "ohos.permission.MICROPHONE", "reason": "$string:microphone_permission_reason", "usedScene": { "abilities": ["EntryAbility"], "when": "always" } } ] ```
1. Dynamic permission request:
```typescript private async requestPermissions() { const atManager = abilityAccessCtrl.createAtManager(); try { const result = await atManager.requestPermissionsFromUser( getContext(), ['ohos.permission.MICROPHONE'] ); this.hasPermissions = result.authResults.every( status => status === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED ); } catch (err) { console.error(`Permission request failed: ${err.code}, ${err.message}`); } } ```
### 3. Speech Engine Management
*Lifecycle Control*:
```typescript // Engine initialization private async initEngine() { this.asrEngine = await speechRecognizer.createEngine({ language: 'zh-CN', // Supports multiple languages like en-US online: 1 // Online recognition mode });
this.configureCallbacks();}
// Resource release private releaseEngine() { this.asrEngine?.finish('10000'); this.asrEngine?.cancel('10000'); this.asrEngine?.shutdown(); this.asrEngine = undefined; } ```
### 4. Core Configuration Parameters