In-Depth Practice of HarmonyOS 5 Intelligent Voice

2 weeks ago 4

## Introduction

In the wave of digital education transformation, HarmonyOS 5 has opened up a new paradigm of intelligent interaction for educational software through its innovative distributed capabilities and AI technology stack. Taking the *K12 oral training scenario* as an entry point, this article deeply analyzes how to use the ArkUI framework and AI voice services to create smart education solutions with functions such as real-time speech evaluation and intelligent transcription of classroom content, achieving three major breakthroughs:

* Technical Highlights* Multimodal Interaction: Dual-channel input of voice and touch, supporting teaching scenarios such as classroom quick response and oral follow‑up Educational‑Level Latency: 1.2‑second edge‑side speech recognition response to ensure smooth classroom interaction Accessibility Support: Real‑time subtitle generation technology to assist in special education scenarios

* Value in Educational Scenarios*

- *Language Learning*: AI speech evaluation enables real‑time scoring of pronunciation accuracy - *Classroom Recording*: Automatically generates timestamped text of teaching content - *Homework Grading*: Quickly invokes question bank resources via voice commands

Build a real‑time speech‑to‑text function that supports long‑pressing a button to trigger recording and dynamically displays recognition results. Suitable for scenarios such as voice input and real‑time subtitles.

---

## Detailed Development Process

### 1. Environment Preparation

*System Requirements*: HarmonyOS 5 API 9+ *Device Support*: Requires verification of device microphone hardware capabilities

```typescript // Device capability detection if (!canIUse('SystemCapability.AI.SpeechRecognizer')) { promptAction.showToast({ message: 'Device does not support speech recognition' }) } ```

### 2. Permission Configuration

*Step Description*:

1. Declare permissions: Add to `module.json5`:

```json "requestPermissions": [ { "name": "ohos.permission.MICROPHONE", "reason": "$string:microphone_permission_reason", "usedScene": { "abilities": ["EntryAbility"], "when": "always" } } ] ```

1. Dynamic permission request:

```typescript private async requestPermissions() { const atManager = abilityAccessCtrl.createAtManager(); try { const result = await atManager.requestPermissionsFromUser( getContext(), ['ohos.permission.MICROPHONE'] ); this.hasPermissions = result.authResults.every( status => status === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED ); } catch (err) { console.error(`Permission request failed: ${err.code}, ${err.message}`); } } ```

### 3. Speech Engine Management

*Lifecycle Control*:

```typescript // Engine initialization private async initEngine() { this.asrEngine = await speechRecognizer.createEngine({ language: 'zh-CN', // Supports multiple languages like en-US online: 1 // Online recognition mode });

this.configureCallbacks();

}

// Resource release private releaseEngine() { this.asrEngine?.finish('10000'); this.asrEngine?.cancel('10000'); this.asrEngine?.shutdown(); this.asrEngine = undefined; } ```

### 4. Core Configuration Parameters

Read Entire Article