【AIGC】使用Java实现Azure语音服务批量转录功能：完整指南

文章目录

- 引言
- 技术背景
- 环境准备
- 详细实现
- - 1. 基础架构设计
  - 2. 实现文件上传功能
  - 3. 提交转录任务
  - crul
  - 4. 获取转录结果
- 使用示例
- 结果示例
- 最佳实践与注意事项
- 总结

引言

在当今数字化时代，将音频内容转换为文本的需求越来越普遍。无论是会议记录、视频字幕生成，还是语音内容分析，高质量的语音转文本服务都发挥着重要作用。Azure Speech Service提供了强大的批量转录功能，让我们能够高效地处理大量音频文件。本文将详细介绍如何使用Java实现Azure语音服务的批量转录功能。

技术背景

Azure Speech Service的批量转录功能采用异步处理方式，整个转录过程分为三个主要步骤：

将音频文件上传到Azure Blob存储
提交转录任务到Speech Service
获取并处理转录结果

这种设计允许我们处理大型音频文件，并且能够同时处理多个转录任务。

环境准备

在开始实现之前，我们需要准备以下条件：

Azure订阅和必要的服务：
- Azure Speech Service账户
- Azure Blob Storage账户
Java开发环境
- JDK 11或更高版本
- Maven或Gradle构建工具
必要的依赖项，在Maven项目中添加：

<dependencies>
    <dependency>
        <groupId>com.microsoft.cognitiveservices.speech</groupId>
        <artifactId>client-sdk</artifactId>
        <version>1.24.0</version>
    </dependency>
    <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-storage-blob</artifactId>
        <version>12.20.0</version>
    </dependency>
    <dependency>
        <groupId>org.json</groupId>
        <artifactId>json</artifactId>
        <version>20220924</version>
    </dependency>
</dependencies>

详细实现

1. 基础架构设计

首先，我们创建一个主类来封装所有相关功能：

public class AzureSpeechBatchTranscription {
    private static final String SUBSCRIPTION_KEY = "您的订阅密钥";
    private static final String REGION = "您的区域";
    private static final String STORAGE_CONNECTION_STRING = "您的存储连接字符串";
    private static final String CONTAINER_NAME = "audio-files";
    
    private final BlobServiceClient blobServiceClient;
    private final HttpClient httpClient;
    
    public AzureSpeechBatchTranscription() {
        this.blobServiceClient = BlobServiceClient.parseConnectionString(STORAGE_CONNECTION_STRING);
        this.httpClient = HttpClient.newBuilder()
            .connectTimeout(Duration.ofSeconds(30))
            .build();
    }
}

2. 实现文件上传功能

第一步是将音频文件上传到Azure Blob存储，并生成一个带有SAS令牌的URL：

public String uploadAudioFile(String localFilePath, String fileName) {
    try {
        // 创建容器(如果不存在)
        BlobContainerClient containerClient = blobServiceClient
            .createBlobContainerIfNotExists(CONTAINER_NAME);
        
        // 获取blob客户端并上传文件
        BlobClient blobClient = containerClient.getBlobClient(fileName);
        blobClient.uploadFromFile(localFilePath);
        
        // 生成24小时有效的SAS令牌
        BlobSasPermission permission = new BlobSasPermission()
            .setReadPermission(true);
        
        OffsetDateTime expiryTime = OffsetDateTime.now().plusDays(1);
        
        String sasToken = blobClient.generateSas(
            new BlobServiceSasSignatureValues(expiryTime, permission)
        );
        
        return blobClient.getUrl() + "?" + sasToken;
    } catch (Exception e) {
        throw new RuntimeException("上传音频文件失败: " + e.getMessage(), e);
    }
}

这段代码的关键点在于：

自动创建存储容器（如果不存在）
使用BlobClient进行文件上传
生成具有读取权限的SAS令牌，确保Speech Service可以访问音频文件

3. 提交转录任务

有了音频文件的URL后，我们可以提交转录任务：

public String submitTranscriptionJob(String audioFileUrl) {
    try {
        String endpoint = String.format(
            "https://%s.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions",
            REGION
        );
        
        // 构建请求体
        JSONObject requestBody = new JSONObject()
            .put("contentUrls", new JSONArray().put(audioFileUrl))
            .put("locale", "zh-CN")
            .put("displayName", "Batch transcription")
            .put("properties", new JSONObject()
                .put("wordLevelTimestampsEnabled", true)
                .put("punctuationMode", "DictatedAndAutomatic")
                .put("profanityFilterMode", "Masked")
            );
        
        // 发送HTTP请求
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(endpoint))
            .header("Content-Type", "application/json")
            .header("Ocp-Apim-Subscription-Key", SUBSCRIPTION_KEY)
            .POST(HttpRequest.BodyPublishers.ofString(requestBody.toString()))
            .build();
        
        HttpResponse<String> response = httpClient.send(
            request,
            HttpResponse.BodyHandlers.ofString()
        );
        
        if (response.statusCode() != 201) {
            throw new RuntimeException("提交转录任务失败: " + response.body());
        }
        
        JSONObject responseJson = new JSONObject(response.body());
        return responseJson.getString("self");
    } catch (Exception e) {
        throw new RuntimeException("提交转录任务失败: " + e.getMessage(), e);
    }
}

crul

curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-Type: application/json" -d '{
  "displayName": "My Transcription",
  "description": "Speech Studio Batch speech to text",
  "locale": "en-us",
  "contentUrls": [
    "https://crbn.us/hello.wav",
    "https://crbn.us/whatstheweatherlike.wav"
  ],
  "model": {
    "self": "https://yourserviceregion.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/92237890-4ac5-49c4-9181-0105bd9bc92d"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": true,
    "diarizationEnabled": false,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked"
  },
  "customProperties": {}
}' "https://yourserviceregion.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions"

这里的重要配置参数包括：

locale：指定音频语言
wordLevelTimestampsEnabled：启用词级时间戳
punctuationMode：标点符号处理模式
profanityFilterMode：敏感词处理模式

4. 获取转录结果

最后一步是轮询获取转录结果：

public String getTranscriptionResult(String transcriptionUrl) {
    try {
        while (true) {
            HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(transcriptionUrl))
                .header("Ocp-Apim-Subscription-Key", SUBSCRIPTION_KEY)
                .GET()
                .build();
            
            HttpResponse<String> response = httpClient.send(
                request,
                HttpResponse.BodyHandlers.ofString()
            );
            JSONObject status = new JSONObject(response.body());
            
            String currentStatus = status.getString("status");
            if ("Failed".equals(currentStatus)) {
                throw new RuntimeException("转录任务失败");
            } else if ("Succeeded".equals(currentStatus)) {
                JSONArray files = status.getJSONArray("files");
                String resultUrl = files.getJSONObject(0)
                    .getString("links")
                    .getJSONObject("contentUrl")
                    .getString("href");
                
                HttpRequest resultRequest = HttpRequest.newBuilder()
                    .uri(URI.create(resultUrl))
                    .GET()
                    .build();
                
                HttpResponse<String> resultResponse = httpClient.send(
                    resultRequest,
                    HttpResponse.BodyHandlers.ofString()
                );
                
                return resultResponse.body();
            }
            
            // 每10秒检查一次状态
            Thread.sleep(10000);
        }
    } catch (Exception e) {
        throw new RuntimeException("获取转录结果失败: " + e.getMessage(), e);
    }
}

这个方法实现了：

定期检查任务状态
在任务完成时获取结果URL
下载并返回转录结果

使用示例

下面是一个完整的使用示例：

public static void main(String[] args) {
    AzureSpeechBatchTranscription transcription = new AzureSpeechBatchTranscription();
    
    try {
        // 1. 上传音频文件
        String audioFileUrl = transcription.uploadAudioFile(
            "path/to/your/audio.wav",
            "audio.wav"
        );
        System.out.println("音频文件已上传: " + audioFileUrl);
        
        // 2. 提交转录任务
        String transcriptionUrl = transcription.submitTranscriptionJob(audioFileUrl);
        System.out.println("转录任务已提交: " + transcriptionUrl);
        
        // 3. 获取转录结果
        String result = transcription.getTranscriptionResult(transcriptionUrl);
        System.out.println("转录结果: " + result);
        
    } catch (Exception e) {
        e.printStackTrace();
    }
}