用 80 行 JavaScript 代码构建自己的语音助手_语言 & 开发_Tuan Nhu Dinh

在本教程中，我们将使用 80 行 JavaScript 代码在浏览器中构建一个虚拟助理（如 Siri 或 Google 助理）。你可以在这里测试这款应用程序，它将会听取用户的语音命令，然后用合成语音进行回复。

你所需要的是：

Google Chrome（版本 25 以上）
一款文本编辑器

由于 Web Speech API 仍处于试验阶段，该应用程序只能在受支持的浏览器上运行：Chrome（版本 25 以上）和 Edge（版本 79 以上）。

我们需要构建哪些组件？

要构建这个 Web 应用程序，我们需要实现四个组件：

一个简单的用户界面，用来显示用户所说的内容和助理的回复。
将语音转换为文本。
处理文本并执行操作。
将文本转换为语音。

用户界面

第一步就是创建一个简单的用户界面，它包含一个按钮用来触发助理，一个用于显示用户命令和助理响应的 div、一个用于显示处理信息的 p 组件。

const startBtn = document.createElement("button");startBtn.innerHTML = "Start listening";const result = document.createElement("div");const processing = document.createElement("p");document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");document.body.append(startBtn);document.body.append(result);document.body.append(processing);

复制代码

语音转文本

我们需要构建一个组件来捕获语音命令并将其转换为文本，以进行进一步处理。在本教程中，我们使用 Web Speech API 的 SpeechRecognition。由于这个 API 只能在受支持的浏览器中使用，我们将显示警告信息并阻止用户在不受支持的浏览器中看到 Start 按钮。

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;if (typeof SpeechRecognition === "undefined") {  startBtn.remove();  result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";}

复制代码

我们需要创建一个 SpeechRecognition 的实例，可以设置一组各种属性来定制语音识别。在这个应用程序中，我们将 continuous 和 interimResults 设置为 true，以便实时显示语音文本。

const recognition = new SpeechRecognition();recognition.continuous = true;recognition.interimResults = true;

复制代码

我们添加一个句柄来处理来自语音 API 的 onresult 事件。在这个处理程序中，我们以文本形式显示用户的语音命令，并调用函数 process 来执行操作。这个 process 函数将在下一步实现。

function process(speech_text) {    return "....";}recognition.onresult = event => {   const last = event.results.length - 1;   const res = event.results[last];   const text = res[0].transcript;   if (res.isFinal) {      processing.innerHTML = "processing ....";      const response = process(text);      const p = document.createElement("p");      p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;      processing.innerHTML = "";      result.appendChild(p);      // add text to speech later   } else {      processing.innerHTML = `listening: ${text}`;   }}

复制代码

我们还需要将用户界面的 button 与 recognition 对象链接起来，以启动/停止语音识别。

let listening = false;toggleBtn = () => {   if (listening) {      recognition.stop();      startBtn.textContent = "Start listening";   } else {      recognition.start();      startBtn.textContent = "Stop listening";   }   listening = !listening;};startBtn.addEventListener("click", toggleBtn);

复制代码

处理文本并执行操作

在这一步中，我们将构建一个简单的会话逻辑并处理一些基本操作。助理可以回复“hello”、“what's your name？”、“how are you？”、提供当前时间的信息、“stop”听取或打开一个新的标签页来搜索它不能回答的问题。你可以通过使用一些 AI 库进一步扩展这个 process 函数，使助理更加智能。

function process(rawText) {   // remove space and lowercase text   let text = rawText.replace(/\s/g, "");   text = text.toLowerCase();   let response = null;   switch(text) {      case "hello":         response = "hi, how are you doing?"; break;      case "what'syourname":         response = "My name's Siri.";  break;      case "howareyou":         response = "I'm good."; break;      case "whattimeisit":         response = new Date().toLocaleTimeString(); break;      case "stop":         response = "Bye!!";         toggleBtn(); // stop listening   }   if (!response) {      window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");      return "I found some information for " + rawText;   }   return response;}

复制代码

文本转语音

在最后一步中，我们使用 Web Speech API 的 speechSynthesis 控制器为我们的助理提供语音。这个 API 简单明了。

speechSynthesis.speak(new SpeechSynthesisUtterance(response));

复制代码

就是这样！我们只用了 80 行代码就有了一个很酷的助理。程序的演示可以在这里找到。

// UI compconst startBtn = document.createElement("button");startBtn.innerHTML = "Start listening";const result = document.createElement("div");const processing = document.createElement("p");document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");document.body.append(startBtn);document.body.append(result);document.body.append(processing);// speech to textconst SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;let toggleBtn = null;if (typeof SpeechRecognition === "undefined") {  startBtn.remove();  result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";} else {  const recognition = new SpeechRecognition();  recognition.continuous = true;  recognition.interimResults = true;  recognition.onresult = event => {    const last = event.results.length - 1;    const res = event.results[last];    const text = res[0].transcript;    if (res.isFinal) {      processing.innerHTML = "processing ....";      const response = process(text);      const p = document.createElement("p");      p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;      processing.innerHTML = "";      result.appendChild(p);      // text to speech      speechSynthesis.speak(new SpeechSynthesisUtterance(response));    } else {      processing.innerHTML = `listening: ${text}`;    }  }  let listening = false;  toggleBtn = () => {    if (listening) {      recognition.stop();      startBtn.textContent = "Start listening";    } else {      recognition.start();      startBtn.textContent = "Stop listening";    }    listening = !listening;  };  startBtn.addEventListener("click", toggleBtn);}// processorfunction process(rawText) {  let text = rawText.replace(/\s/g, "");  text = text.toLowerCase();  let response = null;  switch(text) {    case "hello":      response = "hi, how are you doing?"; break;    case "what'syourname":      response = "My name's Siri.";  break;    case "howareyou":      response = "I'm good."; break;    case "whattimeisit":      response = new Date().toLocaleTimeString(); break;    case "stop":      response = "Bye!!";      toggleBtn();  }  if (!response) {    window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");    return `I found some information for ${rawText}`;  }  return response;}×Drag and DropThe image will be downloaded

复制代码

作者介绍：

Tuan Nhu Dinh，Facebook 软件工程师。

原文链接：

https://medium.com/swlh/build-your-own-hi-siri-with-80-lines-of-javascript-code-653540c77502

评论 1 条评论

发布

神木

然而谷歌接口要翻墙

2020-07-28 13:16

 0 回复

没有更多了

创作场景

用 80 行 JavaScript 代码构建自己的语音助手

你所需要的是：

我们需要构建哪些组件？

用户界面

语音转文本

处理文本并执行操作

文本转语音

评论 1 条评论

被翻译成 12 种语言的深度学习名著重磅升级，75%新内容，让你豁然开朗！

展示出你的创意，知识库搭建也可以这样玩！

Python基础（十四） | Python之禅与时间复杂度分析

企业级低代码开发平台有哪些？

理解NodeJS多进程

Vue组件入门（一）组件定义及使用

js作用域、作用域链和它的一些优化

【一Go到底】第十天---位运算and移位运算符

实验精神终将胜利：量子纠缠的祛魅七十年

小程序容器技术，或许是提升智能设备安全等级的又一解决方案

爬虫练习题（五）

英国国民信托 CIO：如何讲好可持续发展故事

DataOps核心理念是什么？包括哪些模块？

深度理解NodeJS事件循环

我为什么要使用 Tauri 构建 macOS 磁盘清理工具 PrettyClean

如何打造优秀的客户体验？

一文了解 Python 中的对象比较方法 is 和 ==及其本质

阳振坤：分布式技术引领关系数据库发展

说说Nodejs高并发的原理

外包学生管理系统

【指针】有哪些类型？

为什么最近关注“元数据”的人突然增多？

星汉未来 | 云原生薪火计划——开源大使招募

这几款小程序插件可以让效率翻倍

webpack模块化的原理

webpack热更新原理(面试大概率会问)

Red Hat Releases OptaPlanner 8

数字化转型最核心的是数据吗？不，是数据思维

时间旅行调试与自动化测试的支持

被翻译成 12 种语言的深度学习名著重磅升级，75%新内容，让你豁然开朗！

令人头疼的Javascript隐式&强制转换

创作场景

用 80 行 JavaScript 代码构建自己的语音助手

你所需要的是：

我们需要构建哪些组件？

用户界面

语音转文本

处理文本并执行操作

文本转语音

评论 1 条评论

更多内容推荐

推荐阅读

电子书

大厂实战PPT下载