Skip to main content

Whole Home Voice Control

image02-VoiceContol.jpg

Problem

Typing is not always the best way to get information (e.g. what’s the weather?) or give commands (e.g. turn the kitchen lights off!) in the application. 

 

Solution

 

The idea is to create a Voice Assistant to control devices in the SeaPod and ask for information that is available in the Ocean Builder’s applications. 

 

The voice assistant should be smart enough to recognize users and be able to follow commands, considering users’ settings and permissions. For example, when the user asks to turn on the shower then the shrower will be turned on and adjusted automatically based on the user's preferences, but if the user doesn’t have permission to use that shower then the command will be ignored. 



Prize

    • Turn this into your own entrepreneurial business venture and we will be your first customers and help bring you media attention and customers

    • Get Entrepreneurial Business Coaching to start this as a business



    And here are some potential benefits:

     

    • Mass exposure with highly visible project

    • Build reputation

    • Recognized as an official collaborator/ and/or on Github

    • Get noticed

    • Product development experience

    • Work on projects you are passionate about

    • Get your project built and working in the real world

    • Participate in interesting work

    • Get grants (maybe partner with someone that can help with this or exposure to grant writers)

    • Change the world

     

    Industry

    Current technological level

     

    You’ve probably heard of voice assistants like Alexa, Siri, Google Assistant, and Cortana. These voice assistants are essentially based on voice recognition, NLP, and synthesis of speech (see picture below).

     

    There are also many open source projects like:

     

    1. Mycroft

    2. OpenAssistant

    3. Jasper

    4. LinTO

    5. Rhasspy

    6. Aimybox

    7. Leon

     

    Many of these open source voice assistants have come into existence quite recently and will probably take some time to develop into a more sophisticated solution.

     

    The problem with the majority of these platforms is that they are not local and not private enough.



    Some projects such as MyCroft offer solutions built around Google Home or Alexa. However, certain characteristics of these systems - no data protection and no business vocabulary adaptation - limit them to a B2C market that is not (yet) concerned by data sensitivity and criticality issues.

     

    There are also some platforms like LinTo, that embraces these challenges from the start in order to be the engine that catapults your professional product.

     

    One of the biggest challenges might be to implement a voice authenticator. Here are some projects to check out to see if any of them could be a fit to integrate with our system:

     

    Here's a website about open source projects. I brought out some more interesting projects below:

     

    https://awesomeopensource.com/project/pyannote/pyannote-audio

     

    https://codeocean.com/capsule/7271435/tree/v1

     

    https://github.com/mravanelli/pytorch-kaldi

     

    https://awesomeopensource.com/project/google/uis-rnn

     

    https://alize.univ-avignon.fr/

    It has a Java version as well. 



    Information

     

    Repository

    <text>

     

    License Requirement

    Open Source: Can be used for private or commercial projects

    Software: GNU General Public License (GNU GPL V3) here

    Non-Software: Creative Commons (CC BY-SA 4.0) here

     

    Project Areas
    • IoT Development (sensors, arduino and raspberry pi)

    • Software Development (python)?

    • <text>



    Keywords: <text>



    Project requirements

     

    Stages and deadlines

     

    Project Start

    date

    Team Formed

    date

    Market Research Summary (Report)

    date

    Project Plan Complete

    date

    Preliminary Product Design Complete

    date

    Prototype Development Complete

    date

    Prototype Evaluation Complete

    date

    Product Presentation

    date

    Project Completion

    date

     

    Project plan should cover the following:

     

    • stages / milestones of a project (not all stages are brought out in a table above)

    • activities or tasks in each phase

    • task start and end dates

    • interdependencies between tasks

     

    Also:

     

    • skills needed

    • responsibilities of each team member (identify as many as you can).

     

    Product’s general requirements

    https://docs.google.com/spreadsheets/d/1u0Ca9NZvKY6ex5JPtpl8M-HoaM-K8VBF4W4NGoJWlSo/edit?usp=sharing

    (Will remove URL before publishing)



    Basic

    Advanced

    Function

       

    Part I

       

    Can it identify people via voice ID?

       

    Can it easily understand people's accents?

       

    Does it know users preferences of using home appliances?

       

    Can it adjust devices settings based on the users' preferences?

       

    Can it take commands only from people who have permission?

       

    Can you set permissions for commands and information? e.g. select who can open doors.

       

    Is the data sandboxed so personal data is not going to public cloud for AI/ML?

       

    Can you switch from online/offline queries.

         
       

    Part II

       

    Can I ask about all the information that is available in the Ocean Builders user app?

       

    Can I ask about all the information that is available in the Ocean Builders admin panel app?

       

    Can I give all the commands that are available in Ocean Builders user app?

       

    Can I give all the commands that are available in Ocean Builders admin panel app?

       

    Does it support <text> language?



    Tips

     

    Below you can find some examples of tools to use to build Voice Assistance:

     

    gTTS (Google Text-to-Speech) is a  speech synthesis library to convert text to speech.

     

    SpeechRecognition is a library for performing speech recognition, with support for several engines and APIs, online and offline

     

    Sphinx is the offline recognition engine called by the SpeechRecognition library.

     

    Packt is a voice recognition library to identify the person who is speaking 

     

    Voice Authentication 

    https://courses.csail.mit.edu/6.857/2016/files/31.pdf

     

    Not open source:

     

    https://docs.google.com/document/d/1V-cyxivxKFXwYVUO21oleuXcAoXzR8QTKkpI5ug6E0A/edit

     

    Project video link:

    https://www.dropbox.com/s/j44y0z574mt1ohh/VoiceControl.mp4?dl=0