How to Build Your Personal AI Assistant Using Python

Part I: A guide on setting up our J.A.R.V.I.S using Python.

Ashutosh Krishna

Published in

Python in Plain English

7 min readNov 18, 2021

Do you remember J.A.R.V.I.S., Tony Stark’s virtual personal assistant? I’m sure you do!

Have you ever wondered about creating your assistant? Yes? Tony Stark can help us with that! Oops, did you forget he is no more? Sadly, he cannot save us anymore.

But hey, your favorite language Python can help you with that. Yes, you heard it right. We can create our own J.A.R.V.I.S. using Python. Let’s roll it!

1. Create Virtual Environment

During the development of the project, we’ll come across various modules and external libraries. Let’s learn and install them. But before we install them, let’s create a virtual environment and activate it.

We are going to create a virtual environment using virtualenv. Python now ships with a pre-installed virtualenv library. So, to create a virtual environment, you can use the below command:

$ python -m venv env

The above command will create a virtual environment named env. Now, we need to activate the environment using the command:

$ . env/Scripts/activate

To verify if the environment has been activated or not, you can see (env) in your terminal. Now, we can install the libraries.

pyttsx3: pyttsx is a cross-platform text to speech library which is platform-independent. The major advantage of using this library for text-to-speech conversion is that it works offline. To install this module type the below command in the terminal.
$ pip install pyttsx3
SpeechRecognition: It allows us to convert audio into text for further processing. To install this module type the below command in the terminal.
$ pip install SpeechRecognition
pywhatkit: It is an easy-to-use library that will help us interact with the browser very easily. To install the module, run the following command in the terminal.
$ pip install pywhatkit
wikipedia: It is used to fetch a variety of information from the Wikipedia website. To install this module type the below command in the terminal.
$ pip install wikipedia
requests: It is an elegant and simple HTTP library for Python that allows you to send HTTP/1.1 requests extremely easily. To install the module, run the following command in the terminal:
$ pip install requests

.env File

We need this file to store some private data such as API Keys, Passwords, etc related to the project. For now, let’s store the name of the user and the bot.

Create a file named .env and add the following content there:

USER=Ashutosh 
BOTNAME=JARVIS

To use the contents from .env file, we'll install another module called python-decouple as:

$ pip install python-decouple

Learn more about Environment Variables in Python here.

Before we start defining a few important functions, let’s create a speech engine first.

import pyttsx3
from decouple import configUSERNAME = config('USER')
BOTNAME = config('BOTNAME')engine = pyttsx3.init('sapi5')# Set Rate
engine.setProperty('rate', 190)# Set Volume
engine.setProperty('volume', 1.0)# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

Let’s analyze the above script. First of all, we have initialized an engine using the pyttsx3 module. sapi5 is a Microsoft Speech API that helps us use the voices. Learn more about it here. Next, we are setting the rate and volume properties of the speech engine using setProperty method. Now, we can get the voices from the engine using the getProperty method. voices will be a list of voices available in our system. If we print it, we can see as below:

[<pyttsx3.voice.Voice object at 0x000001AB9FB834F0>, <pyttsx3.voice.Voice object at 0x000001AB9FB83490>]

The first one is a male voice and the other one is a female voice. JARVIS was a male assistant in the movies, but I’ve chosen to set the voice property to the female for this tutorial using the setProperty method.

Note: If you get an error related to PyAudio, download the PyAudio wheel from here and install it within the virtual environment.

Also, using the config method from decouple, we are getting the value of USER and BOTNAME from the environment variables.

1. Speak Function

Speak function will be responsible to speak whatever text is passed to it. Let’s see the code:

# Text to Speech Conversion
def speak(text):
    """Used to speak whatever text is passed to it"""
    
    engine.say(text)
    engine.runAndWait()

In the speak() method, the engine speaks whatever text is passed to it using the say() method. Using the runAndWait() method, it blocks during the event loop and returns when the commands queue is cleared.

2. Greet Function

This function will be used to greet the user whenever the program is run. According to the current time, it greets Good Morning, Good Afternoon, or Good Evening to the user.

from datetime import datetime# Greet the user
def greet_user():
    """Greets the user according to the time"""
    
    hour = datetime.now().hour
    if (hour >= 6) and (hour < 12):
        speak(f"Good Morning {USERNAME}")
    elif (hour >= 12) and (hour < 16):
        speak(f"Good afternoon {USERNAME}")
    elif (hour >= 16) and (hour < 19):
        speak(f"Good Evening {USERNAME}")
    speak(f"I am {BOTNAME}. How may I assist you?")

First, we get the current hour, i.e., if the current time is 11:15 AM, the hour will be 11. If the value of the hour is between 6 and 12, wish “Good Morning” to the user. If the value is between 12 and 16, wish “Good Afternoon” and similarly, if the value is between 16 and 19, wish “Good Evening”. We are using the speak method to wish the user.

3. Take User Input

This function is for taking the commands from the user and recognizing the command using the speech_recognition module.

import speech_recognition as sr
from random import choice
from utils import opening_text# Takes Input from User
def take_user_input():
    """Takes user input, recognizes it using Speech Recognition module and converts it into text"""    r = sr.Recognizer()
    with sr.Microphone() as source:
        print('Listening....')
        r.pause_threshold = 1
        audio = r.listen(source)    try:
        print('Recognizing...')
        query = r.recognize_google(audio, language='en-in')
        if not 'exit' in query or 'stop' in query:
            speak(choice(opening_text))
        else:
            hour = datetime.now().hour
            if hour >= 21 and hour < 6:
                speak("Good night sir, take care!")
            else:
                speak('Have a good day sir!')
            exit()
    except Exception:
        speak('Sorry, I could not understand. Could you please say that again?')
        query = 'None'
    return query

We have imported speech_recognition module as sr. The Recognizer class within the speech_recognition module helps us recognize the audio. The same module has a Microphone class that gives us access to the microphone of the device. So with the microphone as the source, we try to listen to the audio using the listen() method in the Recognizer class. We have also set the pause_threshold to 1, i.e., it will not complain even if we pause for one second during we speak.

Next, using the recognize_google() method from the Recognizer class, we try to recognize the audio. The recognize_google() method performs speech recognition on the audio passed to it, using the Google Speech Recognition API. We have set the language to en-in, i.e. English India. It returns the transcript of the audio which is nothing but a string. We've stored it in a variable called query.

If the query has “exit” or “stop” words in it, it means we’re asking the assistant to stop immediately. So, before stopping, we greet the user again as per the current hour. If the hour is between 21 and 6, wish Good Night to the user, else, some other message. We create a utils.py file that has just one list containing a few statements as:

opening_text = [
    "Cool, I'm on it sir.",
    "Okay sir, I'm working on it.",
    "Just a second sir.",
]

If the query doesn’t have those two words (exit or stop), we speak something to tell the user that we have heard you. For that, we will use the choice method from the random module to randomly select any statement from the opening_text list. After speaking, we exit from the program.

During this entire process, if we encounter an exception, we apologize to the user and set the query to None. In the end, we return the query.

To run the project, we’re using the main method.

if __name__ == '__main__':
    greet_user()
    while True:
        query = take_user_input().lower()
        print(query)

As we know, the first thing we need to do is to greet the user using the greet_user() function. Next, we run a while loop to continuously take input from the user using the take_user_input() function. For now, we're just printing the query.

For now, the complete code in main.py looks like this:

import pyttsx3
import speech_recognition as sr
from decouple import config
from datetime import datetime
from random import choice
from utils import opening_textUSERNAME = config('USER')
BOTNAME = config('BOTNAME')engine = pyttsx3.init('sapi5')# Set Rate
engine.setProperty('rate', 190)# Set Volume
engine.setProperty('volume', 1.0)# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)# Text to Speech Conversion
def speak(text):
    """Used to speak whatever text is passed to it"""    engine.say(text)
    engine.runAndWait()# Greet the user
def greet_user():
    """Greets the user according to the time"""
    
    hour = datetime.now().hour
    if (hour >= 6) and (hour < 12):
        speak(f"Good Morning {USERNAME}")
    elif (hour >= 12) and (hour < 16):
        speak(f"Good afternoon {USERNAME}")
    elif (hour >= 16) and (hour < 19):
        speak(f"Good Evening {USERNAME}")
    speak(f"I am {BOTNAME}. How may I assist you?")# Takes Input from User
def take_user_input():
    """Takes user input, recognizes it using Speech Recognition module and converts it into text"""
    
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print('Listening....')
        r.pause_threshold = 1
        audio = r.listen(source)    try:
        print('Recognizing...')
        query = r.recognize_google(audio, language='en-in')
        if not 'exit' in query or 'stop' in query:
            speak(choice(opening_text))
        else:
            hour = datetime.now().hour
            if hour >= 21 and hour < 6:
                speak("Good night sir, take care!")
            else:
                speak('Have a good day sir!')
            exit()
    except Exception:
        speak('Sorry, I could not understand. Could you please say that again?')
        query = 'None'
    return queryif __name__ == '__main__':
    greet_user()
    while True:
        query = take_user_input().lower()
        print(query)

You can run and test the application now.

$ python main.py

In this part, we have completed the setup of our virtual personal assistant. We have not added any functionality to it yet. We’ll work on those functionalities in the next part of the blog. Stay Tuned!

Thank you for reading.

Originally published at https://ireadblog.com

More content at plainenglish.io