Train a custom ChatGPT to answer questions on your resume.

Pavlos Bosmalis
Python in Plain English
6 min readApr 24, 2023

--

Due to ChatGPT’s tremendous capabilities, similar customized AI chatbots will grow in popularity on a personal and enterprise level.

Introduction

In fact, ChatGPT needs no introduction. Interestingly though, and frankly thankfully, OpenAI offers a GPT-3 API (application programming interface) reference that can applied to virtually any task that involves understanding or generating natural language, code, or images. In this piece, we are going to leverage the power of this API to train an AI chatbot with personalized data, i.e., a personal resume. The same implementation can be also followed for any other application, such as the creation of the summary of a book, report or financial statement.

Prerequisites

Hereby some notes to be able to follow this article:

  • You already have Python3 installed (if not, check out this installation guide).
  • You already have an OpenAI account. If you ever used ChatGPT, then you have an account. If not, go to platform.openai.com/signup and create a free account.
  • To get the best results, the data set should be in English. However, according to OpenAI, it will also work with popular international languages like French, Spanish, German, etc. So go ahead and give it a try in your own language.
  • In this article we are going to feed a rather small PDF document (just 4 pages) for our model. If you want to use a larger dataset, make sure that your computer comes with a strong CPU and GPU.

Create an OpenAI API Key

  1. Log in OpenAI
  2. Go to Personal>View API keys

3. Select Create new secret key

4. Copy the API key. Be aware that later on you will not be able to either view or copy the key so make sure that you copy-paste it to your notebook or working file, otherwise you’ll have to create a new key.

5. Now, visit the usage page to verify that you have adequate remaining credit. By default, upon creation of a new account, OpenAI offers $18 of free credit to use within a certain period. If your credit has expired or drained, you can purchase the OpenAI API at the usage page. Alternatively, you can establish a new OpenAI account with a different phone number to receive additional free credits. This will help you avoid encountering Error 429 (You exceeded your current quota, please check your plan and billing details) while running the code.

6. As stated by the “secret”, the API key is strictly personal. So, do not share it with others, or expose it in the browser or other client-side code. In order to protect the security of your account, OpenAI may automatically rotate any API key that has been found leaked publicly.

Install libraries

Let’s install our libraries. Open the Terminal and type the following commands, waiting between each installation to finish before starting the next one.

We are going to use the OpenAI library to train our LLM AI chatbot. Later on, we will import its LangChain framework.

pip install openai

Next, we are installing the gpt_index, which will allow us to connect to external data that will be used to train our chatbot.

pip install gpt_index==0.4.24

In this application we are going to work with a PDF document so we will install PyPDF2 and PyCryptodome to parse PDF files without errors.

pip install PyPDF2
pip install PyCryptodome

Lastly, we will use gradio for a simple interactive UI for our AI chatbot.

pip install gradio

Set up the data and code

We will use the summary of my LinkedIn profile as training data. It can be downloaded by selecting More>Save to PDF through the main profile page.

The code that is used to train the chatbot is shown below:

The chatbot is based on GPT-3 language model and more specifically uses “davinci” , most capable GPT-3 model that can do any task the other models can do, often with higher quality. The code also uses the GPT_index package for building and managing the index of possible responses.

In line 8, ‘Your API key’ should be replaced by your personal API key that was created previously.

The create_index function uses the imported modules to create an index of documents in a specified directory path, which is used by the chatbot to generate responses. The function sets several parameters such as max_input_len, num_outputs, max_chunk_overlap, and chunk_size_limit to configure the index. The function also uses a PromptHelper to assist in generating prompts and an LLMPredictor to predict the likelihood of generating certain responses based on the GPT-3 model.

Once the index is created, it is saved to disk in JSON format using the save_to_disk method. The chat function loads the saved index from disk and queries it with the user's input text. Empirically, setting response_mode="tree_summarize" leads to better summarization results. The function then returns the chatbot's response.

Finally, the script creates a Gradio interface for the chatbot, which allows users to enter text and receive responses. The interface launches with the launch method, and the share parameter is set to True to allow others to access the interface.

Now, we will create the following simple structure in any directory of our preference. The easiest is to just save the files on the Desktop. The “docs” folder contains the PDF of the LinkedIn profile summary and the ”custom_gpt” is our code.

Train the AI chatbot with resume data

We will execute the script from the command line.

  1. Open the Terminal.
  2. Navigate to the previously mentioned directory where the “docs” folder and source code are saved using the cd command.
  3. Run the following command:
python custom_gpt.py

4. Copy paste the local URL on your web browser. It will load the interface of our chatbot.

5. The interface will look like this and is ready to answer any relevant to the training data questions:

Let’s try it out!

Pretty good job for such a simple, yet rather sophisticated chatbot. Considering the low volume of data with which it is trained, it can adequately summarize some key areas of the resume. Should the resumes of several people be added in the “docs” folder, the chatbot would be able to differentiate between them and generate the appropriate response. The current implementation can easily work with PDF and text files, simply add them to the “docs” folder and rerun the script via the Terminal.

To stop the custom-trained AI chatbot, press “Ctrl + C” in the Terminal window. If it does not work, press “Ctrl + C” again. Also, visit again the usage page to view the usage that incurred and keep track of your tokens.

Conclusion

In this piece we explored how a customized AI chatbot can be trained with one’s own data and then answer relevant questions. The same approach can be used to summarize books, articles or anything else in a PDF or text format. The possibilities are endless, though limitations still exist.

Pavlos

--

--

Data Scientist | Artificial Intelligence Consultant | Electrical & Computer Engineer