GPT-2 chatbot with Raspberry Pi, Flask and Gunicorn

Chatbots have evolved substantially with the advent of sophisticated deep learning models. In particular transformers are currently revolutionising natural language processing (NLP). If you would like to better understand language models and NLP in general, I can recommend courses from CS50AI, Hugging Face and Coursera as a starting point. Please be aware that some previous intermediate programming and machine learning knowledge is helpful before you embark on these courses.

For today’s project, though, it won’t be necessary to understand how a chatbot works, as you will simply use an existing model: DialoGPT, a pretrained dialogue response generation model based on the huggingface pytorch-transformer and OpenAI GPT-2. The main task will be to get it running on your Raspberry Pi Server with the help of Flask, Gunicorn and Apache server and make it accessible online via your domain.


What you will need

  • A Raspberry Pi 4 with a self-hosted Apache server (follow this tutorial to set it up)
  • [optional] A registered domain name. A step-by-step guide on how to get your domain and install SSL certificates can be found here.

Setting up the system and virtual environment

This tutorial was heavily inspired by a fantastic DeepLearning.AI FourthBrain workshop on how to build and deploy a chatbot in no time. I encourage you to watch the original videos for more information.

First you will have to ensure that all required packages are installed on your Raspberry Pi.

sudo apt-get update
sudo apt-get install python3-pip python3-dev build-essential libssl-dev libffi-dev python3-setuptools
sudo -H pip3 install --upgrade pip
sudo apt-get install python3-venv

We will deploy the chatbot in a virtual environment to ensure it’s appropriately separated from system processes.

As we are going to deploy the bot on our Apache server later, I’m already creating my folder in the /var/www/html directory (more information here). If you plan to use your bot locally on your machine, you can keep it in your home directory. Next we will initiate the virtual environment so that we can load all further libraries into it.

mkdir /var/www/html/chatbot
cd /var/www/html/chatbot
python3 -m venv chatbotenv
source chatbotenv/bin/activate

Make sure that you are in the virtual environment: a (chatbotenv) tag should be visible in your command line.

Now you can install the remaining libraries you will require for the chatbot to work. Regardless of which version of Python you are using, once the virtual environment is active, always use the pip command (not pip3).

pip install wheel
pip install gunicorn flask

Installing Pytorch is a bit more difficult, as the newest version did not run on the RPI when I tried it. You can go back to an older, stable version instead.

python3 -m pip install torch==1.9.0 -f

Finally we will need the Hugging Face transformers library, which will allow us to download and run the pre-trained language model. If you run into trouble installing it, please check the Troubleshooting section below for some tips and tricks on how to get it working.

pip install transformers

Creating the Flask application

For this part of the tutorial we will use two invaluable helpers that will save us a lot of time and effort by providing frameworks for the virtual chatbot and its implementation as a Flask web app.

The bot itself is based on DialoGPT, a large-scale dialogue response generation model trained on 147M multi-turn dialogue from Reddit discussion threads. You can find the full details on Microsoft’s project page. The medium model is hosted on the HuggingFace website, which also comes with very handy instructions on how to use the model. All that is needed are a few lines of code. You can find them on the Model card here.

Once we have the code for the language model we still need to integrate it into a flask application. For this, again, we will take some inspiration from an already existing Flask web implementation, the Gotham Chatbot, which has been developed to run ChatterBot, another open source chatbot based on machine learning. If you prefer you can also simply use the Gotham Chatbot code instead of the transformer-based language model. The advantage of this model is that it will save and learn from your input.

Download the code for the Gotham Chatbot from github into your chatbot folder using git. If this doesn’t work you can also manually download the zip file from github. Then install the required modules directly from the requirements.txt file (please note that you actually won’t need many of these modules, as we will change the application code to use the transformer model).

Ensure that you are still in the virtual environment before you install any additional modules using pip.

sudo apt install git
cd /var/www/html/chatbot
git clone .
pip install -r requirements.txt

Now we are going to make a few changes to, the main file where our code for the chatbot lives.

sudo nano
An example chatbot

In the following block I’ve combined the transformer model with the flask implementation. Update the code in to include all of the below.

from flask import Flask, render_template, request
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

app = Flask(__name__)

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

def home():
    return render_template("index.html")

def get_bot_response():
    userText = request.args.get('msg')

    #Chat for 5 lines
    for step in range(5):
        # encode the new user input, add the eos_token and return a tensor in Pytorch
        new_user_input_ids = tokenizer.encode(userText + tokenizer.eos_token, return_tensors='pt')

        # append the new user input tokens to the chat history
        bot_input_ids =[chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

        # generated a response while limiting the total chat history to 1000 tokens, 
        chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

        # pretty print last ouput tokens from bot
        return str("Bot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

if __name__ == "__main__":'', debug=True, use_reloader=False)

This is the main code that will run your chatbot. The script will terminate after 5 lines, as this is the range declared in the code. If you prefer a lengthier dialog you can increase the range. Save and close the file when you are finished.

Importantly, note the use_reloader=False command at the end, which is key to getting the transformer-based bot to work, as the app will otherwise keep reloading and you will not receive any bot responses after your initial prompt. If you are working with the Chatterbot model use use_reloader=True.

Testing the Flask application

First-off a huge thank you to this Digital Ocean tutorial on Flask applications, which has made my life a lot easier!

We now have to confirm that everything is working using Flask on our local machine. You will need to open port 5000 to be able to test the application using the Flask development server.

sudo ufw allow 5000

Now you are ready to test your chatbot Flask application locally.

(chatbotenv) python3

You should see a confirmation message in your terminal. It will provide a range of details, noting that the Flask app has been served and is active on the localhost. If you type the IP of your RPI into your browser, you should be able to see your chatbot and start a first conversation with it.

In case this approach does not work, try changing the last line in the file to your RPI’s IP directly.


Serving the Flask application with Gunicorn

Next we have to start configuring Gunicorn, the Python WSGI HTTP Server we will use to serve our Flask application. First we have to define the WSGI file to tell Gunicorn where to find the app. Create the file in your chatbot directory.

(chatbotenv) sudo nano cd /var/www/html/chatbot/

Now enter the following details. If you have used a different name for enter it instead of from 'app'.

from app import app

if __name__ == "__main__":

Now it’s time to test whether Gunicorn is able to serve your chatbot (it should!). For this you will have to bind it to the wsgi file we just created. We also need a few extra tweaks for the gunicorn configuration to make this specific project work.

cd /var/www/html/chatbot
gunicorn --bind wsgi:app --preload --workers 3 --threads 3 --proxy-protocol

Again, if you can’t find your application by typing your RPI’s IP:5000 in a browser, try specifying the IP directly in the --bind command.

The --preload configuration is necessary to download the language model onto your server and initialise it. Without this configuration your website will not load. The Gunicorn workers will be terminated before they can finish the initialisation. The amount of workers does not make much of a difference at this point, as we are just testing. However, having at least a few threads assigned to each worker is key to getting your chatbot to respond. More on this later. In addition, the --proxy-protocol configuration will be required later to make Gunicorn communicate with Apache server.

You should now see the Gunicorn output, confirming that it is listening at the specified port and booting up workers.

Visit your server’s IP:5000 and have a friendly chat with your bot to confirm that it is up and running. The initial load can take a few minutes so don’t be discouraged if you get an error message at first.

Well done.

You have completed the most important part. We can now deactivate the virtual environment.

(chatbotenv) deactivate

Next we have to create the systemd Gunicorn service file to initialise it as a service and add all the extra configuration we discussed before.

sudo nano /etc/systemd/system/chatbot.service

You can copy the specifics for this file below. Please ensure to change any user names/paths as required for your own setup. It is recommended to use the www-data group to ensure successful communication between servers. We will specify the path to our virtual environment in the Gunicorn executable, which we have already set up with all the required libraries and tested.

Description=Gunicorn instance to serve the chatbot

ExecStart=/var/www/html/chatbot/chatbotenv/bin/gunicorn --preload --workers 3 --threads 3 --proxy-protocol --bind unix:chatbot.sock -m 007 wsgi:app

A word on workers

The Gunicorn documentation recommends to try a number of workers that is between 2-4x the number of CPU cores [2-4 x $(NUM_CORES)] for the initial setup and then vary this number depending on specific requirements. For our little project 3 workers should be enough, but feel free to adjust if your chatbot is being accessed a lot. In case you run into troubles with your app, reading the documentation is usually a helpful starting point. Many of the issues that came up for me during the initial setup could be solved by tweaking the configuration.

Excellent! Now we can start the chatbot service, enable it at boot and check its status.

sudo systemctl start chatbot
sudo systemctl enable chatbot
sudo systemctl status chatbot

You should now see a confirmation that your service is active (running), the number of workers handling the requests and their output.

If you see any errors at this stage, make sure to resolve them before you progress to the next step. Take a look at the troubleshooting section below for some tips and tricks.

Configuring Apache

The Gunicorn server is finally up and running and waits for requests on the socket file in the chatbot directory. Now we need Apache to deal with the requests that come in from the web and connect the two. You can do this easily with a few tweaks in the configuration.

First, create a new server block configuration file in Apache’s sites-available directory, similarly to what we did for our project with multiple websites. We will create a new virtual host so that we can link the domain to our files on the server.

For this tutorial I’m assuming you already have your own domain on which to host the chatbot (termed “” below). In case you don’t you can find a few more pieces of advice in this guide.

To keep things easy, we will call the configuration file chatbot.conf.

sudo nano /etc/apache2/sites-available/chatbot.conf

At first we will only configure port 80, wich is the standard HTTP connection. Check out this tutorial if you want to learn more about the difference between HTTP and HTTPS. It is important to include information about the proxying parameters to set, as we need to pass the requests to the socket using ProxyPass and ProxyPassReverse directives.

You might have noticed the --proxy-protocol command earlier in the Gunicorn configuration – this is necessary to ensure successful communication with Apache, as Gunicorn is now placed behind the Apache web server.

<VirtualHost *:80>
    ErrorLog ${APACHE_LOG_DIR}/chatbot-error.log
    CustomLog ${APACHE_LOG_DIR}/chatbot-access.log combined
    DocumentRoot /var/www/html/chatbot
    <Location />
    ProxyPass unix:/var/www/html/chatbot/chatbot.sock|
    ProxyPassReverse unix:/var/www/html/chatbot/chatbot.sock|


To enable the new Apache server block configuration you have to link the file to the sites-enabled directory.

sudo ln -s /etc/apache2/sites-available/chatbot.conf /etc/apache2/sites-enabled

You are now ready to restart Apache and test your chatbot by typing your domain name.

sudo systemctl restart apache2

Don’t forget to adjust your firewall and close port 5000 again, as it is no longer needed.

sudo ufw delete allow 5000

Congratulations, your chatbot should now be up and running! Something not working? Have a look at the Troubleshooting section below.

Upgrading to an SSL connection

As before with the WordPress website, we will change the HTTP to an HTTPS connection to keep our web traffic secure. For this we can obtain a free Let’sEncrypt certificate for our domain.

The trusted certbot takes care of this easily. If you need more detailed instructions on how to get a free certificate, check out the previous blog entry.

sudo certbot --apache -d

During this process, certbot will add the SSL configuration to the chatbot.conf file automatically, but sometimes this produces errors. Just to be on the safe side check your chatbot.conf file again after you have the certificate in place.

I have removed the virtual host instructions for port 80, as all traffic will be re-routed to https via port 443.

sudo nano /etc/apache2/sites-available/chatbot.conf

The instructions in your file should look like this.

<IfModule mod_ssl.c>

<VirtualHost *:443>
    SSLEngine On
    SSLProxyEngine On

    ErrorLog ${APACHE_LOG_DIR}/chatbot-error.log
    CustomLog ${APACHE_LOG_DIR}/chatbot-access.log combined
    DocumentRoot /var/www/html/chatbot

    <Location />
        ProxyPass unix:/var/www/html/chatbot/chatbot.sock|
        ProxyPassReverse unix:/var/www/html/chatbot/chatbot.sock|

    SSLCertificateFile /etc/letsencrypt/live/
    SSLCertificateKeyFile /etc/letsencrypt/live/


Reload Apache once this is done.

sudo systemctl restart apache2

You should now be able to reach your bot using and the browser should recognise your website as secure. Well done!


Did you run into trouble during any of the steps above?

  • Error logs

You might want to start by checking a few error logs to find out what the problem might be. The most relevant ones are:

sudo less /var/log/apache2/error.log
sudo less /var/log/apache2/access.log
sudo journalctl -u apache2
sudo journalctl -u chatbot
  • Installing the libraries on the Raspberry Pi

Some of the modules you need might not be straightforward to install on a Raspberry Pi. If the current transformers version does not install due to an error during the tokenizer install, you need to install rust first.

curl --proto '=https' --tlsv1.2 -sSf | sh

Alternatively, you can also try to install an older version of transformers.

pip install transformers==2.5.1

Image credit: Designed by vectorjuice / Freepik

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.