You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pgml-apps/discord-bot/README.md
+38-74Lines changed: 38 additions & 74 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Discord Bot using pgml Python SDK, Langchain, Instructor-xl, and Falcon 7B
2
2
3
-
In this tutorial, we will build a Discord bot that can use markdown files to help answer user inquiries. We will ingest the files, convert their contents into vector embeddings, and save them to Postgres. After indexing the data, the bot will query the collection to retrieve the documents that are most likely to answer the user's question. Then, we will use a simple SQL query utilizing PostgresML to retrieve a completion from the open source Falcon 7b text generation model. Finally, we will return this completion to the user in the Discord channel. We will be using the pgml SDK to simplify the process.
3
+
In this tutorial, we will build a Discord bot that can use markdown files to help answer user inquiries. We will ingest the files, convert their contents into vector embeddings, and save them to Postgres. After indexing the data, the bot will query the collection to retrieve the documents that are most likely to answer the user's question. Then, we will use a simple SQL query utilizing PostgresML to retrieve a completion from the open source Falcon-7B-Instruct text generation model. Finally, we will return this completion to the user in the Discord channel. We will be using the [pgml python SDK](https://pypi.org/project/pgml/) to simplify the process.
4
4
5
5
In this project, we will be working with three files:
6
6
@@ -20,7 +20,7 @@ To create a Discord bot, you will need to create a Discord bot account and get a
20
20
21
21
Next, set the name of the Discord channel you would like the bot to listen to. Set this to the variable `DISCORD_CHANNEL` in your .env file.
22
22
23
-
We will be using the pgml Python SDK to create, store, and query our vectors. So, if you don't already have an account there, you can create one here: https://postgresml.org/. You can select the free serverless option and will be given a connection string. Set this connection string to the variable `PGML_CONNECTION_STR` in your .env file.
23
+
We will be using the pgml Python SDK to create, store, and query our vectors. So, if you don't already have an account there, you can create one here: https://postgresml.org/. You can select the free serverless option and will be given a connection string. Set this connection string to the variable `pgml_CONNECTION_STR` in your .env file.
24
24
25
25
Next, you will want to add the markdown files you would like to use into the `./content` folder. Set the path to this folder to the variable `CONTENT_PATH` in your .env file.
26
26
@@ -34,33 +34,25 @@ Open and run the cells in the `./ingest.ipynb` notebook. If you have set all of
34
34
35
35
Let's take a look at what is happening in the notebook.
36
36
37
-
1. We load in the markdown files from the path we passed in, using Lanchain's document loader.
38
-
2. We convert this array of documents to an array of dictionaries in the format expected by the PGML SDK.
37
+
1. We load in the markdown files from the path we passed in, using Langchain's document loader.
38
+
2. We convert this array of documents to an array of dictionaries in the format expected by the pgml SDK.
39
39
40
40
```
41
-
42
41
docs = [{text: 'foo'}, {text: 'bar'}, ...]
43
-
44
42
```
45
43
46
-
1. We create a PGML collection upserting those documents into a PostgreML Collection.
44
+
1. We create a pgml collection upserting those documents into a collection.
1. We chunk those documents into smaller sizes and embed those chunks using the Instructor-XL model.
57
52
58
53
```
59
-
60
54
collection.generate_chunks()
61
-
62
55
collection.generate_embeddings()
63
-
64
56
```
65
57
66
58
Now that our data is properly indexed, we can start our bot server to handle incoming requests, using the data we just ingested to help answer questions.
@@ -72,9 +64,7 @@ For our bot server, we are using the popular library [discord.py](https://discor
72
64
To start the bot server, you can run the following command in your terminal:
73
65
74
66
```
75
-
76
67
python start.py
77
-
78
68
```
79
69
80
70
If everything was set up correctly in earlier steps, your bot should be fully functional.
@@ -84,10 +74,9 @@ But since it's good to know how things are working, let's take a look at the cod
84
74
In the `start.py` file, you will see the following code:
This code will initialize the bot class with your PostgreSQL connection string and then start the Discord bot with the collection name, from which you previously saved your data in the previous step, and Discord token.
@@ -109,80 +97,56 @@ We also declared the `on_message` function that is called when a message is sent
109
97
110
98
When a message is handled by this `on_message` function, we do a few things:
111
99
112
-
1. Using the PGML SDK, we run:
100
+
1. Using the pgml SDK, we run:
113
101
114
102
```
115
-
116
103
collection.vector_search(
117
-
118
-
query,
119
-
120
-
top_k=3,
121
-
122
-
model_id=2,
123
-
124
-
splitter_id=2,
125
-
126
-
query_parameters={"instruction": "Represent the question for retrieving supporting documents: "},
127
-
104
+
query,
105
+
top_k=3,
106
+
model_id=2,
107
+
splitter_id=2,
108
+
query_parameters={"instruction": "Represent the question for retrieving supporting documents: "},
128
109
)
129
-
130
110
```
131
111
132
112
This is going to return the top 3 documents that are most similar to the user's message.
133
113
134
-
1. We then concatenate the text of those documents into a single string and add it to our prompt text, which looks like:
114
+
2. We then concatenate the text of those documents into a single string and add it to our prompt text, which looks like:
135
115
136
-
````
137
-
138
-
Use the context, which is delimited by three back ticks, below to help answer the question.
139
-
140
-
context: ```{context}```
116
+
```
117
+
Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know!"
141
118
142
-
{user_message}
119
+
Context:
120
+
{context}
143
121
144
-
````
122
+
QUESTION<<{message_content}
123
+
ANSWER<<
124
+
```
145
125
146
-
1. Now that we have our prompt ready, we can make a Falcon completion. We will get this completion by executing a SQL query that uses `pgml.transform` function.
126
+
3. Now that we have our prompt ready, we can make a Falcon completion. We will get this completion by executing a SQL query that uses `pgml.transform` function.
1. Now that we have the response from Falcon, we need to clean the response text up a bit before returning the bot's answer. Since the completion text includes the original prompt, we will remove that from the generated text in the `prepare_response` function.
185
-
2. Finally, we will send the response back to the Discord channel.
148
+
4. Now that we have the response from Falcon, we need to clean the response text up a bit before returning the bot's answer. Since the completion text includes the original prompt, we will remove that from the generated text in the `prepare_response` function.
149
+
5. Finally, we will send the response back to the Discord channel.
returnf"""Use the context, which is delimited by three *'s, below to help answer the question.\ncontext: {context}\n{message_content}"""
143
+
returnf"""Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know my lord!"
144
+
145
+
Context:
146
+
{context}
147
+
QUESTION<<{message_content}
148
+
ANSWER<<"""
139
149
140
150
# Prepare the bot's response by removing the original prompt from the generated text
0 commit comments