Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 6695538

Browse filesBrowse files
committed
clean up
1 parent e59df3c commit 6695538
Copy full SHA for 6695538

File tree

3 files changed

+57
-197
lines changed
Filter options

3 files changed

+57
-197
lines changed

‎pgml-apps/discord-bot/README.md

Copy file name to clipboardExpand all lines: pgml-apps/discord-bot/README.md
+38-74Lines changed: 38 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Discord Bot using pgml Python SDK, Langchain, Instructor-xl, and Falcon 7B
22

3-
In this tutorial, we will build a Discord bot that can use markdown files to help answer user inquiries. We will ingest the files, convert their contents into vector embeddings, and save them to Postgres. After indexing the data, the bot will query the collection to retrieve the documents that are most likely to answer the user's question. Then, we will use a simple SQL query utilizing PostgresML to retrieve a completion from the open source Falcon 7b text generation model. Finally, we will return this completion to the user in the Discord channel. We will be using the pgml SDK to simplify the process.
3+
In this tutorial, we will build a Discord bot that can use markdown files to help answer user inquiries. We will ingest the files, convert their contents into vector embeddings, and save them to Postgres. After indexing the data, the bot will query the collection to retrieve the documents that are most likely to answer the user's question. Then, we will use a simple SQL query utilizing PostgresML to retrieve a completion from the open source Falcon-7B-Instruct text generation model. Finally, we will return this completion to the user in the Discord channel. We will be using the [pgml python SDK](https://pypi.org/project/pgml/) to simplify the process.
44

55
In this project, we will be working with three files:
66

@@ -20,7 +20,7 @@ To create a Discord bot, you will need to create a Discord bot account and get a
2020

2121
Next, set the name of the Discord channel you would like the bot to listen to. Set this to the variable `DISCORD_CHANNEL` in your .env file.
2222

23-
We will be using the pgml Python SDK to create, store, and query our vectors. So, if you don't already have an account there, you can create one here: https://postgresml.org/. You can select the free serverless option and will be given a connection string. Set this connection string to the variable `PGML_CONNECTION_STR` in your .env file.
23+
We will be using the pgml Python SDK to create, store, and query our vectors. So, if you don't already have an account there, you can create one here: https://postgresml.org/. You can select the free serverless option and will be given a connection string. Set this connection string to the variable `pgml_CONNECTION_STR` in your .env file.
2424

2525
Next, you will want to add the markdown files you would like to use into the `./content` folder. Set the path to this folder to the variable `CONTENT_PATH` in your .env file.
2626

@@ -34,33 +34,25 @@ Open and run the cells in the `./ingest.ipynb` notebook. If you have set all of
3434

3535
Let's take a look at what is happening in the notebook.
3636

37-
1. We load in the markdown files from the path we passed in, using Lanchain's document loader.
38-
2. We convert this array of documents to an array of dictionaries in the format expected by the PGML SDK.
37+
1. We load in the markdown files from the path we passed in, using Langchain's document loader.
38+
2. We convert this array of documents to an array of dictionaries in the format expected by the pgml SDK.
3939

4040
```
41-
4241
docs = [{text: 'foo'}, {text: 'bar'}, ...]
43-
4442
```
4543

46-
1. We create a PGML collection upserting those documents into a PostgreML Collection.
44+
1. We create a pgml collection upserting those documents into a collection.
4745

4846
```
49-
5047
collection = pgml.create_or_get_collection(collection_name)
51-
5248
collection.upsert_documents(docs)
53-
5449
```
5550

5651
1. We chunk those documents into smaller sizes and embed those chunks using the Instructor-XL model.
5752

5853
```
59-
6054
collection.generate_chunks()
61-
6255
collection.generate_embeddings()
63-
6456
```
6557

6658
Now that our data is properly indexed, we can start our bot server to handle incoming requests, using the data we just ingested to help answer questions.
@@ -72,9 +64,7 @@ For our bot server, we are using the popular library [discord.py](https://discor
7264
To start the bot server, you can run the following command in your terminal:
7365

7466
```
75-
7667
python start.py
77-
7868
```
7969

8070
If everything was set up correctly in earlier steps, your bot should be fully functional.
@@ -84,10 +74,9 @@ But since it's good to know how things are working, let's take a look at the cod
8474
In the `start.py` file, you will see the following code:
8575

8676
```
87-
8877
# get environment variables
8978
90-
pg_connection_string = os.getenv("PGML_CONNECTION_STR")
79+
pg_connection_string = os.getenv("pgml_CONNECTION_STR")
9180
9281
# ...
9382
@@ -98,7 +87,6 @@ pgml_bot = Bot(conninfo=pg_connection_string)
9887
## start discord bot
9988
10089
pgml_bot.start(collection_name, discord_token)
101-
10290
```
10391

10492
This code will initialize the bot class with your PostgreSQL connection string and then start the Discord bot with the collection name, from which you previously saved your data in the previous step, and Discord token.
@@ -109,80 +97,56 @@ We also declared the `on_message` function that is called when a message is sent
10997

11098
When a message is handled by this `on_message` function, we do a few things:
11199

112-
1. Using the PGML SDK, we run:
100+
1. Using the pgml SDK, we run:
113101

114102
```
115-
116103
collection.vector_search(
117-
118-
query,
119-
120-
top_k=3,
121-
122-
model_id=2,
123-
124-
splitter_id=2,
125-
126-
query_parameters={"instruction": "Represent the question for retrieving supporting documents: "},
127-
104+
query,
105+
top_k=3,
106+
model_id=2,
107+
splitter_id=2,
108+
query_parameters={"instruction": "Represent the question for retrieving supporting documents: "},
128109
)
129-
130110
```
131111

132112
This is going to return the top 3 documents that are most similar to the user's message.
133113

134-
1. We then concatenate the text of those documents into a single string and add it to our prompt text, which looks like:
114+
2. We then concatenate the text of those documents into a single string and add it to our prompt text, which looks like:
135115

136-
````
137-
138-
Use the context, which is delimited by three back ticks, below to help answer the question.
139-
140-
context: ```{context}```
116+
```
117+
Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know!"
141118
142-
{user_message}
119+
Context:
120+
{context}
143121
144-
````
122+
QUESTION<<{message_content}
123+
ANSWER<<
124+
```
145125

146-
1. Now that we have our prompt ready, we can make a Falcon completion. We will get this completion by executing a SQL query that uses `pgml.transform` function.
126+
3. Now that we have our prompt ready, we can make a Falcon completion. We will get this completion by executing a SQL query that uses `pgml.transform` function.
147127

148128
```
149-
150129
async def run_transform_sql(self, context, message_content):
151-
152-
prompt = self.prepare_prompt(context, message_content)
153-
154-
sql_query = """SELECT pgml.transform(
155-
156-
task => '{
157-
158-
"model": "tiiuae/falcon-7b-instruct",
159-
160-
"device_map": "auto",
161-
162-
"torch_dtype": "bfloat16",
163-
164-
"trust_remote_code": true
165-
166-
}'::JSONB,
167-
168-
args => '{
169-
170-
"max_new_tokens": 100
171-
172-
}'::JSONB,
173-
174-
inputs => ARRAY[%s]
175-
176-
) AS result"""
177-
178-
sql_params = (prompt,)
179-
180-
return await self.run_query(sql_query, sql_params)
130+
prompt = self.prepare_prompt(context, message_content)
131+
sql_query = """SELECT pgml.transform(
132+
task => '{
133+
"model": "tiiuae/falcon-7b-instruct",
134+
"device_map": "auto",
135+
"torch_dtype": "bfloat16",
136+
"trust_remote_code": true
137+
}'::JSONB,
138+
args => '{
139+
"max_new_tokens": 100
140+
}'::JSONB,
141+
inputs => ARRAY[%s]
142+
) AS result"""
143+
sql_params = (prompt,)
144+
return await self.run_query(sql_query, sql_params)
181145
182146
```
183147

184-
1. Now that we have the response from Falcon, we need to clean the response text up a bit before returning the bot's answer. Since the completion text includes the original prompt, we will remove that from the generated text in the `prepare_response` function.
185-
2. Finally, we will send the response back to the Discord channel.
148+
4. Now that we have the response from Falcon, we need to clean the response text up a bit before returning the bot's answer. Since the completion text includes the original prompt, we will remove that from the generated text in the `prepare_response` function.
149+
5. Finally, we will send the response back to the Discord channel.
186150

187151
## Final Remarks
188152

‎pgml-apps/discord-bot/bot.py

Copy file name to clipboardExpand all lines: pgml-apps/discord-bot/bot.py
+14-4Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,9 @@ async def on_ready():
9494
print(f'We have logged in as {self.discord_client.user}')
9595

9696
@self.discord_client.event
97-
async def on_message(message):
97+
async def on_message(message):
98+
print(f"Message from {message.author}: {message.content}")
99+
98100
if message.author != self.discord_client.user and message.channel.name == channel_name:
99101
await self.handle_message(collection_name, message)
100102

@@ -107,13 +109,16 @@ async def handle_message(self, collection_name, message):
107109
res = await self.query_collection(collection_name, message.content)
108110
print(f"Found {len(res)} results")
109111
context = self.build_context(res)
112+
print("Running Completion query")
110113
completion = await self.run_transform_sql(context, message.content)
114+
print("Preparing response")
111115
response = self.prepare_response(completion, context, message.content)
116+
print("Sending response")
112117
await message.channel.send(response)
113118

114119
# Build the context for the message from search results
115120
def build_context(self, res):
116-
return '\n'.join([f'***{r["chunk"]}***' for r in res])
121+
return '\n'.join([f'{r["chunk"]}' for r in res])
117122

118123
# Run a SQL function 'pgml.transform' to get a generated answer for the message
119124
async def run_transform_sql(self, context, message_content):
@@ -126,7 +131,7 @@ async def run_transform_sql(self, context, message_content):
126131
"trust_remote_code": true
127132
}'::JSONB,
128133
args => '{
129-
"max_new_tokens": 100
134+
"max_new_tokens": 200
130135
}'::JSONB,
131136
inputs => ARRAY[%s]
132137
) AS result"""
@@ -135,7 +140,12 @@ async def run_transform_sql(self, context, message_content):
135140

136141
# Prepare the prompt to be used in the SQL function
137142
def prepare_prompt(self, context, message_content):
138-
return f"""Use the context, which is delimited by three *'s, below to help answer the question.\ncontext: {context}\n{message_content}"""
143+
return f"""Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know my lord!"
144+
145+
Context:
146+
{context}
147+
QUESTION<<{message_content}
148+
ANSWER<<"""
139149

140150
# Prepare the bot's response by removing the original prompt from the generated text
141151
def prepare_response(self, completion, context, message_content):

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.