From 891519b634645176ae9f902e3b1ceaec686686a0 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 05:46:59 +0000 Subject: [PATCH 01/26] docs: add llm kmeans notebook as an included example --- .../bq_dataframes_llm_kmeans.ipynb | 941 ++++++++++++++++++ 1 file changed, 941 insertions(+) create mode 100644 notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb new file mode 100644 index 0000000000..0ba0561b7c --- /dev/null +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -0,0 +1,941 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2023 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Use BigQuery DataFrames to cluster and characterize complaints\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Colab Run in Colab\n", + " \n", + " \n", + " \n", + " \"GitHub\n", + " View on GitHub\n", + " \n", + " \n", + " \n", + " \"Vertex\n", + " Open in Vertex AI Workbench\n", + " \n", + "
" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "The goal of this notebook is to demonstrate a comment characterization algorithm for an online business. We will accomplish this using [Google's PaLM 2](https://ai.google/discover/palm2/) and [KMeans clustering](https://en.wikipedia.org/wiki/K-means_clustering) in three steps:\n", + "\n", + "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 10000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", + "2. Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don't yet know _why_ these complaints are similar.\n", + "3. Simply ask PaLM2TextGenerator in English what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n", + "\n", + "We will tie these pieces together in Python using BigQuery DataFrames. [Click here](https://cloud.google.com/bigquery/docs/dataframes-quickstart) to learn more about BigQuery DataFrames!" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dataset\n", + "\n", + "This notebook uses the [CFPB Consumer Complaint Database](https://console.cloud.google.com/marketplace/product/cfpb/complaint-database)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Costs\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* BigQuery (compute)\n", + "* BigQuery ML\n", + "\n", + "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models),\n", + "and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#bqml),\n", + "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n", + "to generate a cost estimate based on your projected usage." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "xckgWno6ouHY" + }, + "source": [ + "## Step 1: Text embedding " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Project Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "R7STCS8xB5d2" + }, + "outputs": [], + "source": [ + "import bigframes.pandas as bpd\n", + "\n", + "bpd.options.bigquery.project = \"bigframes-dev\"\n", + "bpd.options.bigquery.location = \"us\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v6FGschEowht" + }, + "source": [ + "Data Input" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "zDSwoBo1CU3G" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job ca9487e2-aac1-466d-a74c-bf1d414b7557 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 311d2026-8f38-4c76-a4eb-40f6a1810fd4 is DONE. 2.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "input_df = bpd.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "tYDoaKgJChiq" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job e9a7abc7-6fca-4a91-a68c-8feb3ac9b942 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 4e0125c0-bd85-4449-a9b8-a68ea3407919 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrative
0Those Accounts Are Not mine, I never authorize...
11Legal Department, This credit dispute is being...
12Hello my name is XXXX XXXX, I have looked into...
15I HAVE REVIEWED MY CREDIT REPORT AND FOUND SOM...
16On my credit report these are not my items rep...
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative\n", + "0 Those Accounts Are Not mine, I never authorize...\n", + "11 Legal Department, This credit dispute is being...\n", + "12 Hello my name is XXXX XXXX, I have looked into...\n", + "15 I HAVE REVIEWED MY CREDIT REPORT AND FOUND SOM...\n", + "16 On my credit report these are not my items rep...\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n", + "issues_df.head(n=5) # View the first five complaints" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "OltYSUEcsSOW" + }, + "outputs": [], + "source": [ + "# Choose 10,000 complaints randomly\n", + "downsampled_issues_df = issues_df.sample(n=10000)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "Wl2o-NYMoygb" + }, + "source": [ + "Generate the text embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "li38q8FzDDMu" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job 5422de4b-789d-4430-ab73-3a238d7b5238 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n", + "\n", + "model = PaLM2TextEmbeddingGenerator() # No connection id needed" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "cOuSOQ5FDewD" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job 25ed7dd8-829b-4418-8f52-2ba9c5c51dec is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job fde1380f-a308-440d-a9a3-a7c3db902e0a is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job c9bec87e-524a-4206-84d6-f9f87fc12e35 is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 0ac2dee7-1c50-4b63-aa44-16ad43265c5d is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job bd615e5d-8153-45bf-a14e-e5997cbaa962 is DONE. 61.5 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
text_embedding
355[0.0032048337161540985, 0.018182063475251198, ...
414[-0.025085292756557465, -0.05178036540746689, ...
650[0.0020703477784991264, -0.027994778007268906,...
969[-0.009529653936624527, -0.03827650472521782, ...
1009[0.0190849881619215, -0.026688968762755394, 0....
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " text_embedding\n", + "355 [0.0032048337161540985, 0.018182063475251198, ...\n", + "414 [-0.025085292756557465, -0.05178036540746689, ...\n", + "650 [0.0020703477784991264, -0.027994778007268906,...\n", + "969 [-0.009529653936624527, -0.03827650472521782, ...\n", + "1009 [0.0190849881619215, -0.026688968762755394, 0....\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Will take ~5 minutes to compute the embeddings\n", + "predicted_embeddings = model.predict(downsampled_issues_df)\n", + "# Notice the lists of numbers that are our text embeddings for each complaint\n", + "predicted_embeddings.head() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4H_etYfsEOFP" + }, + "outputs": [], + "source": [ + "# Join the complaints with their embeddings in the same DataFrame\n", + "combined_df = downsampled_issues_df.join(predicted_embeddings)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "OUZ3NNbzo1Tb" + }, + "source": [ + "## Step 2: KMeans clustering" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "AhNTnEC5FRz2" + }, + "outputs": [], + "source": [ + "from bigframes.ml.cluster import KMeans\n", + "\n", + "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "6poSxh-fGJF7" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job 803f2250-b38d-4215-8941-b668dc18c023 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 04fe13b0-d07c-4490-ace5-7602830538f4 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 3735b6fd-0c0c-4ad1-83b1-77c09e7c4c68 is DONE. 1.4 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 1c597324-756b-4c96-9520-966f839c3e14 is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job acc3f4ab-71e1-4e51-938d-a447db70dd73 is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 1d11a4e3-7dd3-4619-bf46-f9d842abe83a is DONE. 160.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CENTROID_ID
3554
4142
6501
9695
10095
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " CENTROID_ID\n", + "355 4\n", + "414 2\n", + "650 1\n", + "969 5\n", + "1009 5\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Use KMeans clustering to calculate our groups. Will take ~5 minutes.\n", + "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", + "clustered_result = cluster_model.predict(combined_df[[\"text_embedding\"]])\n", + "# Notice the CENTROID_ID column, which is the ID number of the group that\n", + "# each complaint belongs to.\n", + "clustered_result.head(n=5)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "# Join the group number to the complaints and their text embeddings\n", + "combined_clustered_result = combined_df.join(clustered_result)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "id": "21rNsFMHo8hO" + }, + "source": [ + "## Step 3: Summarize the complaints" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "2E7wXM_jGqo6" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job cf667104-32c3-4ca9-96ac-d044823096c4 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 8088b224-bf24-4cf8-9858-c5bb47c0d3ee is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Using bigframes, with syntax identical to pandas,\n", + "# filter out the first and second groups\n", + "cluster_1_result = combined_clustered_result[\n", + " combined_clustered_result[\"CENTROID_ID\"] == 1][[\"consumer_complaint_narrative\"]\n", + "]\n", + "cluster_1_result_pandas = cluster_1_result.head(5).to_pandas()\n", + "\n", + "cluster_2_result = combined_clustered_result[\n", + " combined_clustered_result[\"CENTROID_ID\"] == 2][[\"consumer_complaint_narrative\"]\n", + "]\n", + "cluster_2_result_pandas = cluster_2_result.head(5).to_pandas()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "ZNDiueI9IP5e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "comment list 1:\n", + "1. I bought my home XX/XX/XXXX for the amount of {$220000.00}. The home was appraised closing with a value of {$260000.00} at closing. When purchasing the home I did not provide a downpayment in the amount of 20 % of the home value, therefore I had to purchase private mortgage insurance ( P.M.I. ) on the home until 20 % of the home value was paid off. 20 % of {$260000.00} is {$53000.00}. This means I would have to owe ( $ XXXX- {$53000.00} ) {$210000.00} or less for the P.M.I. to be taken off of my monthly mortgage payments. According to law, the lender should take the P.M.I. off of my loan once the 20 % is met. At the time of closing my borrower did not provide me a PMI disclosure form to identify when the 20 % mark would be met. \n", + "\n", + "When closing on my home my loan was thru XXXX XXXX XXXX, for the past 5+ years my loan was taken over by Wells Fargo Home Mortgage and they are my current lenders. I have never missed or been late on a mortgage payment. \n", + "\n", + "In XX/XX/XXXX I reached the 20 % mark on the value of my home. Starting XX/XX/XXXX I have owed {$210000.00} or less on my home. As of XX/XX/XXXX I owe {$190000.00}, this is far beyond owning 80 % or less of my home value. \n", + "\n", + "I have reached out to Wells Fargo to have the PMI removed from my mortgage payments, but they refuse. Wells Fargo has stated that I must pay off 22 % of the \" loan '' before PMI can be taken off, and that the 20 % is not based on the value of the home at closing. \n", + "\n", + "It was never identified to me at closing that the 20 % was based the \" value of the loan '' and not the appraised value of the home at closing. This information is new to me and in XX/XX/XXXX I do not believe this was the agreement at closing. I would like to receive some evidence that I agreed to an otherwise condition against having the PMI be based on the value of the home at closing.\n", + "2. I paid my two months mortgage amount of {$3300.00} ( XX/XX/2020 ) and {$3300.00} ( XX/XX/2020 ) to my lender ( XXXX XXXX - XXXX XXXX ). Then I also received another payment notice from TIAA Bank that said my loan was sold to them on XX/XX/2020. I have not received any Goodbye Letter from my lender, nor did my Welcome Letter from TIAA Bank. I provided my bank statement for those two payments to TIAA Bank shows the proof of payments, and never got a reply from them. I called XXXX XXXX and request the Goodbye letter, which indicates - The servicing of your mortgage loan is being transferred, effective XX/XX/2020. My complaint is the lack of information when the loan was transferred to one servicer to another. I am not properly informed my loan had been transferred. As a result, payments made to either the prior or current servicer around the time of the transfer were not applied to the account.\n", + "3. On XX/XX/XXXX I called Quicken Loans to inquire about Refinancing options. I was in the process of an application with another lender and I was unhappy with their terms, etc. I spoke with XXXX XXXX of Quicken who convinced me to move my business over to Quicken. He stated the refinance would only take approx 30 days to complete. I did so that day and within 2 days had completed all paperwork requirements etc. I was now waiting for the appraisal to take place. Weeks went by and i never heard from anyone. I placed numerous calls, emails, chats to different people and was told that it was a delay due to \" volume ''. Finally, on XXXX the appraisal was completed. Again, silence for days/weeks afterwards. On XXXX I called and spoke with a rep who said the appraisal was \" in hand '' and Quicken and the appraiser have been in constant contact discussing some issues in the report, specifically regarding a \" capping of a water line '' and a possible \" apartment ''. I asked to see the report and she said it will post shortly to my account. It never posted. I again never heard from anyone at Quicken. I called, I emailed, I chatted with Quicken and was repeatedly told the report is not yet finalized. They also told me there were NO ISSUES regarding the appraisal -- the delay was purely due to volume. I explained that someone already told me of a possible issue and every person i spoke to denied this fact. Finally, on XX/XX/XXXX the \" dashboard '' for my account with Quicken was updated ( still no word from anyone and no copy of my appraisal was provided ) and it showed a drastic change to my refinance # s. My loan amount was reduced by {$30000.00} approx. and my debt to be paid off with the loan were removed. The entire formulation of the refinance was changed without any explanation or notice to me. I called my banker, I called customer solutions, etc. again and now i was told, \" Oh, the appraisal came in low '' So, bottom line is ... .you need {$13000.00} to {$15000.00} cash to settle this loan at the new rate. '' Again, no mention of the issue with the apartment. I asked for a copy of the appraisal and was finally sent a copy on XXXX. My issue with Quicken is : 1. They took 4 weeks to send an appraiser. 2. They had the results of the appraisal since early XXXX but repeatedly lied to me stating the appraisal had never been seen and was still being created. They strung me along for 8 weeks. I lost my other offer. They \" low balled '' my appraisal with XXXX in order to \" kill my deal '' for reasons other than the apartment conditions. They did not want to take on this loan but they knew they had strung me along for 8 weeks and figured the low appraisal would be their ticket out. I have a XXXX bed XXXX bath home and comps are all running in my immediate area for $ XXXX and they came back with an appraisal of {$400000.00}. Absurd. My kitchen and bathrooms were completely remodeled. {$400000.00} is a ridiculous appraisal and they know it. I sent them XXXX recent comps in my immediate neighborhood of $ XXXX sold values. When i did that, then and only then did they say, \" well ... you have to satisfy the conditions of the appraisal also. Those conditions are ... rip out the cabinets, sink etc in lower level or obtain a C/O/permit. \" Quicken had no intention of closing on this refinance ... they knew that was the course they were taking in early XXXX but they chose to string me along for a total of 8 weeks. It was ONLY due to my insistence on XX/XX/XXXX that this issue be addressed that they finally showed me a copy of the appraisal. I lost my other connection/relationship and find their practices unfair and self-serving. No one should have to go through this again, hopefully. I have copies of emails, chats, etc. showing how they lied to me continuously and misled me. I hope this casts a shadow on their reputation and makes them reconsider their business practices. Thanks\n", + "4. Hi, I was looking into buying a home. I never took the step to get pre qualified for a loan because it obviously needed more thought. I would look on XXXX everyday to see if houses were within my budget and eventually I started dealing with a real estate agent. Before she can look for homes she suggested that I get prequalified for a loan with an associate of hers in that field. I started the application but never finished it just because I was unsure if I would get approved or not then come to find out the house I was interested in ( XXXX ) had gotten sold so I let the idea of buying a house go a bit. I get a message from the loan officer that she ran my credit she has a couple questions about my income. I call back instantly wondering why did you run my credit without authorization theres a reason I never finished the application I did not want a hard inquiry and I had kind of backed off. She proceeded to ask do I want to know the results and annoyed obviously I asked well you ran my credit without permission im going to have a hard inquiry on my report now. She never gave me an explanation on WHY she ran my credit & now I lost a lot of points that I work hard for. I really want a solution to this problem because its not right to just run someones credit after the application is obviously not completed.\n", + "5. I called on XXXX XXXX requesting options on how to lower my principal amount. Unfortunately, I went into Income dri ven Repayment program but instant of seeing a deduction of my principal, I see an increase over and over. The amount stills $ XXXX since my graduation date back in XXXX . I mentioned that I worked for th e XXXX XXXX XXXX for 8 years if any portion of my loan could be forgive, they said no. Their option was to add more funds to my payments, which is totally ridiculously. Specially, now that I found out that my position will be eliminated in XXXX XXXX . Therefore, I will be unemployed after XXXX XXXX . It saddens my heart that something that I did to better myself, pursue an XXXX had cost me such of major debt and nobody is willing to help me. I did n't said that I was n't going to pay, I was seeking for assistance on how to lowered my principal and eventually payoff my debt. The education format from XXXX is heavily criticized, all funds paid and was n't top notch education!! Please help me understand why I could n't get any positive outcome fro m Navient.\n", + "\n", + "comment list 2:\n", + "1. There is a charge on my credit report from HSBC that is over 10 years old. I have contacted the company and asked for the contract showing this is a valid debit and they have refused to send what I am asking. All they have sent me is a statement telling me this is a valid date, but no signed contract.\n", + "2. Convergent Outsourcing is attempting to collect on an account that I have no knowledge of and that I have already reported to the credit bureaus as not being my account. I have contacted them asking that they validate not verify the debt that they are attempting to collect from me and derogatorily reporting on my credit reports. I specifically requested signed contracts or other supporting documentation, the only thing that I keep receiving back is that the account has been verified which does not prove that I am obligated to pay them anything which is not the truth because this account does not belong to me. They have reported delinquent information to the credit bureaus since XX/XX/2016 I am asking be deleted.\n", + "3. Hi I am submitting this XXXX XXXX XXXX this isn't any influence and this is not a third party. XXXX has low and unfair credit number for me in their report. I have complained. The problem has not been resolved. my fico has me at a credit score over 719XXXX has me at a score around 590. That is a huge difference. XXXX paints me as a XXXX. my fico say I have good credit. What the heck is going on here. i have almost no debt and my identity was stolen causing my score to drop XXXX i made this clear for 60 days straight with XXXX i spoke to a representative agent name XXXX and XXXX and XXXX from the fraud department I prefer to speak to a XXXX rept but they refused they had me on mute for 4 hours which was hurtful I have a perfect repayment record. I have very low credit utilization. I have three negative credit items outstanding debt now. I have modest but ok income. Social Security. Something is wrong with XXXX. I do not understand why they are abusing consumers .This was a fist step towards attempting resolution. They kept lying telling me they disputed n its not reporting but it keep reporting this inaccurate information without my authorization. They refused or were unable to verify n remove the inquiries and its been 60days n they record the calls n admitted they had my police report n ftc and affidavit That was after attempting to contact XXXX more than 21 times. XXXX is an abusive company. They are supposed to be protecting consumers. They need to be reigned in. they are causing me severe XXXX and stopping me from getting this job offer XXXX now XXXX XXXX XXXX cant provide to my XXXXXXXX XXXX XXXX daughter PLEASE HELP ME PLEASE XXXX XXXX now.with no help.\n", + "4. On XX/XX/XXXX, I recieved a report from XXXX XXXX XXXX XXXX XXXX, XXXX, MD XXXX XXXX ) XXXX, which indicated a closed account from XXXX XXXX auto opened in XX/XX/XXXX, but was removed from my credit report in XXXX, due to being older than 7 years I recieved this credit alert from equifax, XXXX in XXXX that it fell off my credit report. on XX/XX/XXXX, I see it's been placed back on my credit report in XX/XX/XXXX by this agency and when I logged in to see Equifax credit report and look at my closed accounts, XXXX XXXX shows ( Closed Account ) but it's there to view and it's been there for XXXX year and XXXX months, so my complaint is why is it there it shouldn't even show, for XXXX years XXXX months they've had this on my report, I want it removed because eventhough these account show closed, they are still sending out old information that should not be reported. This is causing me to pay more and keeping my credit score down, please enforce this and make them remove any and all closed accounts. Their disclaimer even states that these accounts are removed after 7 years, it's been XXXX, they should remove all of those closed accounts that way this will not happen again, and I'm asking that they be sued because this keeps certain groups of people credit scores down and that's discrimination, its also fraudulent because on one hand their telling us that this information is not being reported it's closed yet it shows up, so they are lying.\n", + "5. -In XX/XX/XXXX, I was sent a notice to my address in Michigan by XXXX XXXX XXXX XXXX XXXX that my debt ( collected after having to live off my card due to house and joblessness ) had been sold to Portfolio Recovery Associates , LLC for {$2200.00}. As I had already moved to Florida, this letter was not forwarded to me in a timely manner. At this time, it showed up on my credit report as a collection debt. \n", + "\n", + "-Once I had obtained the supplemental information provided by Portfolio Recovery Associates LLC as proof of veracity of claim after disputing the collection, I was able to see the aforementioned notice, as well as a statement from Portfolio Recovery Associates containing my account number, the ( now corrected after an updated credit report request by them ) Florida address it was sent to, amount owed, contact information, etc. When I contacted Portfolio Recovery Associates, I was told they could not give me any further information because it had been transferred to be litigated. \n", + "\n", + "-Three years to the day XXXX XX/XX/XXXX XXXX and two states later, apparently a lawsuit was filed against me in an attempt to collect the debt. I was never served a summons. Once I found out about and looked up the case, I saw they had the correct address but an incorrect name, one that was corrected with the credit bureaus two months after said debt was sold to PRA. Thus, the notice of 'summons returned served ' in the court review is incorrect. Once I learned of the suit, I submitted the necessary paper work on my behalf. After that, I didn't hear from them, nor did an updated search return anything. \n", + "\n", + "-In XXXX of this year, I received a notice from the XXXX XXXX XXXX XXXX XXXX stating that the case was to be closed in a month due to lack of prosecution if no action takes place before then. Just before that dead line, the attorneys for PRA XXXX XXXX XXXX XXXX XXXX filed a motion to transfer the case to XXXX XXXX, citing that this was where they ( in fact did not ) serve the original summons. Once this was granted, they received a letter from the Clerk of Court stating they had 30 days to pay the transfer fees or the case would be dismissed. Two days before that due date, they submitted payment at the last minute. As of yet, I have not seen nor received anything from the XXXX XXXX XXXX XXXX regarding the matter.\n", + "\n" + ] + } + ], + "source": [ + "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n", + "prompt1 = 'comment list 1:\\n'\n", + "for i in range(5):\n", + " prompt1 += str(i + 1) + '. ' + \\\n", + " cluster_1_result_pandas[\"consumer_complaint_narrative\"].iloc[i] + '\\n'\n", + "\n", + "prompt2 = 'comment list 2:\\n'\n", + "for i in range(5):\n", + " prompt2 += str(i + 1) + '. ' + \\\n", + " cluster_2_result_pandas[\"consumer_complaint_narrative\"].iloc[i] + '\\n'\n", + "\n", + "print(prompt1)\n", + "print(prompt2)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "BfHGJLirzSvH" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Please highlight the most obvious difference betweenthe two lists of comments:\n", + "comment list 1:\n", + "1. I bought my home XX/XX/XXXX for the amount of {$220000.00}. The home was appraised closing with a value of {$260000.00} at closing. When purchasing the home I did not provide a downpayment in the amount of 20 % of the home value, therefore I had to purchase private mortgage insurance ( P.M.I. ) on the home until 20 % of the home value was paid off. 20 % of {$260000.00} is {$53000.00}. This means I would have to owe ( $ XXXX- {$53000.00} ) {$210000.00} or less for the P.M.I. to be taken off of my monthly mortgage payments. According to law, the lender should take the P.M.I. off of my loan once the 20 % is met. At the time of closing my borrower did not provide me a PMI disclosure form to identify when the 20 % mark would be met. \n", + "\n", + "When closing on my home my loan was thru XXXX XXXX XXXX, for the past 5+ years my loan was taken over by Wells Fargo Home Mortgage and they are my current lenders. I have never missed or been late on a mortgage payment. \n", + "\n", + "In XX/XX/XXXX I reached the 20 % mark on the value of my home. Starting XX/XX/XXXX I have owed {$210000.00} or less on my home. As of XX/XX/XXXX I owe {$190000.00}, this is far beyond owning 80 % or less of my home value. \n", + "\n", + "I have reached out to Wells Fargo to have the PMI removed from my mortgage payments, but they refuse. Wells Fargo has stated that I must pay off 22 % of the \" loan '' before PMI can be taken off, and that the 20 % is not based on the value of the home at closing. \n", + "\n", + "It was never identified to me at closing that the 20 % was based the \" value of the loan '' and not the appraised value of the home at closing. This information is new to me and in XX/XX/XXXX I do not believe this was the agreement at closing. I would like to receive some evidence that I agreed to an otherwise condition against having the PMI be based on the value of the home at closing.\n", + "2. I paid my two months mortgage amount of {$3300.00} ( XX/XX/2020 ) and {$3300.00} ( XX/XX/2020 ) to my lender ( XXXX XXXX - XXXX XXXX ). Then I also received another payment notice from TIAA Bank that said my loan was sold to them on XX/XX/2020. I have not received any Goodbye Letter from my lender, nor did my Welcome Letter from TIAA Bank. I provided my bank statement for those two payments to TIAA Bank shows the proof of payments, and never got a reply from them. I called XXXX XXXX and request the Goodbye letter, which indicates - The servicing of your mortgage loan is being transferred, effective XX/XX/2020. My complaint is the lack of information when the loan was transferred to one servicer to another. I am not properly informed my loan had been transferred. As a result, payments made to either the prior or current servicer around the time of the transfer were not applied to the account.\n", + "3. On XX/XX/XXXX I called Quicken Loans to inquire about Refinancing options. I was in the process of an application with another lender and I was unhappy with their terms, etc. I spoke with XXXX XXXX of Quicken who convinced me to move my business over to Quicken. He stated the refinance would only take approx 30 days to complete. I did so that day and within 2 days had completed all paperwork requirements etc. I was now waiting for the appraisal to take place. Weeks went by and i never heard from anyone. I placed numerous calls, emails, chats to different people and was told that it was a delay due to \" volume ''. Finally, on XXXX the appraisal was completed. Again, silence for days/weeks afterwards. On XXXX I called and spoke with a rep who said the appraisal was \" in hand '' and Quicken and the appraiser have been in constant contact discussing some issues in the report, specifically regarding a \" capping of a water line '' and a possible \" apartment ''. I asked to see the report and she said it will post shortly to my account. It never posted. I again never heard from anyone at Quicken. I called, I emailed, I chatted with Quicken and was repeatedly told the report is not yet finalized. They also told me there were NO ISSUES regarding the appraisal -- the delay was purely due to volume. I explained that someone already told me of a possible issue and every person i spoke to denied this fact. Finally, on XX/XX/XXXX the \" dashboard '' for my account with Quicken was updated ( still no word from anyone and no copy of my appraisal was provided ) and it showed a drastic change to my refinance # s. My loan amount was reduced by {$30000.00} approx. and my debt to be paid off with the loan were removed. The entire formulation of the refinance was changed without any explanation or notice to me. I called my banker, I called customer solutions, etc. again and now i was told, \" Oh, the appraisal came in low '' So, bottom line is ... .you need {$13000.00} to {$15000.00} cash to settle this loan at the new rate. '' Again, no mention of the issue with the apartment. I asked for a copy of the appraisal and was finally sent a copy on XXXX. My issue with Quicken is : 1. They took 4 weeks to send an appraiser. 2. They had the results of the appraisal since early XXXX but repeatedly lied to me stating the appraisal had never been seen and was still being created. They strung me along for 8 weeks. I lost my other offer. They \" low balled '' my appraisal with XXXX in order to \" kill my deal '' for reasons other than the apartment conditions. They did not want to take on this loan but they knew they had strung me along for 8 weeks and figured the low appraisal would be their ticket out. I have a XXXX bed XXXX bath home and comps are all running in my immediate area for $ XXXX and they came back with an appraisal of {$400000.00}. Absurd. My kitchen and bathrooms were completely remodeled. {$400000.00} is a ridiculous appraisal and they know it. I sent them XXXX recent comps in my immediate neighborhood of $ XXXX sold values. When i did that, then and only then did they say, \" well ... you have to satisfy the conditions of the appraisal also. Those conditions are ... rip out the cabinets, sink etc in lower level or obtain a C/O/permit. \" Quicken had no intention of closing on this refinance ... they knew that was the course they were taking in early XXXX but they chose to string me along for a total of 8 weeks. It was ONLY due to my insistence on XX/XX/XXXX that this issue be addressed that they finally showed me a copy of the appraisal. I lost my other connection/relationship and find their practices unfair and self-serving. No one should have to go through this again, hopefully. I have copies of emails, chats, etc. showing how they lied to me continuously and misled me. I hope this casts a shadow on their reputation and makes them reconsider their business practices. Thanks\n", + "4. Hi, I was looking into buying a home. I never took the step to get pre qualified for a loan because it obviously needed more thought. I would look on XXXX everyday to see if houses were within my budget and eventually I started dealing with a real estate agent. Before she can look for homes she suggested that I get prequalified for a loan with an associate of hers in that field. I started the application but never finished it just because I was unsure if I would get approved or not then come to find out the house I was interested in ( XXXX ) had gotten sold so I let the idea of buying a house go a bit. I get a message from the loan officer that she ran my credit she has a couple questions about my income. I call back instantly wondering why did you run my credit without authorization theres a reason I never finished the application I did not want a hard inquiry and I had kind of backed off. She proceeded to ask do I want to know the results and annoyed obviously I asked well you ran my credit without permission im going to have a hard inquiry on my report now. She never gave me an explanation on WHY she ran my credit & now I lost a lot of points that I work hard for. I really want a solution to this problem because its not right to just run someones credit after the application is obviously not completed.\n", + "5. I called on XXXX XXXX requesting options on how to lower my principal amount. Unfortunately, I went into Income dri ven Repayment program but instant of seeing a deduction of my principal, I see an increase over and over. The amount stills $ XXXX since my graduation date back in XXXX . I mentioned that I worked for th e XXXX XXXX XXXX for 8 years if any portion of my loan could be forgive, they said no. Their option was to add more funds to my payments, which is totally ridiculously. Specially, now that I found out that my position will be eliminated in XXXX XXXX . Therefore, I will be unemployed after XXXX XXXX . It saddens my heart that something that I did to better myself, pursue an XXXX had cost me such of major debt and nobody is willing to help me. I did n't said that I was n't going to pay, I was seeking for assistance on how to lowered my principal and eventually payoff my debt. The education format from XXXX is heavily criticized, all funds paid and was n't top notch education!! Please help me understand why I could n't get any positive outcome fro m Navient.\n", + "comment list 2:\n", + "1. There is a charge on my credit report from HSBC that is over 10 years old. I have contacted the company and asked for the contract showing this is a valid debit and they have refused to send what I am asking. All they have sent me is a statement telling me this is a valid date, but no signed contract.\n", + "2. Convergent Outsourcing is attempting to collect on an account that I have no knowledge of and that I have already reported to the credit bureaus as not being my account. I have contacted them asking that they validate not verify the debt that they are attempting to collect from me and derogatorily reporting on my credit reports. I specifically requested signed contracts or other supporting documentation, the only thing that I keep receiving back is that the account has been verified which does not prove that I am obligated to pay them anything which is not the truth because this account does not belong to me. They have reported delinquent information to the credit bureaus since XX/XX/2016 I am asking be deleted.\n", + "3. Hi I am submitting this XXXX XXXX XXXX this isn't any influence and this is not a third party. XXXX has low and unfair credit number for me in their report. I have complained. The problem has not been resolved. my fico has me at a credit score over 719XXXX has me at a score around 590. That is a huge difference. XXXX paints me as a XXXX. my fico say I have good credit. What the heck is going on here. i have almost no debt and my identity was stolen causing my score to drop XXXX i made this clear for 60 days straight with XXXX i spoke to a representative agent name XXXX and XXXX and XXXX from the fraud department I prefer to speak to a XXXX rept but they refused they had me on mute for 4 hours which was hurtful I have a perfect repayment record. I have very low credit utilization. I have three negative credit items outstanding debt now. I have modest but ok income. Social Security. Something is wrong with XXXX. I do not understand why they are abusing consumers .This was a fist step towards attempting resolution. They kept lying telling me they disputed n its not reporting but it keep reporting this inaccurate information without my authorization. They refused or were unable to verify n remove the inquiries and its been 60days n they record the calls n admitted they had my police report n ftc and affidavit That was after attempting to contact XXXX more than 21 times. XXXX is an abusive company. They are supposed to be protecting consumers. They need to be reigned in. they are causing me severe XXXX and stopping me from getting this job offer XXXX now XXXX XXXX XXXX cant provide to my XXXXXXXX XXXX XXXX daughter PLEASE HELP ME PLEASE XXXX XXXX now.with no help.\n", + "4. On XX/XX/XXXX, I recieved a report from XXXX XXXX XXXX XXXX XXXX, XXXX, MD XXXX XXXX ) XXXX, which indicated a closed account from XXXX XXXX auto opened in XX/XX/XXXX, but was removed from my credit report in XXXX, due to being older than 7 years I recieved this credit alert from equifax, XXXX in XXXX that it fell off my credit report. on XX/XX/XXXX, I see it's been placed back on my credit report in XX/XX/XXXX by this agency and when I logged in to see Equifax credit report and look at my closed accounts, XXXX XXXX shows ( Closed Account ) but it's there to view and it's been there for XXXX year and XXXX months, so my complaint is why is it there it shouldn't even show, for XXXX years XXXX months they've had this on my report, I want it removed because eventhough these account show closed, they are still sending out old information that should not be reported. This is causing me to pay more and keeping my credit score down, please enforce this and make them remove any and all closed accounts. Their disclaimer even states that these accounts are removed after 7 years, it's been XXXX, they should remove all of those closed accounts that way this will not happen again, and I'm asking that they be sued because this keeps certain groups of people credit scores down and that's discrimination, its also fraudulent because on one hand their telling us that this information is not being reported it's closed yet it shows up, so they are lying.\n", + "5. -In XX/XX/XXXX, I was sent a notice to my address in Michigan by XXXX XXXX XXXX XXXX XXXX that my debt ( collected after having to live off my card due to house and joblessness ) had been sold to Portfolio Recovery Associates , LLC for {$2200.00}. As I had already moved to Florida, this letter was not forwarded to me in a timely manner. At this time, it showed up on my credit report as a collection debt. \n", + "\n", + "-Once I had obtained the supplemental information provided by Portfolio Recovery Associates LLC as proof of veracity of claim after disputing the collection, I was able to see the aforementioned notice, as well as a statement from Portfolio Recovery Associates containing my account number, the ( now corrected after an updated credit report request by them ) Florida address it was sent to, amount owed, contact information, etc. When I contacted Portfolio Recovery Associates, I was told they could not give me any further information because it had been transferred to be litigated. \n", + "\n", + "-Three years to the day XXXX XX/XX/XXXX XXXX and two states later, apparently a lawsuit was filed against me in an attempt to collect the debt. I was never served a summons. Once I found out about and looked up the case, I saw they had the correct address but an incorrect name, one that was corrected with the credit bureaus two months after said debt was sold to PRA. Thus, the notice of 'summons returned served ' in the court review is incorrect. Once I learned of the suit, I submitted the necessary paper work on my behalf. After that, I didn't hear from them, nor did an updated search return anything. \n", + "\n", + "-In XXXX of this year, I received a notice from the XXXX XXXX XXXX XXXX XXXX stating that the case was to be closed in a month due to lack of prosecution if no action takes place before then. Just before that dead line, the attorneys for PRA XXXX XXXX XXXX XXXX XXXX filed a motion to transfer the case to XXXX XXXX, citing that this was where they ( in fact did not ) serve the original summons. Once this was granted, they received a letter from the Clerk of Court stating they had 30 days to pay the transfer fees or the case would be dismissed. Two days before that due date, they submitted payment at the last minute. As of yet, I have not seen nor received anything from the XXXX XXXX XXXX XXXX regarding the matter.\n", + "\n" + ] + } + ], + "source": [ + "# The plain English request we will make of PaLM 2\n", + "prompt = (\n", + " \"Please highlight the most obvious difference between\"\n", + " \"the two lists of comments:\\n\" + prompt1 + prompt2\n", + ")\n", + "print(prompt)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "mL5P0_3X04dE" + }, + "outputs": [ + { + "data": { + "text/html": [ + "Query job 67a85808-9741-4ffa-9ac5-677a558bb5d7 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from bigframes.ml.llm import PaLM2TextGenerator\n", + "\n", + "q_a_model = PaLM2TextGenerator(connection_name=\"bigframes-dev.us.bigframes-ml\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "id": "ICWHsqAW1FNk" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/google/home/henryjsolberg/bq/src/bigframes/venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:373: FutureWarning: is_sparse is deprecated and will be removed in a future version. Check `isinstance(dtype, pd.SparseDtype)` instead.\n", + " if _pandas_api.is_sparse(col):\n" + ] + } + ], + "source": [ + "# Make a DataFrame containing only a single row with our prompt for PaLM 2\n", + "df = bpd.DataFrame({\"prompt\": [prompt]})" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "gB7e1LXU1pst" + }, + "outputs": [ + { + "ename": "BadRequest", + "evalue": "400 POST https://bigquery.googleapis.com/bigquery/v2/projects/bigframes-dev/jobs?prettyPrint=false: Syntax error: Unclosed string literal at [5:104]\n\nLocation: us\nJob ID: 9b28df64-af3c-4dcc-b679-4300c3deab88\n [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[INVALID_INPUT] message=QUERY_ERROR: [Syntax error: Unclosed string literal at [5:104]] errorProto=code: \"QUERY_ERROR\"\\nargument: \"Syntax error: Unclosed string literal at [5:104]\"\\nlocation_type: OTHER\\nlocation: \"query\"\\n\\n\\tat com.google.cloud.helix.common.Exceptions.fromProto(Exceptions.java:2072)\\n\\tat com.google.cloud.helix.server.job.DremelErrorUtil.checkStatusWithDremelDetails(DremelErrorUtil.java:162)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQueryUncached(GoogleSqlQueryTransformer.java:527)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQuery(GoogleSqlQueryTransformer.java:511)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.validateQuery(GoogleSqlQueryTransformer.java:251)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkQuery(LocalQueryJobController.java:4331)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkInternal(LocalQueryJobController.java:4461)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkAsync(LocalQueryJobController.java:4415)\\n\\tat com.google.cloud.helix.server.job.LocalSqlJobController.checkAsync(LocalSqlJobController.java:125)\\n\\tat com.google.cloud.helix.server.job.LocalJobController.check(LocalJobController.java:1247)\\n\\tat com.google.cloud.helix.server.job.JobControllerModule$1.check(JobControllerModule.java:461)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine$1.check(JobStateMachine.java:3585)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2515)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.execute(JobStateMachine.java:2494)\\n\\tat com.google.cloud.helix.server.job.ApiJobStateChanger.execute(ApiJobStateChanger.java:33)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertNormalizedJob(HelixJobRosy.java:1998)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertJobInternal(HelixJobRosy.java:2467)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertInternal(HelixJobRosy.java:2492)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertRequestInternal(HelixJobRosy.java:3918)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insert(HelixJobRosy.java:3892)\\n\\tat jdk.internal.reflect.GeneratedMethodAccessor305.invoke(Unknown Source)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)\\n\\tat java.base/java.lang.reflect.Method.invoke(Unknown Source)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$innerContinuation$3(RpcRequestProxy.java:435)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestDapperHookFactory$TracingRequestHook.call(RosyRequestDapperHookFactory.java:88)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestCredsHookFactory$1.call(RosyRequestCredsHookFactory.java:56)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestConcurrentCallsHookFactory$Hook.call(RosyRequestConcurrentCallsHookFactory.java:101)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestVarzHookFactory$Hook.call(RosyRequestVarzHookFactory.java:464)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestAuditHookFactory$1.call(RosyRequestAuditHookFactory.java:110)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RequestSecurityExtensionForGwsHookFactory$1.call(RequestSecurityExtensionForGwsHookFactory.java:69)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestSecurityContextHookFactory$1.call(RosyRequestSecurityContextHookFactory.java:80)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestContextHookFactory.call(RosyRequestContextHookFactory.java:58)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.invoke(RpcRequestProxy.java:666)\\n\\tat com.sun.proxy.$Proxy52.insert(Unknown Source)\\n\\tat com.google.cloud.helix.proto.proto2api.HelixJobService$ServiceParameters$1.handleRequest(HelixJobService.java:917)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$RpcApplicationHandlerAdaptor.handleRequest(RpcServerInterceptor2Util.java:82)\\n\\tat com.google.net.rpc3.impl.server.AggregatedRpcServerInterceptors.interceptRpc(AggregatedRpcServerInterceptors.java:97)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$InterceptedApplicationHandlerImpl.handleRequest(RpcServerInterceptor2Util.java:67)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplicationWithCancellation(RpcServerInternalContext.java:686)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.lambda$runRpcInApplication$0(RpcServerInternalContext.java:651)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplication(RpcServerInternalContext.java:651)\\n\\tat com.google.net.rpc3.util.RpcInProcessConnector$ServerInternalContext.lambda$runWithExecutor$1(RpcInProcessConnector.java:1964)\\n\\tat com.google.common.context.ContextRunnable.runInContext(ContextRunnable.java:83)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.tracing.GenericContextCallback.runInInheritedContext(GenericContextCallback.java:75)\\n\\tat com.google.common.context.ContextRunnable.run(ContextRunnable.java:74)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\\n\\tat java.base/java.lang.Thread.run(Unknown Source)\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2514)\\n\\t\\t... 45 more\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\t... 41 more\\n'}]", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mBadRequest\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[19], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m major_difference \u001b[39m=\u001b[39m q_a_model\u001b[39m.\u001b[39;49mpredict(df)\n\u001b[1;32m 2\u001b[0m major_difference\n\u001b[1;32m 3\u001b[0m \u001b[39m#major_difference[\"ml_generate_text_llm_result\"].iloc[0]\u001b[39;00m\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:178\u001b[0m, in \u001b[0;36mPaLM2TextGenerator.predict\u001b[0;34m(self, X, temperature, max_output_tokens, top_k, top_p)\u001b[0m\n\u001b[1;32m 169\u001b[0m X \u001b[39m=\u001b[39m X\u001b[39m.\u001b[39mrename(columns\u001b[39m=\u001b[39m{col_label: \u001b[39m\"\u001b[39m\u001b[39mprompt\u001b[39m\u001b[39m\"\u001b[39m})\n\u001b[1;32m 171\u001b[0m options \u001b[39m=\u001b[39m {\n\u001b[1;32m 172\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mtemperature\u001b[39m\u001b[39m\"\u001b[39m: temperature,\n\u001b[1;32m 173\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mmax_output_tokens\u001b[39m\u001b[39m\"\u001b[39m: max_output_tokens,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 176\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mflatten_json_output\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39mTrue\u001b[39;00m,\n\u001b[1;32m 177\u001b[0m }\n\u001b[0;32m--> 178\u001b[0m df \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_bqml_model\u001b[39m.\u001b[39;49mgenerate_text(X, options)\n\u001b[1;32m 179\u001b[0m \u001b[39mreturn\u001b[39;00m cast(\n\u001b[1;32m 180\u001b[0m bpd\u001b[39m.\u001b[39mDataFrame,\n\u001b[1;32m 181\u001b[0m df[[_TEXT_GENERATE_RESULT_COLUMN]],\n\u001b[1;32m 182\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/core.py:105\u001b[0m, in \u001b[0;36mBqmlModel.generate_text\u001b[0;34m(self, input_data, options)\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mgenerate_text\u001b[39m(\n\u001b[1;32m 100\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 101\u001b[0m input_data: bpd\u001b[39m.\u001b[39mDataFrame,\n\u001b[1;32m 102\u001b[0m options: Mapping[\u001b[39mstr\u001b[39m, \u001b[39mint\u001b[39m \u001b[39m|\u001b[39m \u001b[39mfloat\u001b[39m],\n\u001b[1;32m 103\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m bpd\u001b[39m.\u001b[39mDataFrame:\n\u001b[1;32m 104\u001b[0m \u001b[39m# TODO: validate input data schema\u001b[39;00m\n\u001b[0;32m--> 105\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_apply_sql(\n\u001b[1;32m 106\u001b[0m input_data,\n\u001b[1;32m 107\u001b[0m \u001b[39mlambda\u001b[39;49;00m source_df: \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model_manipulation_sql_generator\u001b[39m.\u001b[39;49mml_generate_text(\n\u001b[1;32m 108\u001b[0m source_df\u001b[39m=\u001b[39;49msource_df,\n\u001b[1;32m 109\u001b[0m struct_options\u001b[39m=\u001b[39;49moptions,\n\u001b[1;32m 110\u001b[0m ),\n\u001b[1;32m 111\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/core.py:80\u001b[0m, in \u001b[0;36mBqmlModel._apply_sql\u001b[0;34m(self, input_data, func)\u001b[0m\n\u001b[1;32m 77\u001b[0m _, index_col_ids, index_labels \u001b[39m=\u001b[39m input_data\u001b[39m.\u001b[39m_to_sql_query(include_index\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m)\n\u001b[1;32m 79\u001b[0m sql \u001b[39m=\u001b[39m func(input_data)\n\u001b[0;32m---> 80\u001b[0m df \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_session\u001b[39m.\u001b[39;49mread_gbq(sql, index_col\u001b[39m=\u001b[39;49mindex_col_ids)\n\u001b[1;32m 81\u001b[0m df\u001b[39m.\u001b[39mindex\u001b[39m.\u001b[39mnames \u001b[39m=\u001b[39m index_labels\n\u001b[1;32m 83\u001b[0m \u001b[39mreturn\u001b[39;00m df\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:290\u001b[0m, in \u001b[0;36mSession.read_gbq\u001b[0;34m(self, query_or_table, index_col, col_order, max_results)\u001b[0m\n\u001b[1;32m 279\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mread_gbq\u001b[39m(\n\u001b[1;32m 280\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 281\u001b[0m query_or_table: \u001b[39mstr\u001b[39m,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 287\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m dataframe\u001b[39m.\u001b[39mDataFrame:\n\u001b[1;32m 288\u001b[0m \u001b[39m# TODO(b/281571214): Generate prompt to show the progress of read_gbq.\u001b[39;00m\n\u001b[1;32m 289\u001b[0m \u001b[39mif\u001b[39;00m _is_query(query_or_table):\n\u001b[0;32m--> 290\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_read_gbq_query(\n\u001b[1;32m 291\u001b[0m query_or_table,\n\u001b[1;32m 292\u001b[0m index_col\u001b[39m=\u001b[39;49mindex_col,\n\u001b[1;32m 293\u001b[0m col_order\u001b[39m=\u001b[39;49mcol_order,\n\u001b[1;32m 294\u001b[0m max_results\u001b[39m=\u001b[39;49mmax_results,\n\u001b[1;32m 295\u001b[0m api_name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mread_gbq\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 296\u001b[0m )\n\u001b[1;32m 297\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 298\u001b[0m \u001b[39m# TODO(swast): Query the snapshot table but mark it as a\u001b[39;00m\n\u001b[1;32m 299\u001b[0m \u001b[39m# deterministic query so we can avoid serializing if we have a\u001b[39;00m\n\u001b[1;32m 300\u001b[0m \u001b[39m# unique index.\u001b[39;00m\n\u001b[1;32m 301\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_read_gbq_table(\n\u001b[1;32m 302\u001b[0m query_or_table,\n\u001b[1;32m 303\u001b[0m index_col\u001b[39m=\u001b[39mindex_col,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 306\u001b[0m api_name\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mread_gbq\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 307\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:432\u001b[0m, in \u001b[0;36mSession._read_gbq_query\u001b[0;34m(self, query, index_col, col_order, max_results, api_name)\u001b[0m\n\u001b[1;32m 429\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 430\u001b[0m index_cols \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(index_col)\n\u001b[0;32m--> 432\u001b[0m destination, query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_query_to_destination(\n\u001b[1;32m 433\u001b[0m query,\n\u001b[1;32m 434\u001b[0m index_cols,\n\u001b[1;32m 435\u001b[0m api_name\u001b[39m=\u001b[39;49mapi_name,\n\u001b[1;32m 436\u001b[0m )\n\u001b[1;32m 438\u001b[0m \u001b[39m# If there was no destination table, that means the query must have\u001b[39;00m\n\u001b[1;32m 439\u001b[0m \u001b[39m# been DDL or DML. Return some job metadata, instead.\u001b[39;00m\n\u001b[1;32m 440\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m destination:\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:319\u001b[0m, in \u001b[0;36mSession._query_to_destination\u001b[0;34m(self, query, index_cols, api_name)\u001b[0m\n\u001b[1;32m 317\u001b[0m dry_run_config \u001b[39m=\u001b[39m bigquery\u001b[39m.\u001b[39mQueryJobConfig()\n\u001b[1;32m 318\u001b[0m dry_run_config\u001b[39m.\u001b[39mdry_run \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 319\u001b[0m _, dry_run_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_start_query(query, job_config\u001b[39m=\u001b[39;49mdry_run_config)\n\u001b[1;32m 320\u001b[0m \u001b[39mif\u001b[39;00m dry_run_job\u001b[39m.\u001b[39mstatement_type \u001b[39m!=\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mSELECT\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m 321\u001b[0m _, query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_start_query(query)\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:1523\u001b[0m, in \u001b[0;36mSession._start_query\u001b[0;34m(self, sql, job_config, max_results)\u001b[0m\n\u001b[1;32m 1519\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1520\u001b[0m \u001b[39mStarts query job and waits for results.\u001b[39;00m\n\u001b[1;32m 1521\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1522\u001b[0m job_config \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_prepare_job_config(job_config)\n\u001b[0;32m-> 1523\u001b[0m query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mbqclient\u001b[39m.\u001b[39;49mquery(sql, job_config\u001b[39m=\u001b[39;49mjob_config)\n\u001b[1;32m 1525\u001b[0m opts \u001b[39m=\u001b[39m bigframes\u001b[39m.\u001b[39moptions\u001b[39m.\u001b[39mdisplay\n\u001b[1;32m 1526\u001b[0m \u001b[39mif\u001b[39;00m opts\u001b[39m.\u001b[39mprogress_bar \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m query_job\u001b[39m.\u001b[39mconfiguration\u001b[39m.\u001b[39mdry_run:\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3403\u001b[0m, in \u001b[0;36mClient.query\u001b[0;34m(self, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry, api_method)\u001b[0m\n\u001b[1;32m 3392\u001b[0m \u001b[39mreturn\u001b[39;00m _job_helpers\u001b[39m.\u001b[39mquery_jobs_query(\n\u001b[1;32m 3393\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 3394\u001b[0m query,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 3400\u001b[0m job_retry,\n\u001b[1;32m 3401\u001b[0m )\n\u001b[1;32m 3402\u001b[0m \u001b[39melif\u001b[39;00m api_method \u001b[39m==\u001b[39m enums\u001b[39m.\u001b[39mQueryApiMethod\u001b[39m.\u001b[39mINSERT:\n\u001b[0;32m-> 3403\u001b[0m \u001b[39mreturn\u001b[39;00m _job_helpers\u001b[39m.\u001b[39;49mquery_jobs_insert(\n\u001b[1;32m 3404\u001b[0m \u001b[39mself\u001b[39;49m,\n\u001b[1;32m 3405\u001b[0m query,\n\u001b[1;32m 3406\u001b[0m job_config,\n\u001b[1;32m 3407\u001b[0m job_id,\n\u001b[1;32m 3408\u001b[0m job_id_prefix,\n\u001b[1;32m 3409\u001b[0m location,\n\u001b[1;32m 3410\u001b[0m project,\n\u001b[1;32m 3411\u001b[0m retry,\n\u001b[1;32m 3412\u001b[0m timeout,\n\u001b[1;32m 3413\u001b[0m job_retry,\n\u001b[1;32m 3414\u001b[0m )\n\u001b[1;32m 3415\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 3416\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mGot unexpected value for api_method: \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mrepr\u001b[39m(api_method)\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/_job_helpers.py:114\u001b[0m, in \u001b[0;36mquery_jobs_insert\u001b[0;34m(client, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry)\u001b[0m\n\u001b[1;32m 111\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 112\u001b[0m \u001b[39mreturn\u001b[39;00m query_job\n\u001b[0;32m--> 114\u001b[0m future \u001b[39m=\u001b[39m do_query()\n\u001b[1;32m 115\u001b[0m \u001b[39m# The future might be in a failed state now, but if it's\u001b[39;00m\n\u001b[1;32m 116\u001b[0m \u001b[39m# unrecoverable, we'll find out when we ask for it's result, at which\u001b[39;00m\n\u001b[1;32m 117\u001b[0m \u001b[39m# point, we may retry.\u001b[39;00m\n\u001b[1;32m 118\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m job_id_given:\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/_job_helpers.py:91\u001b[0m, in \u001b[0;36mquery_jobs_insert..do_query\u001b[0;34m()\u001b[0m\n\u001b[1;32m 88\u001b[0m query_job \u001b[39m=\u001b[39m job\u001b[39m.\u001b[39mQueryJob(job_ref, query, client\u001b[39m=\u001b[39mclient, job_config\u001b[39m=\u001b[39mjob_config)\n\u001b[1;32m 90\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 91\u001b[0m query_job\u001b[39m.\u001b[39;49m_begin(retry\u001b[39m=\u001b[39;49mretry, timeout\u001b[39m=\u001b[39;49mtimeout)\n\u001b[1;32m 92\u001b[0m \u001b[39mexcept\u001b[39;00m core_exceptions\u001b[39m.\u001b[39mConflict \u001b[39mas\u001b[39;00m create_exc:\n\u001b[1;32m 93\u001b[0m \u001b[39m# The thought is if someone is providing their own job IDs and they get\u001b[39;00m\n\u001b[1;32m 94\u001b[0m \u001b[39m# their job ID generation wrong, this could end up returning results for\u001b[39;00m\n\u001b[1;32m 95\u001b[0m \u001b[39m# the wrong query. We thus only try to recover if job ID was not given.\u001b[39;00m\n\u001b[1;32m 96\u001b[0m \u001b[39mif\u001b[39;00m job_id_given:\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py:1310\u001b[0m, in \u001b[0;36mQueryJob._begin\u001b[0;34m(self, client, retry, timeout)\u001b[0m\n\u001b[1;32m 1290\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"API call: begin the job via a POST request\u001b[39;00m\n\u001b[1;32m 1291\u001b[0m \n\u001b[1;32m 1292\u001b[0m \u001b[39mSee\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1306\u001b[0m \u001b[39m ValueError: If the job has already begun.\u001b[39;00m\n\u001b[1;32m 1307\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1309\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1310\u001b[0m \u001b[39msuper\u001b[39;49m(QueryJob, \u001b[39mself\u001b[39;49m)\u001b[39m.\u001b[39;49m_begin(client\u001b[39m=\u001b[39;49mclient, retry\u001b[39m=\u001b[39;49mretry, timeout\u001b[39m=\u001b[39;49mtimeout)\n\u001b[1;32m 1311\u001b[0m \u001b[39mexcept\u001b[39;00m exceptions\u001b[39m.\u001b[39mGoogleAPICallError \u001b[39mas\u001b[39;00m exc:\n\u001b[1;32m 1312\u001b[0m exc\u001b[39m.\u001b[39mmessage \u001b[39m=\u001b[39m _EXCEPTION_FOOTER_TEMPLATE\u001b[39m.\u001b[39mformat(\n\u001b[1;32m 1313\u001b[0m message\u001b[39m=\u001b[39mexc\u001b[39m.\u001b[39mmessage, location\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mlocation, job_id\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mjob_id\n\u001b[1;32m 1314\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py:693\u001b[0m, in \u001b[0;36m_AsyncJob._begin\u001b[0;34m(self, client, retry, timeout)\u001b[0m\n\u001b[1;32m 690\u001b[0m \u001b[39m# jobs.insert is idempotent because we ensure that every new\u001b[39;00m\n\u001b[1;32m 691\u001b[0m \u001b[39m# job has an ID.\u001b[39;00m\n\u001b[1;32m 692\u001b[0m span_attributes \u001b[39m=\u001b[39m {\u001b[39m\"\u001b[39m\u001b[39mpath\u001b[39m\u001b[39m\"\u001b[39m: path}\n\u001b[0;32m--> 693\u001b[0m api_response \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39;49m_call_api(\n\u001b[1;32m 694\u001b[0m retry,\n\u001b[1;32m 695\u001b[0m span_name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mBigQuery.job.begin\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 696\u001b[0m span_attributes\u001b[39m=\u001b[39;49mspan_attributes,\n\u001b[1;32m 697\u001b[0m job_ref\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m,\n\u001b[1;32m 698\u001b[0m method\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mPOST\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 699\u001b[0m path\u001b[39m=\u001b[39;49mpath,\n\u001b[1;32m 700\u001b[0m data\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mto_api_repr(),\n\u001b[1;32m 701\u001b[0m timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m 702\u001b[0m )\n\u001b[1;32m 703\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_set_properties(api_response)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/client.py:813\u001b[0m, in \u001b[0;36mClient._call_api\u001b[0;34m(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)\u001b[0m\n\u001b[1;32m 809\u001b[0m \u001b[39mif\u001b[39;00m span_name \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m 810\u001b[0m \u001b[39mwith\u001b[39;00m create_span(\n\u001b[1;32m 811\u001b[0m name\u001b[39m=\u001b[39mspan_name, attributes\u001b[39m=\u001b[39mspan_attributes, client\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m, job_ref\u001b[39m=\u001b[39mjob_ref\n\u001b[1;32m 812\u001b[0m ):\n\u001b[0;32m--> 813\u001b[0m \u001b[39mreturn\u001b[39;00m call()\n\u001b[1;32m 815\u001b[0m \u001b[39mreturn\u001b[39;00m call()\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:349\u001b[0m, in \u001b[0;36mRetry.__call__..retry_wrapped_func\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 345\u001b[0m target \u001b[39m=\u001b[39m functools\u001b[39m.\u001b[39mpartial(func, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 346\u001b[0m sleep_generator \u001b[39m=\u001b[39m exponential_sleep_generator(\n\u001b[1;32m 347\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initial, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maximum, multiplier\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_multiplier\n\u001b[1;32m 348\u001b[0m )\n\u001b[0;32m--> 349\u001b[0m \u001b[39mreturn\u001b[39;00m retry_target(\n\u001b[1;32m 350\u001b[0m target,\n\u001b[1;32m 351\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_predicate,\n\u001b[1;32m 352\u001b[0m sleep_generator,\n\u001b[1;32m 353\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_timeout,\n\u001b[1;32m 354\u001b[0m on_error\u001b[39m=\u001b[39;49mon_error,\n\u001b[1;32m 355\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:191\u001b[0m, in \u001b[0;36mretry_target\u001b[0;34m(target, predicate, sleep_generator, timeout, on_error, **kwargs)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[39mfor\u001b[39;00m sleep \u001b[39min\u001b[39;00m sleep_generator:\n\u001b[1;32m 190\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 191\u001b[0m \u001b[39mreturn\u001b[39;00m target()\n\u001b[1;32m 193\u001b[0m \u001b[39m# pylint: disable=broad-except\u001b[39;00m\n\u001b[1;32m 194\u001b[0m \u001b[39m# This function explicitly must deal with broad exceptions.\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m exc:\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/_http/__init__.py:494\u001b[0m, in \u001b[0;36mJSONConnection.api_request\u001b[0;34m(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)\u001b[0m\n\u001b[1;32m 482\u001b[0m response \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_make_request(\n\u001b[1;32m 483\u001b[0m method\u001b[39m=\u001b[39mmethod,\n\u001b[1;32m 484\u001b[0m url\u001b[39m=\u001b[39murl,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 490\u001b[0m extra_api_info\u001b[39m=\u001b[39mextra_api_info,\n\u001b[1;32m 491\u001b[0m )\n\u001b[1;32m 493\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39m200\u001b[39m \u001b[39m<\u001b[39m\u001b[39m=\u001b[39m response\u001b[39m.\u001b[39mstatus_code \u001b[39m<\u001b[39m \u001b[39m300\u001b[39m:\n\u001b[0;32m--> 494\u001b[0m \u001b[39mraise\u001b[39;00m exceptions\u001b[39m.\u001b[39mfrom_http_response(response)\n\u001b[1;32m 496\u001b[0m \u001b[39mif\u001b[39;00m expect_json \u001b[39mand\u001b[39;00m response\u001b[39m.\u001b[39mcontent:\n\u001b[1;32m 497\u001b[0m \u001b[39mreturn\u001b[39;00m response\u001b[39m.\u001b[39mjson()\n", + "\u001b[0;31mBadRequest\u001b[0m: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/bigframes-dev/jobs?prettyPrint=false: Syntax error: Unclosed string literal at [5:104]\n\nLocation: us\nJob ID: 9b28df64-af3c-4dcc-b679-4300c3deab88\n [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[INVALID_INPUT] message=QUERY_ERROR: [Syntax error: Unclosed string literal at [5:104]] errorProto=code: \"QUERY_ERROR\"\\nargument: \"Syntax error: Unclosed string literal at [5:104]\"\\nlocation_type: OTHER\\nlocation: \"query\"\\n\\n\\tat com.google.cloud.helix.common.Exceptions.fromProto(Exceptions.java:2072)\\n\\tat com.google.cloud.helix.server.job.DremelErrorUtil.checkStatusWithDremelDetails(DremelErrorUtil.java:162)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQueryUncached(GoogleSqlQueryTransformer.java:527)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQuery(GoogleSqlQueryTransformer.java:511)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.validateQuery(GoogleSqlQueryTransformer.java:251)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkQuery(LocalQueryJobController.java:4331)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkInternal(LocalQueryJobController.java:4461)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkAsync(LocalQueryJobController.java:4415)\\n\\tat com.google.cloud.helix.server.job.LocalSqlJobController.checkAsync(LocalSqlJobController.java:125)\\n\\tat com.google.cloud.helix.server.job.LocalJobController.check(LocalJobController.java:1247)\\n\\tat com.google.cloud.helix.server.job.JobControllerModule$1.check(JobControllerModule.java:461)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine$1.check(JobStateMachine.java:3585)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2515)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.execute(JobStateMachine.java:2494)\\n\\tat com.google.cloud.helix.server.job.ApiJobStateChanger.execute(ApiJobStateChanger.java:33)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertNormalizedJob(HelixJobRosy.java:1998)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertJobInternal(HelixJobRosy.java:2467)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertInternal(HelixJobRosy.java:2492)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertRequestInternal(HelixJobRosy.java:3918)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insert(HelixJobRosy.java:3892)\\n\\tat jdk.internal.reflect.GeneratedMethodAccessor305.invoke(Unknown Source)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)\\n\\tat java.base/java.lang.reflect.Method.invoke(Unknown Source)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$innerContinuation$3(RpcRequestProxy.java:435)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestDapperHookFactory$TracingRequestHook.call(RosyRequestDapperHookFactory.java:88)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestCredsHookFactory$1.call(RosyRequestCredsHookFactory.java:56)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestConcurrentCallsHookFactory$Hook.call(RosyRequestConcurrentCallsHookFactory.java:101)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestVarzHookFactory$Hook.call(RosyRequestVarzHookFactory.java:464)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestAuditHookFactory$1.call(RosyRequestAuditHookFactory.java:110)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RequestSecurityExtensionForGwsHookFactory$1.call(RequestSecurityExtensionForGwsHookFactory.java:69)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestSecurityContextHookFactory$1.call(RosyRequestSecurityContextHookFactory.java:80)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestContextHookFactory.call(RosyRequestContextHookFactory.java:58)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.invoke(RpcRequestProxy.java:666)\\n\\tat com.sun.proxy.$Proxy52.insert(Unknown Source)\\n\\tat com.google.cloud.helix.proto.proto2api.HelixJobService$ServiceParameters$1.handleRequest(HelixJobService.java:917)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$RpcApplicationHandlerAdaptor.handleRequest(RpcServerInterceptor2Util.java:82)\\n\\tat com.google.net.rpc3.impl.server.AggregatedRpcServerInterceptors.interceptRpc(AggregatedRpcServerInterceptors.java:97)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$InterceptedApplicationHandlerImpl.handleRequest(RpcServerInterceptor2Util.java:67)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplicationWithCancellation(RpcServerInternalContext.java:686)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.lambda$runRpcInApplication$0(RpcServerInternalContext.java:651)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplication(RpcServerInternalContext.java:651)\\n\\tat com.google.net.rpc3.util.RpcInProcessConnector$ServerInternalContext.lambda$runWithExecutor$1(RpcInProcessConnector.java:1964)\\n\\tat com.google.common.context.ContextRunnable.runInContext(ContextRunnable.java:83)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.tracing.GenericContextCallback.runInInheritedContext(GenericContextCallback.java:75)\\n\\tat com.google.common.context.ContextRunnable.run(ContextRunnable.java:74)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\\n\\tat java.base/java.lang.Thread.run(Unknown Source)\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2514)\\n\\t\\t... 45 more\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\t... 41 more\\n'}]" + ] + } + ], + "source": [ + "# Send the request for PaLM 2 to generate a response to our prompt\n", + "major_difference = q_a_model.predict(df)\n", + "# PaLM 2's response is the only row in the dataframe result \n", + "major_difference[\"ml_generate_text_llm_result\"].iloc[0]" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} From dc3c4d8110b0900a0d9872f3ce68aae835c8d3fc Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 05:49:42 +0000 Subject: [PATCH 02/26] removed cached cell output --- .../bq_dataframes_llm_kmeans.ipynb | 591 +----------------- 1 file changed, 27 insertions(+), 564 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 0ba0561b7c..eb521a182e 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -114,7 +114,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "id": "R7STCS8xB5d2" }, @@ -137,137 +137,22 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "id": "zDSwoBo1CU3G" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job ca9487e2-aac1-466d-a74c-bf1d414b7557 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 311d2026-8f38-4c76-a4eb-40f6a1810fd4 is DONE. 2.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "input_df = bpd.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "id": "tYDoaKgJChiq" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job e9a7abc7-6fca-4a91-a68c-8feb3ac9b942 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 4e0125c0-bd85-4449-a9b8-a68ea3407919 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
consumer_complaint_narrative
0Those Accounts Are Not mine, I never authorize...
11Legal Department, This credit dispute is being...
12Hello my name is XXXX XXXX, I have looked into...
15I HAVE REVIEWED MY CREDIT REPORT AND FOUND SOM...
16On my credit report these are not my items rep...
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " consumer_complaint_narrative\n", - "0 Those Accounts Are Not mine, I never authorize...\n", - "11 Legal Department, This credit dispute is being...\n", - "12 Hello my name is XXXX XXXX, I have looked into...\n", - "15 I HAVE REVIEWED MY CREDIT REPORT AND FOUND SOM...\n", - "16 On my credit report these are not my items rep...\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n", "issues_df.head(n=5) # View the first five complaints" @@ -275,7 +160,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "id": "OltYSUEcsSOW" }, @@ -297,24 +182,11 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "id": "li38q8FzDDMu" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 5422de4b-789d-4430-ab73-3a238d7b5238 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n", "\n", @@ -323,137 +195,11 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "cOuSOQ5FDewD" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 25ed7dd8-829b-4418-8f52-2ba9c5c51dec is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job fde1380f-a308-440d-a9a3-a7c3db902e0a is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job c9bec87e-524a-4206-84d6-f9f87fc12e35 is DONE. 80.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 0ac2dee7-1c50-4b63-aa44-16ad43265c5d is DONE. 80.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job bd615e5d-8153-45bf-a14e-e5997cbaa962 is DONE. 61.5 MB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
text_embedding
355[0.0032048337161540985, 0.018182063475251198, ...
414[-0.025085292756557465, -0.05178036540746689, ...
650[0.0020703477784991264, -0.027994778007268906,...
969[-0.009529653936624527, -0.03827650472521782, ...
1009[0.0190849881619215, -0.026688968762755394, 0....
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " text_embedding\n", - "355 [0.0032048337161540985, 0.018182063475251198, ...\n", - "414 [-0.025085292756557465, -0.05178036540746689, ...\n", - "650 [0.0020703477784991264, -0.027994778007268906,...\n", - "969 [-0.009529653936624527, -0.03827650472521782, ...\n", - "1009 [0.0190849881619215, -0.026688968762755394, 0....\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "# Will take ~5 minutes to compute the embeddings\n", "predicted_embeddings = model.predict(downsampled_issues_df)\n", @@ -485,7 +231,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "id": "AhNTnEC5FRz2" }, @@ -498,149 +244,11 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { "id": "6poSxh-fGJF7" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 803f2250-b38d-4215-8941-b668dc18c023 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 04fe13b0-d07c-4490-ace5-7602830538f4 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 3735b6fd-0c0c-4ad1-83b1-77c09e7c4c68 is DONE. 1.4 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 1c597324-756b-4c96-9520-966f839c3e14 is DONE. 80.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job acc3f4ab-71e1-4e51-938d-a447db70dd73 is DONE. 80.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 1d11a4e3-7dd3-4619-bf46-f9d842abe83a is DONE. 160.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
CENTROID_ID
3554
4142
6501
9695
10095
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " CENTROID_ID\n", - "355 4\n", - "414 2\n", - "650 1\n", - "969 5\n", - "1009 5\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "# Use KMeans clustering to calculate our groups. Will take ~5 minutes.\n", "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", @@ -652,7 +260,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -672,36 +280,11 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "id": "2E7wXM_jGqo6" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job cf667104-32c3-4ca9-96ac-d044823096c4 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 8088b224-bf24-4cf8-9858-c5bb47c0d3ee is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# Using bigframes, with syntax identical to pandas,\n", "# filter out the first and second groups\n", @@ -718,46 +301,11 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "id": "ZNDiueI9IP5e" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "comment list 1:\n", - "1. I bought my home XX/XX/XXXX for the amount of {$220000.00}. The home was appraised closing with a value of {$260000.00} at closing. When purchasing the home I did not provide a downpayment in the amount of 20 % of the home value, therefore I had to purchase private mortgage insurance ( P.M.I. ) on the home until 20 % of the home value was paid off. 20 % of {$260000.00} is {$53000.00}. This means I would have to owe ( $ XXXX- {$53000.00} ) {$210000.00} or less for the P.M.I. to be taken off of my monthly mortgage payments. According to law, the lender should take the P.M.I. off of my loan once the 20 % is met. At the time of closing my borrower did not provide me a PMI disclosure form to identify when the 20 % mark would be met. \n", - "\n", - "When closing on my home my loan was thru XXXX XXXX XXXX, for the past 5+ years my loan was taken over by Wells Fargo Home Mortgage and they are my current lenders. I have never missed or been late on a mortgage payment. \n", - "\n", - "In XX/XX/XXXX I reached the 20 % mark on the value of my home. Starting XX/XX/XXXX I have owed {$210000.00} or less on my home. As of XX/XX/XXXX I owe {$190000.00}, this is far beyond owning 80 % or less of my home value. \n", - "\n", - "I have reached out to Wells Fargo to have the PMI removed from my mortgage payments, but they refuse. Wells Fargo has stated that I must pay off 22 % of the \" loan '' before PMI can be taken off, and that the 20 % is not based on the value of the home at closing. \n", - "\n", - "It was never identified to me at closing that the 20 % was based the \" value of the loan '' and not the appraised value of the home at closing. This information is new to me and in XX/XX/XXXX I do not believe this was the agreement at closing. I would like to receive some evidence that I agreed to an otherwise condition against having the PMI be based on the value of the home at closing.\n", - "2. I paid my two months mortgage amount of {$3300.00} ( XX/XX/2020 ) and {$3300.00} ( XX/XX/2020 ) to my lender ( XXXX XXXX - XXXX XXXX ). Then I also received another payment notice from TIAA Bank that said my loan was sold to them on XX/XX/2020. I have not received any Goodbye Letter from my lender, nor did my Welcome Letter from TIAA Bank. I provided my bank statement for those two payments to TIAA Bank shows the proof of payments, and never got a reply from them. I called XXXX XXXX and request the Goodbye letter, which indicates - The servicing of your mortgage loan is being transferred, effective XX/XX/2020. My complaint is the lack of information when the loan was transferred to one servicer to another. I am not properly informed my loan had been transferred. As a result, payments made to either the prior or current servicer around the time of the transfer were not applied to the account.\n", - "3. On XX/XX/XXXX I called Quicken Loans to inquire about Refinancing options. I was in the process of an application with another lender and I was unhappy with their terms, etc. I spoke with XXXX XXXX of Quicken who convinced me to move my business over to Quicken. He stated the refinance would only take approx 30 days to complete. I did so that day and within 2 days had completed all paperwork requirements etc. I was now waiting for the appraisal to take place. Weeks went by and i never heard from anyone. I placed numerous calls, emails, chats to different people and was told that it was a delay due to \" volume ''. Finally, on XXXX the appraisal was completed. Again, silence for days/weeks afterwards. On XXXX I called and spoke with a rep who said the appraisal was \" in hand '' and Quicken and the appraiser have been in constant contact discussing some issues in the report, specifically regarding a \" capping of a water line '' and a possible \" apartment ''. I asked to see the report and she said it will post shortly to my account. It never posted. I again never heard from anyone at Quicken. I called, I emailed, I chatted with Quicken and was repeatedly told the report is not yet finalized. They also told me there were NO ISSUES regarding the appraisal -- the delay was purely due to volume. I explained that someone already told me of a possible issue and every person i spoke to denied this fact. Finally, on XX/XX/XXXX the \" dashboard '' for my account with Quicken was updated ( still no word from anyone and no copy of my appraisal was provided ) and it showed a drastic change to my refinance # s. My loan amount was reduced by {$30000.00} approx. and my debt to be paid off with the loan were removed. The entire formulation of the refinance was changed without any explanation or notice to me. I called my banker, I called customer solutions, etc. again and now i was told, \" Oh, the appraisal came in low '' So, bottom line is ... .you need {$13000.00} to {$15000.00} cash to settle this loan at the new rate. '' Again, no mention of the issue with the apartment. I asked for a copy of the appraisal and was finally sent a copy on XXXX. My issue with Quicken is : 1. They took 4 weeks to send an appraiser. 2. They had the results of the appraisal since early XXXX but repeatedly lied to me stating the appraisal had never been seen and was still being created. They strung me along for 8 weeks. I lost my other offer. They \" low balled '' my appraisal with XXXX in order to \" kill my deal '' for reasons other than the apartment conditions. They did not want to take on this loan but they knew they had strung me along for 8 weeks and figured the low appraisal would be their ticket out. I have a XXXX bed XXXX bath home and comps are all running in my immediate area for $ XXXX and they came back with an appraisal of {$400000.00}. Absurd. My kitchen and bathrooms were completely remodeled. {$400000.00} is a ridiculous appraisal and they know it. I sent them XXXX recent comps in my immediate neighborhood of $ XXXX sold values. When i did that, then and only then did they say, \" well ... you have to satisfy the conditions of the appraisal also. Those conditions are ... rip out the cabinets, sink etc in lower level or obtain a C/O/permit. \" Quicken had no intention of closing on this refinance ... they knew that was the course they were taking in early XXXX but they chose to string me along for a total of 8 weeks. It was ONLY due to my insistence on XX/XX/XXXX that this issue be addressed that they finally showed me a copy of the appraisal. I lost my other connection/relationship and find their practices unfair and self-serving. No one should have to go through this again, hopefully. I have copies of emails, chats, etc. showing how they lied to me continuously and misled me. I hope this casts a shadow on their reputation and makes them reconsider their business practices. Thanks\n", - "4. Hi, I was looking into buying a home. I never took the step to get pre qualified for a loan because it obviously needed more thought. I would look on XXXX everyday to see if houses were within my budget and eventually I started dealing with a real estate agent. Before she can look for homes she suggested that I get prequalified for a loan with an associate of hers in that field. I started the application but never finished it just because I was unsure if I would get approved or not then come to find out the house I was interested in ( XXXX ) had gotten sold so I let the idea of buying a house go a bit. I get a message from the loan officer that she ran my credit she has a couple questions about my income. I call back instantly wondering why did you run my credit without authorization theres a reason I never finished the application I did not want a hard inquiry and I had kind of backed off. She proceeded to ask do I want to know the results and annoyed obviously I asked well you ran my credit without permission im going to have a hard inquiry on my report now. She never gave me an explanation on WHY she ran my credit & now I lost a lot of points that I work hard for. I really want a solution to this problem because its not right to just run someones credit after the application is obviously not completed.\n", - "5. I called on XXXX XXXX requesting options on how to lower my principal amount. Unfortunately, I went into Income dri ven Repayment program but instant of seeing a deduction of my principal, I see an increase over and over. The amount stills $ XXXX since my graduation date back in XXXX . I mentioned that I worked for th e XXXX XXXX XXXX for 8 years if any portion of my loan could be forgive, they said no. Their option was to add more funds to my payments, which is totally ridiculously. Specially, now that I found out that my position will be eliminated in XXXX XXXX . Therefore, I will be unemployed after XXXX XXXX . It saddens my heart that something that I did to better myself, pursue an XXXX had cost me such of major debt and nobody is willing to help me. I did n't said that I was n't going to pay, I was seeking for assistance on how to lowered my principal and eventually payoff my debt. The education format from XXXX is heavily criticized, all funds paid and was n't top notch education!! Please help me understand why I could n't get any positive outcome fro m Navient.\n", - "\n", - "comment list 2:\n", - "1. There is a charge on my credit report from HSBC that is over 10 years old. I have contacted the company and asked for the contract showing this is a valid debit and they have refused to send what I am asking. All they have sent me is a statement telling me this is a valid date, but no signed contract.\n", - "2. Convergent Outsourcing is attempting to collect on an account that I have no knowledge of and that I have already reported to the credit bureaus as not being my account. I have contacted them asking that they validate not verify the debt that they are attempting to collect from me and derogatorily reporting on my credit reports. I specifically requested signed contracts or other supporting documentation, the only thing that I keep receiving back is that the account has been verified which does not prove that I am obligated to pay them anything which is not the truth because this account does not belong to me. They have reported delinquent information to the credit bureaus since XX/XX/2016 I am asking be deleted.\n", - "3. Hi I am submitting this XXXX XXXX XXXX this isn't any influence and this is not a third party. XXXX has low and unfair credit number for me in their report. I have complained. The problem has not been resolved. my fico has me at a credit score over 719XXXX has me at a score around 590. That is a huge difference. XXXX paints me as a XXXX. my fico say I have good credit. What the heck is going on here. i have almost no debt and my identity was stolen causing my score to drop XXXX i made this clear for 60 days straight with XXXX i spoke to a representative agent name XXXX and XXXX and XXXX from the fraud department I prefer to speak to a XXXX rept but they refused they had me on mute for 4 hours which was hurtful I have a perfect repayment record. I have very low credit utilization. I have three negative credit items outstanding debt now. I have modest but ok income. Social Security. Something is wrong with XXXX. I do not understand why they are abusing consumers .This was a fist step towards attempting resolution. They kept lying telling me they disputed n its not reporting but it keep reporting this inaccurate information without my authorization. They refused or were unable to verify n remove the inquiries and its been 60days n they record the calls n admitted they had my police report n ftc and affidavit That was after attempting to contact XXXX more than 21 times. XXXX is an abusive company. They are supposed to be protecting consumers. They need to be reigned in. they are causing me severe XXXX and stopping me from getting this job offer XXXX now XXXX XXXX XXXX cant provide to my XXXXXXXX XXXX XXXX daughter PLEASE HELP ME PLEASE XXXX XXXX now.with no help.\n", - "4. On XX/XX/XXXX, I recieved a report from XXXX XXXX XXXX XXXX XXXX, XXXX, MD XXXX XXXX ) XXXX, which indicated a closed account from XXXX XXXX auto opened in XX/XX/XXXX, but was removed from my credit report in XXXX, due to being older than 7 years I recieved this credit alert from equifax, XXXX in XXXX that it fell off my credit report. on XX/XX/XXXX, I see it's been placed back on my credit report in XX/XX/XXXX by this agency and when I logged in to see Equifax credit report and look at my closed accounts, XXXX XXXX shows ( Closed Account ) but it's there to view and it's been there for XXXX year and XXXX months, so my complaint is why is it there it shouldn't even show, for XXXX years XXXX months they've had this on my report, I want it removed because eventhough these account show closed, they are still sending out old information that should not be reported. This is causing me to pay more and keeping my credit score down, please enforce this and make them remove any and all closed accounts. Their disclaimer even states that these accounts are removed after 7 years, it's been XXXX, they should remove all of those closed accounts that way this will not happen again, and I'm asking that they be sued because this keeps certain groups of people credit scores down and that's discrimination, its also fraudulent because on one hand their telling us that this information is not being reported it's closed yet it shows up, so they are lying.\n", - "5. -In XX/XX/XXXX, I was sent a notice to my address in Michigan by XXXX XXXX XXXX XXXX XXXX that my debt ( collected after having to live off my card due to house and joblessness ) had been sold to Portfolio Recovery Associates , LLC for {$2200.00}. As I had already moved to Florida, this letter was not forwarded to me in a timely manner. At this time, it showed up on my credit report as a collection debt. \n", - "\n", - "-Once I had obtained the supplemental information provided by Portfolio Recovery Associates LLC as proof of veracity of claim after disputing the collection, I was able to see the aforementioned notice, as well as a statement from Portfolio Recovery Associates containing my account number, the ( now corrected after an updated credit report request by them ) Florida address it was sent to, amount owed, contact information, etc. When I contacted Portfolio Recovery Associates, I was told they could not give me any further information because it had been transferred to be litigated. \n", - "\n", - "-Three years to the day XXXX XX/XX/XXXX XXXX and two states later, apparently a lawsuit was filed against me in an attempt to collect the debt. I was never served a summons. Once I found out about and looked up the case, I saw they had the correct address but an incorrect name, one that was corrected with the credit bureaus two months after said debt was sold to PRA. Thus, the notice of 'summons returned served ' in the court review is incorrect. Once I learned of the suit, I submitted the necessary paper work on my behalf. After that, I didn't hear from them, nor did an updated search return anything. \n", - "\n", - "-In XXXX of this year, I received a notice from the XXXX XXXX XXXX XXXX XXXX stating that the case was to be closed in a month due to lack of prosecution if no action takes place before then. Just before that dead line, the attorneys for PRA XXXX XXXX XXXX XXXX XXXX filed a motion to transfer the case to XXXX XXXX, citing that this was where they ( in fact did not ) serve the original summons. Once this was granted, they received a letter from the Clerk of Court stating they had 30 days to pay the transfer fees or the case would be dismissed. Two days before that due date, they submitted payment at the last minute. As of yet, I have not seen nor received anything from the XXXX XXXX XXXX XXXX regarding the matter.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n", "prompt1 = 'comment list 1:\\n'\n", @@ -776,46 +324,11 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "id": "BfHGJLirzSvH" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Please highlight the most obvious difference betweenthe two lists of comments:\n", - "comment list 1:\n", - "1. I bought my home XX/XX/XXXX for the amount of {$220000.00}. The home was appraised closing with a value of {$260000.00} at closing. When purchasing the home I did not provide a downpayment in the amount of 20 % of the home value, therefore I had to purchase private mortgage insurance ( P.M.I. ) on the home until 20 % of the home value was paid off. 20 % of {$260000.00} is {$53000.00}. This means I would have to owe ( $ XXXX- {$53000.00} ) {$210000.00} or less for the P.M.I. to be taken off of my monthly mortgage payments. According to law, the lender should take the P.M.I. off of my loan once the 20 % is met. At the time of closing my borrower did not provide me a PMI disclosure form to identify when the 20 % mark would be met. \n", - "\n", - "When closing on my home my loan was thru XXXX XXXX XXXX, for the past 5+ years my loan was taken over by Wells Fargo Home Mortgage and they are my current lenders. I have never missed or been late on a mortgage payment. \n", - "\n", - "In XX/XX/XXXX I reached the 20 % mark on the value of my home. Starting XX/XX/XXXX I have owed {$210000.00} or less on my home. As of XX/XX/XXXX I owe {$190000.00}, this is far beyond owning 80 % or less of my home value. \n", - "\n", - "I have reached out to Wells Fargo to have the PMI removed from my mortgage payments, but they refuse. Wells Fargo has stated that I must pay off 22 % of the \" loan '' before PMI can be taken off, and that the 20 % is not based on the value of the home at closing. \n", - "\n", - "It was never identified to me at closing that the 20 % was based the \" value of the loan '' and not the appraised value of the home at closing. This information is new to me and in XX/XX/XXXX I do not believe this was the agreement at closing. I would like to receive some evidence that I agreed to an otherwise condition against having the PMI be based on the value of the home at closing.\n", - "2. I paid my two months mortgage amount of {$3300.00} ( XX/XX/2020 ) and {$3300.00} ( XX/XX/2020 ) to my lender ( XXXX XXXX - XXXX XXXX ). Then I also received another payment notice from TIAA Bank that said my loan was sold to them on XX/XX/2020. I have not received any Goodbye Letter from my lender, nor did my Welcome Letter from TIAA Bank. I provided my bank statement for those two payments to TIAA Bank shows the proof of payments, and never got a reply from them. I called XXXX XXXX and request the Goodbye letter, which indicates - The servicing of your mortgage loan is being transferred, effective XX/XX/2020. My complaint is the lack of information when the loan was transferred to one servicer to another. I am not properly informed my loan had been transferred. As a result, payments made to either the prior or current servicer around the time of the transfer were not applied to the account.\n", - "3. On XX/XX/XXXX I called Quicken Loans to inquire about Refinancing options. I was in the process of an application with another lender and I was unhappy with their terms, etc. I spoke with XXXX XXXX of Quicken who convinced me to move my business over to Quicken. He stated the refinance would only take approx 30 days to complete. I did so that day and within 2 days had completed all paperwork requirements etc. I was now waiting for the appraisal to take place. Weeks went by and i never heard from anyone. I placed numerous calls, emails, chats to different people and was told that it was a delay due to \" volume ''. Finally, on XXXX the appraisal was completed. Again, silence for days/weeks afterwards. On XXXX I called and spoke with a rep who said the appraisal was \" in hand '' and Quicken and the appraiser have been in constant contact discussing some issues in the report, specifically regarding a \" capping of a water line '' and a possible \" apartment ''. I asked to see the report and she said it will post shortly to my account. It never posted. I again never heard from anyone at Quicken. I called, I emailed, I chatted with Quicken and was repeatedly told the report is not yet finalized. They also told me there were NO ISSUES regarding the appraisal -- the delay was purely due to volume. I explained that someone already told me of a possible issue and every person i spoke to denied this fact. Finally, on XX/XX/XXXX the \" dashboard '' for my account with Quicken was updated ( still no word from anyone and no copy of my appraisal was provided ) and it showed a drastic change to my refinance # s. My loan amount was reduced by {$30000.00} approx. and my debt to be paid off with the loan were removed. The entire formulation of the refinance was changed without any explanation or notice to me. I called my banker, I called customer solutions, etc. again and now i was told, \" Oh, the appraisal came in low '' So, bottom line is ... .you need {$13000.00} to {$15000.00} cash to settle this loan at the new rate. '' Again, no mention of the issue with the apartment. I asked for a copy of the appraisal and was finally sent a copy on XXXX. My issue with Quicken is : 1. They took 4 weeks to send an appraiser. 2. They had the results of the appraisal since early XXXX but repeatedly lied to me stating the appraisal had never been seen and was still being created. They strung me along for 8 weeks. I lost my other offer. They \" low balled '' my appraisal with XXXX in order to \" kill my deal '' for reasons other than the apartment conditions. They did not want to take on this loan but they knew they had strung me along for 8 weeks and figured the low appraisal would be their ticket out. I have a XXXX bed XXXX bath home and comps are all running in my immediate area for $ XXXX and they came back with an appraisal of {$400000.00}. Absurd. My kitchen and bathrooms were completely remodeled. {$400000.00} is a ridiculous appraisal and they know it. I sent them XXXX recent comps in my immediate neighborhood of $ XXXX sold values. When i did that, then and only then did they say, \" well ... you have to satisfy the conditions of the appraisal also. Those conditions are ... rip out the cabinets, sink etc in lower level or obtain a C/O/permit. \" Quicken had no intention of closing on this refinance ... they knew that was the course they were taking in early XXXX but they chose to string me along for a total of 8 weeks. It was ONLY due to my insistence on XX/XX/XXXX that this issue be addressed that they finally showed me a copy of the appraisal. I lost my other connection/relationship and find their practices unfair and self-serving. No one should have to go through this again, hopefully. I have copies of emails, chats, etc. showing how they lied to me continuously and misled me. I hope this casts a shadow on their reputation and makes them reconsider their business practices. Thanks\n", - "4. Hi, I was looking into buying a home. I never took the step to get pre qualified for a loan because it obviously needed more thought. I would look on XXXX everyday to see if houses were within my budget and eventually I started dealing with a real estate agent. Before she can look for homes she suggested that I get prequalified for a loan with an associate of hers in that field. I started the application but never finished it just because I was unsure if I would get approved or not then come to find out the house I was interested in ( XXXX ) had gotten sold so I let the idea of buying a house go a bit. I get a message from the loan officer that she ran my credit she has a couple questions about my income. I call back instantly wondering why did you run my credit without authorization theres a reason I never finished the application I did not want a hard inquiry and I had kind of backed off. She proceeded to ask do I want to know the results and annoyed obviously I asked well you ran my credit without permission im going to have a hard inquiry on my report now. She never gave me an explanation on WHY she ran my credit & now I lost a lot of points that I work hard for. I really want a solution to this problem because its not right to just run someones credit after the application is obviously not completed.\n", - "5. I called on XXXX XXXX requesting options on how to lower my principal amount. Unfortunately, I went into Income dri ven Repayment program but instant of seeing a deduction of my principal, I see an increase over and over. The amount stills $ XXXX since my graduation date back in XXXX . I mentioned that I worked for th e XXXX XXXX XXXX for 8 years if any portion of my loan could be forgive, they said no. Their option was to add more funds to my payments, which is totally ridiculously. Specially, now that I found out that my position will be eliminated in XXXX XXXX . Therefore, I will be unemployed after XXXX XXXX . It saddens my heart that something that I did to better myself, pursue an XXXX had cost me such of major debt and nobody is willing to help me. I did n't said that I was n't going to pay, I was seeking for assistance on how to lowered my principal and eventually payoff my debt. The education format from XXXX is heavily criticized, all funds paid and was n't top notch education!! Please help me understand why I could n't get any positive outcome fro m Navient.\n", - "comment list 2:\n", - "1. There is a charge on my credit report from HSBC that is over 10 years old. I have contacted the company and asked for the contract showing this is a valid debit and they have refused to send what I am asking. All they have sent me is a statement telling me this is a valid date, but no signed contract.\n", - "2. Convergent Outsourcing is attempting to collect on an account that I have no knowledge of and that I have already reported to the credit bureaus as not being my account. I have contacted them asking that they validate not verify the debt that they are attempting to collect from me and derogatorily reporting on my credit reports. I specifically requested signed contracts or other supporting documentation, the only thing that I keep receiving back is that the account has been verified which does not prove that I am obligated to pay them anything which is not the truth because this account does not belong to me. They have reported delinquent information to the credit bureaus since XX/XX/2016 I am asking be deleted.\n", - "3. Hi I am submitting this XXXX XXXX XXXX this isn't any influence and this is not a third party. XXXX has low and unfair credit number for me in their report. I have complained. The problem has not been resolved. my fico has me at a credit score over 719XXXX has me at a score around 590. That is a huge difference. XXXX paints me as a XXXX. my fico say I have good credit. What the heck is going on here. i have almost no debt and my identity was stolen causing my score to drop XXXX i made this clear for 60 days straight with XXXX i spoke to a representative agent name XXXX and XXXX and XXXX from the fraud department I prefer to speak to a XXXX rept but they refused they had me on mute for 4 hours which was hurtful I have a perfect repayment record. I have very low credit utilization. I have three negative credit items outstanding debt now. I have modest but ok income. Social Security. Something is wrong with XXXX. I do not understand why they are abusing consumers .This was a fist step towards attempting resolution. They kept lying telling me they disputed n its not reporting but it keep reporting this inaccurate information without my authorization. They refused or were unable to verify n remove the inquiries and its been 60days n they record the calls n admitted they had my police report n ftc and affidavit That was after attempting to contact XXXX more than 21 times. XXXX is an abusive company. They are supposed to be protecting consumers. They need to be reigned in. they are causing me severe XXXX and stopping me from getting this job offer XXXX now XXXX XXXX XXXX cant provide to my XXXXXXXX XXXX XXXX daughter PLEASE HELP ME PLEASE XXXX XXXX now.with no help.\n", - "4. On XX/XX/XXXX, I recieved a report from XXXX XXXX XXXX XXXX XXXX, XXXX, MD XXXX XXXX ) XXXX, which indicated a closed account from XXXX XXXX auto opened in XX/XX/XXXX, but was removed from my credit report in XXXX, due to being older than 7 years I recieved this credit alert from equifax, XXXX in XXXX that it fell off my credit report. on XX/XX/XXXX, I see it's been placed back on my credit report in XX/XX/XXXX by this agency and when I logged in to see Equifax credit report and look at my closed accounts, XXXX XXXX shows ( Closed Account ) but it's there to view and it's been there for XXXX year and XXXX months, so my complaint is why is it there it shouldn't even show, for XXXX years XXXX months they've had this on my report, I want it removed because eventhough these account show closed, they are still sending out old information that should not be reported. This is causing me to pay more and keeping my credit score down, please enforce this and make them remove any and all closed accounts. Their disclaimer even states that these accounts are removed after 7 years, it's been XXXX, they should remove all of those closed accounts that way this will not happen again, and I'm asking that they be sued because this keeps certain groups of people credit scores down and that's discrimination, its also fraudulent because on one hand their telling us that this information is not being reported it's closed yet it shows up, so they are lying.\n", - "5. -In XX/XX/XXXX, I was sent a notice to my address in Michigan by XXXX XXXX XXXX XXXX XXXX that my debt ( collected after having to live off my card due to house and joblessness ) had been sold to Portfolio Recovery Associates , LLC for {$2200.00}. As I had already moved to Florida, this letter was not forwarded to me in a timely manner. At this time, it showed up on my credit report as a collection debt. \n", - "\n", - "-Once I had obtained the supplemental information provided by Portfolio Recovery Associates LLC as proof of veracity of claim after disputing the collection, I was able to see the aforementioned notice, as well as a statement from Portfolio Recovery Associates containing my account number, the ( now corrected after an updated credit report request by them ) Florida address it was sent to, amount owed, contact information, etc. When I contacted Portfolio Recovery Associates, I was told they could not give me any further information because it had been transferred to be litigated. \n", - "\n", - "-Three years to the day XXXX XX/XX/XXXX XXXX and two states later, apparently a lawsuit was filed against me in an attempt to collect the debt. I was never served a summons. Once I found out about and looked up the case, I saw they had the correct address but an incorrect name, one that was corrected with the credit bureaus two months after said debt was sold to PRA. Thus, the notice of 'summons returned served ' in the court review is incorrect. Once I learned of the suit, I submitted the necessary paper work on my behalf. After that, I didn't hear from them, nor did an updated search return anything. \n", - "\n", - "-In XXXX of this year, I received a notice from the XXXX XXXX XXXX XXXX XXXX stating that the case was to be closed in a month due to lack of prosecution if no action takes place before then. Just before that dead line, the attorneys for PRA XXXX XXXX XXXX XXXX XXXX filed a motion to transfer the case to XXXX XXXX, citing that this was where they ( in fact did not ) serve the original summons. Once this was granted, they received a letter from the Clerk of Court stating they had 30 days to pay the transfer fees or the case would be dismissed. Two days before that due date, they submitted payment at the last minute. As of yet, I have not seen nor received anything from the XXXX XXXX XXXX XXXX regarding the matter.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# The plain English request we will make of PaLM 2\n", "prompt = (\n", @@ -827,24 +340,11 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "id": "mL5P0_3X04dE" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 67a85808-9741-4ffa-9ac5-677a558bb5d7 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", @@ -853,20 +353,11 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "id": "ICWHsqAW1FNk" }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/google/home/henryjsolberg/bq/src/bigframes/venv/lib/python3.9/site-packages/pyarrow/pandas_compat.py:373: FutureWarning: is_sparse is deprecated and will be removed in a future version. Check `isinstance(dtype, pd.SparseDtype)` instead.\n", - " if _pandas_api.is_sparse(col):\n" - ] - } - ], + "outputs": [], "source": [ "# Make a DataFrame containing only a single row with our prompt for PaLM 2\n", "df = bpd.DataFrame({\"prompt\": [prompt]})" @@ -874,39 +365,11 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": { "id": "gB7e1LXU1pst" }, - "outputs": [ - { - "ename": "BadRequest", - "evalue": "400 POST https://bigquery.googleapis.com/bigquery/v2/projects/bigframes-dev/jobs?prettyPrint=false: Syntax error: Unclosed string literal at [5:104]\n\nLocation: us\nJob ID: 9b28df64-af3c-4dcc-b679-4300c3deab88\n [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[INVALID_INPUT] message=QUERY_ERROR: [Syntax error: Unclosed string literal at [5:104]] errorProto=code: \"QUERY_ERROR\"\\nargument: \"Syntax error: Unclosed string literal at [5:104]\"\\nlocation_type: OTHER\\nlocation: \"query\"\\n\\n\\tat com.google.cloud.helix.common.Exceptions.fromProto(Exceptions.java:2072)\\n\\tat com.google.cloud.helix.server.job.DremelErrorUtil.checkStatusWithDremelDetails(DremelErrorUtil.java:162)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQueryUncached(GoogleSqlQueryTransformer.java:527)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQuery(GoogleSqlQueryTransformer.java:511)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.validateQuery(GoogleSqlQueryTransformer.java:251)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkQuery(LocalQueryJobController.java:4331)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkInternal(LocalQueryJobController.java:4461)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkAsync(LocalQueryJobController.java:4415)\\n\\tat com.google.cloud.helix.server.job.LocalSqlJobController.checkAsync(LocalSqlJobController.java:125)\\n\\tat com.google.cloud.helix.server.job.LocalJobController.check(LocalJobController.java:1247)\\n\\tat com.google.cloud.helix.server.job.JobControllerModule$1.check(JobControllerModule.java:461)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine$1.check(JobStateMachine.java:3585)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2515)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.execute(JobStateMachine.java:2494)\\n\\tat com.google.cloud.helix.server.job.ApiJobStateChanger.execute(ApiJobStateChanger.java:33)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertNormalizedJob(HelixJobRosy.java:1998)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertJobInternal(HelixJobRosy.java:2467)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertInternal(HelixJobRosy.java:2492)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertRequestInternal(HelixJobRosy.java:3918)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insert(HelixJobRosy.java:3892)\\n\\tat jdk.internal.reflect.GeneratedMethodAccessor305.invoke(Unknown Source)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)\\n\\tat java.base/java.lang.reflect.Method.invoke(Unknown Source)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$innerContinuation$3(RpcRequestProxy.java:435)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestDapperHookFactory$TracingRequestHook.call(RosyRequestDapperHookFactory.java:88)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestCredsHookFactory$1.call(RosyRequestCredsHookFactory.java:56)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestConcurrentCallsHookFactory$Hook.call(RosyRequestConcurrentCallsHookFactory.java:101)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestVarzHookFactory$Hook.call(RosyRequestVarzHookFactory.java:464)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestAuditHookFactory$1.call(RosyRequestAuditHookFactory.java:110)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RequestSecurityExtensionForGwsHookFactory$1.call(RequestSecurityExtensionForGwsHookFactory.java:69)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestSecurityContextHookFactory$1.call(RosyRequestSecurityContextHookFactory.java:80)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestContextHookFactory.call(RosyRequestContextHookFactory.java:58)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.invoke(RpcRequestProxy.java:666)\\n\\tat com.sun.proxy.$Proxy52.insert(Unknown Source)\\n\\tat com.google.cloud.helix.proto.proto2api.HelixJobService$ServiceParameters$1.handleRequest(HelixJobService.java:917)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$RpcApplicationHandlerAdaptor.handleRequest(RpcServerInterceptor2Util.java:82)\\n\\tat com.google.net.rpc3.impl.server.AggregatedRpcServerInterceptors.interceptRpc(AggregatedRpcServerInterceptors.java:97)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$InterceptedApplicationHandlerImpl.handleRequest(RpcServerInterceptor2Util.java:67)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplicationWithCancellation(RpcServerInternalContext.java:686)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.lambda$runRpcInApplication$0(RpcServerInternalContext.java:651)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplication(RpcServerInternalContext.java:651)\\n\\tat com.google.net.rpc3.util.RpcInProcessConnector$ServerInternalContext.lambda$runWithExecutor$1(RpcInProcessConnector.java:1964)\\n\\tat com.google.common.context.ContextRunnable.runInContext(ContextRunnable.java:83)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.tracing.GenericContextCallback.runInInheritedContext(GenericContextCallback.java:75)\\n\\tat com.google.common.context.ContextRunnable.run(ContextRunnable.java:74)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\\n\\tat java.base/java.lang.Thread.run(Unknown Source)\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2514)\\n\\t\\t... 45 more\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\t... 41 more\\n'}]", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mBadRequest\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[19], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m major_difference \u001b[39m=\u001b[39m q_a_model\u001b[39m.\u001b[39;49mpredict(df)\n\u001b[1;32m 2\u001b[0m major_difference\n\u001b[1;32m 3\u001b[0m \u001b[39m#major_difference[\"ml_generate_text_llm_result\"].iloc[0]\u001b[39;00m\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:178\u001b[0m, in \u001b[0;36mPaLM2TextGenerator.predict\u001b[0;34m(self, X, temperature, max_output_tokens, top_k, top_p)\u001b[0m\n\u001b[1;32m 169\u001b[0m X \u001b[39m=\u001b[39m X\u001b[39m.\u001b[39mrename(columns\u001b[39m=\u001b[39m{col_label: \u001b[39m\"\u001b[39m\u001b[39mprompt\u001b[39m\u001b[39m\"\u001b[39m})\n\u001b[1;32m 171\u001b[0m options \u001b[39m=\u001b[39m {\n\u001b[1;32m 172\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mtemperature\u001b[39m\u001b[39m\"\u001b[39m: temperature,\n\u001b[1;32m 173\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mmax_output_tokens\u001b[39m\u001b[39m\"\u001b[39m: max_output_tokens,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 176\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mflatten_json_output\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39mTrue\u001b[39;00m,\n\u001b[1;32m 177\u001b[0m }\n\u001b[0;32m--> 178\u001b[0m df \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_bqml_model\u001b[39m.\u001b[39;49mgenerate_text(X, options)\n\u001b[1;32m 179\u001b[0m \u001b[39mreturn\u001b[39;00m cast(\n\u001b[1;32m 180\u001b[0m bpd\u001b[39m.\u001b[39mDataFrame,\n\u001b[1;32m 181\u001b[0m df[[_TEXT_GENERATE_RESULT_COLUMN]],\n\u001b[1;32m 182\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/core.py:105\u001b[0m, in \u001b[0;36mBqmlModel.generate_text\u001b[0;34m(self, input_data, options)\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mgenerate_text\u001b[39m(\n\u001b[1;32m 100\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 101\u001b[0m input_data: bpd\u001b[39m.\u001b[39mDataFrame,\n\u001b[1;32m 102\u001b[0m options: Mapping[\u001b[39mstr\u001b[39m, \u001b[39mint\u001b[39m \u001b[39m|\u001b[39m \u001b[39mfloat\u001b[39m],\n\u001b[1;32m 103\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m bpd\u001b[39m.\u001b[39mDataFrame:\n\u001b[1;32m 104\u001b[0m \u001b[39m# TODO: validate input data schema\u001b[39;00m\n\u001b[0;32m--> 105\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_apply_sql(\n\u001b[1;32m 106\u001b[0m input_data,\n\u001b[1;32m 107\u001b[0m \u001b[39mlambda\u001b[39;49;00m source_df: \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model_manipulation_sql_generator\u001b[39m.\u001b[39;49mml_generate_text(\n\u001b[1;32m 108\u001b[0m source_df\u001b[39m=\u001b[39;49msource_df,\n\u001b[1;32m 109\u001b[0m struct_options\u001b[39m=\u001b[39;49moptions,\n\u001b[1;32m 110\u001b[0m ),\n\u001b[1;32m 111\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/core.py:80\u001b[0m, in \u001b[0;36mBqmlModel._apply_sql\u001b[0;34m(self, input_data, func)\u001b[0m\n\u001b[1;32m 77\u001b[0m _, index_col_ids, index_labels \u001b[39m=\u001b[39m input_data\u001b[39m.\u001b[39m_to_sql_query(include_index\u001b[39m=\u001b[39m\u001b[39mTrue\u001b[39;00m)\n\u001b[1;32m 79\u001b[0m sql \u001b[39m=\u001b[39m func(input_data)\n\u001b[0;32m---> 80\u001b[0m df \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_session\u001b[39m.\u001b[39;49mread_gbq(sql, index_col\u001b[39m=\u001b[39;49mindex_col_ids)\n\u001b[1;32m 81\u001b[0m df\u001b[39m.\u001b[39mindex\u001b[39m.\u001b[39mnames \u001b[39m=\u001b[39m index_labels\n\u001b[1;32m 83\u001b[0m \u001b[39mreturn\u001b[39;00m df\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:290\u001b[0m, in \u001b[0;36mSession.read_gbq\u001b[0;34m(self, query_or_table, index_col, col_order, max_results)\u001b[0m\n\u001b[1;32m 279\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mread_gbq\u001b[39m(\n\u001b[1;32m 280\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 281\u001b[0m query_or_table: \u001b[39mstr\u001b[39m,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 287\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m dataframe\u001b[39m.\u001b[39mDataFrame:\n\u001b[1;32m 288\u001b[0m \u001b[39m# TODO(b/281571214): Generate prompt to show the progress of read_gbq.\u001b[39;00m\n\u001b[1;32m 289\u001b[0m \u001b[39mif\u001b[39;00m _is_query(query_or_table):\n\u001b[0;32m--> 290\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_read_gbq_query(\n\u001b[1;32m 291\u001b[0m query_or_table,\n\u001b[1;32m 292\u001b[0m index_col\u001b[39m=\u001b[39;49mindex_col,\n\u001b[1;32m 293\u001b[0m col_order\u001b[39m=\u001b[39;49mcol_order,\n\u001b[1;32m 294\u001b[0m max_results\u001b[39m=\u001b[39;49mmax_results,\n\u001b[1;32m 295\u001b[0m api_name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mread_gbq\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 296\u001b[0m )\n\u001b[1;32m 297\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 298\u001b[0m \u001b[39m# TODO(swast): Query the snapshot table but mark it as a\u001b[39;00m\n\u001b[1;32m 299\u001b[0m \u001b[39m# deterministic query so we can avoid serializing if we have a\u001b[39;00m\n\u001b[1;32m 300\u001b[0m \u001b[39m# unique index.\u001b[39;00m\n\u001b[1;32m 301\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_read_gbq_table(\n\u001b[1;32m 302\u001b[0m query_or_table,\n\u001b[1;32m 303\u001b[0m index_col\u001b[39m=\u001b[39mindex_col,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 306\u001b[0m api_name\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mread_gbq\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 307\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:432\u001b[0m, in \u001b[0;36mSession._read_gbq_query\u001b[0;34m(self, query, index_col, col_order, max_results, api_name)\u001b[0m\n\u001b[1;32m 429\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 430\u001b[0m index_cols \u001b[39m=\u001b[39m \u001b[39mlist\u001b[39m(index_col)\n\u001b[0;32m--> 432\u001b[0m destination, query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_query_to_destination(\n\u001b[1;32m 433\u001b[0m query,\n\u001b[1;32m 434\u001b[0m index_cols,\n\u001b[1;32m 435\u001b[0m api_name\u001b[39m=\u001b[39;49mapi_name,\n\u001b[1;32m 436\u001b[0m )\n\u001b[1;32m 438\u001b[0m \u001b[39m# If there was no destination table, that means the query must have\u001b[39;00m\n\u001b[1;32m 439\u001b[0m \u001b[39m# been DDL or DML. Return some job metadata, instead.\u001b[39;00m\n\u001b[1;32m 440\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m destination:\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:319\u001b[0m, in \u001b[0;36mSession._query_to_destination\u001b[0;34m(self, query, index_cols, api_name)\u001b[0m\n\u001b[1;32m 317\u001b[0m dry_run_config \u001b[39m=\u001b[39m bigquery\u001b[39m.\u001b[39mQueryJobConfig()\n\u001b[1;32m 318\u001b[0m dry_run_config\u001b[39m.\u001b[39mdry_run \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 319\u001b[0m _, dry_run_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_start_query(query, job_config\u001b[39m=\u001b[39;49mdry_run_config)\n\u001b[1;32m 320\u001b[0m \u001b[39mif\u001b[39;00m dry_run_job\u001b[39m.\u001b[39mstatement_type \u001b[39m!=\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mSELECT\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m 321\u001b[0m _, query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_start_query(query)\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/session/__init__.py:1523\u001b[0m, in \u001b[0;36mSession._start_query\u001b[0;34m(self, sql, job_config, max_results)\u001b[0m\n\u001b[1;32m 1519\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1520\u001b[0m \u001b[39mStarts query job and waits for results.\u001b[39;00m\n\u001b[1;32m 1521\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1522\u001b[0m job_config \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_prepare_job_config(job_config)\n\u001b[0;32m-> 1523\u001b[0m query_job \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mbqclient\u001b[39m.\u001b[39;49mquery(sql, job_config\u001b[39m=\u001b[39;49mjob_config)\n\u001b[1;32m 1525\u001b[0m opts \u001b[39m=\u001b[39m bigframes\u001b[39m.\u001b[39moptions\u001b[39m.\u001b[39mdisplay\n\u001b[1;32m 1526\u001b[0m \u001b[39mif\u001b[39;00m opts\u001b[39m.\u001b[39mprogress_bar \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m \u001b[39mand\u001b[39;00m \u001b[39mnot\u001b[39;00m query_job\u001b[39m.\u001b[39mconfiguration\u001b[39m.\u001b[39mdry_run:\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/client.py:3403\u001b[0m, in \u001b[0;36mClient.query\u001b[0;34m(self, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry, api_method)\u001b[0m\n\u001b[1;32m 3392\u001b[0m \u001b[39mreturn\u001b[39;00m _job_helpers\u001b[39m.\u001b[39mquery_jobs_query(\n\u001b[1;32m 3393\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[1;32m 3394\u001b[0m query,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 3400\u001b[0m job_retry,\n\u001b[1;32m 3401\u001b[0m )\n\u001b[1;32m 3402\u001b[0m \u001b[39melif\u001b[39;00m api_method \u001b[39m==\u001b[39m enums\u001b[39m.\u001b[39mQueryApiMethod\u001b[39m.\u001b[39mINSERT:\n\u001b[0;32m-> 3403\u001b[0m \u001b[39mreturn\u001b[39;00m _job_helpers\u001b[39m.\u001b[39;49mquery_jobs_insert(\n\u001b[1;32m 3404\u001b[0m \u001b[39mself\u001b[39;49m,\n\u001b[1;32m 3405\u001b[0m query,\n\u001b[1;32m 3406\u001b[0m job_config,\n\u001b[1;32m 3407\u001b[0m job_id,\n\u001b[1;32m 3408\u001b[0m job_id_prefix,\n\u001b[1;32m 3409\u001b[0m location,\n\u001b[1;32m 3410\u001b[0m project,\n\u001b[1;32m 3411\u001b[0m retry,\n\u001b[1;32m 3412\u001b[0m timeout,\n\u001b[1;32m 3413\u001b[0m job_retry,\n\u001b[1;32m 3414\u001b[0m )\n\u001b[1;32m 3415\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 3416\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mGot unexpected value for api_method: \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mrepr\u001b[39m(api_method)\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/_job_helpers.py:114\u001b[0m, in \u001b[0;36mquery_jobs_insert\u001b[0;34m(client, query, job_config, job_id, job_id_prefix, location, project, retry, timeout, job_retry)\u001b[0m\n\u001b[1;32m 111\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m 112\u001b[0m \u001b[39mreturn\u001b[39;00m query_job\n\u001b[0;32m--> 114\u001b[0m future \u001b[39m=\u001b[39m do_query()\n\u001b[1;32m 115\u001b[0m \u001b[39m# The future might be in a failed state now, but if it's\u001b[39;00m\n\u001b[1;32m 116\u001b[0m \u001b[39m# unrecoverable, we'll find out when we ask for it's result, at which\u001b[39;00m\n\u001b[1;32m 117\u001b[0m \u001b[39m# point, we may retry.\u001b[39;00m\n\u001b[1;32m 118\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m job_id_given:\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/_job_helpers.py:91\u001b[0m, in \u001b[0;36mquery_jobs_insert..do_query\u001b[0;34m()\u001b[0m\n\u001b[1;32m 88\u001b[0m query_job \u001b[39m=\u001b[39m job\u001b[39m.\u001b[39mQueryJob(job_ref, query, client\u001b[39m=\u001b[39mclient, job_config\u001b[39m=\u001b[39mjob_config)\n\u001b[1;32m 90\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 91\u001b[0m query_job\u001b[39m.\u001b[39;49m_begin(retry\u001b[39m=\u001b[39;49mretry, timeout\u001b[39m=\u001b[39;49mtimeout)\n\u001b[1;32m 92\u001b[0m \u001b[39mexcept\u001b[39;00m core_exceptions\u001b[39m.\u001b[39mConflict \u001b[39mas\u001b[39;00m create_exc:\n\u001b[1;32m 93\u001b[0m \u001b[39m# The thought is if someone is providing their own job IDs and they get\u001b[39;00m\n\u001b[1;32m 94\u001b[0m \u001b[39m# their job ID generation wrong, this could end up returning results for\u001b[39;00m\n\u001b[1;32m 95\u001b[0m \u001b[39m# the wrong query. We thus only try to recover if job ID was not given.\u001b[39;00m\n\u001b[1;32m 96\u001b[0m \u001b[39mif\u001b[39;00m job_id_given:\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/query.py:1310\u001b[0m, in \u001b[0;36mQueryJob._begin\u001b[0;34m(self, client, retry, timeout)\u001b[0m\n\u001b[1;32m 1290\u001b[0m \u001b[39m\u001b[39m\u001b[39m\"\"\"API call: begin the job via a POST request\u001b[39;00m\n\u001b[1;32m 1291\u001b[0m \n\u001b[1;32m 1292\u001b[0m \u001b[39mSee\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1306\u001b[0m \u001b[39m ValueError: If the job has already begun.\u001b[39;00m\n\u001b[1;32m 1307\u001b[0m \u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 1309\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m-> 1310\u001b[0m \u001b[39msuper\u001b[39;49m(QueryJob, \u001b[39mself\u001b[39;49m)\u001b[39m.\u001b[39;49m_begin(client\u001b[39m=\u001b[39;49mclient, retry\u001b[39m=\u001b[39;49mretry, timeout\u001b[39m=\u001b[39;49mtimeout)\n\u001b[1;32m 1311\u001b[0m \u001b[39mexcept\u001b[39;00m exceptions\u001b[39m.\u001b[39mGoogleAPICallError \u001b[39mas\u001b[39;00m exc:\n\u001b[1;32m 1312\u001b[0m exc\u001b[39m.\u001b[39mmessage \u001b[39m=\u001b[39m _EXCEPTION_FOOTER_TEMPLATE\u001b[39m.\u001b[39mformat(\n\u001b[1;32m 1313\u001b[0m message\u001b[39m=\u001b[39mexc\u001b[39m.\u001b[39mmessage, location\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mlocation, job_id\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mjob_id\n\u001b[1;32m 1314\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py:693\u001b[0m, in \u001b[0;36m_AsyncJob._begin\u001b[0;34m(self, client, retry, timeout)\u001b[0m\n\u001b[1;32m 690\u001b[0m \u001b[39m# jobs.insert is idempotent because we ensure that every new\u001b[39;00m\n\u001b[1;32m 691\u001b[0m \u001b[39m# job has an ID.\u001b[39;00m\n\u001b[1;32m 692\u001b[0m span_attributes \u001b[39m=\u001b[39m {\u001b[39m\"\u001b[39m\u001b[39mpath\u001b[39m\u001b[39m\"\u001b[39m: path}\n\u001b[0;32m--> 693\u001b[0m api_response \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39;49m_call_api(\n\u001b[1;32m 694\u001b[0m retry,\n\u001b[1;32m 695\u001b[0m span_name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mBigQuery.job.begin\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 696\u001b[0m span_attributes\u001b[39m=\u001b[39;49mspan_attributes,\n\u001b[1;32m 697\u001b[0m job_ref\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m,\n\u001b[1;32m 698\u001b[0m method\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mPOST\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 699\u001b[0m path\u001b[39m=\u001b[39;49mpath,\n\u001b[1;32m 700\u001b[0m data\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mto_api_repr(),\n\u001b[1;32m 701\u001b[0m timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m 702\u001b[0m )\n\u001b[1;32m 703\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_set_properties(api_response)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/bigquery/client.py:813\u001b[0m, in \u001b[0;36mClient._call_api\u001b[0;34m(self, retry, span_name, span_attributes, job_ref, headers, **kwargs)\u001b[0m\n\u001b[1;32m 809\u001b[0m \u001b[39mif\u001b[39;00m span_name \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m 810\u001b[0m \u001b[39mwith\u001b[39;00m create_span(\n\u001b[1;32m 811\u001b[0m name\u001b[39m=\u001b[39mspan_name, attributes\u001b[39m=\u001b[39mspan_attributes, client\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m, job_ref\u001b[39m=\u001b[39mjob_ref\n\u001b[1;32m 812\u001b[0m ):\n\u001b[0;32m--> 813\u001b[0m \u001b[39mreturn\u001b[39;00m call()\n\u001b[1;32m 815\u001b[0m \u001b[39mreturn\u001b[39;00m call()\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:349\u001b[0m, in \u001b[0;36mRetry.__call__..retry_wrapped_func\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 345\u001b[0m target \u001b[39m=\u001b[39m functools\u001b[39m.\u001b[39mpartial(func, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 346\u001b[0m sleep_generator \u001b[39m=\u001b[39m exponential_sleep_generator(\n\u001b[1;32m 347\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initial, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maximum, multiplier\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_multiplier\n\u001b[1;32m 348\u001b[0m )\n\u001b[0;32m--> 349\u001b[0m \u001b[39mreturn\u001b[39;00m retry_target(\n\u001b[1;32m 350\u001b[0m target,\n\u001b[1;32m 351\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_predicate,\n\u001b[1;32m 352\u001b[0m sleep_generator,\n\u001b[1;32m 353\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_timeout,\n\u001b[1;32m 354\u001b[0m on_error\u001b[39m=\u001b[39;49mon_error,\n\u001b[1;32m 355\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:191\u001b[0m, in \u001b[0;36mretry_target\u001b[0;34m(target, predicate, sleep_generator, timeout, on_error, **kwargs)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[39mfor\u001b[39;00m sleep \u001b[39min\u001b[39;00m sleep_generator:\n\u001b[1;32m 190\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 191\u001b[0m \u001b[39mreturn\u001b[39;00m target()\n\u001b[1;32m 193\u001b[0m \u001b[39m# pylint: disable=broad-except\u001b[39;00m\n\u001b[1;32m 194\u001b[0m \u001b[39m# This function explicitly must deal with broad exceptions.\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m exc:\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/_http/__init__.py:494\u001b[0m, in \u001b[0;36mJSONConnection.api_request\u001b[0;34m(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)\u001b[0m\n\u001b[1;32m 482\u001b[0m response \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_make_request(\n\u001b[1;32m 483\u001b[0m method\u001b[39m=\u001b[39mmethod,\n\u001b[1;32m 484\u001b[0m url\u001b[39m=\u001b[39murl,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 490\u001b[0m extra_api_info\u001b[39m=\u001b[39mextra_api_info,\n\u001b[1;32m 491\u001b[0m )\n\u001b[1;32m 493\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39m200\u001b[39m \u001b[39m<\u001b[39m\u001b[39m=\u001b[39m response\u001b[39m.\u001b[39mstatus_code \u001b[39m<\u001b[39m \u001b[39m300\u001b[39m:\n\u001b[0;32m--> 494\u001b[0m \u001b[39mraise\u001b[39;00m exceptions\u001b[39m.\u001b[39mfrom_http_response(response)\n\u001b[1;32m 496\u001b[0m \u001b[39mif\u001b[39;00m expect_json \u001b[39mand\u001b[39;00m response\u001b[39m.\u001b[39mcontent:\n\u001b[1;32m 497\u001b[0m \u001b[39mreturn\u001b[39;00m response\u001b[39m.\u001b[39mjson()\n", - "\u001b[0;31mBadRequest\u001b[0m: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/bigframes-dev/jobs?prettyPrint=false: Syntax error: Unclosed string literal at [5:104]\n\nLocation: us\nJob ID: 9b28df64-af3c-4dcc-b679-4300c3deab88\n [{'@type': 'type.googleapis.com/google.rpc.DebugInfo', 'detail': '[INVALID_INPUT] message=QUERY_ERROR: [Syntax error: Unclosed string literal at [5:104]] errorProto=code: \"QUERY_ERROR\"\\nargument: \"Syntax error: Unclosed string literal at [5:104]\"\\nlocation_type: OTHER\\nlocation: \"query\"\\n\\n\\tat com.google.cloud.helix.common.Exceptions.fromProto(Exceptions.java:2072)\\n\\tat com.google.cloud.helix.server.job.DremelErrorUtil.checkStatusWithDremelDetails(DremelErrorUtil.java:162)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQueryUncached(GoogleSqlQueryTransformer.java:527)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.parseQuery(GoogleSqlQueryTransformer.java:511)\\n\\tat com.google.cloud.helix.server.job.GoogleSqlQueryTransformer.validateQuery(GoogleSqlQueryTransformer.java:251)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkQuery(LocalQueryJobController.java:4331)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkInternal(LocalQueryJobController.java:4461)\\n\\tat com.google.cloud.helix.server.job.LocalQueryJobController.checkAsync(LocalQueryJobController.java:4415)\\n\\tat com.google.cloud.helix.server.job.LocalSqlJobController.checkAsync(LocalSqlJobController.java:125)\\n\\tat com.google.cloud.helix.server.job.LocalJobController.check(LocalJobController.java:1247)\\n\\tat com.google.cloud.helix.server.job.JobControllerModule$1.check(JobControllerModule.java:461)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine$1.check(JobStateMachine.java:3585)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2515)\\n\\tat com.google.cloud.helix.server.job.JobStateMachine.execute(JobStateMachine.java:2494)\\n\\tat com.google.cloud.helix.server.job.ApiJobStateChanger.execute(ApiJobStateChanger.java:33)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertNormalizedJob(HelixJobRosy.java:1998)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertJobInternal(HelixJobRosy.java:2467)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertInternal(HelixJobRosy.java:2492)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insertRequestInternal(HelixJobRosy.java:3918)\\n\\tat com.google.cloud.helix.server.job.rosy.HelixJobRosy.insert(HelixJobRosy.java:3892)\\n\\tat jdk.internal.reflect.GeneratedMethodAccessor305.invoke(Unknown Source)\\n\\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)\\n\\tat java.base/java.lang.reflect.Method.invoke(Unknown Source)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$innerContinuation$3(RpcRequestProxy.java:435)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestDapperHookFactory$TracingRequestHook.call(RosyRequestDapperHookFactory.java:88)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestCredsHookFactory$1.call(RosyRequestCredsHookFactory.java:56)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestConcurrentCallsHookFactory$Hook.call(RosyRequestConcurrentCallsHookFactory.java:101)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestVarzHookFactory$Hook.call(RosyRequestVarzHookFactory.java:464)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestAuditHookFactory$1.call(RosyRequestAuditHookFactory.java:110)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RequestSecurityExtensionForGwsHookFactory$1.call(RequestSecurityExtensionForGwsHookFactory.java:69)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RosyRequestSecurityContextHookFactory$1.call(RosyRequestSecurityContextHookFactory.java:80)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.server.rosy.RosyRequestContextHookFactory.call(RosyRequestContextHookFactory.java:58)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.lambda$makeContinuation$4(RpcRequestProxy.java:461)\\n\\tat com.google.cloud.helix.common.rosy.RpcRequestProxy.invoke(RpcRequestProxy.java:666)\\n\\tat com.sun.proxy.$Proxy52.insert(Unknown Source)\\n\\tat com.google.cloud.helix.proto.proto2api.HelixJobService$ServiceParameters$1.handleRequest(HelixJobService.java:917)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$RpcApplicationHandlerAdaptor.handleRequest(RpcServerInterceptor2Util.java:82)\\n\\tat com.google.net.rpc3.impl.server.AggregatedRpcServerInterceptors.interceptRpc(AggregatedRpcServerInterceptors.java:97)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInterceptor2Util$InterceptedApplicationHandlerImpl.handleRequest(RpcServerInterceptor2Util.java:67)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplicationWithCancellation(RpcServerInternalContext.java:686)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.lambda$runRpcInApplication$0(RpcServerInternalContext.java:651)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.net.rpc3.impl.server.RpcServerInternalContext.runRpcInApplication(RpcServerInternalContext.java:651)\\n\\tat com.google.net.rpc3.util.RpcInProcessConnector$ServerInternalContext.lambda$runWithExecutor$1(RpcInProcessConnector.java:1964)\\n\\tat com.google.common.context.ContextRunnable.runInContext(ContextRunnable.java:83)\\n\\tat io.grpc.Context.run(Context.java:536)\\n\\tat com.google.tracing.GenericContextCallback.runInInheritedContext(GenericContextCallback.java:75)\\n\\tat com.google.common.context.ContextRunnable.run(ContextRunnable.java:74)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\\n\\tat java.base/java.lang.Thread.run(Unknown Source)\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\tat com.google.cloud.helix.server.job.JobStateMachine.dryRunJob(JobStateMachine.java:2514)\\n\\t\\t... 45 more\\n\\tSuppressed: java.lang.Exception: Including call stack from HelixFutures\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.getHelixException(HelixFutures.java:76)\\n\\t\\tat com.google.cloud.helix.common.HelixFutures.get(HelixFutures.java:42)\\n\\t\\t... 41 more\\n'}]" - ] - } - ], + "outputs": [], "source": [ "# Send the request for PaLM 2 to generate a response to our prompt\n", "major_difference = q_a_model.predict(df)\n", From 07493c7af76fdfc51f7f31d26c9b1dccd3b3b133 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 06:04:05 +0000 Subject: [PATCH 03/26] Add more subtitles --- .../bq_dataframes_llm_kmeans.ipynb | 26 ++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index eb521a182e..f5b6cc1ccd 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -109,7 +109,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Project Setup" + "Project Setup" ] }, { @@ -242,6 +242,14 @@ "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Perform KMeans clustering" + ] + }, { "cell_type": "code", "execution_count": null, @@ -278,6 +286,14 @@ "## Step 3: Summarize the complaints" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Build prompts" + ] + }, { "cell_type": "code", "execution_count": null, @@ -338,6 +354,14 @@ "print(prompt)" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get a response from PaLM 2 LLM" + ] + }, { "cell_type": "code", "execution_count": null, From 66bae52e76f7e01cdd2947b54295fa99a53de1af Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 06:07:18 +0000 Subject: [PATCH 04/26] Improve one comment --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index f5b6cc1ccd..2151f5d8d2 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -166,7 +166,7 @@ }, "outputs": [], "source": [ - "# Choose 10,000 complaints randomly\n", + "# Choose 10,000 complaints randomly and store them in a column in a DataFrame\n", "downsampled_issues_df = issues_df.sample(n=10000)" ] }, From 124b1460b84d31d052e5bf7c8477899befe7aabb Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 17:28:56 +0000 Subject: [PATCH 05/26] code formating fix for one cell --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 2151f5d8d2..2ea8713ee1 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -195,7 +195,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "id": "cOuSOQ5FDewD" }, @@ -305,13 +305,13 @@ "# Using bigframes, with syntax identical to pandas,\n", "# filter out the first and second groups\n", "cluster_1_result = combined_clustered_result[\n", - " combined_clustered_result[\"CENTROID_ID\"] == 1][[\"consumer_complaint_narrative\"]\n", - "]\n", + " combined_clustered_result[\"CENTROID_ID\"] == 1\n", + "][[\"consumer_complaint_narrative\"]]\n", "cluster_1_result_pandas = cluster_1_result.head(5).to_pandas()\n", "\n", "cluster_2_result = combined_clustered_result[\n", - " combined_clustered_result[\"CENTROID_ID\"] == 2][[\"consumer_complaint_narrative\"]\n", - "]\n", + " combined_clustered_result[\"CENTROID_ID\"] == 2\n", + "][[\"consumer_complaint_narrative\"]]\n", "cluster_2_result_pandas = cluster_2_result.head(5).to_pandas()" ] }, From f32cce2bc180524406b09044b81a43620d0af208 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Tue, 7 Nov 2023 18:33:17 +0000 Subject: [PATCH 06/26] add some debug asserts --- .../bq_dataframes_llm_kmeans.ipynb | 73 ++++++++++++++++++- 1 file changed, 70 insertions(+), 3 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 2ea8713ee1..4f4bbab384 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -231,15 +231,82 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "metadata": { "id": "AhNTnEC5FRz2" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 910407c4-0edf-44d9-a6ea-c05a52e858fd is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job b2cb73f8-c8d4-4eab-9a83-a6f00617577e is DONE. 1.4 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 204dbe3e-7044-4d96-a6e8-9323d9059e29 is DONE. 1.4 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 23e88166-2913-4409-b4b5-b29198530cd0 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job d7ebebd7-bd23-4fd5-8289-6a62ac30d0b6 is DONE. 1.4 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "from bigframes.ml.cluster import KMeans\n", "\n", - "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups" + "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups\n", + "\n", + "assert len(combined_df[\"text_embedding\"].iloc[0]) == 768\n", + "assert len(combined_df[\"text_embedding\"].iloc[10]) == 768\n", + "assert len(combined_df[\"text_embedding\"].iloc[100]) == 768\n", + "assert len(combined_df[\"text_embedding\"].iloc[234]) == 768\n", + "assert len(combined_df[\"text_embedding\"].iloc[9999]) == 768" ] }, { From 392488cdf059f355bf770034004b8fbd4d0391f6 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:27:56 +0000 Subject: [PATCH 07/26] some fixes --- .../bq_dataframes_llm_kmeans.ipynb | 55 +++++++++++++++++-- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 4f4bbab384..89d56c73ae 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -61,7 +61,7 @@ "\n", "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 10000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", "2. Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don't yet know _why_ these complaints are similar.\n", - "3. Simply ask PaLM2TextGenerator in English what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n", + "3. Prompt PaLM2TextGenerator in English asking what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n", "\n", "We will tie these pieces together in Python using BigQuery DataFrames. [Click here](https://cloud.google.com/bigquery/docs/dataframes-quickstart) to learn more about BigQuery DataFrames!" ] @@ -87,13 +87,51 @@ "\n", "* BigQuery (compute)\n", "* BigQuery ML\n", + "* Generative AI support on Vertex AI\n", "\n", - "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models),\n", + "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing#generative_ai_models),\n", "and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#bqml),\n", "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Before you begin\n", + "\n", + "Complete the tasks in this section to set up your environment." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set up your Google Cloud project\n", + "\n", + "**The following steps are required, regardless of your notebook environment.**\n", + "\n", + "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n", + "\n", + "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", + "\n", + "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", + "\n", + " * BigQuery API\n", + " * BigQuery Connection API\n", + " * Cloud Functions API\n", + " * Cloud Run API\n", + " * Artifact Registry API\n", + " * Cloud Build API\n", + " * Cloud Resource Manager API\n", + " * Vertex AI API\n", + "\n", + "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -120,10 +158,7 @@ }, "outputs": [], "source": [ - "import bigframes.pandas as bpd\n", - "\n", - "bpd.options.bigquery.project = \"bigframes-dev\"\n", - "bpd.options.bigquery.location = \"us\"" + "import bigframes.pandas as bpd" ] }, { @@ -219,6 +254,14 @@ "combined_df = downsampled_issues_df.join(predicted_embeddings)" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now have the complaints and their text embeddings as two columns in our combined_df. Recall that complaints with numerically similar text embeddings should have similar meanings semantically. We will now group similar complaints together." + ] + }, { "attachments": {}, "cell_type": "markdown", From 0b283588a21b7b1a88a060d5d124e1f30eeaa5bf Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:32:16 +0000 Subject: [PATCH 08/26] add PROJECT_ID --- .../bq_dataframes_llm_kmeans.ipynb | 37 ++++++++++++++----- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 89d56c73ae..1089056f64 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -118,20 +118,36 @@ "\n", "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", "\n", - "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", + "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com) to enable the following APIs:\n", "\n", " * BigQuery API\n", " * BigQuery Connection API\n", - " * Cloud Functions API\n", - " * Cloud Run API\n", - " * Artifact Registry API\n", - " * Cloud Build API\n", - " * Cloud Resource Manager API\n", - " * Vertex AI API\n", - "\n", + " \n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Set your project ID\n", + "\n", + "**If you don't know your project ID**, try the following:\n", + "* Run `gcloud config list`.\n", + "* Run `gcloud projects list`.\n", + "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "PROJECT_ID = \"\" # @param {type:\"string\"}" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -158,7 +174,10 @@ }, "outputs": [], "source": [ - "import bigframes.pandas as bpd" + "import bigframes.pandas as bpd\n", + "\n", + "bpd.options.bigquery.project = PROJECT_ID\n", + "bpd.options.bigquery.location = \"us\"" ] }, { From 58bb925d376cb219610db0f11abbed16f168a72f Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:42:45 +0000 Subject: [PATCH 09/26] Add incomplete cleaning up step --- .../bq_dataframes_llm_kmeans.ipynb | 42 +++++++++++++++---- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 1089056f64..84e5f09d81 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -131,12 +131,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Set your project ID\n", + "#### Set your project ID and location\n", "\n", - "**If you don't know your project ID**, try the following:\n", - "* Run `gcloud config list`.\n", - "* Run `gcloud projects list`.\n", - "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)" + "**If you don't know your project ID**, see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)" ] }, { @@ -145,7 +142,8 @@ "metadata": {}, "outputs": [], "source": [ - "PROJECT_ID = \"\" # @param {type:\"string\"}" + "PROJECT_ID = \"\" # @param {type:\"string\"}\n", + "LOCATION = \"us\" # or your project location" ] }, { @@ -177,7 +175,7 @@ "import bigframes.pandas as bpd\n", "\n", "bpd.options.bigquery.project = PROJECT_ID\n", - "bpd.options.bigquery.location = \"us\"" + "bpd.options.bigquery.location = LOCATION" ] }, { @@ -529,6 +527,36 @@ "# PaLM 2's response is the only row in the dataframe result \n", "major_difference[\"ml_generate_text_llm_result\"].iloc[0]" ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleaning up\n", + "\n", + "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", + "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", + "\n", + "Otherwise, you can uncomment the remaining cells and run them to delete the individual resources you created in this tutorial:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO" + ] } ], "metadata": { From e9ae860b708a5e797d41edad795a4252fec1985f Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:44:05 +0000 Subject: [PATCH 10/26] use bigframes-dev project id as default --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 84e5f09d81..d0ed01b940 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -142,7 +142,7 @@ "metadata": {}, "outputs": [], "source": [ - "PROJECT_ID = \"\" # @param {type:\"string\"}\n", + "PROJECT_ID = \"bigframes-dev\" # your project name goes here\n", "LOCATION = \"us\" # or your project location" ] }, From 0a0b096d8c8c102a043a682ab790a0e8a066c3e1 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:44:59 +0000 Subject: [PATCH 11/26] clear all outputs --- .../bq_dataframes_llm_kmeans.ipynb | 67 +------------------ 1 file changed, 3 insertions(+), 64 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index d0ed01b940..471120f310 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -247,7 +247,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "id": "cOuSOQ5FDewD" }, @@ -291,72 +291,11 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": { "id": "AhNTnEC5FRz2" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 910407c4-0edf-44d9-a6ea-c05a52e858fd is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job b2cb73f8-c8d4-4eab-9a83-a6f00617577e is DONE. 1.4 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 204dbe3e-7044-4d96-a6e8-9323d9059e29 is DONE. 1.4 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 23e88166-2913-4409-b4b5-b29198530cd0 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job d7ebebd7-bd23-4fd5-8289-6a62ac30d0b6 is DONE. 1.4 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from bigframes.ml.cluster import KMeans\n", "\n", From fe43376666d7e3f04df63cab4ba9b72afea6c8f9 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:47:14 +0000 Subject: [PATCH 12/26] change set up md cell --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 471120f310..dcd65a745e 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -118,11 +118,8 @@ "\n", "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", "\n", - "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com) to enable the following APIs:\n", + "3. [Enable the BigQuery API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com).\n", "\n", - " * BigQuery API\n", - " * BigQuery Connection API\n", - " \n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." ] }, From c9c63c1871822850f40d44306f51aef13f8d181b Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 01:49:42 +0000 Subject: [PATCH 13/26] add set the region --- .../bq_dataframes_llm_kmeans.ipynb | 25 ++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index dcd65a745e..674cc0c340 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -128,7 +128,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Set your project ID and location\n", + "#### Set your project ID\n", "\n", "**If you don't know your project ID**, see the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)" ] @@ -139,8 +139,27 @@ "metadata": {}, "outputs": [], "source": [ - "PROJECT_ID = \"bigframes-dev\" # your project name goes here\n", - "LOCATION = \"us\" # or your project location" + "# set your project ID below\n", + "PROJECT_ID = \"\" # @param {type:\"string\"}" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Set the region\n", + "\n", + "You can also change the `REGION` variable used by BigQuery. Learn more about [BigQuery regions](https://cloud.google.com/bigquery/docs/locations#supported_locations)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "REGION = \"US\" # @param {type: \"string\"}" ] }, { From 80dd62be7ba323bbb6f96a1a785ece9ac4b726cf Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 02:14:07 +0000 Subject: [PATCH 14/26] add another narrative md cell --- .../generative_ai/bq_dataframes_llm_kmeans.ipynb | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 674cc0c340..7c223204f4 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -315,13 +315,7 @@ "source": [ "from bigframes.ml.cluster import KMeans\n", "\n", - "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups\n", - "\n", - "assert len(combined_df[\"text_embedding\"].iloc[0]) == 768\n", - "assert len(combined_df[\"text_embedding\"].iloc[10]) == 768\n", - "assert len(combined_df[\"text_embedding\"].iloc[100]) == 768\n", - "assert len(combined_df[\"text_embedding\"].iloc[234]) == 768\n", - "assert len(combined_df[\"text_embedding\"].iloc[9999]) == 768" + "cluster_model = KMeans(n_clusters=10) # We will divide our complaints into 10 groups" ] }, { @@ -358,6 +352,14 @@ "combined_clustered_result = combined_df.join(clustered_result)" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Our dataframe combined_clustered_result now has three columns: the complaints, their text embeddings, and an ID from 1-10 (inclusive) indicating which semantically similar group they belong to." + ] + }, { "attachments": {}, "cell_type": "markdown", From 299800fe9847131a353fae26713438cf1931eb55 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 04:12:08 +0000 Subject: [PATCH 15/26] add connection and cleanup --- .../bq_dataframes_llm_kmeans.ipynb | 157 +++++++++++++++++- 1 file changed, 153 insertions(+), 4 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 7c223204f4..1cae24d37c 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -140,7 +140,10 @@ "outputs": [], "source": [ "# set your project ID below\n", - "PROJECT_ID = \"\" # @param {type:\"string\"}" + "PROJECT_ID = \"\" # @param {type:\"string\"}\n", + "\n", + "# Set the project id in gcloud\n", + "! gcloud config set project {PROJECT_ID}" ] }, { @@ -162,6 +165,146 @@ "REGION = \"US\" # @param {type: \"string\"}" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Authenticate your Google Cloud account\n", + "\n", + "Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Vertex AI Workbench**\n", + "\n", + "Do nothing, you are already authenticated." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Local JupyterLab instance**\n", + "\n", + "Uncomment and run the following cell:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ! gcloud auth login" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Colab**\n", + "\n", + "Uncomment and run the following cell:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# from google.colab import auth\n", + "# auth.authenticate_user()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to Vertex AI\n", + "\n", + "In order to use PaLM2TextGenerator, we will need to set up a [cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from google.cloud import bigquery_connection_v1 as bq_connection\n", + "\n", + "CONN_NAME = \"bqdf-llm\"\n", + "\n", + "client = bq_connection.ConnectionServiceClient()\n", + "new_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}\"\n", + "exists_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", + "cloud_resource_properties = bq_connection.CloudResourceProperties({})\n", + "\n", + "try:\n", + " request = client.get_connection(\n", + " request=bq_connection.GetConnectionRequest(name=exists_conn_parent)\n", + " )\n", + " CONN_SERVICE_ACCOUNT = f\"serviceAccount:{request.cloud_resource.service_account_id}\"\n", + "except Exception:\n", + " connection = bq_connection.types.Connection(\n", + " {\"friendly_name\": CONN_NAME, \"cloud_resource\": cloud_resource_properties}\n", + " )\n", + " request = bq_connection.CreateConnectionRequest(\n", + " {\n", + " \"parent\": new_conn_parent,\n", + " \"connection_id\": CONN_NAME,\n", + " \"connection\": connection,\n", + " }\n", + " )\n", + " response = client.create_connection(request)\n", + " CONN_SERVICE_ACCOUNT = (\n", + " f\"serviceAccount:{response.cloud_resource.service_account_id}\"\n", + " )\n", + "print(CONN_SERVICE_ACCOUNT)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set permissions for the service account\n", + "\n", + "The resource connection service account requires certain project-level permissions:\n", + " - `roles/aiplatform.user` and `roles/bigquery.connectionUser`: These roles are required for the connection to create a model definition using the LLM model in Vertex AI ([documentation](https://cloud.google.com/bigquery/docs/generate-text#give_the_service_account_access)).\n", + " - `roles/run.invoker`: This role is required for the connection to have read-only access to Cloud Run services that back custom/remote functions ([documentation](https://cloud.google.com/bigquery/docs/remote-functions#grant_permission_on_function)).\n", + "\n", + "Set these permissions by running the following `gcloud` commands:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -191,7 +334,7 @@ "import bigframes.pandas as bpd\n", "\n", "bpd.options.bigquery.project = PROJECT_ID\n", - "bpd.options.bigquery.location = LOCATION" + "bpd.options.bigquery.location = REGION" ] }, { @@ -456,7 +599,8 @@ "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", - "q_a_model = PaLM2TextGenerator(connection_name=\"bigframes-dev.us.bigframes-ml\")" + "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", + "q_a_model = PaLM2TextGenerator(connection_name=connection)" ] }, { @@ -512,7 +656,12 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO" + "# # Delete the BigQuery Connection\n", + "# from google.cloud import bigquery_connection_v1 as bq_connection\n", + "# client = bq_connection.ConnectionServiceClient()\n", + "# CONNECTION_ID = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", + "# client.delete_connection(name=CONNECTION_ID)\n", + "# print(f\"Deleted connection '{CONNECTION_ID}'.\")" ] } ], From ba3c0b310c6c62a41f93b2f5ad72e288c1868693 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 04:15:48 +0000 Subject: [PATCH 16/26] add encouraging md cell --- .../bq_dataframes_llm_kmeans.ipynb | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 1cae24d37c..484f12d9ba 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -118,7 +118,16 @@ "\n", "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", "\n", - "3. [Enable the BigQuery API](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com).\n", + "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", + "\n", + " * BigQuery API\n", + " * BigQuery Connection API\n", + " * Cloud Functions API\n", + " * Cloud Run API\n", + " * Artifact Registry API\n", + " * Cloud Build API\n", + " * Cloud Resource Manager API\n", + " * Vertex AI API\n", "\n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." ] @@ -305,6 +314,14 @@ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we are ready to use BigQuery DataFrames!" + ] + }, { "attachments": {}, "cell_type": "markdown", From 606826a66c80a9d6b7ccb6e93e3927b91c932a1c Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 05:24:25 +0000 Subject: [PATCH 17/26] use bigframes-dev as default PROJECT_ID --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 484f12d9ba..c2e10e17cd 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -149,7 +149,7 @@ "outputs": [], "source": [ "# set your project ID below\n", - "PROJECT_ID = \"\" # @param {type:\"string\"}\n", + "PROJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n", "\n", "# Set the project id in gcloud\n", "! gcloud config set project {PROJECT_ID}" From bbbf66d98120c6e41f67a967ad327a224856f2b4 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 05:30:11 +0000 Subject: [PATCH 18/26] reduce number of complaints to 5000 --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index c2e10e17cd..0b5bfc0271 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -59,7 +59,7 @@ "\n", "The goal of this notebook is to demonstrate a comment characterization algorithm for an online business. We will accomplish this using [Google's PaLM 2](https://ai.google/discover/palm2/) and [KMeans clustering](https://en.wikipedia.org/wiki/K-means_clustering) in three steps:\n", "\n", - "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 10000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", + "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 5000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", "2. Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don't yet know _why_ these complaints are similar.\n", "3. Prompt PaLM2TextGenerator in English asking what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n", "\n", @@ -394,8 +394,8 @@ }, "outputs": [], "source": [ - "# Choose 10,000 complaints randomly and store them in a column in a DataFrame\n", - "downsampled_issues_df = issues_df.sample(n=10000)" + "# Choose 5,000 complaints randomly and store them in a column in a DataFrame\n", + "downsampled_issues_df = issues_df.sample(n=5000)" ] }, { @@ -429,7 +429,7 @@ }, "outputs": [], "source": [ - "# Will take ~5 minutes to compute the embeddings\n", + "# Will take ~3 minutes to compute the embeddings\n", "predicted_embeddings = model.predict(downsampled_issues_df)\n", "# Notice the lists of numbers that are our text embeddings for each complaint\n", "predicted_embeddings.head() " @@ -494,7 +494,7 @@ }, "outputs": [], "source": [ - "# Use KMeans clustering to calculate our groups. Will take ~5 minutes.\n", + "# Use KMeans clustering to calculate our groups. Will take ~3 minutes.\n", "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", "clustered_result = cluster_model.predict(combined_df[[\"text_embedding\"]])\n", "# Notice the CENTROID_ID column, which is the ID number of the group that\n", From 39d888d039f73f0111d71ecb88576df52fa794eb Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 16:23:35 +0000 Subject: [PATCH 19/26] add session to PaLM2TextGenerator --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 0b5bfc0271..5379f3c7f2 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -616,8 +616,9 @@ "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", + "session = bf.get_global_session()\n", "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", - "q_a_model = PaLM2TextGenerator(connection_name=connection)" + "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)" ] }, { From a0556ccb9869f26d59b1c5162cd162ec256e72ef Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 17:01:03 +0000 Subject: [PATCH 20/26] bpd -> bf --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 5379f3c7f2..50a0cf85dd 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -348,10 +348,10 @@ }, "outputs": [], "source": [ - "import bigframes.pandas as bpd\n", + "import bigframes.pandas as bf\n", "\n", - "bpd.options.bigquery.project = PROJECT_ID\n", - "bpd.options.bigquery.location = REGION" + "bf.options.bigquery.project = PROJECT_ID\n", + "bf.options.bigquery.location = REGION" ] }, { @@ -371,7 +371,7 @@ }, "outputs": [], "source": [ - "input_df = bpd.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" + "input_df = bf.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" ] }, { @@ -630,7 +630,7 @@ "outputs": [], "source": [ "# Make a DataFrame containing only a single row with our prompt for PaLM 2\n", - "df = bpd.DataFrame({\"prompt\": [prompt]})" + "df = bf.DataFrame({\"prompt\": [prompt]})" ] }, { From 578b38727e90df65dc4cbd89d190e9095e36281b Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 18:02:50 +0000 Subject: [PATCH 21/26] add resourcemanager.projects.setIamPolicy --- .../generative_ai/bq_dataframes_llm_code_generation.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb index 0f113b84c6..5cc9207561 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb @@ -444,7 +444,8 @@ "source": [ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'\n", + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/resourcemanager.projects.setIamPolicy'" ] }, { From 613cb20c5655b69d44b08f6f6ef1474117ed7f66 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 19:08:47 +0000 Subject: [PATCH 22/26] remove cleanup step --- .../bq_dataframes_llm_kmeans.ipynb | 33 ++----------------- 1 file changed, 3 insertions(+), 30 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 50a0cf85dd..1f0426112e 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -118,11 +118,10 @@ "\n", "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", "\n", - "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", + "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", "\n", " * BigQuery API\n", " * BigQuery Connection API\n", - " * Cloud Functions API\n", " * Cloud Run API\n", " * Artifact Registry API\n", " * Cloud Build API\n", @@ -311,7 +310,8 @@ "source": [ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'\n", + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/resourcemanager.projects.setIamPolicy'" ] }, { @@ -654,33 +654,6 @@ "source": [ "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!" ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Cleaning up\n", - "\n", - "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", - "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", - "\n", - "Otherwise, you can uncomment the remaining cells and run them to delete the individual resources you created in this tutorial:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# # Delete the BigQuery Connection\n", - "# from google.cloud import bigquery_connection_v1 as bq_connection\n", - "# client = bq_connection.ConnectionServiceClient()\n", - "# CONNECTION_ID = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", - "# client.delete_connection(name=CONNECTION_ID)\n", - "# print(f\"Deleted connection '{CONNECTION_ID}'.\")" - ] } ], "metadata": { From 2e52e21e94a3be62f1f193df7a83268fd2f8ec09 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 22:23:43 +0000 Subject: [PATCH 23/26] per testing requirement, block this notebook. (to be unblocked with future work) --- .../bq_dataframes_llm_code_generation.ipynb | 3 +- .../bq_dataframes_llm_kmeans.ipynb | 615 +++++++++++++++++- noxfile.py | 1 + 3 files changed, 581 insertions(+), 38 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb index 5cc9207561..0f113b84c6 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb @@ -444,8 +444,7 @@ "source": [ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/resourcemanager.projects.setIamPolicy'" + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" ] }, { diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 1f0426112e..3d7adad44f 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -143,12 +143,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Updated property [core/project].\n" + ] + } + ], "source": [ "# set your project ID below\n", - "PROJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n", + "PROJECT_ID = \"\" # @param {type:\"string\"}\n", "\n", "# Set the project id in gcloud\n", "! gcloud config set project {PROJECT_ID}" @@ -166,7 +174,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ @@ -205,7 +213,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -224,7 +232,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ @@ -252,9 +260,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "serviceAccount:bqcx-1084210331973-vl8v@gcp-sa-bigquery-condel.iam.gserviceaccount.com\n" + ] + } + ], "source": [ "from google.cloud import bigquery_connection_v1 as bq_connection\n", "\n", @@ -304,14 +320,40 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", + "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", + " detail: |-\n", + " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", + " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", + "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", + "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", + " detail: |-\n", + " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", + " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", + "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", + "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", + " detail: |-\n", + " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", + " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", + "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", + "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", + " detail: |-\n", + " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", + " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n" + ] + } + ], "source": [ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/resourcemanager.projects.setIamPolicy'" + "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" ] }, { @@ -342,7 +384,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { "id": "R7STCS8xB5d2" }, @@ -365,22 +407,137 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { "id": "zDSwoBo1CU3G" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job d6e63245-f4af-4a62-a5a8-121c8c553270 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job ae376738-e474-4855-94de-07cdacc5b321 is DONE. 2.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "input_df = bf.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": { "id": "tYDoaKgJChiq" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job e1ea942c-456d-462a-ad85-7a522123f84b is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 992afe58-15b8-4941-9c54-e917bc552ee6 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrative
0In XXXX, Citimortgage coerced a voluntary judg...
4I have 2 credit cards from Citi bank as well a...
5This is in regards to the Government taking ac...
8I write to dispute {$32.00} in late charges ( ...
9I decided to close my Citibank checking accoun...
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative\n", + "0 In XXXX, Citimortgage coerced a voluntary judg...\n", + "4 I have 2 credit cards from Citi bank as well a...\n", + "5 This is in regards to the Government taking ac...\n", + "8 I write to dispute {$32.00} in late charges ( ...\n", + "9 I decided to close my Citibank checking accoun...\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n", "issues_df.head(n=5) # View the first five complaints" @@ -388,7 +545,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": { "id": "OltYSUEcsSOW" }, @@ -410,11 +567,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": { "id": "li38q8FzDDMu" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 1105c42a-ef82-41f9-a7af-2c6c432782b3 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n", "\n", @@ -423,11 +593,137 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": { "id": "cOuSOQ5FDewD" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 282305dc-0934-4e7f-8f2c-5d0dc4b27113 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 778c4dc1-af17-4bbc-b240-6041e1ab56f4 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 36de51e5-db56-4e13-bc73-deb684ea890e is DONE. 40.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 1f737da5-ab6d-4849-8b43-edfdd5a97ab1 is DONE. 40.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 3342d18e-1190-41cd-9a19-ae1e352f686d is DONE. 30.8 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
text_embedding
109[-0.010294735431671143, -0.017596377059817314,...
181[-0.004606005270034075, -0.0029765090439468622...
690[-0.023824475705623627, -0.03503825515508652, ...
1068[-0.005357897840440273, 0.024292852729558945, ...
1613[0.023095030337572098, -0.016921309754252434, ...
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " text_embedding\n", + "109 [-0.010294735431671143, -0.017596377059817314,...\n", + "181 [-0.004606005270034075, -0.0029765090439468622...\n", + "690 [-0.023824475705623627, -0.03503825515508652, ...\n", + "1068 [-0.005357897840440273, 0.024292852729558945, ...\n", + "1613 [0.023095030337572098, -0.016921309754252434, ...\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Will take ~3 minutes to compute the embeddings\n", "predicted_embeddings = model.predict(downsampled_issues_df)\n", @@ -437,7 +733,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": { "id": "4H_etYfsEOFP" }, @@ -467,7 +763,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": { "id": "AhNTnEC5FRz2" }, @@ -488,11 +784,149 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": { "id": "6poSxh-fGJF7" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 3bcd701d-638d-428f-9fef-4283f246a4b8 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 01dd30db-5768-4965-8fbf-28aab2eea0a0 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job a3f70a16-c2c5-4d60-9a62-66700c9c5135 is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 32d2bc5f-4532-4a94-af03-3def73268827 is DONE. 40.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 70192818-2fe2-46e2-bf9a-3c765609fed1 is DONE. 40.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 37af51f2-87df-430d-a456-75ab3681cf01 is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CENTROID_ID
1091
1813
6909
106810
16136
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " CENTROID_ID\n", + "109 1\n", + "181 3\n", + "690 9\n", + "1068 10\n", + "1613 6\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Use KMeans clustering to calculate our groups. Will take ~3 minutes.\n", "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", @@ -504,7 +938,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": {}, "outputs": [], "source": [ @@ -540,11 +974,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 18, "metadata": { "id": "2E7wXM_jGqo6" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 5ba8559f-4730-422f-93d4-0743418573bb is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job ed452d2f-3d9d-4eaa-95b5-4227275d01bd is DONE. 1.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Using bigframes, with syntax identical to pandas,\n", "# filter out the first and second groups\n", @@ -561,11 +1020,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 19, "metadata": { "id": "ZNDiueI9IP5e" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "comment list 1:\n", + "1. On or about XXXX XXXX, 2016, I submitted to my servicer a loan modification request. They accepted and when I called back approximately two weeks later, Ocwen told me very rudely that they can not accept my request for workout assistance because the owner of the loan does not participate in modifications. They are not offering me any sort of options to foreclosure. How can the investor not abide by California laws designed to help people who are in distress. Also, each time I call Ocwen they are very rude and I have asked for Spanish speaking agents and they flat out tell me there is none. I believe Ocwen is headquartered XXXX and they simply do not understand Ca laws, nor that they are discriminating against people like me who feel more comfortable to speak in native tongue.\n", + "2. I have a mortgage loan which is interest only and the balance is {$700000.00}. The interest rate is 5.75 and since the real estate market crash of XX/XX/XXXX/XX/XX/XXXX I have been trying to get it refinanced to the much lower rates that mortgages have been at since then. My mortgage lender is Bank of America. Everytime I have asked them to refinance the loan, they say that my debt/income ratio is high and they can not do it. \n", + "\n", + "That is ridicules since for the past 9 years I have been able to make my payments at 5.75 % rate and they tell me that I do n't qualify to make the payments at 3 to 3.5 % rates. \n", + "\n", + "Of course by delaying this matter for years they have benefited nicely by collecting 5.75 % on a good size mortgage.\n", + "3. I have signed the refinance paperwork with PrimeChoiceFunding Mortgage on XX/XX/XXXX for a XXXX days lock based on the advertisement flyer that came in the mail in XXXX of XXXX. As of today XX/XX/XXXX XXXX 3 months later ) I have not received any updates on my closing date and have not been able to receive any updates on the status of my file. It is misleading to keep the consumer on hold for such a long period of time with no updates and no end date after offering a XXXX days lock in. It is a desperate situation. Please assist.\n", + "4. Our loan had been transferred to Freedom Mortgage, we were on a biweekly payment plan as we wanted to payoff the loan faster. We ended up moving from XXXX Texas to XXXX and sold the home. During closing a payoff amount was requested and we contacted them to notify that the loan would close on XX/XX/XXXX and the payment for XX/XX/XXXX would be included in the payoff amount. A represenative told us this was normal practice and so we went on to close the loan on the XX/XX/XXXX. Once the loan was closed and paid in full, we began looking for a new home in our new XXXX of XXXX. Come to find out our credit had been dinged badly due to Freedom Mortgage saying we had missed XXXX payments!!! WHAT?? I immediately got on the phone to find out what was going on. After countless phone calls and talking to representatives and waiting for supervisor call backs ... .still they can not put me through to anyone who has the authority to look into the matter and realize that NO payments were missed and the loan is closed and paid in Full!! This has prevented us from looking for and applying for a new mortgage. This is by far the worst handling of a situation where a client who has always been in good standing and has completely paid their loan in full, is being completely discredited by a mistake from their company!! It completely disrupts peoples lives!!\n", + "5. My USDA mortgage loan is currently serviced by Carrington Mortgage Services ( CMS ). CMS claims that the total amount past due is {$14000.00}. The alleged claim is for 14 payments. CMS alleges that {$3400.00} were transferred in late fees from XXXX. XXXX has advised there were no late fees compiled when the loan was transferred in XX/XX/XXXX. CMS is clearly trying to over collect on the amount due, so they can push the bill to the USDA for payment. I have also applied for a loan modification on numerous occasions and CMS has refused to submit the modification requests for underwriting to the USDA. Since the USDA is the investor and insures the loan, they would be issuing a partial claim to help fund the loan modification. Since the USDA is unaware of the partial claim, they were clearly in the dark on the loan modification process. CMS is in clear violation of the False Claims Act ( FCA ), 31 USC 3729 - 3733, since they knowingly withheld information and reported false records to a Federal Government Agency. CMS also has an obligation to process the loan modification with transparency and clarity per the National Mortgage Settlement of XX/XX/XXXX. CMS has refused to cooperate and has set a foreclosure sale date of XX/XX/XXXX. Based on the violations listed herein, I am seeking your assistance on this matter.\n", + "\n", + "comment list 2:\n", + "1. Someone has used my information trying to apply for credit I have notified transunion they have not removed the false information. \n", + "XXXX. XXXX XXXX XX/XX/XXXX. \n", + "XXXX XXXX XXXX XX/XX/XXXX. \n", + "XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XXXX XXXXXXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX\n", + "2. Have reported numerous times identity theft. Last identity theft report was closed after phone was hacked. I have had current equifax account for about 2 years. Suddenly equifax will not let me log into my account without contacting a call center in XXXX and getting a password sent to my email from them. They sent me an email saying the issue was fixed. It is not fixed and Im tired of fighting with these people over my own social security number. This is a violation of fcra and my rights to lock me out of my own credit report. The person stealing my ssn is constantly doing this to save their own behind. Im tired of it and nothing is ever done. This has been going on since around XX/XX/2022 2022\n", + "3. Equifax have placed my file in the consumer affairs department and refuses to block the fraudulent items that are on my credit report. Every time I call in to request they remove my credit file out of the fraud department, a lady by the name of XXXX XXXX is very rude and tell me it will remain in the consumer affairs department. There are fraudulent addresses and accounts that are on my file the department refuses to block and remove.\n", + "4. I am having issues with a freeze on my file that is holding up my credit from reporting my accounts properly. I have a XXXX XXXX and XXXX XXXX account that is not reporting to my credit file. My name is XXXX XXXX SSN:XXXX DOB : XX/XX/1992 My addition address is XXXXXXXX XXXX XXXX XXXX FL XXXX .\n", + "5. I don't recognize this account. I never applied for it. I was victim of Identity Theft, somebody stole my personal information to open credit cards. Except as otherwise provided in this section, a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt by such agency of ( 1 ) appropriate proof of the identity of the consumer ; ( 2 ) a copy of an identity theft report ; ( 3 ) the identification of such information by the consumer ; and ( 4 ) a statement by the consumer that the information is not information relating to any transaction by the consumer. ( b ) Notification. A consumer reporting agency shall promptly notify the furnisher of information identified by the consumer under subsection ( a ) of this section ( 1 ) that the information may be a result of identity theft ; ( 2 ) that an identity theft report has been filed ; ( 3 ) that a block has been requested under this section ; and ( 4 ) of the effective dates of the block. ( c ) Authority to decline or rescind. ( 1 ) In general. A consumer reporting agency may decline to block, or may rescind any block, of information relating to a consumer under this section, if the consumer reporting agency reasonably determines that ( A ) the information was blocked in error or a block was requested by the consumer in error ; ( B ) the information was blocked, or a block was requested by the consumer, on the basis of a material misrepresentation of fact by the consumer relevant to the request to block ; or ( C ) the consumer obtained possession of goods, services, or money as a result of the blocked transaction or transactions. ( 2 ) Notification to consumer. If a block of information is declined or rescinded under this subsection, the affected consumer shall be notified promptly, in the same manner as consumers are notified of the reinsertion of information under section 1681i ( a ) ( 5 ) ( B ) of this title. ( 3 ) Significance of block. For purposes of this subsection, if a consumer reporting agency rescinds a block, the presence of information in the file of a consumer prior to the blocking of such information is not evidence of whether the consumer knew or should have known that the consumer obtained possession of any goods, services, or money as a result of the block. ( d ) Exception for resellers. ( 1 ) No reseller file. This section shall not apply to a consumer reporting agency, if the consumer reporting agency ( A ) is a reseller ; ( B ) is not, at the time of the request of the consumer under subsection ( a ) of this section, otherwise furnishing or reselling a consumer report concerning the information identified by the consumer ; and ( C ) informs the consumer, by any means, that the consumer may report the identity theft to the Bureau to obtain consumer information regarding identity theft. ( 2 ) Reseller with file. The sole obligation of the consumer reporting agency under this section, with regard to any request of a consumer under this section, shall be to block the consumer report maintained by the consumer reporting agency from any subsequent use, if ( A ) the consumer, in accordance with the provisions of subsection ( a ) of this section, identifies, to a consumer reporting agency, information in the file of the consumer that resulted from identity theft ; and ( B ) the consumer reporting agency is a reseller of the identified information. ( 3 ) Notice. In carrying out its obligation under paragraph ( 2 ), the reseller shall promptly provide a notice to the consumer of the decision to block the file. Such notice shall contain the name, address, and telephone number of each consumer reporting agency from which the consumer information was obtained for resale. ( e ) Exception for verification companies. The provisions of this section do not apply to a check services company, acting as such, which issues authorizations for the purpose of approving or processing negotiable instruments, electronic fund transfers, or similar methods of payments, except that, beginning 4 business days after receipt of information described in paragraphs ( 1 ) through ( 3 ) of subsection ( a ) of this section, a check services company shall not report to a national consumer reporting agency described in section 1681a ( p ) of this title, any information identified in the subject identity theft report as resulting from identity theft. ( f ) Access to blocked information by law enforcement agencies. No provision of this section shall be construed as requiring a consumer reporting agency to prevent a Federal, State, or local law enforcement agency from accessing blocked information in a consumer file to which the agency could otherwise obtain access under this subchapter.\n", + "\n" + ] + } + ], "source": [ "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n", "prompt1 = 'comment list 1:\\n'\n", @@ -584,11 +1071,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 20, "metadata": { "id": "BfHGJLirzSvH" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Please highlight the most obvious difference betweenthe two lists of comments:\n", + "comment list 1:\n", + "1. On or about XXXX XXXX, 2016, I submitted to my servicer a loan modification request. They accepted and when I called back approximately two weeks later, Ocwen told me very rudely that they can not accept my request for workout assistance because the owner of the loan does not participate in modifications. They are not offering me any sort of options to foreclosure. How can the investor not abide by California laws designed to help people who are in distress. Also, each time I call Ocwen they are very rude and I have asked for Spanish speaking agents and they flat out tell me there is none. I believe Ocwen is headquartered XXXX and they simply do not understand Ca laws, nor that they are discriminating against people like me who feel more comfortable to speak in native tongue.\n", + "2. I have a mortgage loan which is interest only and the balance is {$700000.00}. The interest rate is 5.75 and since the real estate market crash of XX/XX/XXXX/XX/XX/XXXX I have been trying to get it refinanced to the much lower rates that mortgages have been at since then. My mortgage lender is Bank of America. Everytime I have asked them to refinance the loan, they say that my debt/income ratio is high and they can not do it. \n", + "\n", + "That is ridicules since for the past 9 years I have been able to make my payments at 5.75 % rate and they tell me that I do n't qualify to make the payments at 3 to 3.5 % rates. \n", + "\n", + "Of course by delaying this matter for years they have benefited nicely by collecting 5.75 % on a good size mortgage.\n", + "3. I have signed the refinance paperwork with PrimeChoiceFunding Mortgage on XX/XX/XXXX for a XXXX days lock based on the advertisement flyer that came in the mail in XXXX of XXXX. As of today XX/XX/XXXX XXXX 3 months later ) I have not received any updates on my closing date and have not been able to receive any updates on the status of my file. It is misleading to keep the consumer on hold for such a long period of time with no updates and no end date after offering a XXXX days lock in. It is a desperate situation. Please assist.\n", + "4. Our loan had been transferred to Freedom Mortgage, we were on a biweekly payment plan as we wanted to payoff the loan faster. We ended up moving from XXXX Texas to XXXX and sold the home. During closing a payoff amount was requested and we contacted them to notify that the loan would close on XX/XX/XXXX and the payment for XX/XX/XXXX would be included in the payoff amount. A represenative told us this was normal practice and so we went on to close the loan on the XX/XX/XXXX. Once the loan was closed and paid in full, we began looking for a new home in our new XXXX of XXXX. Come to find out our credit had been dinged badly due to Freedom Mortgage saying we had missed XXXX payments!!! WHAT?? I immediately got on the phone to find out what was going on. After countless phone calls and talking to representatives and waiting for supervisor call backs ... .still they can not put me through to anyone who has the authority to look into the matter and realize that NO payments were missed and the loan is closed and paid in Full!! This has prevented us from looking for and applying for a new mortgage. This is by far the worst handling of a situation where a client who has always been in good standing and has completely paid their loan in full, is being completely discredited by a mistake from their company!! It completely disrupts peoples lives!!\n", + "5. My USDA mortgage loan is currently serviced by Carrington Mortgage Services ( CMS ). CMS claims that the total amount past due is {$14000.00}. The alleged claim is for 14 payments. CMS alleges that {$3400.00} were transferred in late fees from XXXX. XXXX has advised there were no late fees compiled when the loan was transferred in XX/XX/XXXX. CMS is clearly trying to over collect on the amount due, so they can push the bill to the USDA for payment. I have also applied for a loan modification on numerous occasions and CMS has refused to submit the modification requests for underwriting to the USDA. Since the USDA is the investor and insures the loan, they would be issuing a partial claim to help fund the loan modification. Since the USDA is unaware of the partial claim, they were clearly in the dark on the loan modification process. CMS is in clear violation of the False Claims Act ( FCA ), 31 USC 3729 - 3733, since they knowingly withheld information and reported false records to a Federal Government Agency. CMS also has an obligation to process the loan modification with transparency and clarity per the National Mortgage Settlement of XX/XX/XXXX. CMS has refused to cooperate and has set a foreclosure sale date of XX/XX/XXXX. Based on the violations listed herein, I am seeking your assistance on this matter.\n", + "comment list 2:\n", + "1. Someone has used my information trying to apply for credit I have notified transunion they have not removed the false information. \n", + "XXXX. XXXX XXXX XX/XX/XXXX. \n", + "XXXX XXXX XXXX XX/XX/XXXX. \n", + "XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XXXX XXXXXXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX\n", + "2. Have reported numerous times identity theft. Last identity theft report was closed after phone was hacked. I have had current equifax account for about 2 years. Suddenly equifax will not let me log into my account without contacting a call center in XXXX and getting a password sent to my email from them. They sent me an email saying the issue was fixed. It is not fixed and Im tired of fighting with these people over my own social security number. This is a violation of fcra and my rights to lock me out of my own credit report. The person stealing my ssn is constantly doing this to save their own behind. Im tired of it and nothing is ever done. This has been going on since around XX/XX/2022 2022\n", + "3. Equifax have placed my file in the consumer affairs department and refuses to block the fraudulent items that are on my credit report. Every time I call in to request they remove my credit file out of the fraud department, a lady by the name of XXXX XXXX is very rude and tell me it will remain in the consumer affairs department. There are fraudulent addresses and accounts that are on my file the department refuses to block and remove.\n", + "4. I am having issues with a freeze on my file that is holding up my credit from reporting my accounts properly. I have a XXXX XXXX and XXXX XXXX account that is not reporting to my credit file. My name is XXXX XXXX SSN:XXXX DOB : XX/XX/1992 My addition address is XXXXXXXX XXXX XXXX XXXX FL XXXX .\n", + "5. I don't recognize this account. I never applied for it. I was victim of Identity Theft, somebody stole my personal information to open credit cards. Except as otherwise provided in this section, a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt by such agency of ( 1 ) appropriate proof of the identity of the consumer ; ( 2 ) a copy of an identity theft report ; ( 3 ) the identification of such information by the consumer ; and ( 4 ) a statement by the consumer that the information is not information relating to any transaction by the consumer. ( b ) Notification. A consumer reporting agency shall promptly notify the furnisher of information identified by the consumer under subsection ( a ) of this section ( 1 ) that the information may be a result of identity theft ; ( 2 ) that an identity theft report has been filed ; ( 3 ) that a block has been requested under this section ; and ( 4 ) of the effective dates of the block. ( c ) Authority to decline or rescind. ( 1 ) In general. A consumer reporting agency may decline to block, or may rescind any block, of information relating to a consumer under this section, if the consumer reporting agency reasonably determines that ( A ) the information was blocked in error or a block was requested by the consumer in error ; ( B ) the information was blocked, or a block was requested by the consumer, on the basis of a material misrepresentation of fact by the consumer relevant to the request to block ; or ( C ) the consumer obtained possession of goods, services, or money as a result of the blocked transaction or transactions. ( 2 ) Notification to consumer. If a block of information is declined or rescinded under this subsection, the affected consumer shall be notified promptly, in the same manner as consumers are notified of the reinsertion of information under section 1681i ( a ) ( 5 ) ( B ) of this title. ( 3 ) Significance of block. For purposes of this subsection, if a consumer reporting agency rescinds a block, the presence of information in the file of a consumer prior to the blocking of such information is not evidence of whether the consumer knew or should have known that the consumer obtained possession of any goods, services, or money as a result of the block. ( d ) Exception for resellers. ( 1 ) No reseller file. This section shall not apply to a consumer reporting agency, if the consumer reporting agency ( A ) is a reseller ; ( B ) is not, at the time of the request of the consumer under subsection ( a ) of this section, otherwise furnishing or reselling a consumer report concerning the information identified by the consumer ; and ( C ) informs the consumer, by any means, that the consumer may report the identity theft to the Bureau to obtain consumer information regarding identity theft. ( 2 ) Reseller with file. The sole obligation of the consumer reporting agency under this section, with regard to any request of a consumer under this section, shall be to block the consumer report maintained by the consumer reporting agency from any subsequent use, if ( A ) the consumer, in accordance with the provisions of subsection ( a ) of this section, identifies, to a consumer reporting agency, information in the file of the consumer that resulted from identity theft ; and ( B ) the consumer reporting agency is a reseller of the identified information. ( 3 ) Notice. In carrying out its obligation under paragraph ( 2 ), the reseller shall promptly provide a notice to the consumer of the decision to block the file. Such notice shall contain the name, address, and telephone number of each consumer reporting agency from which the consumer information was obtained for resale. ( e ) Exception for verification companies. The provisions of this section do not apply to a check services company, acting as such, which issues authorizations for the purpose of approving or processing negotiable instruments, electronic fund transfers, or similar methods of payments, except that, beginning 4 business days after receipt of information described in paragraphs ( 1 ) through ( 3 ) of subsection ( a ) of this section, a check services company shall not report to a national consumer reporting agency described in section 1681a ( p ) of this title, any information identified in the subject identity theft report as resulting from identity theft. ( f ) Access to blocked information by law enforcement agencies. No provision of this section shall be construed as requiring a consumer reporting agency to prevent a Federal, State, or local law enforcement agency from accessing blocked information in a consumer file to which the agency could otherwise obtain access under this subchapter.\n", + "\n" + ] + } + ], "source": [ "# The plain English request we will make of PaLM 2\n", "prompt = (\n", @@ -608,11 +1123,39 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 21, "metadata": { "id": "mL5P0_3X04dE" }, - "outputs": [], + "outputs": [ + { + "ename": "PermissionDenied", + "evalue": "403 Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"cloudresourcemanager.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/bigframes-dev\"\n}\nmetadata {\n key: \"permission\"\n value: \"resourcemanager.projects.setIamPolicy\"\n}\n]", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31m_InactiveRpcError\u001b[0m Traceback (most recent call last)", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py:72\u001b[0m, in \u001b[0;36m_wrap_unary_errors..error_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 71\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 72\u001b[0m \u001b[39mreturn\u001b[39;00m callable_(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m 73\u001b[0m \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mRpcError \u001b[39mas\u001b[39;00m exc:\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/grpc/_channel.py:1030\u001b[0m, in \u001b[0;36m_UnaryUnaryMultiCallable.__call__\u001b[0;34m(self, request, timeout, metadata, credentials, wait_for_ready, compression)\u001b[0m\n\u001b[1;32m 1028\u001b[0m state, call, \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_blocking(request, timeout, metadata, credentials,\n\u001b[1;32m 1029\u001b[0m wait_for_ready, compression)\n\u001b[0;32m-> 1030\u001b[0m \u001b[39mreturn\u001b[39;00m _end_unary_response_blocking(state, call, \u001b[39mFalse\u001b[39;49;00m, \u001b[39mNone\u001b[39;49;00m)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/grpc/_channel.py:910\u001b[0m, in \u001b[0;36m_end_unary_response_blocking\u001b[0;34m(state, call, with_call, deadline)\u001b[0m\n\u001b[1;32m 909\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 910\u001b[0m \u001b[39mraise\u001b[39;00m _InactiveRpcError(state)\n", + "\u001b[0;31m_InactiveRpcError\u001b[0m: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.251.163.95:443 {grpc_message:\"Permission \\'resourcemanager.projects.setIamPolicy\\' denied on resource \\'//cloudresourcemanager.googleapis.com/projects/bigframes-dev\\' (or it may not exist).\", grpc_status:7, created_time:\"2023-11-08T20:49:43.954445252+00:00\"}\"\n>", + "\nThe above exception was the direct cause of the following exception:\n", + "\u001b[0;31mPermissionDenied\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[21], line 5\u001b[0m\n\u001b[1;32m 3\u001b[0m session \u001b[39m=\u001b[39m bf\u001b[39m.\u001b[39mget_global_session()\n\u001b[1;32m 4\u001b[0m connection \u001b[39m=\u001b[39m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mPROJECT_ID\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m{\u001b[39;00mREGION\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m{\u001b[39;00mCONN_NAME\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> 5\u001b[0m q_a_model \u001b[39m=\u001b[39m PaLM2TextGenerator(session\u001b[39m=\u001b[39;49msession, connection_name\u001b[39m=\u001b[39;49mconnection)\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:72\u001b[0m, in \u001b[0;36mPaLM2TextGenerator.__init__\u001b[0;34m(self, model_name, session, connection_name)\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconnection_name \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bq_connection_manager\u001b[39m.\u001b[39mresolve_full_connection_name(\n\u001b[1;32m 66\u001b[0m connection_name,\n\u001b[1;32m 67\u001b[0m default_project\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39msession\u001b[39m.\u001b[39m_project,\n\u001b[1;32m 68\u001b[0m default_location\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39msession\u001b[39m.\u001b[39m_location,\n\u001b[1;32m 69\u001b[0m )\n\u001b[1;32m 71\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bqml_model_factory \u001b[39m=\u001b[39m \u001b[39mglobals\u001b[39m\u001b[39m.\u001b[39mbqml_model_factory()\n\u001b[0;32m---> 72\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bqml_model: core\u001b[39m.\u001b[39mBqmlModel \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_create_bqml_model()\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:85\u001b[0m, in \u001b[0;36mPaLM2TextGenerator._create_bqml_model\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 81\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(connection_name_parts) \u001b[39m!=\u001b[39m \u001b[39m3\u001b[39m:\n\u001b[1;32m 82\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m 83\u001b[0m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mconnection_name must be of the format .., got \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconnection_name\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 84\u001b[0m )\n\u001b[0;32m---> 85\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_bq_connection_manager\u001b[39m.\u001b[39;49mcreate_bq_connection(\n\u001b[1;32m 86\u001b[0m project_id\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m0\u001b[39;49m],\n\u001b[1;32m 87\u001b[0m location\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m1\u001b[39;49m],\n\u001b[1;32m 88\u001b[0m connection_id\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m2\u001b[39;49m],\n\u001b[1;32m 89\u001b[0m iam_role\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39maiplatform.user\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 90\u001b[0m )\n\u001b[1;32m 91\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mmodel_name \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mtext-bison\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m 92\u001b[0m options \u001b[39m=\u001b[39m {\n\u001b[1;32m 93\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mremote_service_type\u001b[39m\u001b[39m\"\u001b[39m: _REMOTE_TEXT_GENERATOR_MODEL_CODE,\n\u001b[1;32m 94\u001b[0m }\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/clients.py:100\u001b[0m, in \u001b[0;36mBqConnectionManager.create_bq_connection\u001b[0;34m(self, project_id, location, connection_id, iam_role)\u001b[0m\n\u001b[1;32m 97\u001b[0m service_account_id \u001b[39m=\u001b[39m cast(\u001b[39mstr\u001b[39m, service_account_id)\n\u001b[1;32m 98\u001b[0m \u001b[39m# Ensure IAM role on the BQ connection\u001b[39;00m\n\u001b[1;32m 99\u001b[0m \u001b[39m# https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function\u001b[39;00m\n\u001b[0;32m--> 100\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_ensure_iam_binding(project_id, service_account_id, iam_role)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:349\u001b[0m, in \u001b[0;36mRetry.__call__..retry_wrapped_func\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 345\u001b[0m target \u001b[39m=\u001b[39m functools\u001b[39m.\u001b[39mpartial(func, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 346\u001b[0m sleep_generator \u001b[39m=\u001b[39m exponential_sleep_generator(\n\u001b[1;32m 347\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initial, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maximum, multiplier\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_multiplier\n\u001b[1;32m 348\u001b[0m )\n\u001b[0;32m--> 349\u001b[0m \u001b[39mreturn\u001b[39;00m retry_target(\n\u001b[1;32m 350\u001b[0m target,\n\u001b[1;32m 351\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_predicate,\n\u001b[1;32m 352\u001b[0m sleep_generator,\n\u001b[1;32m 353\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_timeout,\n\u001b[1;32m 354\u001b[0m on_error\u001b[39m=\u001b[39;49mon_error,\n\u001b[1;32m 355\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:191\u001b[0m, in \u001b[0;36mretry_target\u001b[0;34m(target, predicate, sleep_generator, timeout, on_error, **kwargs)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[39mfor\u001b[39;00m sleep \u001b[39min\u001b[39;00m sleep_generator:\n\u001b[1;32m 190\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 191\u001b[0m \u001b[39mreturn\u001b[39;00m target()\n\u001b[1;32m 193\u001b[0m \u001b[39m# pylint: disable=broad-except\u001b[39;00m\n\u001b[1;32m 194\u001b[0m \u001b[39m# This function explicitly must deal with broad exceptions.\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m exc:\n", + "File \u001b[0;32m~/bq/src/bigframes/bigframes/clients.py:138\u001b[0m, in \u001b[0;36mBqConnectionManager._ensure_iam_binding\u001b[0;34m(self, project_id, service_account_id, iam_role)\u001b[0m\n\u001b[1;32m 136\u001b[0m policy\u001b[39m.\u001b[39mbindings\u001b[39m.\u001b[39mappend(new_binding)\n\u001b[1;32m 137\u001b[0m request \u001b[39m=\u001b[39m iam_policy_pb2\u001b[39m.\u001b[39mSetIamPolicyRequest(resource\u001b[39m=\u001b[39mproject, policy\u001b[39m=\u001b[39mpolicy)\n\u001b[0;32m--> 138\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_cloud_resource_manager_client\u001b[39m.\u001b[39;49mset_iam_policy(request\u001b[39m=\u001b[39;49mrequest)\n\u001b[1;32m 140\u001b[0m \u001b[39m# We would wait for the IAM policy change to take effect\u001b[39;00m\n\u001b[1;32m 141\u001b[0m \u001b[39m# https://cloud.google.com/iam/docs/access-change-propagation\u001b[39;00m\n\u001b[1;32m 142\u001b[0m logger\u001b[39m.\u001b[39minfo(\n\u001b[1;32m 143\u001b[0m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mWaiting \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_IAM_WAIT_SECONDS\u001b[39m}\u001b[39;00m\u001b[39m seconds for IAM to take effect..\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 144\u001b[0m )\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/resourcemanager_v3/services/projects/client.py:1838\u001b[0m, in \u001b[0;36mProjectsClient.set_iam_policy\u001b[0;34m(self, request, resource, retry, timeout, metadata)\u001b[0m\n\u001b[1;32m 1833\u001b[0m metadata \u001b[39m=\u001b[39m \u001b[39mtuple\u001b[39m(metadata) \u001b[39m+\u001b[39m (\n\u001b[1;32m 1834\u001b[0m gapic_v1\u001b[39m.\u001b[39mrouting_header\u001b[39m.\u001b[39mto_grpc_metadata(((\u001b[39m\"\u001b[39m\u001b[39mresource\u001b[39m\u001b[39m\"\u001b[39m, request\u001b[39m.\u001b[39mresource),)),\n\u001b[1;32m 1835\u001b[0m )\n\u001b[1;32m 1837\u001b[0m \u001b[39m# Send the request.\u001b[39;00m\n\u001b[0;32m-> 1838\u001b[0m response \u001b[39m=\u001b[39m rpc(\n\u001b[1;32m 1839\u001b[0m request,\n\u001b[1;32m 1840\u001b[0m retry\u001b[39m=\u001b[39;49mretry,\n\u001b[1;32m 1841\u001b[0m timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m 1842\u001b[0m metadata\u001b[39m=\u001b[39;49mmetadata,\n\u001b[1;32m 1843\u001b[0m )\n\u001b[1;32m 1845\u001b[0m \u001b[39m# Done; return the response.\u001b[39;00m\n\u001b[1;32m 1846\u001b[0m \u001b[39mreturn\u001b[39;00m response\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py:113\u001b[0m, in \u001b[0;36m_GapicCallable.__call__\u001b[0;34m(self, timeout, retry, *args, **kwargs)\u001b[0m\n\u001b[1;32m 110\u001b[0m metadata\u001b[39m.\u001b[39mextend(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_metadata)\n\u001b[1;32m 111\u001b[0m kwargs[\u001b[39m\"\u001b[39m\u001b[39mmetadata\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m metadata\n\u001b[0;32m--> 113\u001b[0m \u001b[39mreturn\u001b[39;00m wrapped_func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/timeout.py:120\u001b[0m, in \u001b[0;36mTimeToDeadlineTimeout.__call__..func_with_timeout\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[39m# Avoid setting negative timeout\u001b[39;00m\n\u001b[1;32m 118\u001b[0m kwargs[\u001b[39m\"\u001b[39m\u001b[39mtimeout\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mmax\u001b[39m(\u001b[39m0\u001b[39m, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_timeout \u001b[39m-\u001b[39m time_since_first_attempt)\n\u001b[0;32m--> 120\u001b[0m \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", + "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py:74\u001b[0m, in \u001b[0;36m_wrap_unary_errors..error_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[39mreturn\u001b[39;00m callable_(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 73\u001b[0m \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mRpcError \u001b[39mas\u001b[39;00m exc:\n\u001b[0;32m---> 74\u001b[0m \u001b[39mraise\u001b[39;00m exceptions\u001b[39m.\u001b[39mfrom_grpc_error(exc) \u001b[39mfrom\u001b[39;00m \u001b[39mexc\u001b[39;00m\n", + "\u001b[0;31mPermissionDenied\u001b[0m: 403 Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"cloudresourcemanager.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/bigframes-dev\"\n}\nmetadata {\n key: \"permission\"\n value: \"resourcemanager.projects.setIamPolicy\"\n}\n]" + ] + } + ], "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", diff --git a/noxfile.py b/noxfile.py index 34b055de44..3dd23ba04f 100644 --- a/noxfile.py +++ b/noxfile.py @@ -609,6 +609,7 @@ def notebook(session): # our test infrastructure. "notebooks/getting_started/getting_started_bq_dataframes.ipynb", "notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb", + "notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb", "notebooks/regression/bq_dataframes_ml_linear_regression.ipynb", "notebooks/generative_ai/bq_dataframes_ml_drug_name_generation.ipynb", "notebooks/vertex_sdk/sdk2_bigframes_pytorch.ipynb", From ccd26ce0f15ec21b479c93d39aef659774867016 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 22:59:09 +0000 Subject: [PATCH 24/26] clear all outputs --- .../bq_dataframes_llm_kmeans.ipynb | 610 +----------------- 1 file changed, 33 insertions(+), 577 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 3d7adad44f..3c0ab676cd 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -143,17 +143,9 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Updated property [core/project].\n" - ] - } - ], + "outputs": [], "source": [ "# set your project ID below\n", "PROJECT_ID = \"\" # @param {type:\"string\"}\n", @@ -174,7 +166,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -213,7 +205,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -232,7 +224,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -260,17 +252,9 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "serviceAccount:bqcx-1084210331973-vl8v@gcp-sa-bigquery-condel.iam.gserviceaccount.com\n" - ] - } - ], + "outputs": [], "source": [ "from google.cloud import bigquery_connection_v1 as bq_connection\n", "\n", @@ -320,36 +304,9 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", - "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", - " detail: |-\n", - " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", - " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", - "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", - "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", - " detail: |-\n", - " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", - " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", - "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", - "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", - " detail: |-\n", - " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", - " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n", - "\u001b[1;31mERROR:\u001b[0m (gcloud.projects.add-iam-policy-binding) User [henryjsolberg@google.com] does not have permission to access projects instance [bigframes-dev:setIamPolicy] (or it may not exist): Policy update access denied.\n", - "- '@type': type.googleapis.com/google.rpc.DebugInfo\n", - " detail: |-\n", - " [ORIGINAL ERROR] generic::permission_denied: Policy update access denied.\n", - " com.google.apps.framework.request.StatusException: generic::PERMISSION_DENIED: Policy update access denied. [google.rpc.error_details_ext] { code: 7 message: \"Policy update access denied.\" }\n" - ] - } - ], + "outputs": [], "source": [ "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", @@ -384,7 +341,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "id": "R7STCS8xB5d2" }, @@ -407,137 +364,22 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "id": "zDSwoBo1CU3G" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job d6e63245-f4af-4a62-a5a8-121c8c553270 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job ae376738-e474-4855-94de-07cdacc5b321 is DONE. 2.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "input_df = bf.read_gbq(\"bigquery-public-data.cfpb_complaints.complaint_database\")" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "id": "tYDoaKgJChiq" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job e1ea942c-456d-462a-ad85-7a522123f84b is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 992afe58-15b8-4941-9c54-e917bc552ee6 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
consumer_complaint_narrative
0In XXXX, Citimortgage coerced a voluntary judg...
4I have 2 credit cards from Citi bank as well a...
5This is in regards to the Government taking ac...
8I write to dispute {$32.00} in late charges ( ...
9I decided to close my Citibank checking accoun...
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " consumer_complaint_narrative\n", - "0 In XXXX, Citimortgage coerced a voluntary judg...\n", - "4 I have 2 credit cards from Citi bank as well a...\n", - "5 This is in regards to the Government taking ac...\n", - "8 I write to dispute {$32.00} in late charges ( ...\n", - "9 I decided to close my Citibank checking accoun...\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n", "issues_df.head(n=5) # View the first five complaints" @@ -545,7 +387,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { "id": "OltYSUEcsSOW" }, @@ -567,24 +409,11 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": { "id": "li38q8FzDDMu" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 1105c42a-ef82-41f9-a7af-2c6c432782b3 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n", "\n", @@ -593,137 +422,11 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "id": "cOuSOQ5FDewD" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 282305dc-0934-4e7f-8f2c-5d0dc4b27113 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 778c4dc1-af17-4bbc-b240-6041e1ab56f4 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 36de51e5-db56-4e13-bc73-deb684ea890e is DONE. 40.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 1f737da5-ab6d-4849-8b43-edfdd5a97ab1 is DONE. 40.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 3342d18e-1190-41cd-9a19-ae1e352f686d is DONE. 30.8 MB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
text_embedding
109[-0.010294735431671143, -0.017596377059817314,...
181[-0.004606005270034075, -0.0029765090439468622...
690[-0.023824475705623627, -0.03503825515508652, ...
1068[-0.005357897840440273, 0.024292852729558945, ...
1613[0.023095030337572098, -0.016921309754252434, ...
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " text_embedding\n", - "109 [-0.010294735431671143, -0.017596377059817314,...\n", - "181 [-0.004606005270034075, -0.0029765090439468622...\n", - "690 [-0.023824475705623627, -0.03503825515508652, ...\n", - "1068 [-0.005357897840440273, 0.024292852729558945, ...\n", - "1613 [0.023095030337572098, -0.016921309754252434, ...\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "# Will take ~3 minutes to compute the embeddings\n", "predicted_embeddings = model.predict(downsampled_issues_df)\n", @@ -733,7 +436,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "id": "4H_etYfsEOFP" }, @@ -763,7 +466,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "id": "AhNTnEC5FRz2" }, @@ -784,149 +487,11 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "id": "6poSxh-fGJF7" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 3bcd701d-638d-428f-9fef-4283f246a4b8 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 01dd30db-5768-4965-8fbf-28aab2eea0a0 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job a3f70a16-c2c5-4d60-9a62-66700c9c5135 is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 32d2bc5f-4532-4a94-af03-3def73268827 is DONE. 40.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 70192818-2fe2-46e2-bf9a-3c765609fed1 is DONE. 40.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 37af51f2-87df-430d-a456-75ab3681cf01 is DONE. 80.0 kB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
CENTROID_ID
1091
1813
6909
106810
16136
\n", - "

5 rows × 1 columns

\n", - "
[5 rows x 1 columns in total]" - ], - "text/plain": [ - " CENTROID_ID\n", - "109 1\n", - "181 3\n", - "690 9\n", - "1068 10\n", - "1613 6\n", - "\n", - "[5 rows x 1 columns]" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "# Use KMeans clustering to calculate our groups. Will take ~3 minutes.\n", "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", @@ -938,7 +503,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -974,36 +539,11 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": { "id": "2E7wXM_jGqo6" }, - "outputs": [ - { - "data": { - "text/html": [ - "Query job 5ba8559f-4730-422f-93d4-0743418573bb is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job ed452d2f-3d9d-4eaa-95b5-4227275d01bd is DONE. 1.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# Using bigframes, with syntax identical to pandas,\n", "# filter out the first and second groups\n", @@ -1020,39 +560,11 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": { "id": "ZNDiueI9IP5e" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "comment list 1:\n", - "1. On or about XXXX XXXX, 2016, I submitted to my servicer a loan modification request. They accepted and when I called back approximately two weeks later, Ocwen told me very rudely that they can not accept my request for workout assistance because the owner of the loan does not participate in modifications. They are not offering me any sort of options to foreclosure. How can the investor not abide by California laws designed to help people who are in distress. Also, each time I call Ocwen they are very rude and I have asked for Spanish speaking agents and they flat out tell me there is none. I believe Ocwen is headquartered XXXX and they simply do not understand Ca laws, nor that they are discriminating against people like me who feel more comfortable to speak in native tongue.\n", - "2. I have a mortgage loan which is interest only and the balance is {$700000.00}. The interest rate is 5.75 and since the real estate market crash of XX/XX/XXXX/XX/XX/XXXX I have been trying to get it refinanced to the much lower rates that mortgages have been at since then. My mortgage lender is Bank of America. Everytime I have asked them to refinance the loan, they say that my debt/income ratio is high and they can not do it. \n", - "\n", - "That is ridicules since for the past 9 years I have been able to make my payments at 5.75 % rate and they tell me that I do n't qualify to make the payments at 3 to 3.5 % rates. \n", - "\n", - "Of course by delaying this matter for years they have benefited nicely by collecting 5.75 % on a good size mortgage.\n", - "3. I have signed the refinance paperwork with PrimeChoiceFunding Mortgage on XX/XX/XXXX for a XXXX days lock based on the advertisement flyer that came in the mail in XXXX of XXXX. As of today XX/XX/XXXX XXXX 3 months later ) I have not received any updates on my closing date and have not been able to receive any updates on the status of my file. It is misleading to keep the consumer on hold for such a long period of time with no updates and no end date after offering a XXXX days lock in. It is a desperate situation. Please assist.\n", - "4. Our loan had been transferred to Freedom Mortgage, we were on a biweekly payment plan as we wanted to payoff the loan faster. We ended up moving from XXXX Texas to XXXX and sold the home. During closing a payoff amount was requested and we contacted them to notify that the loan would close on XX/XX/XXXX and the payment for XX/XX/XXXX would be included in the payoff amount. A represenative told us this was normal practice and so we went on to close the loan on the XX/XX/XXXX. Once the loan was closed and paid in full, we began looking for a new home in our new XXXX of XXXX. Come to find out our credit had been dinged badly due to Freedom Mortgage saying we had missed XXXX payments!!! WHAT?? I immediately got on the phone to find out what was going on. After countless phone calls and talking to representatives and waiting for supervisor call backs ... .still they can not put me through to anyone who has the authority to look into the matter and realize that NO payments were missed and the loan is closed and paid in Full!! This has prevented us from looking for and applying for a new mortgage. This is by far the worst handling of a situation where a client who has always been in good standing and has completely paid their loan in full, is being completely discredited by a mistake from their company!! It completely disrupts peoples lives!!\n", - "5. My USDA mortgage loan is currently serviced by Carrington Mortgage Services ( CMS ). CMS claims that the total amount past due is {$14000.00}. The alleged claim is for 14 payments. CMS alleges that {$3400.00} were transferred in late fees from XXXX. XXXX has advised there were no late fees compiled when the loan was transferred in XX/XX/XXXX. CMS is clearly trying to over collect on the amount due, so they can push the bill to the USDA for payment. I have also applied for a loan modification on numerous occasions and CMS has refused to submit the modification requests for underwriting to the USDA. Since the USDA is the investor and insures the loan, they would be issuing a partial claim to help fund the loan modification. Since the USDA is unaware of the partial claim, they were clearly in the dark on the loan modification process. CMS is in clear violation of the False Claims Act ( FCA ), 31 USC 3729 - 3733, since they knowingly withheld information and reported false records to a Federal Government Agency. CMS also has an obligation to process the loan modification with transparency and clarity per the National Mortgage Settlement of XX/XX/XXXX. CMS has refused to cooperate and has set a foreclosure sale date of XX/XX/XXXX. Based on the violations listed herein, I am seeking your assistance on this matter.\n", - "\n", - "comment list 2:\n", - "1. Someone has used my information trying to apply for credit I have notified transunion they have not removed the false information. \n", - "XXXX. XXXX XXXX XX/XX/XXXX. \n", - "XXXX XXXX XXXX XX/XX/XXXX. \n", - "XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XXXX XXXXXXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX\n", - "2. Have reported numerous times identity theft. Last identity theft report was closed after phone was hacked. I have had current equifax account for about 2 years. Suddenly equifax will not let me log into my account without contacting a call center in XXXX and getting a password sent to my email from them. They sent me an email saying the issue was fixed. It is not fixed and Im tired of fighting with these people over my own social security number. This is a violation of fcra and my rights to lock me out of my own credit report. The person stealing my ssn is constantly doing this to save their own behind. Im tired of it and nothing is ever done. This has been going on since around XX/XX/2022 2022\n", - "3. Equifax have placed my file in the consumer affairs department and refuses to block the fraudulent items that are on my credit report. Every time I call in to request they remove my credit file out of the fraud department, a lady by the name of XXXX XXXX is very rude and tell me it will remain in the consumer affairs department. There are fraudulent addresses and accounts that are on my file the department refuses to block and remove.\n", - "4. I am having issues with a freeze on my file that is holding up my credit from reporting my accounts properly. I have a XXXX XXXX and XXXX XXXX account that is not reporting to my credit file. My name is XXXX XXXX SSN:XXXX DOB : XX/XX/1992 My addition address is XXXXXXXX XXXX XXXX XXXX FL XXXX .\n", - "5. I don't recognize this account. I never applied for it. I was victim of Identity Theft, somebody stole my personal information to open credit cards. Except as otherwise provided in this section, a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt by such agency of ( 1 ) appropriate proof of the identity of the consumer ; ( 2 ) a copy of an identity theft report ; ( 3 ) the identification of such information by the consumer ; and ( 4 ) a statement by the consumer that the information is not information relating to any transaction by the consumer. ( b ) Notification. A consumer reporting agency shall promptly notify the furnisher of information identified by the consumer under subsection ( a ) of this section ( 1 ) that the information may be a result of identity theft ; ( 2 ) that an identity theft report has been filed ; ( 3 ) that a block has been requested under this section ; and ( 4 ) of the effective dates of the block. ( c ) Authority to decline or rescind. ( 1 ) In general. A consumer reporting agency may decline to block, or may rescind any block, of information relating to a consumer under this section, if the consumer reporting agency reasonably determines that ( A ) the information was blocked in error or a block was requested by the consumer in error ; ( B ) the information was blocked, or a block was requested by the consumer, on the basis of a material misrepresentation of fact by the consumer relevant to the request to block ; or ( C ) the consumer obtained possession of goods, services, or money as a result of the blocked transaction or transactions. ( 2 ) Notification to consumer. If a block of information is declined or rescinded under this subsection, the affected consumer shall be notified promptly, in the same manner as consumers are notified of the reinsertion of information under section 1681i ( a ) ( 5 ) ( B ) of this title. ( 3 ) Significance of block. For purposes of this subsection, if a consumer reporting agency rescinds a block, the presence of information in the file of a consumer prior to the blocking of such information is not evidence of whether the consumer knew or should have known that the consumer obtained possession of any goods, services, or money as a result of the block. ( d ) Exception for resellers. ( 1 ) No reseller file. This section shall not apply to a consumer reporting agency, if the consumer reporting agency ( A ) is a reseller ; ( B ) is not, at the time of the request of the consumer under subsection ( a ) of this section, otherwise furnishing or reselling a consumer report concerning the information identified by the consumer ; and ( C ) informs the consumer, by any means, that the consumer may report the identity theft to the Bureau to obtain consumer information regarding identity theft. ( 2 ) Reseller with file. The sole obligation of the consumer reporting agency under this section, with regard to any request of a consumer under this section, shall be to block the consumer report maintained by the consumer reporting agency from any subsequent use, if ( A ) the consumer, in accordance with the provisions of subsection ( a ) of this section, identifies, to a consumer reporting agency, information in the file of the consumer that resulted from identity theft ; and ( B ) the consumer reporting agency is a reseller of the identified information. ( 3 ) Notice. In carrying out its obligation under paragraph ( 2 ), the reseller shall promptly provide a notice to the consumer of the decision to block the file. Such notice shall contain the name, address, and telephone number of each consumer reporting agency from which the consumer information was obtained for resale. ( e ) Exception for verification companies. The provisions of this section do not apply to a check services company, acting as such, which issues authorizations for the purpose of approving or processing negotiable instruments, electronic fund transfers, or similar methods of payments, except that, beginning 4 business days after receipt of information described in paragraphs ( 1 ) through ( 3 ) of subsection ( a ) of this section, a check services company shall not report to a national consumer reporting agency described in section 1681a ( p ) of this title, any information identified in the subject identity theft report as resulting from identity theft. ( f ) Access to blocked information by law enforcement agencies. No provision of this section shall be construed as requiring a consumer reporting agency to prevent a Federal, State, or local law enforcement agency from accessing blocked information in a consumer file to which the agency could otherwise obtain access under this subchapter.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n", "prompt1 = 'comment list 1:\\n'\n", @@ -1071,39 +583,11 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": { "id": "BfHGJLirzSvH" }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Please highlight the most obvious difference betweenthe two lists of comments:\n", - "comment list 1:\n", - "1. On or about XXXX XXXX, 2016, I submitted to my servicer a loan modification request. They accepted and when I called back approximately two weeks later, Ocwen told me very rudely that they can not accept my request for workout assistance because the owner of the loan does not participate in modifications. They are not offering me any sort of options to foreclosure. How can the investor not abide by California laws designed to help people who are in distress. Also, each time I call Ocwen they are very rude and I have asked for Spanish speaking agents and they flat out tell me there is none. I believe Ocwen is headquartered XXXX and they simply do not understand Ca laws, nor that they are discriminating against people like me who feel more comfortable to speak in native tongue.\n", - "2. I have a mortgage loan which is interest only and the balance is {$700000.00}. The interest rate is 5.75 and since the real estate market crash of XX/XX/XXXX/XX/XX/XXXX I have been trying to get it refinanced to the much lower rates that mortgages have been at since then. My mortgage lender is Bank of America. Everytime I have asked them to refinance the loan, they say that my debt/income ratio is high and they can not do it. \n", - "\n", - "That is ridicules since for the past 9 years I have been able to make my payments at 5.75 % rate and they tell me that I do n't qualify to make the payments at 3 to 3.5 % rates. \n", - "\n", - "Of course by delaying this matter for years they have benefited nicely by collecting 5.75 % on a good size mortgage.\n", - "3. I have signed the refinance paperwork with PrimeChoiceFunding Mortgage on XX/XX/XXXX for a XXXX days lock based on the advertisement flyer that came in the mail in XXXX of XXXX. As of today XX/XX/XXXX XXXX 3 months later ) I have not received any updates on my closing date and have not been able to receive any updates on the status of my file. It is misleading to keep the consumer on hold for such a long period of time with no updates and no end date after offering a XXXX days lock in. It is a desperate situation. Please assist.\n", - "4. Our loan had been transferred to Freedom Mortgage, we were on a biweekly payment plan as we wanted to payoff the loan faster. We ended up moving from XXXX Texas to XXXX and sold the home. During closing a payoff amount was requested and we contacted them to notify that the loan would close on XX/XX/XXXX and the payment for XX/XX/XXXX would be included in the payoff amount. A represenative told us this was normal practice and so we went on to close the loan on the XX/XX/XXXX. Once the loan was closed and paid in full, we began looking for a new home in our new XXXX of XXXX. Come to find out our credit had been dinged badly due to Freedom Mortgage saying we had missed XXXX payments!!! WHAT?? I immediately got on the phone to find out what was going on. After countless phone calls and talking to representatives and waiting for supervisor call backs ... .still they can not put me through to anyone who has the authority to look into the matter and realize that NO payments were missed and the loan is closed and paid in Full!! This has prevented us from looking for and applying for a new mortgage. This is by far the worst handling of a situation where a client who has always been in good standing and has completely paid their loan in full, is being completely discredited by a mistake from their company!! It completely disrupts peoples lives!!\n", - "5. My USDA mortgage loan is currently serviced by Carrington Mortgage Services ( CMS ). CMS claims that the total amount past due is {$14000.00}. The alleged claim is for 14 payments. CMS alleges that {$3400.00} were transferred in late fees from XXXX. XXXX has advised there were no late fees compiled when the loan was transferred in XX/XX/XXXX. CMS is clearly trying to over collect on the amount due, so they can push the bill to the USDA for payment. I have also applied for a loan modification on numerous occasions and CMS has refused to submit the modification requests for underwriting to the USDA. Since the USDA is the investor and insures the loan, they would be issuing a partial claim to help fund the loan modification. Since the USDA is unaware of the partial claim, they were clearly in the dark on the loan modification process. CMS is in clear violation of the False Claims Act ( FCA ), 31 USC 3729 - 3733, since they knowingly withheld information and reported false records to a Federal Government Agency. CMS also has an obligation to process the loan modification with transparency and clarity per the National Mortgage Settlement of XX/XX/XXXX. CMS has refused to cooperate and has set a foreclosure sale date of XX/XX/XXXX. Based on the violations listed herein, I am seeking your assistance on this matter.\n", - "comment list 2:\n", - "1. Someone has used my information trying to apply for credit I have notified transunion they have not removed the false information. \n", - "XXXX. XXXX XXXX XX/XX/XXXX. \n", - "XXXX XXXX XXXX XX/XX/XXXX. \n", - "XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX XXXX XXXX XXXXXXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX XXXX XX/XX/XXXX XXXX. XXXX XXXX XXXX XX/XX/XXXX XXXX\n", - "2. Have reported numerous times identity theft. Last identity theft report was closed after phone was hacked. I have had current equifax account for about 2 years. Suddenly equifax will not let me log into my account without contacting a call center in XXXX and getting a password sent to my email from them. They sent me an email saying the issue was fixed. It is not fixed and Im tired of fighting with these people over my own social security number. This is a violation of fcra and my rights to lock me out of my own credit report. The person stealing my ssn is constantly doing this to save their own behind. Im tired of it and nothing is ever done. This has been going on since around XX/XX/2022 2022\n", - "3. Equifax have placed my file in the consumer affairs department and refuses to block the fraudulent items that are on my credit report. Every time I call in to request they remove my credit file out of the fraud department, a lady by the name of XXXX XXXX is very rude and tell me it will remain in the consumer affairs department. There are fraudulent addresses and accounts that are on my file the department refuses to block and remove.\n", - "4. I am having issues with a freeze on my file that is holding up my credit from reporting my accounts properly. I have a XXXX XXXX and XXXX XXXX account that is not reporting to my credit file. My name is XXXX XXXX SSN:XXXX DOB : XX/XX/1992 My addition address is XXXXXXXX XXXX XXXX XXXX FL XXXX .\n", - "5. I don't recognize this account. I never applied for it. I was victim of Identity Theft, somebody stole my personal information to open credit cards. Except as otherwise provided in this section, a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt by such agency of ( 1 ) appropriate proof of the identity of the consumer ; ( 2 ) a copy of an identity theft report ; ( 3 ) the identification of such information by the consumer ; and ( 4 ) a statement by the consumer that the information is not information relating to any transaction by the consumer. ( b ) Notification. A consumer reporting agency shall promptly notify the furnisher of information identified by the consumer under subsection ( a ) of this section ( 1 ) that the information may be a result of identity theft ; ( 2 ) that an identity theft report has been filed ; ( 3 ) that a block has been requested under this section ; and ( 4 ) of the effective dates of the block. ( c ) Authority to decline or rescind. ( 1 ) In general. A consumer reporting agency may decline to block, or may rescind any block, of information relating to a consumer under this section, if the consumer reporting agency reasonably determines that ( A ) the information was blocked in error or a block was requested by the consumer in error ; ( B ) the information was blocked, or a block was requested by the consumer, on the basis of a material misrepresentation of fact by the consumer relevant to the request to block ; or ( C ) the consumer obtained possession of goods, services, or money as a result of the blocked transaction or transactions. ( 2 ) Notification to consumer. If a block of information is declined or rescinded under this subsection, the affected consumer shall be notified promptly, in the same manner as consumers are notified of the reinsertion of information under section 1681i ( a ) ( 5 ) ( B ) of this title. ( 3 ) Significance of block. For purposes of this subsection, if a consumer reporting agency rescinds a block, the presence of information in the file of a consumer prior to the blocking of such information is not evidence of whether the consumer knew or should have known that the consumer obtained possession of any goods, services, or money as a result of the block. ( d ) Exception for resellers. ( 1 ) No reseller file. This section shall not apply to a consumer reporting agency, if the consumer reporting agency ( A ) is a reseller ; ( B ) is not, at the time of the request of the consumer under subsection ( a ) of this section, otherwise furnishing or reselling a consumer report concerning the information identified by the consumer ; and ( C ) informs the consumer, by any means, that the consumer may report the identity theft to the Bureau to obtain consumer information regarding identity theft. ( 2 ) Reseller with file. The sole obligation of the consumer reporting agency under this section, with regard to any request of a consumer under this section, shall be to block the consumer report maintained by the consumer reporting agency from any subsequent use, if ( A ) the consumer, in accordance with the provisions of subsection ( a ) of this section, identifies, to a consumer reporting agency, information in the file of the consumer that resulted from identity theft ; and ( B ) the consumer reporting agency is a reseller of the identified information. ( 3 ) Notice. In carrying out its obligation under paragraph ( 2 ), the reseller shall promptly provide a notice to the consumer of the decision to block the file. Such notice shall contain the name, address, and telephone number of each consumer reporting agency from which the consumer information was obtained for resale. ( e ) Exception for verification companies. The provisions of this section do not apply to a check services company, acting as such, which issues authorizations for the purpose of approving or processing negotiable instruments, electronic fund transfers, or similar methods of payments, except that, beginning 4 business days after receipt of information described in paragraphs ( 1 ) through ( 3 ) of subsection ( a ) of this section, a check services company shall not report to a national consumer reporting agency described in section 1681a ( p ) of this title, any information identified in the subject identity theft report as resulting from identity theft. ( f ) Access to blocked information by law enforcement agencies. No provision of this section shall be construed as requiring a consumer reporting agency to prevent a Federal, State, or local law enforcement agency from accessing blocked information in a consumer file to which the agency could otherwise obtain access under this subchapter.\n", - "\n" - ] - } - ], + "outputs": [], "source": [ "# The plain English request we will make of PaLM 2\n", "prompt = (\n", @@ -1123,39 +607,11 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": { "id": "mL5P0_3X04dE" }, - "outputs": [ - { - "ename": "PermissionDenied", - "evalue": "403 Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"cloudresourcemanager.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/bigframes-dev\"\n}\nmetadata {\n key: \"permission\"\n value: \"resourcemanager.projects.setIamPolicy\"\n}\n]", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31m_InactiveRpcError\u001b[0m Traceback (most recent call last)", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py:72\u001b[0m, in \u001b[0;36m_wrap_unary_errors..error_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 71\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 72\u001b[0m \u001b[39mreturn\u001b[39;00m callable_(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m 73\u001b[0m \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mRpcError \u001b[39mas\u001b[39;00m exc:\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/grpc/_channel.py:1030\u001b[0m, in \u001b[0;36m_UnaryUnaryMultiCallable.__call__\u001b[0;34m(self, request, timeout, metadata, credentials, wait_for_ready, compression)\u001b[0m\n\u001b[1;32m 1028\u001b[0m state, call, \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_blocking(request, timeout, metadata, credentials,\n\u001b[1;32m 1029\u001b[0m wait_for_ready, compression)\n\u001b[0;32m-> 1030\u001b[0m \u001b[39mreturn\u001b[39;00m _end_unary_response_blocking(state, call, \u001b[39mFalse\u001b[39;49;00m, \u001b[39mNone\u001b[39;49;00m)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/grpc/_channel.py:910\u001b[0m, in \u001b[0;36m_end_unary_response_blocking\u001b[0;34m(state, call, with_call, deadline)\u001b[0m\n\u001b[1;32m 909\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m--> 910\u001b[0m \u001b[39mraise\u001b[39;00m _InactiveRpcError(state)\n", - "\u001b[0;31m_InactiveRpcError\u001b[0m: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.PERMISSION_DENIED\n\tdetails = \"Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist).\"\n\tdebug_error_string = \"UNKNOWN:Error received from peer ipv4:142.251.163.95:443 {grpc_message:\"Permission \\'resourcemanager.projects.setIamPolicy\\' denied on resource \\'//cloudresourcemanager.googleapis.com/projects/bigframes-dev\\' (or it may not exist).\", grpc_status:7, created_time:\"2023-11-08T20:49:43.954445252+00:00\"}\"\n>", - "\nThe above exception was the direct cause of the following exception:\n", - "\u001b[0;31mPermissionDenied\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[21], line 5\u001b[0m\n\u001b[1;32m 3\u001b[0m session \u001b[39m=\u001b[39m bf\u001b[39m.\u001b[39mget_global_session()\n\u001b[1;32m 4\u001b[0m connection \u001b[39m=\u001b[39m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mPROJECT_ID\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m{\u001b[39;00mREGION\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m{\u001b[39;00mCONN_NAME\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> 5\u001b[0m q_a_model \u001b[39m=\u001b[39m PaLM2TextGenerator(session\u001b[39m=\u001b[39;49msession, connection_name\u001b[39m=\u001b[39;49mconnection)\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:72\u001b[0m, in \u001b[0;36mPaLM2TextGenerator.__init__\u001b[0;34m(self, model_name, session, connection_name)\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconnection_name \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bq_connection_manager\u001b[39m.\u001b[39mresolve_full_connection_name(\n\u001b[1;32m 66\u001b[0m connection_name,\n\u001b[1;32m 67\u001b[0m default_project\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39msession\u001b[39m.\u001b[39m_project,\n\u001b[1;32m 68\u001b[0m default_location\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39msession\u001b[39m.\u001b[39m_location,\n\u001b[1;32m 69\u001b[0m )\n\u001b[1;32m 71\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bqml_model_factory \u001b[39m=\u001b[39m \u001b[39mglobals\u001b[39m\u001b[39m.\u001b[39mbqml_model_factory()\n\u001b[0;32m---> 72\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_bqml_model: core\u001b[39m.\u001b[39mBqmlModel \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_create_bqml_model()\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/ml/llm.py:85\u001b[0m, in \u001b[0;36mPaLM2TextGenerator._create_bqml_model\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 81\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mlen\u001b[39m(connection_name_parts) \u001b[39m!=\u001b[39m \u001b[39m3\u001b[39m:\n\u001b[1;32m 82\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[1;32m 83\u001b[0m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mconnection_name must be of the format .., got \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39mconnection_name\u001b[39m}\u001b[39;00m\u001b[39m.\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 84\u001b[0m )\n\u001b[0;32m---> 85\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_bq_connection_manager\u001b[39m.\u001b[39;49mcreate_bq_connection(\n\u001b[1;32m 86\u001b[0m project_id\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m0\u001b[39;49m],\n\u001b[1;32m 87\u001b[0m location\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m1\u001b[39;49m],\n\u001b[1;32m 88\u001b[0m connection_id\u001b[39m=\u001b[39;49mconnection_name_parts[\u001b[39m2\u001b[39;49m],\n\u001b[1;32m 89\u001b[0m iam_role\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39maiplatform.user\u001b[39;49m\u001b[39m\"\u001b[39;49m,\n\u001b[1;32m 90\u001b[0m )\n\u001b[1;32m 91\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mmodel_name \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mtext-bison\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m 92\u001b[0m options \u001b[39m=\u001b[39m {\n\u001b[1;32m 93\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mremote_service_type\u001b[39m\u001b[39m\"\u001b[39m: _REMOTE_TEXT_GENERATOR_MODEL_CODE,\n\u001b[1;32m 94\u001b[0m }\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/clients.py:100\u001b[0m, in \u001b[0;36mBqConnectionManager.create_bq_connection\u001b[0;34m(self, project_id, location, connection_id, iam_role)\u001b[0m\n\u001b[1;32m 97\u001b[0m service_account_id \u001b[39m=\u001b[39m cast(\u001b[39mstr\u001b[39m, service_account_id)\n\u001b[1;32m 98\u001b[0m \u001b[39m# Ensure IAM role on the BQ connection\u001b[39;00m\n\u001b[1;32m 99\u001b[0m \u001b[39m# https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function\u001b[39;00m\n\u001b[0;32m--> 100\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_ensure_iam_binding(project_id, service_account_id, iam_role)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:349\u001b[0m, in \u001b[0;36mRetry.__call__..retry_wrapped_func\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 345\u001b[0m target \u001b[39m=\u001b[39m functools\u001b[39m.\u001b[39mpartial(func, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 346\u001b[0m sleep_generator \u001b[39m=\u001b[39m exponential_sleep_generator(\n\u001b[1;32m 347\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_initial, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_maximum, multiplier\u001b[39m=\u001b[39m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_multiplier\n\u001b[1;32m 348\u001b[0m )\n\u001b[0;32m--> 349\u001b[0m \u001b[39mreturn\u001b[39;00m retry_target(\n\u001b[1;32m 350\u001b[0m target,\n\u001b[1;32m 351\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_predicate,\n\u001b[1;32m 352\u001b[0m sleep_generator,\n\u001b[1;32m 353\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_timeout,\n\u001b[1;32m 354\u001b[0m on_error\u001b[39m=\u001b[39;49mon_error,\n\u001b[1;32m 355\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/retry.py:191\u001b[0m, in \u001b[0;36mretry_target\u001b[0;34m(target, predicate, sleep_generator, timeout, on_error, **kwargs)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[39mfor\u001b[39;00m sleep \u001b[39min\u001b[39;00m sleep_generator:\n\u001b[1;32m 190\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 191\u001b[0m \u001b[39mreturn\u001b[39;00m target()\n\u001b[1;32m 193\u001b[0m \u001b[39m# pylint: disable=broad-except\u001b[39;00m\n\u001b[1;32m 194\u001b[0m \u001b[39m# This function explicitly must deal with broad exceptions.\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mException\u001b[39;00m \u001b[39mas\u001b[39;00m exc:\n", - "File \u001b[0;32m~/bq/src/bigframes/bigframes/clients.py:138\u001b[0m, in \u001b[0;36mBqConnectionManager._ensure_iam_binding\u001b[0;34m(self, project_id, service_account_id, iam_role)\u001b[0m\n\u001b[1;32m 136\u001b[0m policy\u001b[39m.\u001b[39mbindings\u001b[39m.\u001b[39mappend(new_binding)\n\u001b[1;32m 137\u001b[0m request \u001b[39m=\u001b[39m iam_policy_pb2\u001b[39m.\u001b[39mSetIamPolicyRequest(resource\u001b[39m=\u001b[39mproject, policy\u001b[39m=\u001b[39mpolicy)\n\u001b[0;32m--> 138\u001b[0m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_cloud_resource_manager_client\u001b[39m.\u001b[39;49mset_iam_policy(request\u001b[39m=\u001b[39;49mrequest)\n\u001b[1;32m 140\u001b[0m \u001b[39m# We would wait for the IAM policy change to take effect\u001b[39;00m\n\u001b[1;32m 141\u001b[0m \u001b[39m# https://cloud.google.com/iam/docs/access-change-propagation\u001b[39;00m\n\u001b[1;32m 142\u001b[0m logger\u001b[39m.\u001b[39minfo(\n\u001b[1;32m 143\u001b[0m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mWaiting \u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_IAM_WAIT_SECONDS\u001b[39m}\u001b[39;00m\u001b[39m seconds for IAM to take effect..\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 144\u001b[0m )\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/cloud/resourcemanager_v3/services/projects/client.py:1838\u001b[0m, in \u001b[0;36mProjectsClient.set_iam_policy\u001b[0;34m(self, request, resource, retry, timeout, metadata)\u001b[0m\n\u001b[1;32m 1833\u001b[0m metadata \u001b[39m=\u001b[39m \u001b[39mtuple\u001b[39m(metadata) \u001b[39m+\u001b[39m (\n\u001b[1;32m 1834\u001b[0m gapic_v1\u001b[39m.\u001b[39mrouting_header\u001b[39m.\u001b[39mto_grpc_metadata(((\u001b[39m\"\u001b[39m\u001b[39mresource\u001b[39m\u001b[39m\"\u001b[39m, request\u001b[39m.\u001b[39mresource),)),\n\u001b[1;32m 1835\u001b[0m )\n\u001b[1;32m 1837\u001b[0m \u001b[39m# Send the request.\u001b[39;00m\n\u001b[0;32m-> 1838\u001b[0m response \u001b[39m=\u001b[39m rpc(\n\u001b[1;32m 1839\u001b[0m request,\n\u001b[1;32m 1840\u001b[0m retry\u001b[39m=\u001b[39;49mretry,\n\u001b[1;32m 1841\u001b[0m timeout\u001b[39m=\u001b[39;49mtimeout,\n\u001b[1;32m 1842\u001b[0m metadata\u001b[39m=\u001b[39;49mmetadata,\n\u001b[1;32m 1843\u001b[0m )\n\u001b[1;32m 1845\u001b[0m \u001b[39m# Done; return the response.\u001b[39;00m\n\u001b[1;32m 1846\u001b[0m \u001b[39mreturn\u001b[39;00m response\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py:113\u001b[0m, in \u001b[0;36m_GapicCallable.__call__\u001b[0;34m(self, timeout, retry, *args, **kwargs)\u001b[0m\n\u001b[1;32m 110\u001b[0m metadata\u001b[39m.\u001b[39mextend(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_metadata)\n\u001b[1;32m 111\u001b[0m kwargs[\u001b[39m\"\u001b[39m\u001b[39mmetadata\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m metadata\n\u001b[0;32m--> 113\u001b[0m \u001b[39mreturn\u001b[39;00m wrapped_func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/timeout.py:120\u001b[0m, in \u001b[0;36mTimeToDeadlineTimeout.__call__..func_with_timeout\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[39m# Avoid setting negative timeout\u001b[39;00m\n\u001b[1;32m 118\u001b[0m kwargs[\u001b[39m\"\u001b[39m\u001b[39mtimeout\u001b[39m\u001b[39m\"\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mmax\u001b[39m(\u001b[39m0\u001b[39m, \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_timeout \u001b[39m-\u001b[39m time_since_first_attempt)\n\u001b[0;32m--> 120\u001b[0m \u001b[39mreturn\u001b[39;00m func(\u001b[39m*\u001b[39;49margs, \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n", - "File \u001b[0;32m~/bq/src/bigframes/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py:74\u001b[0m, in \u001b[0;36m_wrap_unary_errors..error_remapped_callable\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 72\u001b[0m \u001b[39mreturn\u001b[39;00m callable_(\u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 73\u001b[0m \u001b[39mexcept\u001b[39;00m grpc\u001b[39m.\u001b[39mRpcError \u001b[39mas\u001b[39;00m exc:\n\u001b[0;32m---> 74\u001b[0m \u001b[39mraise\u001b[39;00m exceptions\u001b[39m.\u001b[39mfrom_grpc_error(exc) \u001b[39mfrom\u001b[39;00m \u001b[39mexc\u001b[39;00m\n", - "\u001b[0;31mPermissionDenied\u001b[0m: 403 Permission 'resourcemanager.projects.setIamPolicy' denied on resource '//cloudresourcemanager.googleapis.com/projects/bigframes-dev' (or it may not exist). [reason: \"IAM_PERMISSION_DENIED\"\ndomain: \"cloudresourcemanager.googleapis.com\"\nmetadata {\n key: \"resource\"\n value: \"projects/bigframes-dev\"\n}\nmetadata {\n key: \"permission\"\n value: \"resourcemanager.projects.setIamPolicy\"\n}\n]" - ] - } - ], + "outputs": [], "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", From 6b4d78aef26d0f37a8216c60c61e11baa8575e68 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 23:03:23 +0000 Subject: [PATCH 25/26] 5000 -> 10000 --- notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 3c0ab676cd..2bf1d753de 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -59,7 +59,7 @@ "\n", "The goal of this notebook is to demonstrate a comment characterization algorithm for an online business. We will accomplish this using [Google's PaLM 2](https://ai.google/discover/palm2/) and [KMeans clustering](https://en.wikipedia.org/wiki/K-means_clustering) in three steps:\n", "\n", - "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 5000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", + "1. Use PaLM2TextEmbeddingGenerator to [generate text embeddings](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) for each of 10000 complaints sent to an online bank. If you're not familiar with what a text embedding is, it's a list of numbers that are like coordinates in an imaginary \"meaning space\" for sentences. (It's like [word embeddings](https://en.wikipedia.org/wiki/Word_embedding), but for more general text.) The important point for our purposes is that similar sentences are close to each other in this imaginary space.\n", "2. Use KMeans clustering to group together complaints whose text embeddings are near to eachother. This will give us sets of similar complaints, but we don't yet know _why_ these complaints are similar.\n", "3. Prompt PaLM2TextGenerator in English asking what the difference is between the groups of complaints that we got. Thanks to the power of modern LLMs, the response might give us a very good idea of what these complaints are all about, but remember to [\"understand the limits of your dataset and model.\"](https://ai.google/responsibility/responsible-ai-practices/#:~:text=Understand%20the%20limitations%20of%20your%20dataset%20and%20model)\n", "\n", @@ -393,8 +393,8 @@ }, "outputs": [], "source": [ - "# Choose 5,000 complaints randomly and store them in a column in a DataFrame\n", - "downsampled_issues_df = issues_df.sample(n=5000)" + "# Choose 10,000 complaints randomly and store them in a column in a DataFrame\n", + "downsampled_issues_df = issues_df.sample(n=10000)" ] }, { From 747829ea05695b950ab0987876fd08ad0e3994f4 Mon Sep 17 00:00:00 2001 From: Henry J Solberg Date: Wed, 8 Nov 2023 23:29:10 +0000 Subject: [PATCH 26/26] Add more explanation text --- .../generative_ai/bq_dataframes_llm_kmeans.ipynb | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 2bf1d753de..46c4955288 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -354,12 +354,13 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "metadata": { "id": "v6FGschEowht" }, "source": [ - "Data Input" + "Data Input - read the data from a publicly available BigQuery dataset" ] }, { @@ -385,6 +386,14 @@ "issues_df.head(n=5) # View the first five complaints" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Download 10000 complaints to use with PaLM2TextEmbeddingGenerator" + ] + }, { "cell_type": "code", "execution_count": null, @@ -534,7 +543,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Build prompts" + "Build prompts - we will choose just two of our categories and prompt PaLM2TextGenerator to identify their salient characteristics. The prompt is natural language in a python string." ] }, { @@ -602,7 +611,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Get a response from PaLM 2 LLM" + "Get a response from PaLM 2 LLM by making a call to Vertex AI using our connection." ] }, {