From 9fde1f8f158682f4c4ed9e310fc1fcdaac0ced11 Mon Sep 17 00:00:00 2001 From: Ashley Xu Date: Tue, 14 Nov 2023 20:06:09 +0000 Subject: [PATCH 1/7] Make the llm kmeans notebook professional --- .../bq_dataframes_llm_kmeans.ipynb | 115 ++++-------------- 1 file changed, 26 insertions(+), 89 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 46c4955288..729beab0a3 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -122,10 +122,6 @@ "\n", " * BigQuery API\n", " * BigQuery Connection API\n", - " * Cloud Run API\n", - " * Artifact Registry API\n", - " * Cloud Build API\n", - " * Cloud Resource Manager API\n", " * Vertex AI API\n", "\n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." @@ -232,87 +228,6 @@ "# auth.authenticate_user()" ] }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to Vertex AI\n", - "\n", - "In order to use PaLM2TextGenerator, we will need to set up a [cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from google.cloud import bigquery_connection_v1 as bq_connection\n", - "\n", - "CONN_NAME = \"bqdf-llm\"\n", - "\n", - "client = bq_connection.ConnectionServiceClient()\n", - "new_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}\"\n", - "exists_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", - "cloud_resource_properties = bq_connection.CloudResourceProperties({})\n", - "\n", - "try:\n", - " request = client.get_connection(\n", - " request=bq_connection.GetConnectionRequest(name=exists_conn_parent)\n", - " )\n", - " CONN_SERVICE_ACCOUNT = f\"serviceAccount:{request.cloud_resource.service_account_id}\"\n", - "except Exception:\n", - " connection = bq_connection.types.Connection(\n", - " {\"friendly_name\": CONN_NAME, \"cloud_resource\": cloud_resource_properties}\n", - " )\n", - " request = bq_connection.CreateConnectionRequest(\n", - " {\n", - " \"parent\": new_conn_parent,\n", - " \"connection_id\": CONN_NAME,\n", - " \"connection\": connection,\n", - " }\n", - " )\n", - " response = client.create_connection(request)\n", - " CONN_SERVICE_ACCOUNT = (\n", - " f\"serviceAccount:{response.cloud_resource.service_account_id}\"\n", - " )\n", - "print(CONN_SERVICE_ACCOUNT)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set permissions for the service account\n", - "\n", - "The resource connection service account requires certain project-level permissions:\n", - " - `roles/aiplatform.user` and `roles/bigquery.connectionUser`: These roles are required for the connection to create a model definition using the LLM model in Vertex AI ([documentation](https://cloud.google.com/bigquery/docs/generate-text#give_the_service_account_access)).\n", - " - `roles/run.invoker`: This role is required for the connection to have read-only access to Cloud Run services that back custom/remote functions ([documentation](https://cloud.google.com/bigquery/docs/remote-functions#grant_permission_on_function)).\n", - "\n", - "Set these permissions by running the following `gcloud` commands:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" - ] - }, { "attachments": {}, "cell_type": "markdown", @@ -336,7 +251,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Project Setup" + "BigQuery DataFrames setup" ] }, { @@ -353,6 +268,14 @@ "bf.options.bigquery.location = REGION" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -391,7 +314,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Download 10000 complaints to use with PaLM2TextEmbeddingGenerator" + "Downsample DataFrame to 10,000 records for model training." ] }, { @@ -470,7 +393,7 @@ "id": "OUZ3NNbzo1Tb" }, "source": [ - "## Step 2: KMeans clustering" + "## Step 2: Create k-means model and predict clusters" ] }, { @@ -535,7 +458,7 @@ "id": "21rNsFMHo8hO" }, "source": [ - "## Step 3: Summarize the complaints" + "## Step 3: Use PaLM2 LLM model to summarize complaint clusters" ] }, { @@ -624,7 +547,10 @@ "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", + "# Create a BigQuery Cloud resource connection\n", + "CONN_NAME = \"bqdf-llm\"\n", "session = bf.get_global_session()\n", + "\n", "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)" ] @@ -662,6 +588,17 @@ "source": [ "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Summary and next steps\n", + "\n", + "You've used BigQuery DataFrames' integration with LLM models (`bigframes.ml.llm`) to generate code samples, and have tranformed LLM output by creating and using a custom function in BigQuery DataFrames.\n", + "\n", + "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)." + ] } ], "metadata": { From a1b9706a60b92c014c757a334c2a769e0976a46a Mon Sep 17 00:00:00 2001 From: Ashley Xu Date: Tue, 14 Nov 2023 20:06:09 +0000 Subject: [PATCH 2/7] Make the llm kmeans notebook professional --- .../bq_dataframes_llm_kmeans.ipynb | 115 ++++-------------- 1 file changed, 26 insertions(+), 89 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 46c4955288..729beab0a3 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -122,10 +122,6 @@ "\n", " * BigQuery API\n", " * BigQuery Connection API\n", - " * Cloud Run API\n", - " * Artifact Registry API\n", - " * Cloud Build API\n", - " * Cloud Resource Manager API\n", " * Vertex AI API\n", "\n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." @@ -232,87 +228,6 @@ "# auth.authenticate_user()" ] }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to Vertex AI\n", - "\n", - "In order to use PaLM2TextGenerator, we will need to set up a [cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from google.cloud import bigquery_connection_v1 as bq_connection\n", - "\n", - "CONN_NAME = \"bqdf-llm\"\n", - "\n", - "client = bq_connection.ConnectionServiceClient()\n", - "new_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}\"\n", - "exists_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", - "cloud_resource_properties = bq_connection.CloudResourceProperties({})\n", - "\n", - "try:\n", - " request = client.get_connection(\n", - " request=bq_connection.GetConnectionRequest(name=exists_conn_parent)\n", - " )\n", - " CONN_SERVICE_ACCOUNT = f\"serviceAccount:{request.cloud_resource.service_account_id}\"\n", - "except Exception:\n", - " connection = bq_connection.types.Connection(\n", - " {\"friendly_name\": CONN_NAME, \"cloud_resource\": cloud_resource_properties}\n", - " )\n", - " request = bq_connection.CreateConnectionRequest(\n", - " {\n", - " \"parent\": new_conn_parent,\n", - " \"connection_id\": CONN_NAME,\n", - " \"connection\": connection,\n", - " }\n", - " )\n", - " response = client.create_connection(request)\n", - " CONN_SERVICE_ACCOUNT = (\n", - " f\"serviceAccount:{response.cloud_resource.service_account_id}\"\n", - " )\n", - "print(CONN_SERVICE_ACCOUNT)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set permissions for the service account\n", - "\n", - "The resource connection service account requires certain project-level permissions:\n", - " - `roles/aiplatform.user` and `roles/bigquery.connectionUser`: These roles are required for the connection to create a model definition using the LLM model in Vertex AI ([documentation](https://cloud.google.com/bigquery/docs/generate-text#give_the_service_account_access)).\n", - " - `roles/run.invoker`: This role is required for the connection to have read-only access to Cloud Run services that back custom/remote functions ([documentation](https://cloud.google.com/bigquery/docs/remote-functions#grant_permission_on_function)).\n", - "\n", - "Set these permissions by running the following `gcloud` commands:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" - ] - }, { "attachments": {}, "cell_type": "markdown", @@ -336,7 +251,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Project Setup" + "BigQuery DataFrames setup" ] }, { @@ -353,6 +268,14 @@ "bf.options.bigquery.location = REGION" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -391,7 +314,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Download 10000 complaints to use with PaLM2TextEmbeddingGenerator" + "Downsample DataFrame to 10,000 records for model training." ] }, { @@ -470,7 +393,7 @@ "id": "OUZ3NNbzo1Tb" }, "source": [ - "## Step 2: KMeans clustering" + "## Step 2: Create k-means model and predict clusters" ] }, { @@ -535,7 +458,7 @@ "id": "21rNsFMHo8hO" }, "source": [ - "## Step 3: Summarize the complaints" + "## Step 3: Use PaLM2 LLM model to summarize complaint clusters" ] }, { @@ -624,7 +547,10 @@ "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", + "# Create a BigQuery Cloud resource connection\n", + "CONN_NAME = \"bqdf-llm\"\n", "session = bf.get_global_session()\n", + "\n", "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)" ] @@ -662,6 +588,17 @@ "source": [ "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Summary and next steps\n", + "\n", + "You've used BigQuery DataFrames' integration with LLM models (`bigframes.ml.llm`) to generate code samples, and have tranformed LLM output by creating and using a custom function in BigQuery DataFrames.\n", + "\n", + "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)." + ] } ], "metadata": { From 8324f133547ec35da5eefc0a8b02fe0f3887d81d Mon Sep 17 00:00:00 2001 From: TrevorBergeron Date: Wed, 15 Nov 2023 18:58:14 -0800 Subject: [PATCH 3/7] fix: correctly handle null values when initializing fingerprint ordering (#210) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly: - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery-dataframes/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) Fixes # 🦕 --- bigframes/session/__init__.py | 9 ++++++--- tests/system/small/test_dataframe.py | 8 ++++++++ 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/bigframes/session/__init__.py b/bigframes/session/__init__.py index 069bd5d260..928123ce74 100644 --- a/bigframes/session/__init__.py +++ b/bigframes/session/__init__.py @@ -1120,8 +1120,9 @@ def _create_total_ordering( ordering_hash_part = guid.generate_guid("bigframes_ordering_") ordering_rand_part = guid.generate_guid("bigframes_ordering_") + # All inputs into hash must be non-null or resulting hash will be null str_values = list( - map(lambda col: _convert_to_string(table[col]), table.columns) + map(lambda col: _convert_to_nonnull_string(table[col]), table.columns) ) full_row_str = ( str_values[0].concat(*str_values[1:]) @@ -1419,7 +1420,7 @@ def _can_cluster_bq(field: bigquery.SchemaField): ) -def _convert_to_string(column: ibis_types.Column) -> ibis_types.StringColumn: +def _convert_to_nonnull_string(column: ibis_types.Column) -> ibis_types.StringValue: col_type = column.type() if ( col_type.is_numeric() @@ -1436,4 +1437,6 @@ def _convert_to_string(column: ibis_types.Column) -> ibis_types.StringColumn: # TO_JSON_STRING works with all data types, but isn't the most efficient # Needed for JSON, STRUCT and ARRAY datatypes result = vendored_ibis_ops.ToJsonString(column).to_expr() # type: ignore - return typing.cast(ibis_types.StringColumn, result) + # Escape backslashes and use backslash as delineator + escaped = typing.cast(ibis_types.StringColumn, result.fillna("")).replace("\\", "\\\\") # type: ignore + return typing.cast(ibis_types.StringColumn, ibis.literal("\\")).concat(escaped) diff --git a/tests/system/small/test_dataframe.py b/tests/system/small/test_dataframe.py index e522878229..a0cf25807c 100644 --- a/tests/system/small/test_dataframe.py +++ b/tests/system/small/test_dataframe.py @@ -2703,6 +2703,14 @@ def test_sample(scalars_dfs, frac, n, random_state): assert bf_result.shape[1] == scalars_df.shape[1] +def test_sample_determinism(penguins_df_default_index): + df = penguins_df_default_index.sample(n=100, random_state=12345).head(15) + bf_result = df.to_pandas() + bf_result2 = df.to_pandas() + + pandas.testing.assert_frame_equal(bf_result, bf_result2) + + def test_sample_raises_value_error(scalars_dfs): scalars_df, _ = scalars_dfs with pytest.raises( From 5ab5059f7db5d0f2be735dca76bc8e5163287c4d Mon Sep 17 00:00:00 2001 From: "release-please[bot]" <55107282+release-please[bot]@users.noreply.github.com> Date: Wed, 15 Nov 2023 20:25:53 -0800 Subject: [PATCH 4/7] chore(main): release 0.14.1 (#207) Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com> --- CHANGELOG.md | 12 ++++++++++++ bigframes/version.py | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1f76b78272..091967513a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,18 @@ [1]: https://pypi.org/project/bigframes/#history +## [0.14.1](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.14.0...v0.14.1) (2023-11-16) + + +### Bug Fixes + +* Correctly handle null values when initializing fingerprint ordering ([#210](https://github.com/googleapis/python-bigquery-dataframes/issues/210)) ([8324f13](https://github.com/googleapis/python-bigquery-dataframes/commit/8324f133547ec35da5eefc0a8b02fe0f3887d81d)) + + +### Documentation + +* Add an example notebook about line graphs ([#197](https://github.com/googleapis/python-bigquery-dataframes/issues/197)) ([f957b27](https://github.com/googleapis/python-bigquery-dataframes/commit/f957b278b39e0a472a3153e9e1906c2d5f2ac2e5)) + ## [0.14.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v0.13.0...v0.14.0) (2023-11-14) diff --git a/bigframes/version.py b/bigframes/version.py index 5a94f72649..46e57e5b88 100644 --- a/bigframes/version.py +++ b/bigframes/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.14.0" +__version__ = "0.14.1" From d06a4c9a96d6f6485c6afaa12b6a8100bed77051 Mon Sep 17 00:00:00 2001 From: Ashley Xu Date: Tue, 14 Nov 2023 20:06:09 +0000 Subject: [PATCH 5/7] Make the llm kmeans notebook professional --- .../bq_dataframes_llm_kmeans.ipynb | 115 ++++-------------- 1 file changed, 26 insertions(+), 89 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 46c4955288..729beab0a3 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -122,10 +122,6 @@ "\n", " * BigQuery API\n", " * BigQuery Connection API\n", - " * Cloud Run API\n", - " * Artifact Registry API\n", - " * Cloud Build API\n", - " * Cloud Resource Manager API\n", " * Vertex AI API\n", "\n", "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)." @@ -232,87 +228,6 @@ "# auth.authenticate_user()" ] }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to Vertex AI\n", - "\n", - "In order to use PaLM2TextGenerator, we will need to set up a [cloud resource connection](https://cloud.google.com/bigquery/docs/create-cloud-resource-connection)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from google.cloud import bigquery_connection_v1 as bq_connection\n", - "\n", - "CONN_NAME = \"bqdf-llm\"\n", - "\n", - "client = bq_connection.ConnectionServiceClient()\n", - "new_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}\"\n", - "exists_conn_parent = f\"projects/{PROJECT_ID}/locations/{REGION}/connections/{CONN_NAME}\"\n", - "cloud_resource_properties = bq_connection.CloudResourceProperties({})\n", - "\n", - "try:\n", - " request = client.get_connection(\n", - " request=bq_connection.GetConnectionRequest(name=exists_conn_parent)\n", - " )\n", - " CONN_SERVICE_ACCOUNT = f\"serviceAccount:{request.cloud_resource.service_account_id}\"\n", - "except Exception:\n", - " connection = bq_connection.types.Connection(\n", - " {\"friendly_name\": CONN_NAME, \"cloud_resource\": cloud_resource_properties}\n", - " )\n", - " request = bq_connection.CreateConnectionRequest(\n", - " {\n", - " \"parent\": new_conn_parent,\n", - " \"connection_id\": CONN_NAME,\n", - " \"connection\": connection,\n", - " }\n", - " )\n", - " response = client.create_connection(request)\n", - " CONN_SERVICE_ACCOUNT = (\n", - " f\"serviceAccount:{response.cloud_resource.service_account_id}\"\n", - " )\n", - "print(CONN_SERVICE_ACCOUNT)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set permissions for the service account\n", - "\n", - "The resource connection service account requires certain project-level permissions:\n", - " - `roles/aiplatform.user` and `roles/bigquery.connectionUser`: These roles are required for the connection to create a model definition using the LLM model in Vertex AI ([documentation](https://cloud.google.com/bigquery/docs/generate-text#give_the_service_account_access)).\n", - " - `roles/run.invoker`: This role is required for the connection to have read-only access to Cloud Run services that back custom/remote functions ([documentation](https://cloud.google.com/bigquery/docs/remote-functions#grant_permission_on_function)).\n", - "\n", - "Set these permissions by running the following `gcloud` commands:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/bigquery.connectionUser'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/aiplatform.user'\n", - "!gcloud projects add-iam-policy-binding {PROJECT_ID} --condition=None --no-user-output-enabled --member={CONN_SERVICE_ACCOUNT} --role='roles/run.invoker'" - ] - }, { "attachments": {}, "cell_type": "markdown", @@ -336,7 +251,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Project Setup" + "BigQuery DataFrames setup" ] }, { @@ -353,6 +268,14 @@ "bf.options.bigquery.location = REGION" ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you want to reset the location of the created DataFrame or Series objects, reset the session by executing `bf.close_session()`. After that, you can reuse `bf.options.bigquery.location` to specify another location." + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -391,7 +314,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Download 10000 complaints to use with PaLM2TextEmbeddingGenerator" + "Downsample DataFrame to 10,000 records for model training." ] }, { @@ -470,7 +393,7 @@ "id": "OUZ3NNbzo1Tb" }, "source": [ - "## Step 2: KMeans clustering" + "## Step 2: Create k-means model and predict clusters" ] }, { @@ -535,7 +458,7 @@ "id": "21rNsFMHo8hO" }, "source": [ - "## Step 3: Summarize the complaints" + "## Step 3: Use PaLM2 LLM model to summarize complaint clusters" ] }, { @@ -624,7 +547,10 @@ "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", + "# Create a BigQuery Cloud resource connection\n", + "CONN_NAME = \"bqdf-llm\"\n", "session = bf.get_global_session()\n", + "\n", "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)" ] @@ -662,6 +588,17 @@ "source": [ "We now see PaLM2TextGenerator's characterization of the different comment groups. Thanks for using BigQuery DataFrames!" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Summary and next steps\n", + "\n", + "You've used BigQuery DataFrames' integration with LLM models (`bigframes.ml.llm`) to generate code samples, and have tranformed LLM output by creating and using a custom function in BigQuery DataFrames.\n", + "\n", + "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)." + ] } ], "metadata": { From 280ffdf1aec38b0ad440d0ad188543dcffba325e Mon Sep 17 00:00:00 2001 From: Ashley Xu Date: Thu, 16 Nov 2023 01:08:36 +0000 Subject: [PATCH 6/7] address the comments --- .../bq_dataframes_llm_code_generation.ipynb | 2 +- .../bq_dataframes_llm_kmeans.ipynb | 1283 ++++++++++++++++- 2 files changed, 1241 insertions(+), 44 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb index 0f113b84c6..0a41447a53 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_code_generation.ipynb @@ -34,7 +34,7 @@ "\n", "\n", " \n", diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 729beab0a3..740003d4ee 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -31,7 +31,7 @@ "
\n", - " \n", + " \n", " \"Colab Run in Colab\n", " \n", "
\n", "\n", " \n", @@ -118,7 +118,7 @@ "\n", "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n", "\n", - "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,run.googleapis.com,artifactregistry.googleapis.com,cloudbuild.googleapis.com,cloudresourcemanager.googleapis.com) to enable the following APIs:\n", + "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com) to enable the following APIs:\n", "\n", " * BigQuery API\n", " * BigQuery Connection API\n", @@ -139,9 +139,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Updated property [core/project].\n" + ] + } + ], "source": [ "# set your project ID below\n", "PROJECT_ID = \"\" # @param {type:\"string\"}\n", @@ -162,7 +170,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -256,7 +264,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "metadata": { "id": "R7STCS8xB5d2" }, @@ -288,7 +296,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": { "id": "zDSwoBo1CU3G" }, @@ -299,11 +307,101 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { "id": "tYDoaKgJChiq" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 9f096761-e3b5-4d58-a9f7-485ced67afca is DONE. 2.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job ee8fecb1-2e30-407d-9e2e-9e76061da9e7 is DONE. 2.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "
\n", - " \n", + " \n", " \"Colab Run in Colab\n", " \n", "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrative
0I signed a contract as a condition of employme...
1First, I want to disclose that XXXX and XXXX b...
2Frequent calls from Focused Receivables Manage...
3I recently contacted Enhanced Recovery Company...
4This began when I subscribed to XXXX XXXX inte...
\n", + "

5 rows × 1 columns

\n", + "[5 rows x 1 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative\n", + "0 I signed a contract as a condition of employme...\n", + "1 First, I want to disclose that XXXX and XXXX b...\n", + "2 Frequent calls from Focused Receivables Manage...\n", + "3 I recently contacted Enhanced Recovery Company...\n", + "4 This began when I subscribed to XXXX XXXX inte...\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "issues_df = input_df[[\"consumer_complaint_narrative\"]].dropna()\n", "issues_df.head(n=5) # View the first five complaints" @@ -319,7 +417,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": { "id": "OltYSUEcsSOW" }, @@ -329,6 +427,222 @@ "downsampled_issues_df = issues_df.sample(n=10000)" ] }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "Query job aed21372-ad20-4483-8843-faabc286be3f is DONE. 2.3 GB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job c5138c60-f714-4091-83b0-6900e0897d02 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 521e7788-6722-4bb8-91d6-5deb3a56435a is DONE. 10.7 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrative
2580664Hello, my name is XXXX XXXX, and I am writing ...
1806973This is XXXX XXXX and I am submitting this com...
2055053XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...
2515231When I reinvestigated my credit report, I real...
2633049Checking my credit report XX/XX/2018 with all ...
3117273I contacted TransUnion and spoke a credit rep ...
698814XXXX XXXX XXXX. makes daily calls to me cell c...
267826Can we please reopen Case : XXXX? \n", + "\n", + "Wells Farg...
54019My rights under 15 USC 1681 have been violated...
141050To whom it may concern : My personal informati...
2962076I have had a CashApp account since last year, ...
2481105that some of the information was erroneous. Th...
431562I have disputed the referenced accounts to the...
1953029On, XX/XX/22, I attempted to complete a transa...
2395979Subject : XXXX XXXX XXXX compensation, refund,...
455524I paid off my mortgage on XX/XX/2019. The comp...
2155924This kind of account is placed as a charged of...
1069497This is one of many issues I have had with Wel...
3181689I have disputed this account with MONTEREY FIN...
274268Lender is not updating my loan status in the V...
1671305XXXX is a peer to peer lending conmpany that u...
886026( DISPUTE CODE - XXXX ) My personal informatio...
1044431I filed a complaint against PNC this year and ...
1938481I applied for a modification and was approved....
1987834Ive been Disputting my XXXX XXXX I opened this...
\n", + "

25 rows × 1 columns

\n", + "
[10000 rows x 1 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative\n", + "2580664 Hello, my name is XXXX XXXX, and I am writing ...\n", + "1806973 This is XXXX XXXX and I am submitting this com...\n", + "2055053 XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...\n", + "2515231 When I reinvestigated my credit report, I real...\n", + "2633049 Checking my credit report XX/XX/2018 with all ...\n", + "3117273 I contacted TransUnion and spoke a credit rep ...\n", + "698814 XXXX XXXX XXXX. makes daily calls to me cell c...\n", + "267826 Can we please reopen Case : XXXX? \n", + "\n", + "Wells Farg...\n", + "54019 My rights under 15 USC 1681 have been violated...\n", + "141050 To whom it may concern : My personal informati...\n", + "2962076 I have had a CashApp account since last year, ...\n", + "2481105 that some of the information was erroneous. Th...\n", + "431562 I have disputed the referenced accounts to the...\n", + "1953029 On, XX/XX/22, I attempted to complete a transa...\n", + "2395979 Subject : XXXX XXXX XXXX compensation, refund,...\n", + "455524 I paid off my mortgage on XX/XX/2019. The comp...\n", + "2155924 This kind of account is placed as a charged of...\n", + "1069497 This is one of many issues I have had with Wel...\n", + "3181689 I have disputed this account with MONTEREY FIN...\n", + "274268 Lender is not updating my loan status in the V...\n", + "1671305 XXXX is a peer to peer lending conmpany that u...\n", + "886026 ( DISPUTE CODE - XXXX ) My personal informatio...\n", + "1044431 I filed a complaint against PNC this year and ...\n", + "1938481 I applied for a modification and was approved....\n", + "1987834 Ive been Disputting my XXXX XXXX I opened this...\n", + "...\n", + "\n", + "[10000 rows x 1 columns]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "downsampled_issues_df._cached()" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -341,11 +655,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { "id": "li38q8FzDDMu" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 52d2e961-7896-497c-8b03-ab7374737679 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "from bigframes.ml.llm import PaLM2TextEmbeddingGenerator\n", "\n", @@ -354,11 +681,125 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 29, "metadata": { "id": "cOuSOQ5FDewD" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job d093d51a-8eda-442f-80cd-568cb76e00b3 is DONE. 10.6 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 6419df65-3e96-41a7-a7b5-3d058e18763a is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 917f09ea-c468-4363-a856-b1091e5f775f is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 5c9679e7-192c-40b5-a14b-edc0fa113eaa is DONE. 61.5 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
text_embedding
422[-0.012013785541057587, 0.003669967409223318, ...
616[-0.014948881231248379, -0.04672442376613617, ...
833[-0.01951478235423565, -0.027120858430862427, ...
1370[-0.03140445053577423, -0.048797041177749634, ...
1430[-0.02244548313319683, -0.03336532413959503, 0...
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " text_embedding\n", + "422 [-0.012013785541057587, 0.003669967409223318, ...\n", + "616 [-0.014948881231248379, -0.04672442376613617, ...\n", + "833 [-0.01951478235423565, -0.027120858430862427, ...\n", + "1370 [-0.03140445053577423, -0.048797041177749634, ...\n", + "1430 [-0.02244548313319683, -0.03336532413959503, 0...\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Will take ~3 minutes to compute the embeddings\n", "predicted_embeddings = model.predict(downsampled_issues_df)\n", @@ -368,14 +809,263 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 30, "metadata": { "id": "4H_etYfsEOFP" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job ce9cb0f9-4b0d-40a1-81f3-d6e60dd6c684 is DONE. 160.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job aa692a30-5706-46ad-8029-faf2fac66234 is DONE. 72.2 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrativetext_embedding
2580664Hello, my name is XXXX XXXX, and I am writing ...[0.0003211698785889894, -0.01816680282354355, ...
1806973This is XXXX XXXX and I am submitting this com...[-0.009485247544944286, -0.025846892967820168,...
2055053XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...[-0.010950954630970955, -0.0249345600605011, 0...
2515231When I reinvestigated my credit report, I real...[-0.009660656563937664, -0.05793113633990288, ...
2633049Checking my credit report XX/XX/2018 with all ...[-0.0022159104701131582, -0.03330004960298538,...
3117273I contacted TransUnion and spoke a credit rep ...[-0.015955328941345215, -0.006488671060651541,...
698814XXXX XXXX XXXX. makes daily calls to me cell c...[0.005397460889071226, -0.01276913657784462, 0...
267826Can we please reopen Case : XXXX? \n", + "\n", + "Wells Farg...[0.004065403249114752, -0.0005381882656365633,...
54019My rights under 15 USC 1681 have been violated...[0.013823015615344048, -0.02010691538453102, 0...
141050To whom it may concern : My personal informati...[0.008104532025754452, -0.01856449618935585, 0...
2962076I have had a CashApp account since last year, ...[-0.0003019514260813594, -0.03750108182430267,...
2481105that some of the information was erroneous. Th...[-0.014868081547319889, -0.0443895161151886, -...
431562I have disputed the referenced accounts to the...[-0.0020524838473647833, -0.04830990731716156,...
1953029On, XX/XX/22, I attempted to complete a transa...[-0.01599179394543171, -0.0074900356121361256,...
2395979Subject : XXXX XXXX XXXX compensation, refund,...[-0.0035950862802565098, -0.014652969315648079...
455524I paid off my mortgage on XX/XX/2019. The comp...[-0.01100730150938034, -0.03495829552412033, 0...
2155924This kind of account is placed as a charged of...[-0.028635455295443535, -0.028604287654161453,...
1069497This is one of many issues I have had with Wel...[0.008871790021657944, -0.028502725064754486, ...
3181689I have disputed this account with MONTEREY FIN...[-0.004721717908978462, -0.03673810139298439, ...
274268Lender is not updating my loan status in the V...[-0.009221495129168034, -0.0289347805082798, 0...
1671305XXXX is a peer to peer lending conmpany that u...[-0.02911308966577053, -0.01850792020559311, -...
886026( DISPUTE CODE - XXXX ) My personal informatio...[-0.007220877334475517, -0.016615957021713257,...
1044431I filed a complaint against PNC this year and ...[0.002848619595170021, -0.035117778927087784, ...
1938481I applied for a modification and was approved....[-0.03114932030439377, -0.0421406552195549, 0....
1987834Ive been Disputting my XXXX XXXX I opened this...[-0.009406660683453083, -0.020967338234186172,...
\n", + "

25 rows × 2 columns

\n", + "
[10000 rows x 2 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative \\\n", + "2580664 Hello, my name is XXXX XXXX, and I am writing ... \n", + "1806973 This is XXXX XXXX and I am submitting this com... \n", + "2055053 XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX... \n", + "2515231 When I reinvestigated my credit report, I real... \n", + "2633049 Checking my credit report XX/XX/2018 with all ... \n", + "3117273 I contacted TransUnion and spoke a credit rep ... \n", + "698814 XXXX XXXX XXXX. makes daily calls to me cell c... \n", + "267826 Can we please reopen Case : XXXX? \n", + "\n", + "Wells Farg... \n", + "54019 My rights under 15 USC 1681 have been violated... \n", + "141050 To whom it may concern : My personal informati... \n", + "2962076 I have had a CashApp account since last year, ... \n", + "2481105 that some of the information was erroneous. Th... \n", + "431562 I have disputed the referenced accounts to the... \n", + "1953029 On, XX/XX/22, I attempted to complete a transa... \n", + "2395979 Subject : XXXX XXXX XXXX compensation, refund,... \n", + "455524 I paid off my mortgage on XX/XX/2019. The comp... \n", + "2155924 This kind of account is placed as a charged of... \n", + "1069497 This is one of many issues I have had with Wel... \n", + "3181689 I have disputed this account with MONTEREY FIN... \n", + "274268 Lender is not updating my loan status in the V... \n", + "1671305 XXXX is a peer to peer lending conmpany that u... \n", + "886026 ( DISPUTE CODE - XXXX ) My personal informatio... \n", + "1044431 I filed a complaint against PNC this year and ... \n", + "1938481 I applied for a modification and was approved.... \n", + "1987834 Ive been Disputting my XXXX XXXX I opened this... \n", + "\n", + " text_embedding \n", + "2580664 [0.0003211698785889894, -0.01816680282354355, ... \n", + "1806973 [-0.009485247544944286, -0.025846892967820168,... \n", + "2055053 [-0.010950954630970955, -0.0249345600605011, 0... \n", + "2515231 [-0.009660656563937664, -0.05793113633990288, ... \n", + "2633049 [-0.0022159104701131582, -0.03330004960298538,... \n", + "3117273 [-0.015955328941345215, -0.006488671060651541,... \n", + "698814 [0.005397460889071226, -0.01276913657784462, 0... \n", + "267826 [0.004065403249114752, -0.0005381882656365633,... \n", + "54019 [0.013823015615344048, -0.02010691538453102, 0... \n", + "141050 [0.008104532025754452, -0.01856449618935585, 0... \n", + "2962076 [-0.0003019514260813594, -0.03750108182430267,... \n", + "2481105 [-0.014868081547319889, -0.0443895161151886, -... \n", + "431562 [-0.0020524838473647833, -0.04830990731716156,... \n", + "1953029 [-0.01599179394543171, -0.0074900356121361256,... \n", + "2395979 [-0.0035950862802565098, -0.014652969315648079... \n", + "455524 [-0.01100730150938034, -0.03495829552412033, 0... \n", + "2155924 [-0.028635455295443535, -0.028604287654161453,... \n", + "1069497 [0.008871790021657944, -0.028502725064754486, ... \n", + "3181689 [-0.004721717908978462, -0.03673810139298439, ... \n", + "274268 [-0.009221495129168034, -0.0289347805082798, 0... \n", + "1671305 [-0.02911308966577053, -0.01850792020559311, -... \n", + "886026 [-0.007220877334475517, -0.016615957021713257,... \n", + "1044431 [0.002848619595170021, -0.035117778927087784, ... \n", + "1938481 [-0.03114932030439377, -0.0421406552195549, 0.... \n", + "1987834 [-0.009406660683453083, -0.020967338234186172,... \n", + "...\n", + "\n", + "[10000 rows x 2 columns]" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Join the complaints with their embeddings in the same DataFrame\n", - "combined_df = downsampled_issues_df.join(predicted_embeddings)" + "combined_df = downsampled_issues_df.join(predicted_embeddings, how=\"left\")\n", + "combined_df" ] }, { @@ -398,7 +1088,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 31, "metadata": { "id": "AhNTnEC5FRz2" }, @@ -419,14 +1109,152 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 32, "metadata": { "id": "6poSxh-fGJF7" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 65eb317d-59f1-4d10-acd1-4b7f3778114c is DONE. 61.7 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 156e445e-cc01-4b30-84cc-ac1c98a69b81 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 5befc212-f4a3-4e33-b1b2-01e809acdcbd is DONE. 61.9 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job bd271178-8b8d-45dc-ac57-7f0194d0daac is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job bbfb9cca-622d-4bf5-9fc0-6d9a85287d41 is DONE. 80.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job a5f30b32-9fb0-42b4-b426-d8484f008bdb is DONE. 160.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CENTROID_ID
4222
6163
8335
13707
14303
\n", + "

5 rows × 1 columns

\n", + "
[5 rows x 1 columns in total]" + ], + "text/plain": [ + " CENTROID_ID\n", + "422 2\n", + "616 3\n", + "833 5\n", + "1370 7\n", + "1430 3\n", + "\n", + "[5 rows x 1 columns]" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Use KMeans clustering to calculate our groups. Will take ~3 minutes.\n", - "cluster_model.fit(combined_df[[\"text_embedding\"]])\n", + "cluster_model.fit(combined_df[\"text_embedding\"])\n", "clustered_result = cluster_model.predict(combined_df[[\"text_embedding\"]])\n", "# Notice the CENTROID_ID column, which is the ID number of the group that\n", "# each complaint belongs to.\n", @@ -435,12 +1263,123 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 33, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 7a41196e-ea67-44ac-95a7-7dce620d6d21 is DONE. 320.0 kB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 8008b482-1a0d-461f-a215-4676d9d918dc is DONE. 72.4 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
consumer_complaint_narrativetext_embeddingCENTROID_ID
2580664Hello, my name is XXXX XXXX, and I am writing ...[0.0003211698785889894, -0.01816680282354355, ...2
1806973This is XXXX XXXX and I am submitting this com...[-0.009485247544944286, -0.025846892967820168,...5
2055053XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...[-0.010950954630970955, -0.0249345600605011, 0...3
2515231When I reinvestigated my credit report, I real...[-0.009660656563937664, -0.05793113633990288, ...5
2633049Checking my credit report XX/XX/2018 with all ...[-0.0022159104701131582, -0.03330004960298538,...3
\n", + "

5 rows × 3 columns

\n", + "
[5 rows x 3 columns in total]" + ], + "text/plain": [ + " consumer_complaint_narrative \\\n", + "2580664 Hello, my name is XXXX XXXX, and I am writing ... \n", + "1806973 This is XXXX XXXX and I am submitting this com... \n", + "2055053 XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX... \n", + "2515231 When I reinvestigated my credit report, I real... \n", + "2633049 Checking my credit report XX/XX/2018 with all ... \n", + "\n", + " text_embedding CENTROID_ID \n", + "2580664 [0.0003211698785889894, -0.01816680282354355, ... 2 \n", + "1806973 [-0.009485247544944286, -0.025846892967820168,... 5 \n", + "2055053 [-0.010950954630970955, -0.0249345600605011, 0... 3 \n", + "2515231 [-0.009660656563937664, -0.05793113633990288, ... 5 \n", + "2633049 [-0.0022159104701131582, -0.03330004960298538,... 3 \n", + "\n", + "[5 rows x 3 columns]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Join the group number to the complaints and their text embeddings\n", - "combined_clustered_result = combined_df.join(clustered_result)" + "combined_clustered_result = combined_df.join(clustered_result)\n", + "\n", + "combined_clustered_result.head(n=5)" ] }, { @@ -471,11 +1410,36 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 34, "metadata": { "id": "2E7wXM_jGqo6" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 50c7c0dd-94a2-494e-a37f-6a838a518f6c is DONE. 11.0 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job d96c847f-c292-4804-bd05-fd643c41c7a5 is DONE. 11.0 MB processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "# Using bigframes, with syntax identical to pandas,\n", "# filter out the first and second groups\n", @@ -492,11 +1456,100 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 36, "metadata": { "id": "ZNDiueI9IP5e" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "comment list 1:\n", + "1. XXXX is a peer to peer lending conmpany that uses borrowers crypto to collateralize loans from investors ( like myself ). I've been investing with them for almost XXXX years and currently have {$240000.00} tied up in lending products with XXXX. \n", + "As of XXXX days ago we received an email saying all business operations have been ceased and no withdrawals or deposits will be allowed. They said they'll update customers within 10 days, but no one can reach anyone at the company to find out any more details as they are not answering calls nor returning emails. It also appears the company has scrubbed its XXXX page and the XXXX pages of top executives. \n", + "\n", + "All collateral and client 's investment funds are supposedly held at or processed through XXXX XXXX XXXX ( registered SEC company ). XXXX XXXX keeps telling us to contact XXXX and won't give us any information, so we have no way to find out what's happening with our funds/collateral or if everything is gone. We have a XXXX channel up where people are gathering evidence, documentation, etc. This is probably the best place to start to get a broad view of what's happening. Details below. \n", + "\n", + "XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX CONST LLC ( Business ID : XXXX ) FoXXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX 'Cease of Operations ' email received by all investors XXXX XX/XX/2022 at XXXX : \" Dear XXXX Users, Given the collapses of several cryptocurrencies so far this year and the rapidly deteriorating market conditions that have been prompting heavy withdrawals across all XXXX lending and XXXX exchange platforms recently, we are sad to inform you that we are unable to continue to operate our business as usual. As such, we are limiting our business activities, including pausing user withdrawals as allowed under our Terms of XXXX. \n", + "No deposit or investment request will be processed at this time. \n", + "\n", + "Our team is working diligently towards our objective of maximizing value for all of our Users, and our top priority continues to be to protect your interests. As we explore all options available to us, we will provide updates to you as we go. \n", + "\n", + "We hope to communicate with you within the next XXXX business days on the next steps to address the situation. We appreciate your patience in this trying time. \n", + "\n", + "Sincerely yoursXXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX\n", + "2. Submitted XX/XX/XXXX\n", + "Typed XX/XX/XXXX:\n", + "\n", + "XX/XX/XXXX\n", + "XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX, XXXX XXXX\n", + "PH:. XXXX\n", + "PH: XXXX\n", + "EM:\n", + "XXXX\n", + "XXXX\n", + "XXXX XXXX \n", + "XXXX XXXX\n", + "Date of Birth XX/XX/XXXX\n", + "SS#: XXXX\n", + "TO:\n", + "* Consumer Financial Protection Brueau\n", + "* Department of Veteran Affairs, Office of the Inspector General\n", + "My name is XXXX XXXX XXXX, I've received more than one email from Discover Card in my XXXX XXXX, past emails from Discover Card were unautherized deletions.\n", + "From: Discover Card XXXX\n", + "To: You XXXX\n", + "Date: XX/XX/XXXX, XXXX XXXX XXXX From: Discover Card XXXX>\n", + "To Recipient \n", + "Date Mon, XX/XX/XXXX XXXX XXXX\n", + "I dont and havent ever had a Discover Checking, Savings, Business Accounts nor Loans of any kind through any Bank called Discover. The 1st time I was contacted by Discover Card I resided alone from XX/XX/XXXX to XX/XX/XXXXat XXXX XXXX XXXX at XXXX XXXX XXXX XXXX XXXX in XXXX, XXXX years prior to me moving here to XXXX, XXXX in XX/XX/XXXX. When \n", + "\n", + "\n", + "Discover Card had 1st contacted me in XXXX, XXXX it was associated with my XXXX XXXX XXXX website related online Merchants Account. Not once have I ever applied for or had any Website Merchant Accounts here in XXXX; I only applied for online online Merchant Accounts associated with my XXXX related Accounts I purchased while residing in XXXX, XXXX. Some of my website related information was stolen both in XXXX, XXXX and here in XXXX along with my other property that hasn't been returned to me. I don't and haven't ever had any XXXX XXXX related Agreements,Contracts or Credit Cards offered to Veterans associated with ones businesses. Nor have I ever applied for or had a Business License or Business Permit in any City or State inspite of my diverse interest. Not once have I ever allowed another be it an Paralegal, Payee, Attorney, Employers, Landlords, Veteran Organizations including Vocational Rehabilitation Programs, XXXX( XXXX XXXX XXXX, XXXX XXXX, Entertainment Companies, Banks, Celebrity Personal Assistant Agencies or Celebs, Shelters, Charities, HUD, Housing Arthority, Department of Veteran Affairs, Military, Law Enforcement or anyone else nor their employess to sign any business related Agreements or Contracts on my behalf; not even my family members or friends. \n", + "None of my XXXX XXXX attempts were associated with my Employers, Department of Veteran Affairs,Vocational Rehabilitation Programs Military, Landlords, HUD( Housing Authority),Friends, Family nor did I ever sign related Agreements or Contracts with them. Not once had I ever provided anyone the passwords to be able to sign into my accounts rather were aware of my accounts or not. Yes, my desktop computer that was stolen along with my other property XX/XX/XXXX was registered with my Online Merchant Account. I had paid for my Merchant related Accounts through my same XXXX XXXX XXXX Account I purchased both of my XXXX XXXX XXXX related accounts through. That was 1st once during the Summer of XX/XX/XXXX and 2nd my related website months later, while I resided in XXXX XXXX and I worked for XXXX. I never offered nor did I ever sign any business Contracts or Agreements with XXXX nor my Landlord or their staff associted with any of my online websites or Merchant Accounts. My XXXX XXXX XXXX Compensation was deposited into both of my XXXX XXXX XXXX Accounts at that time. My account was changed during the Summer of XX/XX/XXXXbecause of theft of my Bank Card. None of my Checking,Savings, past Credit Cards or Business related were shared accounts in which others were allowed to \n", + "use to make purchases. I had written checks from my XXXX XXXX XXXX account to pay for my XXXX XXXX XXXX XXXX on the XXXX XXXX here in XXXX in XX/XX/XXXX before it's name changed to XXXX XXXX. Prior to me using my same account open a Checking account in person at XXXX XXXX before it's name was changed to XXXX XXXX. Where my XXXX XXXX XXXX XXXX has been deposited since that time. I had used my XXXX XXXX Checking to pay for my XXXX XXXX XXXX XXXX both before theft of my property XX/XX/XXXX and that was also prior to the theft of my property from my XXXX XXXX XXXX XXXX in XX/XX/XXXX.\n", + "I've stated this many times:\n", + "I paid for my 1st XXXX XXXX XXXX Membership while employed at XXXX using my XXXX XXXX XXXX account XXXX my XXXX XXXX XXXX XXXX was also deposited. That was changed to XXXX because I didn't receive my 1st XXXX XXXX XXXX Card the bank sent to XXXX XXXX residence on XXXX XXXX in XX/XX/XXXX while I was there. In which both my XXXX salary and XXXX XXXX XXXX XXXX were deposited into my account, no money from XXXX XXXX nor anyone else that was at that residence was given to nor were any of my children there. Nor did XXXX or any other person at that residence ever give me my missing Bank Card not even after I moved out and stayed a month at XXXX XXXX XXXX using my replacement card to pay for my Hotel room. Which is the same account I used to pay for XXXX XXXX Membership, XXXX XXXX XXXX, XXXX XXXX Membership fees, and various online Merchant Account activation related fees.\n", + "* XXXX XXXX XXXX.\n", + "XXXX XXXX XXXX XXXX. Membership\n", + "\n", + "# XXXX\n", + "* XXXX XXXX Membership\n", + "# XXXX\n", + "* Total Merchant Services XXXX and XXXX.\n", + "* XXXX XXXX XXXX XXXX XXXX\n", + "* XXXX XXXX changed my $XXXX a month fees to my XXXX XXXX XXXX account #XXXX.\n", + "XX/XX/XXXX - XX/XX/XXXX XXXX XXXX, XXXX.\n", + "\n", + "Rep: XXXX XXXX XXXX, Fl \n", + "XXXX\n", + "XXXX Website \n", + "XXXX\n", + "Software and website owner, I performed Internet advertising and marketing, to promote this software and website. I worked and XXXX from my home XXXX XXXX XXXX XXXX XXXX , XXXX. I purchased XXXX XXXX XXXX-Software Electronic Book CD and was given a website to promote the software on the internet. The XXXX was given a copy of my website owner certificate document submitted to me when I purchased the software marketing program as well copies of my other school transcripts in addition to XXXX XXXX XXXX for example. XXXX, represented the first initials of my children's names. I wasn't ever paid and I'm still owed the money. Nor did my marketing program have anything to do with any schools, college nor university programs nor did I ever offer or sign any agreement to include it such. Nor did my XXXX XXXX XXXX have anything to do with any other employers, Department of Family and Children, Military, Veteran Organizations or Food Stamp programs, Section 8 nor Indianapolis Housing Authority for example; only me.\n", + "Thank you,\n", + "XXXX XXXX\n", + "3. ACCORDING TO 15 U.S. CODE 6803-DISCLOSURE OF INSTITUTION PRIVACY POLICY, AND ACCORDING TO U.S. CODE 6802- OBLIGATIONS WITH RESPECT TO DISCLOSURES OF PERSONAL INFORMATION. ( b ) OPT OUT ( 1 ) IN GENERAL A FINANCIAL INSTITUTION MAY NOT DISCLOSE NONPUBLIC PERSONAL INFORMATION TO A NONAFFILIATED THIRD PARTY ( TRANSUNION, XXXX, AND XXXX. ) UNLESS- ( A ) SUCH FINANCIAL INSTITUTION CLEARLY AND CONSPICUOUSLY DISCLOSES TO THE CONSUMER, IN WRITING OR IN ELECTRONIC FORM OR OTHER FORM PERMITTED BY THE REGULATIONS PRESCRIBED UNDER SECTION 6804 OF THIS TITLE. ALSO ACCORDING TO THE \" XXXX ACT '', FINANCIAL INSTITUTIONS MUST TELL THEIR CUSTOMERS ABOUT THEIR INFORMATION-SHARING PRACTICES AND EXPLAIN TO CUSTOMERS THEIR RIGHT TO \" OPT OUT '' IF THEY DON'T WANT THEIR INFORMATION SHARED WITH CERTAIN THIRD PARTIES. UNDER THE FDCPA, A COLLECTOR MUST PROVIDE YOU WITH INFORMATION ABOUT THE DEBT IN ITS INITIAL COMMUNICATION OR WITHIN FIVE DAYS AFTER THE INITIAL COMMUNICATION. ALSO, THE FDCPA STATES, \" YOU CAN NOT ATTEMPT TO COLLECT AN DEBT WHILE A PERSON ( THE CONSUMER ) SUPRESS VALIDATION. TRANSUNION, XXXX, XXXX, AND THE ACCOUNTS LISTED BELOW HAVE CLEARLY VIOLATED MY RIGHTS : XXXX ACCOUNT # XXXX, XXXX XXXX XXXX ACCOUNT # XXXXXXXX XXXX XXXX XXXX XXXX ACCOUNT # XXXXXXXX XXXX XXXX XXXX ACCOUNT # XXXX, XXXX XXXX XXXX XXXX ACCOUNT # XXXX, AND XXXX ACCOUNT # XXXX. FAILURE TO RESPOND SATISFACTORILY WITH DELETIONS OF ALL THE ABOVE ACCOUNTS WILL RESULT IN LEGAL ACTIONS BEING TAKEN AGAINST, TRANSUNION, XXXX, XXXX, WHICH I'LL BE SEEKING A {$1000.00} PER VIOLATION FOR DEFAMATION OF CHARACTER ( PER SE ) NEGLIGENT ENABLEMENT OF IDENTITY FRAUD. 15 USC 1681 VIOLATIONS FOR WILLFUL NONCOMPLIANCE-616 CIVIL LIABILITY FOR WILLFUL NONCOPLIANCE. THIS IS THE THIRD TIME I'VE SUBMITTED A COMPLAINT, AND THE REPONSE I GET IS \" YOU CAN NOT LOCATE MY CREDIT REPORT! '' THIS IS CLEARLY NEGLIGENCE.\n", + "4. I do not know how this works, but I need it done or somehow corrected. My name is XXXX XXXX, XXXX XXXX XXXX XXXX TN XXXXMy SS XXXX DOB XXXX. I had some issues with my income being affected by the COVID-19PANDEMICSHUTDOWN. I was under the 1 CARESAct, Pub. L. 116-136, section 4021, codified at FCRAsection 623 ( a ) ( 1 ) ( F ) ( i ) ( I ), 15 U.S.C.1681s- 2 ( a ) ( 1 ) ( F ) ( i ) ( I ). I am requesting some accommodations so I care to protect the integrity of my credit file. US DEPT OF ED / XXXX # XXXX, # XXXX accounts are reporting on XXXX, XXXX The was 30,60, 90 DAYS LATEsince requested assistance due to the pandemic. I found a few accounts that I have never done any business with these companies and the accounts do not belong on my report : XXXX XXXX # XXXX, XXXX XXXX XXXX XXXX # XXXX. \n", + "\n", + "I have some issues with the misspelling of my name, my correct spelling is XXXX XXXX. Please remove any other variation of my name they are not correct. The following addresses do not belong to me please delete them : XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXXSC, XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX\n", + "5. I want to know if this is even legal?! How can they disclose information without knowing its a correct email?!\n", + "\n", + "comment list 2:\n", + "1. Hello, my name is XXXX XXXX, and I am writing to delete the following information in my file. The items I need deleted are listed in the report. I am a victim of identity theft and did not make the charge. I ask that the items be deleted to correct my credit report. I reported the theft of my identity to the Federal Trade Commission and I also have enclosed copies of the Federal Trade Commissions Identity Theft Affidavit. Please delete the items as soon as possible. The accounts are being reported currently open and the accounts need to be closed. \n", + "XXXX account number XXXX opened on XX/XX/2022 for the amount {$530.00} XXXX XXXX XXXX account number XXXX opened on XX/XX/2022 for the amount of {$140.00} The accounts are being reported currently open and need to be closed immediately. \n", + "Based on, 15 U.S. Code 1681c2 a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt. This account should not be furnished on my consumer report. As a consumer I am demanding the deletion of the accounts listed IMMEDIATELY.\n", + "2. To whom it may concern : My personal information was breach in the internet as result accounts had been open in my name, I was advise to fill out an Id theft report to help me deal with this situation, I have listed each one of the accounts that do not belong to me. This is my second request to remove unverified items in my report, but XXXX keep rposting these account with out providing any type of original document as the FCRA provide, you need to provide me with original documents or remove these account immediately.\n", + "3. Ive been Disputting my XXXX XXXX I opened this account and someone got my information and used my card, I contacted XXXX over and over, they removed the negative reporting from my XXXX report but still reporting it negative on my XXXX and Expean this is very unfair to me because Im a victim of identity theft\n", + "4. Today, XX/XX/2021, I received three items in the mail, one envelope containing an unsolicited debit card from Navy Federal credit Union and the other two, with a letter each describing The Important Rights on two accounts should these accounts become delinquent under New York law. \n", + "\n", + "First of all, I never applied for these accounts with Navy Federal, not have I authorized anyone to do so on my behalf. I immediately contacted Navy Federal via phone and was told I was most likely a victim of identity theft and that I should monitor my credit and use a credit monitoring service. I was also asked for my email and mailing information in order to receive a letter from them regarding this issue. \n", + "\n", + "My main concern is having someone using my identity to illegally open bank accounts and commit fraud, destroying my credit and finances in the process. This bank is in another state from where I reside. I have not lived in Virginia nor do I intend to do so in the foreseeable future.\n", + "5. My personal information ( including my SSN, Drivers License Info, Addresses, and more ) was stolen from a hacking, and Equifax did n't tell the public about the hack until more than a month after the hacking. During this time, three Equifax executives were caught inside trading. It really shows how Equifax cares about other people!\n", + "\n" + ] + } + ], "source": [ "# Build plain-text prompts to send to PaLM 2. Use only 5 complaints from each group.\n", "prompt1 = 'comment list 1:\\n'\n", @@ -515,11 +1568,100 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 37, "metadata": { "id": "BfHGJLirzSvH" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Please highlight the most obvious difference betweenthe two lists of comments:\n", + "comment list 1:\n", + "1. XXXX is a peer to peer lending conmpany that uses borrowers crypto to collateralize loans from investors ( like myself ). I've been investing with them for almost XXXX years and currently have {$240000.00} tied up in lending products with XXXX. \n", + "As of XXXX days ago we received an email saying all business operations have been ceased and no withdrawals or deposits will be allowed. They said they'll update customers within 10 days, but no one can reach anyone at the company to find out any more details as they are not answering calls nor returning emails. It also appears the company has scrubbed its XXXX page and the XXXX pages of top executives. \n", + "\n", + "All collateral and client 's investment funds are supposedly held at or processed through XXXX XXXX XXXX ( registered SEC company ). XXXX XXXX keeps telling us to contact XXXX and won't give us any information, so we have no way to find out what's happening with our funds/collateral or if everything is gone. We have a XXXX channel up where people are gathering evidence, documentation, etc. This is probably the best place to start to get a broad view of what's happening. Details below. \n", + "\n", + "XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX CONST LLC ( Business ID : XXXX ) FoXXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX 'Cease of Operations ' email received by all investors XXXX XX/XX/2022 at XXXX : \" Dear XXXX Users, Given the collapses of several cryptocurrencies so far this year and the rapidly deteriorating market conditions that have been prompting heavy withdrawals across all XXXX lending and XXXX exchange platforms recently, we are sad to inform you that we are unable to continue to operate our business as usual. As such, we are limiting our business activities, including pausing user withdrawals as allowed under our Terms of XXXX. \n", + "No deposit or investment request will be processed at this time. \n", + "\n", + "Our team is working diligently towards our objective of maximizing value for all of our Users, and our top priority continues to be to protect your interests. As we explore all options available to us, we will provide updates to you as we go. \n", + "\n", + "We hope to communicate with you within the next XXXX business days on the next steps to address the situation. We appreciate your patience in this trying time. \n", + "\n", + "Sincerely yoursXXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX\n", + "2. Submitted XX/XX/XXXX\n", + "Typed XX/XX/XXXX:\n", + "\n", + "XX/XX/XXXX\n", + "XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX, XXXX XXXX\n", + "PH:. XXXX\n", + "PH: XXXX\n", + "EM:\n", + "XXXX\n", + "XXXX\n", + "XXXX XXXX \n", + "XXXX XXXX\n", + "Date of Birth XX/XX/XXXX\n", + "SS#: XXXX\n", + "TO:\n", + "* Consumer Financial Protection Brueau\n", + "* Department of Veteran Affairs, Office of the Inspector General\n", + "My name is XXXX XXXX XXXX, I've received more than one email from Discover Card in my XXXX XXXX, past emails from Discover Card were unautherized deletions.\n", + "From: Discover Card XXXX\n", + "To: You XXXX\n", + "Date: XX/XX/XXXX, XXXX XXXX XXXX From: Discover Card XXXX>\n", + "To Recipient \n", + "Date Mon, XX/XX/XXXX XXXX XXXX\n", + "I dont and havent ever had a Discover Checking, Savings, Business Accounts nor Loans of any kind through any Bank called Discover. The 1st time I was contacted by Discover Card I resided alone from XX/XX/XXXX to XX/XX/XXXXat XXXX XXXX XXXX at XXXX XXXX XXXX XXXX XXXX in XXXX, XXXX years prior to me moving here to XXXX, XXXX in XX/XX/XXXX. When \n", + "\n", + "\n", + "Discover Card had 1st contacted me in XXXX, XXXX it was associated with my XXXX XXXX XXXX website related online Merchants Account. Not once have I ever applied for or had any Website Merchant Accounts here in XXXX; I only applied for online online Merchant Accounts associated with my XXXX related Accounts I purchased while residing in XXXX, XXXX. Some of my website related information was stolen both in XXXX, XXXX and here in XXXX along with my other property that hasn't been returned to me. I don't and haven't ever had any XXXX XXXX related Agreements,Contracts or Credit Cards offered to Veterans associated with ones businesses. Nor have I ever applied for or had a Business License or Business Permit in any City or State inspite of my diverse interest. Not once have I ever allowed another be it an Paralegal, Payee, Attorney, Employers, Landlords, Veteran Organizations including Vocational Rehabilitation Programs, XXXX( XXXX XXXX XXXX, XXXX XXXX, Entertainment Companies, Banks, Celebrity Personal Assistant Agencies or Celebs, Shelters, Charities, HUD, Housing Arthority, Department of Veteran Affairs, Military, Law Enforcement or anyone else nor their employess to sign any business related Agreements or Contracts on my behalf; not even my family members or friends. \n", + "None of my XXXX XXXX attempts were associated with my Employers, Department of Veteran Affairs,Vocational Rehabilitation Programs Military, Landlords, HUD( Housing Authority),Friends, Family nor did I ever sign related Agreements or Contracts with them. Not once had I ever provided anyone the passwords to be able to sign into my accounts rather were aware of my accounts or not. Yes, my desktop computer that was stolen along with my other property XX/XX/XXXX was registered with my Online Merchant Account. I had paid for my Merchant related Accounts through my same XXXX XXXX XXXX Account I purchased both of my XXXX XXXX XXXX related accounts through. That was 1st once during the Summer of XX/XX/XXXX and 2nd my related website months later, while I resided in XXXX XXXX and I worked for XXXX. I never offered nor did I ever sign any business Contracts or Agreements with XXXX nor my Landlord or their staff associted with any of my online websites or Merchant Accounts. My XXXX XXXX XXXX Compensation was deposited into both of my XXXX XXXX XXXX Accounts at that time. My account was changed during the Summer of XX/XX/XXXXbecause of theft of my Bank Card. None of my Checking,Savings, past Credit Cards or Business related were shared accounts in which others were allowed to \n", + "use to make purchases. I had written checks from my XXXX XXXX XXXX account to pay for my XXXX XXXX XXXX XXXX on the XXXX XXXX here in XXXX in XX/XX/XXXX before it's name changed to XXXX XXXX. Prior to me using my same account open a Checking account in person at XXXX XXXX before it's name was changed to XXXX XXXX. Where my XXXX XXXX XXXX XXXX has been deposited since that time. I had used my XXXX XXXX Checking to pay for my XXXX XXXX XXXX XXXX both before theft of my property XX/XX/XXXX and that was also prior to the theft of my property from my XXXX XXXX XXXX XXXX in XX/XX/XXXX.\n", + "I've stated this many times:\n", + "I paid for my 1st XXXX XXXX XXXX Membership while employed at XXXX using my XXXX XXXX XXXX account XXXX my XXXX XXXX XXXX XXXX was also deposited. That was changed to XXXX because I didn't receive my 1st XXXX XXXX XXXX Card the bank sent to XXXX XXXX residence on XXXX XXXX in XX/XX/XXXX while I was there. In which both my XXXX salary and XXXX XXXX XXXX XXXX were deposited into my account, no money from XXXX XXXX nor anyone else that was at that residence was given to nor were any of my children there. Nor did XXXX or any other person at that residence ever give me my missing Bank Card not even after I moved out and stayed a month at XXXX XXXX XXXX using my replacement card to pay for my Hotel room. Which is the same account I used to pay for XXXX XXXX Membership, XXXX XXXX XXXX, XXXX XXXX Membership fees, and various online Merchant Account activation related fees.\n", + "* XXXX XXXX XXXX.\n", + "XXXX XXXX XXXX XXXX. Membership\n", + "\n", + "# XXXX\n", + "* XXXX XXXX Membership\n", + "# XXXX\n", + "* Total Merchant Services XXXX and XXXX.\n", + "* XXXX XXXX XXXX XXXX XXXX\n", + "* XXXX XXXX changed my $XXXX a month fees to my XXXX XXXX XXXX account #XXXX.\n", + "XX/XX/XXXX - XX/XX/XXXX XXXX XXXX, XXXX.\n", + "\n", + "Rep: XXXX XXXX XXXX, Fl \n", + "XXXX\n", + "XXXX Website \n", + "XXXX\n", + "Software and website owner, I performed Internet advertising and marketing, to promote this software and website. I worked and XXXX from my home XXXX XXXX XXXX XXXX XXXX , XXXX. I purchased XXXX XXXX XXXX-Software Electronic Book CD and was given a website to promote the software on the internet. The XXXX was given a copy of my website owner certificate document submitted to me when I purchased the software marketing program as well copies of my other school transcripts in addition to XXXX XXXX XXXX for example. XXXX, represented the first initials of my children's names. I wasn't ever paid and I'm still owed the money. Nor did my marketing program have anything to do with any schools, college nor university programs nor did I ever offer or sign any agreement to include it such. Nor did my XXXX XXXX XXXX have anything to do with any other employers, Department of Family and Children, Military, Veteran Organizations or Food Stamp programs, Section 8 nor Indianapolis Housing Authority for example; only me.\n", + "Thank you,\n", + "XXXX XXXX\n", + "3. ACCORDING TO 15 U.S. CODE 6803-DISCLOSURE OF INSTITUTION PRIVACY POLICY, AND ACCORDING TO U.S. CODE 6802- OBLIGATIONS WITH RESPECT TO DISCLOSURES OF PERSONAL INFORMATION. ( b ) OPT OUT ( 1 ) IN GENERAL A FINANCIAL INSTITUTION MAY NOT DISCLOSE NONPUBLIC PERSONAL INFORMATION TO A NONAFFILIATED THIRD PARTY ( TRANSUNION, XXXX, AND XXXX. ) UNLESS- ( A ) SUCH FINANCIAL INSTITUTION CLEARLY AND CONSPICUOUSLY DISCLOSES TO THE CONSUMER, IN WRITING OR IN ELECTRONIC FORM OR OTHER FORM PERMITTED BY THE REGULATIONS PRESCRIBED UNDER SECTION 6804 OF THIS TITLE. ALSO ACCORDING TO THE \" XXXX ACT '', FINANCIAL INSTITUTIONS MUST TELL THEIR CUSTOMERS ABOUT THEIR INFORMATION-SHARING PRACTICES AND EXPLAIN TO CUSTOMERS THEIR RIGHT TO \" OPT OUT '' IF THEY DON'T WANT THEIR INFORMATION SHARED WITH CERTAIN THIRD PARTIES. UNDER THE FDCPA, A COLLECTOR MUST PROVIDE YOU WITH INFORMATION ABOUT THE DEBT IN ITS INITIAL COMMUNICATION OR WITHIN FIVE DAYS AFTER THE INITIAL COMMUNICATION. ALSO, THE FDCPA STATES, \" YOU CAN NOT ATTEMPT TO COLLECT AN DEBT WHILE A PERSON ( THE CONSUMER ) SUPRESS VALIDATION. TRANSUNION, XXXX, XXXX, AND THE ACCOUNTS LISTED BELOW HAVE CLEARLY VIOLATED MY RIGHTS : XXXX ACCOUNT # XXXX, XXXX XXXX XXXX ACCOUNT # XXXXXXXX XXXX XXXX XXXX XXXX ACCOUNT # XXXXXXXX XXXX XXXX XXXX ACCOUNT # XXXX, XXXX XXXX XXXX XXXX ACCOUNT # XXXX, AND XXXX ACCOUNT # XXXX. FAILURE TO RESPOND SATISFACTORILY WITH DELETIONS OF ALL THE ABOVE ACCOUNTS WILL RESULT IN LEGAL ACTIONS BEING TAKEN AGAINST, TRANSUNION, XXXX, XXXX, WHICH I'LL BE SEEKING A {$1000.00} PER VIOLATION FOR DEFAMATION OF CHARACTER ( PER SE ) NEGLIGENT ENABLEMENT OF IDENTITY FRAUD. 15 USC 1681 VIOLATIONS FOR WILLFUL NONCOMPLIANCE-616 CIVIL LIABILITY FOR WILLFUL NONCOPLIANCE. THIS IS THE THIRD TIME I'VE SUBMITTED A COMPLAINT, AND THE REPONSE I GET IS \" YOU CAN NOT LOCATE MY CREDIT REPORT! '' THIS IS CLEARLY NEGLIGENCE.\n", + "4. I do not know how this works, but I need it done or somehow corrected. My name is XXXX XXXX, XXXX XXXX XXXX XXXX TN XXXXMy SS XXXX DOB XXXX. I had some issues with my income being affected by the COVID-19PANDEMICSHUTDOWN. I was under the 1 CARESAct, Pub. L. 116-136, section 4021, codified at FCRAsection 623 ( a ) ( 1 ) ( F ) ( i ) ( I ), 15 U.S.C.1681s- 2 ( a ) ( 1 ) ( F ) ( i ) ( I ). I am requesting some accommodations so I care to protect the integrity of my credit file. US DEPT OF ED / XXXX # XXXX, # XXXX accounts are reporting on XXXX, XXXX The was 30,60, 90 DAYS LATEsince requested assistance due to the pandemic. I found a few accounts that I have never done any business with these companies and the accounts do not belong on my report : XXXX XXXX # XXXX, XXXX XXXX XXXX XXXX # XXXX. \n", + "\n", + "I have some issues with the misspelling of my name, my correct spelling is XXXX XXXX. Please remove any other variation of my name they are not correct. The following addresses do not belong to me please delete them : XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXXSC, XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX\n", + "5. I want to know if this is even legal?! How can they disclose information without knowing its a correct email?!\n", + "comment list 2:\n", + "1. Hello, my name is XXXX XXXX, and I am writing to delete the following information in my file. The items I need deleted are listed in the report. I am a victim of identity theft and did not make the charge. I ask that the items be deleted to correct my credit report. I reported the theft of my identity to the Federal Trade Commission and I also have enclosed copies of the Federal Trade Commissions Identity Theft Affidavit. Please delete the items as soon as possible. The accounts are being reported currently open and the accounts need to be closed. \n", + "XXXX account number XXXX opened on XX/XX/2022 for the amount {$530.00} XXXX XXXX XXXX account number XXXX opened on XX/XX/2022 for the amount of {$140.00} The accounts are being reported currently open and need to be closed immediately. \n", + "Based on, 15 U.S. Code 1681c2 a consumer reporting agency shall block the reporting of any information in the file of a consumer that the consumer identifies as information that resulted from an alleged identity theft, not later than 4 business days after the date of receipt. This account should not be furnished on my consumer report. As a consumer I am demanding the deletion of the accounts listed IMMEDIATELY.\n", + "2. To whom it may concern : My personal information was breach in the internet as result accounts had been open in my name, I was advise to fill out an Id theft report to help me deal with this situation, I have listed each one of the accounts that do not belong to me. This is my second request to remove unverified items in my report, but XXXX keep rposting these account with out providing any type of original document as the FCRA provide, you need to provide me with original documents or remove these account immediately.\n", + "3. Ive been Disputting my XXXX XXXX I opened this account and someone got my information and used my card, I contacted XXXX over and over, they removed the negative reporting from my XXXX report but still reporting it negative on my XXXX and Expean this is very unfair to me because Im a victim of identity theft\n", + "4. Today, XX/XX/2021, I received three items in the mail, one envelope containing an unsolicited debit card from Navy Federal credit Union and the other two, with a letter each describing The Important Rights on two accounts should these accounts become delinquent under New York law. \n", + "\n", + "First of all, I never applied for these accounts with Navy Federal, not have I authorized anyone to do so on my behalf. I immediately contacted Navy Federal via phone and was told I was most likely a victim of identity theft and that I should monitor my credit and use a credit monitoring service. I was also asked for my email and mailing information in order to receive a letter from them regarding this issue. \n", + "\n", + "My main concern is having someone using my identity to illegally open bank accounts and commit fraud, destroying my credit and finances in the process. This bank is in another state from where I reside. I have not lived in Virginia nor do I intend to do so in the foreseeable future.\n", + "5. My personal information ( including my SSN, Drivers License Info, Addresses, and more ) was stolen from a hacking, and Equifax did n't tell the public about the hack until more than a month after the hacking. During this time, three Equifax executives were caught inside trading. It really shows how Equifax cares about other people!\n", + "\n" + ] + } + ], "source": [ "# The plain English request we will make of PaLM 2\n", "prompt = (\n", @@ -539,25 +1681,33 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 38, "metadata": { "id": "mL5P0_3X04dE" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 66e3af22-91cb-400a-92c3-69e7cd12ee01 is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "from bigframes.ml.llm import PaLM2TextGenerator\n", "\n", - "# Create a BigQuery Cloud resource connection\n", - "CONN_NAME = \"bqdf-llm\"\n", - "session = bf.get_global_session()\n", - "\n", - "connection = f\"{PROJECT_ID}.{REGION}.{CONN_NAME}\"\n", - "q_a_model = PaLM2TextGenerator(session=session, connection_name=connection)" + "q_a_model = PaLM2TextGenerator()" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 39, "metadata": { "id": "ICWHsqAW1FNk" }, @@ -569,11 +1719,58 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 40, "metadata": { "id": "gB7e1LXU1pst" }, - "outputs": [], + "outputs": [ + { + "data": { + "text/html": [ + "Query job 653add17-29be-408c-8882-064217f8556e is DONE. 0 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job 8fd16954-853a-45fd-80bc-65b1242429e2 is DONE. 8 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Query job d9929bcb-26ce-4844-b68e-f4a980b90ede is DONE. 171 Bytes processed. Open Job" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "' The first comment list is about people complaining about companies or services, while the second comment list is about people reporting identity theft or fraud.'" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Send the request for PaLM 2 to generate a response to our prompt\n", "major_difference = q_a_model.predict(df)\n", @@ -595,7 +1792,7 @@ "source": [ "# Summary and next steps\n", "\n", - "You've used BigQuery DataFrames' integration with LLM models (`bigframes.ml.llm`) to generate code samples, and have tranformed LLM output by creating and using a custom function in BigQuery DataFrames.\n", + "You've used the ML and LLM capabilities of BigQuery DataFrames to help analyze and understand a large dataset of unstructured feedback.\n", "\n", "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)." ] @@ -619,7 +1816,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.13" } }, "nbformat": 4, From b6d5a908777b544b986af26c47ee4c020eb6c125 Mon Sep 17 00:00:00 2001 From: Ashley Xu Date: Thu, 16 Nov 2023 05:08:11 +0000 Subject: [PATCH 7/7] fix: fix the comment --- .../bq_dataframes_llm_kmeans.ipynb | 216 ------------------ 1 file changed, 216 deletions(-) diff --git a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb index 740003d4ee..ac9cafa585 100644 --- a/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb +++ b/notebooks/generative_ai/bq_dataframes_llm_kmeans.ipynb @@ -427,222 +427,6 @@ "downsampled_issues_df = issues_df.sample(n=10000)" ] }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "Query job aed21372-ad20-4483-8843-faabc286be3f is DONE. 2.3 GB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job c5138c60-f714-4091-83b0-6900e0897d02 is DONE. 0 Bytes processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "Query job 521e7788-6722-4bb8-91d6-5deb3a56435a is DONE. 10.7 MB processed. Open Job" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
consumer_complaint_narrative
2580664Hello, my name is XXXX XXXX, and I am writing ...
1806973This is XXXX XXXX and I am submitting this com...
2055053XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...
2515231When I reinvestigated my credit report, I real...
2633049Checking my credit report XX/XX/2018 with all ...
3117273I contacted TransUnion and spoke a credit rep ...
698814XXXX XXXX XXXX. makes daily calls to me cell c...
267826Can we please reopen Case : XXXX? \n", - "\n", - "Wells Farg...
54019My rights under 15 USC 1681 have been violated...
141050To whom it may concern : My personal informati...
2962076I have had a CashApp account since last year, ...
2481105that some of the information was erroneous. Th...
431562I have disputed the referenced accounts to the...
1953029On, XX/XX/22, I attempted to complete a transa...
2395979Subject : XXXX XXXX XXXX compensation, refund,...
455524I paid off my mortgage on XX/XX/2019. The comp...
2155924This kind of account is placed as a charged of...
1069497This is one of many issues I have had with Wel...
3181689I have disputed this account with MONTEREY FIN...
274268Lender is not updating my loan status in the V...
1671305XXXX is a peer to peer lending conmpany that u...
886026( DISPUTE CODE - XXXX ) My personal informatio...
1044431I filed a complaint against PNC this year and ...
1938481I applied for a modification and was approved....
1987834Ive been Disputting my XXXX XXXX I opened this...
\n", - "

25 rows × 1 columns

\n", - "
[10000 rows x 1 columns in total]" - ], - "text/plain": [ - " consumer_complaint_narrative\n", - "2580664 Hello, my name is XXXX XXXX, and I am writing ...\n", - "1806973 This is XXXX XXXX and I am submitting this com...\n", - "2055053 XXXX XXXX XXXX, XXXX. ( address : XXXX XXXX XX...\n", - "2515231 When I reinvestigated my credit report, I real...\n", - "2633049 Checking my credit report XX/XX/2018 with all ...\n", - "3117273 I contacted TransUnion and spoke a credit rep ...\n", - "698814 XXXX XXXX XXXX. makes daily calls to me cell c...\n", - "267826 Can we please reopen Case : XXXX? \n", - "\n", - "Wells Farg...\n", - "54019 My rights under 15 USC 1681 have been violated...\n", - "141050 To whom it may concern : My personal informati...\n", - "2962076 I have had a CashApp account since last year, ...\n", - "2481105 that some of the information was erroneous. Th...\n", - "431562 I have disputed the referenced accounts to the...\n", - "1953029 On, XX/XX/22, I attempted to complete a transa...\n", - "2395979 Subject : XXXX XXXX XXXX compensation, refund,...\n", - "455524 I paid off my mortgage on XX/XX/2019. The comp...\n", - "2155924 This kind of account is placed as a charged of...\n", - "1069497 This is one of many issues I have had with Wel...\n", - "3181689 I have disputed this account with MONTEREY FIN...\n", - "274268 Lender is not updating my loan status in the V...\n", - "1671305 XXXX is a peer to peer lending conmpany that u...\n", - "886026 ( DISPUTE CODE - XXXX ) My personal informatio...\n", - "1044431 I filed a complaint against PNC this year and ...\n", - "1938481 I applied for a modification and was approved....\n", - "1987834 Ive been Disputting my XXXX XXXX I opened this...\n", - "...\n", - "\n", - "[10000 rows x 1 columns]" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "downsampled_issues_df._cached()" - ] - }, { "attachments": {}, "cell_type": "markdown",