Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(dlp): Add code samples to de-identify files in Cloud Storage #10511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Avani-Thakker-Crest
Copy link
Contributor

@Avani-Thakker-Crest Avani-Thakker-Crest commented Aug 11, 2023

Description

Implemented dlp_deidentify_cloud_storage code sample and test is implemented using mocking mechanism. Json equivalent: https://cloud.google.com/dlp/docs/deidentify-storage#code-example

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

@Avani-Thakker-Crest Avani-Thakker-Crest requested review from a team as code owners August 11, 2023 12:25
@snippet-bot
Copy link

snippet-bot bot commented Aug 11, 2023

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: dlp Issues related to the Sensitive Data Protection API. labels Aug 11, 2023
@Avani-Thakker Avani-Thakker added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 11, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 11, 2023
@m-strzelczyk
Copy link
Contributor

Hey @Avani-Thakker-Crest this file is growing to quite a big number of lines. Would it be possible to slip it into multiple files? So that for example each region tag gets its own file?

It would also allow you to remove some # noqa: comments, as you wouldn't repeat imports etc.

@Avani-Thakker-Crest
Copy link
Contributor Author

Avani-Thakker-Crest commented Aug 29, 2023

Hey @Avani-Thakker-Crest this file is growing to quite a big number of lines. Would it be possible to slip it into multiple files? So that for example each region tag gets its own file?

It would also allow you to remove some # noqa: comments, as you wouldn't repeat imports etc.

Thank you @m-strzelczyk for your comment. Definitely, we can split into multiple files. Please let me know how should I split:

  1. Split deid related samples into 3-4 logical files. I had already split deid samples into two files in the past a) deid_table.py containing all samples deidentifying table data and b) deid.py containing the remaining samples. We can further go ahead and split into 2 more files.
  2. Split samples such a way that one file corresponds to one sample. This would result in entire restructuring of all the dlp samples, be it inspect.py, risk.py and so on to maintain uniformity.
  3. For now, we can create a separate file for deidentify_cloud_storage sample and a corresponding test file.

Please let me know which option would be appropriate to perform. Thank you!

@Avani-Thakker Avani-Thakker added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 8, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 8, 2023
@Avani-Thakker-Crest
Copy link
Contributor Author

Hey @Avani-Thakker-Crest this file is growing to quite a big number of lines. Would it be possible to slip it into multiple files? So that for example each region tag gets its own file?
It would also allow you to remove some # noqa: comments, as you wouldn't repeat imports etc.

Thank you @m-strzelczyk for your comment. Definitely, we can split into multiple files. Please let me know how should I split:

  1. Split deid related samples into 3-4 logical files. I had already split deid samples into two files in the past a) deid_table.py containing all samples deidentifying table data and b) deid.py containing the remaining samples. We can further go ahead and split into 2 more files.
  2. Split samples such a way that one file corresponds to one sample. This would result in entire restructuring of all the dlp samples, be it inspect.py, risk.py and so on to maintain uniformity.
  3. For now, we can create a separate file for deidentify_cloud_storage sample and a corresponding test file.

Please let me know which option would be appropriate to perform. Thank you!

Hi @m-strzelczyk , for now I have created new files for this particular sample. And for the rest samples, I have created a backlog issue #10601 .

@Avani-Thakker Avani-Thakker added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2023
@Avani-Thakker Avani-Thakker added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 11, 2023
Copy link
Contributor

@m-strzelczyk m-strzelczyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting close :)

One issue with the logic inside the sample, probably easy to fix.

And one issue with increasing the depth of testing.

dlp/snippets/deid_cloud_storage.py Show resolved Hide resolved
dlp/snippets/deid_cloud_storage.py Show resolved Hide resolved
dlp/snippets/deid_cloud_storage_test.py Outdated Show resolved Hide resolved
@msampathkumar msampathkumar self-requested a review September 14, 2023 12:57
@msampathkumar msampathkumar changed the title [DLP] Implemented dlp_deidentify_cloud_storage feat(DLP): Add code samples to de-identify files in Cloud Storage Sep 15, 2023
@msampathkumar msampathkumar changed the title feat(DLP): Add code samples to de-identify files in Cloud Storage feat(dlp): Add code samples to de-identify files in Cloud Storage Sep 15, 2023
@Avani-Thakker Avani-Thakker added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 18, 2023
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 18, 2023
@Avani-Thakker Avani-Thakker merged commit 699db3d into GoogleCloudPlatform:main Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dlp Issues related to the Sensitive Data Protection API. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.