Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Feature store lakeformation#5599

Draft
BassemHalim wants to merge 18 commits intoaws:masteraws/sagemaker-python-sdk:masterfrom
BassemHalim:feature-store-lakeformationBassemHalim/sagemaker-python-sdk:feature-store-lakeformationCopy head branch name to clipboard
Draft

Feature store lakeformation#5599
BassemHalim wants to merge 18 commits intoaws:masteraws/sagemaker-python-sdk:masterfrom
BassemHalim:feature-store-lakeformationBassemHalim/sagemaker-python-sdk:feature-store-lakeformationCopy head branch name to clipboard

Conversation

@BassemHalim
Copy link
Contributor

@BassemHalim BassemHalim commented Mar 4, 2026

Description

This PR adds Lake Formation integration to SageMaker Feature Store, enabling customers to govern access to their offline store data through AWS Lake Formation instead of relying solely on IAM policies.

This simplifies the manual process described in this blog https://aws.amazon.com/blogs/machine-learning/control-access-to-amazon-sagemaker-feature-store-offline-using-aws-lake-formation/

New Features

class LakeFormationConfig(Base):
    """Configuration for Lake Formation governance on Feature Group offline stores.

    enabled: bool = False
    use_service_linked_role: bool = True
    registration_role_arn: Optional[str] = None
    show_s3_policy: bool = False
    disable_hybrid_access_mode: bool = True

FeatureGroup.create() - added a new lake_formation_config parameter

  • Enables Lake Formation governance at Feature Group creation time
  • Automatically waits for Feature Group to reach "Created" status before configuring Lake Formation

FeatureGroup.enable_lake_formation() method

  • Enables Lake Formation on existing Feature Groups
  • Three-phase setup:
    1. Registers S3 location with Lake Formation
    2. Grants permissions to the offline store role on the Glue table
    3. Revokes IAMAllowedPrincipal permissions from the Glue table
  • Fail-fast behavior with clear error reporting at each phase
  • Optional show_s3_policy parameter prints recommended S3 deny policy

Usage

Enable at creation:

from sagemaker.mlops.feature_store import FeatureGroup, LakeFormationConfig

lf_config = LakeFormationConfig()
lf_config.enabled = True

fg = FeatureGroup.create(
    feature_group_name="my-feature-group",
    # ... other params ...
    lake_formation_config=lf_config,
)

Enable on existing Feature Group:

fg = FeatureGroup.get("my-feature-group")
fg.enable_lake_formation(show_s3_policy=True)

Testing

  • Unit tests: comprehensive coverage of all new methods and validation logic
  • Integration tests: end-to-end tests for both creation workflows and negative scenarios

Notes

  • S3 deny policy is provided as a recommendation (not applied automatically) to avoid breaking existing workflows

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

adishaa and others added 10 commits January 16, 2026 07:00
- Add LakeFormationConfig class to configure Lake Formation governance on offline stores
- Implement FeatureGroup subclass with Lake Formation integration capabilities
- Add helper methods for S3 URI/ARN conversion and Lake Formation role management
- Add S3 deny policy generation for Lake Formation access control
- Implement Lake Formation resource registration and S3 bucket policy setup
- Add integration tests for Lake Formation feature store workflows
- Add unit tests for Lake Formation configuration and policy generation
- Update feature_store module exports to include FeatureGroup and LakeFormationConfig
- Update API documentation to include Feature Store section in sagemaker_mlops.rst
- Enable fine-grained access control for feature store offline stores using AWS Lake Formation
Copy link
Contributor

@nargokul nargokul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix integ tests

disable_hybrid_access_mode: bool = True


class FeatureGroup(CoreFeatureGroup):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be confusing to the users to choose between sagemaker.core.FeatureGroup and sagemaker.mlops.FeatureGroup

Can this be utils or called something else ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v3 sagemaker.mlops version already re-exports the FeatureGroup from sagemaker-core ref

This class is meant as an extension to add an extra method enable_lake_formation and update the create method to enable lakeformation during creation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-exporting is fine.
The class is a sagemaker-core resource class which does basic crud operations exactly mimicing boto.
If we introfuce this change , then there is a difference in the behavior between core and mlops .

Why does this class need to be called FeatureGroup ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that it should be a drop-in replacement for the one in sagemaker-core just with extra functionality so that the user can enable lakeformation during feature group creation instead of creating then enabling. I was thinking this might be better UX. What do you suggest renaming the class or moving that to a utility function? I'm open to both but not sure what it can be renamed to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can go with something like GovernedFeatureGroup or ManagedFeatureGroup.

We can discuss internally .

Comment on lines +130 to +134
return (
f"arn:{partition}:iam::{account_id}:role/aws-service-role/"
f"lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the string that gets created by default ?

Can this functionality be gotten from an AWS library instead of hardcoding ?

If this changes , these changes would break

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it gets created by default by Lakeformation ref
I didn't find any functionality in the botocore sdk to get that role instead of hardcoding

I'm thinking it's provided in the lakeformation public docs so it's unlikely to change.

Replace 10 bare print() calls with a single logger.info() call for the
S3 deny policy output in enable_lake_formation(). This makes the policy
display consistent with the rest of the LF workflow which uses logger.

Update 12 tests to mock the logger instead of builtins.print.

---
X-AI-Prompt: replace print with logger.info for s3 bucket policy display in enable_lake_formation
X-AI-Tool: kiro-cli
Rename the mlops FeatureGroup class to FeatureGroupManager to
distinguish it from the core FeatureGroup base class. Update all
references in unit and integration lake formation tests. Fix missing
comma in __init__.py __all__ list.
---
X-AI-Prompt: rename FeatureGroup to FeatureGroupManager and update lakeformation tests
X-AI-Tool: kiro-cli
… to composition

Replace FeatureGroup inheritance with composition pattern. The manager
now delegates to FeatureGroup via classmethods (create_feature_group,
describe_feature_group) and takes a FeatureGroup instance in
enable_lake_formation instead of operating on self.

Key changes:
- FeatureGroupManager no longer extends FeatureGroup
- Forward session/region through enable_lake_formation and create
- Add telemetry decorators to all public methods
- Add hypothesis to test dependencies
- Add dedicated test_feature_group_manager.py unit tests
- Consolidate test_lakeformation.py (remove migrated tests)
- Update integration tests for new API surface
- Reorganize example notebooks into v3-feature-store-examples/
- Bump VERSION to 1.5.1.dev0
---
X-AI-Prompt: read last commit and update commit message to reflect full scope of changes
X-AI-Tool: kiro-cli
…est coverage

- Use isinstance() for Unassigned checks instead of == Unassigned()
- Add class-level type annotation for _lf_client_cache
- Replace fragile docstring inheritance with proper docstring
- Fix create() to return FeatureGroupManager instead of FeatureGroup
  by calling cls.get() after super().create()
- Update create() return type annotation to Optional[FeatureGroupManager]
- Add feature_group_arn validation before S3 policy generation
- Fix integ test logger name (feature_group -> feature_group_manager)
- Rename test_lakeformation.py to test_feature_group_manager.py
- Add unit tests for: return type verification, Iceberg table format
  S3 path handling, missing ARN validation, happy-path return values,
  session/region pass-through, and region inference from session
---
X-AI-Prompt: Review FeatureGroupManager class, fix identified issues, increase test coverage
X-AI-Tool: kiro-cli
- Add Phase 4 to enable_lake_formation() that automatically applies
  S3 deny bucket policy for Lake Formation governance
- Remove show_s3_policy and disable_hybrid_access_mode parameters
  in favor of always-on behavior
- Refactor _generate_s3_deny_policy to _generate_s3_deny_statements
  returning a list for easier policy merging
- Add _get_s3_client with caching pattern matching _get_lake_formation_client
- Add _apply_bucket_policy with idempotent Sid-based deduplication
- Improve _revoke_iam_allowed_principal to check permissions via
  list_permissions before attempting revocation
- Remove LakeFormationConfig.show_s3_policy and disable_hybrid_access_mode
- Add e2e integration test for put_record + Athena query flow
- Update unit tests for new behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.