ROX-33041: audit log events for internal token generation#19105
ROX-33041: audit log events for internal token generation#19105stehessel wants to merge 2 commits intomasterstackrox/stackrox:masterfrom ROX-33041/audit-log-events-for-internal-tokensstackrox/stackrox:ROX-33041/audit-log-events-for-internal-tokensCopy head branch name to clipboard
Conversation
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
auditableServiceEndpointsset hard-codes a specific full method name; consider adding a short comment or helper near the token service definition so future changes to the RPC name/signature are less likely to silently break auditing for this path. - The service-derived
UserInfofor audited service calls uses a synthetic username/friendly name format; if these audit records are consumed by existing tooling or search filters, it may be worth aligning the format with any established convention for service identities elsewhere in the system. - The increase of the token cache TTL and token TTL in
transport.gois quite substantial (3 → 55/60 minutes); it would be helpful to briefly justify this change in a code comment near the constants to make the trade-off between token reuse and security clearer for future readers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `auditableServiceEndpoints` set hard-codes a specific full method name; consider adding a short comment or helper near the token service definition so future changes to the RPC name/signature are less likely to silently break auditing for this path.
- The service-derived `UserInfo` for audited service calls uses a synthetic username/friendly name format; if these audit records are consumed by existing tooling or search filters, it may be worth aligning the format with any established convention for service identities elsewhere in the system.
- The increase of the token cache TTL and token TTL in `transport.go` is quite substantial (3 → 55/60 minutes); it would be helpful to briefly justify this change in a code comment near the constants to make the trade-off between token reuse and security clearer for future readers.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at 834beb6. To use with deploy scripts, first |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19105 +/- ##
========================================
Coverage 49.51% 49.52%
========================================
Files 2670 2672 +2
Lines 201549 201680 +131
========================================
+ Hits 99803 99888 +85
- Misses 94293 94334 +41
- Partials 7453 7458 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
31e4f6e to
cb273ac
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The significantly increased
tokenCacheTTL/tokenTTLvalues intransport.gochange the revocation/latency trade-off quite a bit; consider making these durations configurable (or at least cluster-tunable) rather than hard-coded constants so operators can adjust them based on their security posture and traffic patterns. - The
auditableServiceEndpointsset is a central allowlist for auditable service-to-service RPCs; it might be worth adding a brief comment or helper around its usage to make it more discoverable when new security-sensitive internal RPCs are added, so those do not get missed unintentionally.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The significantly increased `tokenCacheTTL`/`tokenTTL` values in `transport.go` change the revocation/latency trade-off quite a bit; consider making these durations configurable (or at least cluster-tunable) rather than hard-coded constants so operators can adjust them based on their security posture and traffic patterns.
- The `auditableServiceEndpoints` set is a central allowlist for auditable service-to-service RPCs; it might be worth adding a brief comment or helper around its usage to make it more discoverable when new security-sensitive internal RPCs are added, so those do not get missed unintentionally.
## Individual Comments
### Comment 1
<location> `sensor/common/centralproxy/transport.go:34-38` </location>
<code_context>
+ // These tokens carry narrowly scoped permissions and are not directly exposed
+ // to users. The longer cache window trades minimal revocation latency for
+ // significantly lower load on Central in clusters with many proxied requests.
+ tokenCacheTTL = 55 * time.Minute
// tokenTTL is the requested token validity duration.
- // Slightly longer than cache TTL to ensure tokens remain valid during cache lifetime.
- tokenTTL = tokenCacheTTL + 1*time.Minute
+ // Slightly longer than cache TTL to ensure tokens remain valid during cache lifetime
+ // and to avoid race conditions where a token expires between being served from cache
+ // and being used in a request.
+ tokenTTL = 60 * time.Minute
// tokenRequestTimeout is the maximum time allowed for a token request RPC.
</code_context>
<issue_to_address>
**suggestion:** Tie tokenTTL directly to tokenCacheTTL to avoid future drift.
Since `tokenTTL` is intended to be slightly longer than `tokenCacheTTL`, it’s safer to define it relative to `tokenCacheTTL` (e.g., `tokenTTL = tokenCacheTTL + 5*time.Minute`). That way, if `tokenCacheTTL` is changed later, the relationship and skew remain correct without requiring a separate manual update.
```suggestion
// tokenTTL is the requested token validity duration.
// Slightly longer than cache TTL to ensure tokens remain valid during cache lifetime
// and to avoid race conditions where a token expires between being served from cache
// and being used in a request. Keep this defined relative to tokenCacheTTL to avoid
// drift if tokenCacheTTL is changed.
tokenTTL = tokenCacheTTL + 5*time.Minute
```
</issue_to_address>
### Comment 2
<location> `central/audit/audit_test.go:134-143` </location>
<code_context>
+func (suite *AuditLogTestSuite) TestServiceRequestsForInternalTokenEndpointAreAudited() {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting additional audit fields (endpoint and interaction) for the internal token endpoint
The test already checks that the message is non-nil, uses the service identity, and has method API. To better validate auditing for this sensitive endpoint, please also assert that:
- `msg.GetEndpoint()` (or equivalent) equals `tokenServiceV1.TokenService_GenerateTokenForPermissionsAndScope_FullMethodName`, and
- `msg.GetInteraction()` is set to the expected value (likely `UPDATE`, given `defaultGRPCMethod`).
These extra checks will help catch regressions in `newAuditMessage` and ensure the audit entry has the expected shape for downstream consumers.
Suggested implementation:
```golang
suite.Equal(storage.Audit_IdentityKind_SERVICE, msg.GetIdentity().GetKind())
suite.Equal(storage.Audit_Method_API, msg.GetMethod())
suite.Equal(tokenServiceV1.TokenService_GenerateTokenForPermissionsAndScope_FullMethodName, msg.GetEndpoint())
suite.Equal(storage.Audit_Interaction_UPDATE, msg.GetInteraction())
```
I assumed the existing assertions on `msg` use `storage.Audit_IdentityKind_SERVICE` and `storage.Audit_Method_API` exactly as shown above. If the actual lines differ (e.g., different enum names or ordering), adjust the `SEARCH` block to match the real code around the `msg` method assertion.
Also, I assumed the interaction enum is `storage.Audit_Interaction_UPDATE`. If the enum name in your repo differs (e.g., `storage.Audit_UPDATE` or similar), update that constant in the `REPLACE` block accordingly.
</issue_to_address>
### Comment 3
<location> `central/audit/audit_test.go:160-169` </location>
<code_context>
+ suite.Equal(v1.Audit_API, msg.GetMethod(), "Service requests should have method=API")
+}
+
+func (suite *AuditLogTestSuite) TestServiceRequestsForOtherEndpointsAreNotAudited() {
+ // Create mock service identity.
+ serviceIdentity := &storage.ServiceIdentity{
+ Id: "test-sensor-12345",
+ Type: storage.ServiceType_SENSOR_SERVICE,
+ }
+ suite.identityMock.EXPECT().Service().Return(serviceIdentity).AnyTimes()
+
+ ctxWithServiceIdentity := authn.ContextWithIdentity(context.Background(), suite.identityMock, suite.T())
+ ctxWithAuth := interceptor.ContextWithAuthStatus(ctxWithServiceIdentity, nil)
+
+ a := &audit{notifications: suite.notifierMock}
+
+ // Test non-token endpoint - should NOT be audited.
+ msg := a.newAuditMessage(ctxWithAuth, "test-request",
+ "/v1.SomeOtherService/SomeMethod",
+ interceptor.AuthStatus{Error: nil}, nil)
+
+ suite.Nil(msg, "Service request for non-token endpoint should not be audited")
+}
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add a negative test that combines service identity with the internal-token method but with an auth error
To fully exercise the new behavior, please also add a case where a service identity calls the token endpoint but auth fails (e.g., `AuthStatus{Error: someError}`). In that test, assert that an audit message is still produced and includes the error details, so we verify that error audits for the token endpoint aren’t inadvertently suppressed.
Suggested implementation:
```golang
func (suite *AuditLogTestSuite) TestServiceRequestsForOtherEndpointsAreNotAudited() {
// Create mock service identity.
serviceIdentity := &storage.ServiceIdentity{
Id: "test-sensor-12345",
Type: storage.ServiceType_SENSOR_SERVICE,
}
suite.identityMock.EXPECT().Service().Return(serviceIdentity).AnyTimes()
ctxWithServiceIdentity := authn.ContextWithIdentity(context.Background(), suite.identityMock, suite.T())
ctxWithAuth := interceptor.ContextWithAuthStatus(ctxWithServiceIdentity, nil)
a := &audit{notifications: suite.notifierMock}
// Test non-token endpoint - should NOT be audited.
msg := a.newAuditMessage(ctxWithAuth, "test-request",
"/v1.SomeOtherService/SomeMethod",
interceptor.AuthStatus{Error: nil}, nil)
suite.Nil(msg, "Service request for non-token endpoint should not be audited")
}
func (suite *AuditLogTestSuite) TestServiceRequestsForInternalTokenEndpointWithAuthErrorAreAudited() {
// Create mock service identity.
serviceIdentity := &storage.ServiceIdentity{
Id: "test-sensor-12345",
Type: storage.ServiceType_SENSOR_SERVICE,
}
suite.identityMock.EXPECT().Service().Return(serviceIdentity).AnyTimes()
ctxWithServiceIdentity := authn.ContextWithIdentity(context.Background(), suite.identityMock, suite.T())
// Simulate an auth failure for the internal token endpoint.
authErr := errors.New("test auth error")
ctxWithAuth := interceptor.ContextWithAuthStatus(ctxWithServiceIdentity, nil)
a := &audit{notifications: suite.notifierMock}
msg := a.newAuditMessage(ctxWithAuth, "test-request",
tokenServiceV1.TokenService_GenerateTokenForPermissionsAndScope_FullMethodName,
interceptor.AuthStatus{Error: authErr}, nil)
suite.NotNil(msg, "Service request for internal token endpoint with auth error should still be audited")
suite.NotNil(msg.GetUser(), "Audit message should contain user info from service identity")
suite.Equal("service:SENSOR_SERVICE:test-sensor-12345", msg.GetUser().GetUsername())
suite.Equal("Service: SENSOR_SERVICE (ID: test-sensor-12345)", msg.GetUser().GetFriendlyName())
suite.Equal(v1.Audit_API, msg.GetMethod(), "Service requests should have method=API")
suite.Contains(msg.GetError(), authErr.Error(), "Audit message should contain auth error details")
}
```
1. Ensure that the audit log protobuf/message type used by `newAuditMessage` has an `Error` field with accessor `GetError() string`. If the error details are stored differently (e.g., in a nested status object or differently named field), adjust the last assertion accordingly.
2. If `errors` is not yet imported in this file, keep the existing `github.com/pkg/errors` import that appears in your snippet or add a standard `errors` import (`"errors"`) and switch `errors.New` to the appropriate package consistently with the rest of the file.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
cb273ac to
f669757
Compare
f669757 to
8554d14
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
UnaryServerInterceptorfunction signature now has a trailing comma before the closing parenthesis (handler grpc.UnaryHandler,)); this looks unintentional and should be cleaned up for consistency with the rest of the codebase. - The construction of the service-backed
UserInfo(theservice:<type>:<id>username and friendly name format) is hard-coded innewAuditMessage; consider extracting this into a small helper so the representation of service identities in audit logs stays consistent if it is needed elsewhere or the format changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `UnaryServerInterceptor` function signature now has a trailing comma before the closing parenthesis (`handler grpc.UnaryHandler,)`); this looks unintentional and should be cleaned up for consistency with the rest of the codebase.
- The construction of the service-backed `UserInfo` (the `service:<type>:<id>` username and friendly name format) is hard-coded in `newAuditMessage`; consider extracting this into a small helper so the representation of service identities in audit logs stays consistent if it is needed elsewhere or the format changes.
## Individual Comments
### Comment 1
<location> `central/audit/audit.go:158-159` </location>
<code_context>
func (a *audit) UnaryServerInterceptor() func(ctx context.Context, req interface{},
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo,
- handler grpc.UnaryHandler) (interface{}, error) {
+ handler grpc.UnaryHandler,) (interface{}, error) {
resp, err := handler(ctx, req)
go a.SendAuditMessage(ctx, req, info.FullMethod, interceptor.GetAuthErrorFromContext(ctx), err)
</code_context>
<issue_to_address>
**issue (bug_risk):** Fix the extra trailing comma in the UnaryServerInterceptor handler parameter list.
The `handler grpc.UnaryHandler,)` parameter list has a trailing comma before the closing parenthesis, which is invalid Go syntax and will prevent compilation. Remove the comma (or format the parameters as a multiline list where a trailing comma is allowed).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
aa7d6cc to
834beb6
Compare
|
@stehessel: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Create audit log events when internal tokens are generated by Sensor. This is to let security conscious users monitor the creation and proliferation of tokens.
I also increased the internal token lifetime to 1 hour. This is both to reduce the audit log events and the load on Central. Since the internal tokens have minimal permissions and are not exposed to users, keeping them alive for a bit longer should be acceptable.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
Example audit log event: