Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 32c99a6

Browse filesBrowse files
feat(api): further updates for evals API
1 parent e9a89ab commit 32c99a6
Copy full SHA for 32c99a6

15 files changed

+107
-73
lines changed

‎.stats.yml

Copy file name to clipboard
+2-2Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
configured_endpoints: 101
2-
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-5fa16b9a02985ae06e41be14946a9c325dc672fb014b3c19abca65880c6990e6.yml
3-
openapi_spec_hash: da3e669f65130043b1170048c0727890
2+
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-262e171d0a8150ea1192474d16ba3afdf9a054b399f1a49a9c9b697a3073c136.yml
3+
openapi_spec_hash: 33e00a48df8f94c94f46290c489f132b
44
config_hash: d8d5fda350f6db77c784f35429741a2e

‎src/openai/resources/evals/evals.py

Copy file name to clipboardExpand all lines: src/openai/resources/evals/evals.py
+16-6Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,20 @@ def create(
7474
) -> EvalCreateResponse:
7575
"""
7676
Create the structure of an evaluation that can be used to test a model's
77-
performance. An evaluation is a set of testing criteria and a datasource. After
77+
performance. An evaluation is a set of testing criteria and the config for a
78+
data source, which dictates the schema of the data used in the evaluation. After
7879
creating an evaluation, you can run it on different models and model parameters.
7980
We support several types of graders and datasources. For more information, see
8081
the [Evals guide](https://platform.openai.com/docs/guides/evals).
8182
8283
Args:
83-
data_source_config: The configuration for the data source used for the evaluation runs.
84+
data_source_config: The configuration for the data source used for the evaluation runs. Dictates the
85+
schema of the data used in the evaluation.
8486
85-
testing_criteria: A list of graders for all eval runs in this group.
87+
testing_criteria: A list of graders for all eval runs in this group. Graders can reference
88+
variables in the data source using double curly braces notation, like
89+
`{{item.variable_name}}`. To reference the model's output, use the `sample`
90+
namespace (ie, `{{sample.output_text}}`).
8691
8792
metadata: Set of 16 key-value pairs that can be attached to an object. This can be useful
8893
for storing additional information about the object in a structured format, and
@@ -333,15 +338,20 @@ async def create(
333338
) -> EvalCreateResponse:
334339
"""
335340
Create the structure of an evaluation that can be used to test a model's
336-
performance. An evaluation is a set of testing criteria and a datasource. After
341+
performance. An evaluation is a set of testing criteria and the config for a
342+
data source, which dictates the schema of the data used in the evaluation. After
337343
creating an evaluation, you can run it on different models and model parameters.
338344
We support several types of graders and datasources. For more information, see
339345
the [Evals guide](https://platform.openai.com/docs/guides/evals).
340346
341347
Args:
342-
data_source_config: The configuration for the data source used for the evaluation runs.
348+
data_source_config: The configuration for the data source used for the evaluation runs. Dictates the
349+
schema of the data used in the evaluation.
343350
344-
testing_criteria: A list of graders for all eval runs in this group.
351+
testing_criteria: A list of graders for all eval runs in this group. Graders can reference
352+
variables in the data source using double curly braces notation, like
353+
`{{item.variable_name}}`. To reference the model's output, use the `sample`
354+
namespace (ie, `{{sample.output_text}}`).
345355
346356
metadata: Set of 16 key-value pairs that can be attached to an object. This can be useful
347357
for storing additional information about the object in a structured format, and

‎src/openai/resources/evals/runs/runs.py

Copy file name to clipboardExpand all lines: src/openai/resources/evals/runs/runs.py
+8-6Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,10 @@ def create(
7272
extra_body: Body | None = None,
7373
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
7474
) -> RunCreateResponse:
75-
"""Create a new evaluation run.
76-
77-
This is the endpoint that will kick off grading.
75+
"""
76+
Kicks off a new run for a given evaluation, specifying the data source, and what
77+
model configuration to use to test. The datasource will be validated against the
78+
schema specified in the config of the evaluation.
7879
7980
Args:
8081
data_source: Details about the run's data source.
@@ -321,9 +322,10 @@ async def create(
321322
extra_body: Body | None = None,
322323
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
323324
) -> RunCreateResponse:
324-
"""Create a new evaluation run.
325-
326-
This is the endpoint that will kick off grading.
325+
"""
326+
Kicks off a new run for a given evaluation, specifying the data source, and what
327+
model configuration to use to test. The datasource will be validated against the
328+
schema specified in the config of the evaluation.
327329
328330
Args:
329331
data_source: Details about the run's data source.

‎src/openai/types/beta/realtime/transcription_session_updated_event.py

Copy file name to clipboardExpand all lines: src/openai/types/beta/realtime/transcription_session_updated_event.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ class TranscriptionSessionUpdatedEvent(BaseModel):
1616
"""A new Realtime transcription session configuration.
1717
1818
When a session is created on the server via REST API, the session object also
19-
contains an ephemeral key. Default TTL for keys is one minute. This property is
19+
contains an ephemeral key. Default TTL for keys is 10 minutes. This property is
2020
not present when a session is updated via the WebSocket API.
2121
"""
2222

‎src/openai/types/eval_create_params.py

Copy file name to clipboardExpand all lines: src/openai/types/eval_create_params.py
+13-5Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,18 @@
3333

3434
class EvalCreateParams(TypedDict, total=False):
3535
data_source_config: Required[DataSourceConfig]
36-
"""The configuration for the data source used for the evaluation runs."""
36+
"""The configuration for the data source used for the evaluation runs.
37+
38+
Dictates the schema of the data used in the evaluation.
39+
"""
3740

3841
testing_criteria: Required[Iterable[TestingCriterion]]
39-
"""A list of graders for all eval runs in this group."""
42+
"""A list of graders for all eval runs in this group.
43+
44+
Graders can reference variables in the data source using double curly braces
45+
notation, like `{{item.variable_name}}`. To reference the model's output, use
46+
the `sample` namespace (ie, `{{sample.output_text}}`).
47+
"""
4048

4149
metadata: Optional[Metadata]
4250
"""Set of 16 key-value pairs that can be attached to an object.
@@ -75,8 +83,8 @@ class DataSourceConfigLogs(TypedDict, total=False):
7583

7684

7785
class DataSourceConfigStoredCompletions(TypedDict, total=False):
78-
type: Required[Literal["stored-completions"]]
79-
"""The type of data source. Always `stored-completions`."""
86+
type: Required[Literal["stored_completions"]]
87+
"""The type of data source. Always `stored_completions`."""
8088

8189
metadata: Dict[str, object]
8290
"""Metadata filters for the stored completions data source."""
@@ -129,7 +137,7 @@ class TestingCriterionLabelModel(TypedDict, total=False):
129137
input: Required[Iterable[TestingCriterionLabelModelInput]]
130138
"""A list of chat messages forming the prompt or context.
131139
132-
May include variable references to the "item" namespace, ie {{item.name}}.
140+
May include variable references to the `item` namespace, ie {{item.name}}.
133141
"""
134142

135143
labels: Required[List[str]]

‎src/openai/types/eval_stored_completions_data_source_config.py

Copy file name to clipboardExpand all lines: src/openai/types/eval_stored_completions_data_source_config.py
+2-2Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ class EvalStoredCompletionsDataSourceConfig(BaseModel):
1818
[here](https://json-schema.org/).
1919
"""
2020

21-
type: Literal["stored-completions"]
22-
"""The type of data source. Always `stored-completions`."""
21+
type: Literal["stored_completions"]
22+
"""The type of data source. Always `stored_completions`."""
2323

2424
metadata: Optional[Metadata] = None
2525
"""Set of 16 key-value pairs that can be attached to an object.

‎src/openai/types/evals/create_eval_completions_run_data_source.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/create_eval_completions_run_data_source.py
+9-3Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ class InputMessagesTemplate(BaseModel):
117117
template: List[InputMessagesTemplateTemplate]
118118
"""A list of chat messages forming the prompt or context.
119119
120-
May include variable references to the "item" namespace, ie {{item.name}}.
120+
May include variable references to the `item` namespace, ie {{item.name}}.
121121
"""
122122

123123
type: Literal["template"]
@@ -126,7 +126,7 @@ class InputMessagesTemplate(BaseModel):
126126

127127
class InputMessagesItemReference(BaseModel):
128128
item_reference: str
129-
"""A reference to a variable in the "item" namespace. Ie, "item.name" """
129+
"""A reference to a variable in the `item` namespace. Ie, "item.input_trajectory" """
130130

131131
type: Literal["item_reference"]
132132
"""The type of input messages. Always `item_reference`."""
@@ -153,12 +153,18 @@ class SamplingParams(BaseModel):
153153

154154
class CreateEvalCompletionsRunDataSource(BaseModel):
155155
source: Source
156-
"""A StoredCompletionsRunDataSource configuration describing a set of filters"""
156+
"""Determines what populates the `item` namespace in this run's data source."""
157157

158158
type: Literal["completions"]
159159
"""The type of run data source. Always `completions`."""
160160

161161
input_messages: Optional[InputMessages] = None
162+
"""Used when sampling from a model.
163+
164+
Dictates the structure of the messages passed into the model. Can either be a
165+
reference to a prebuilt trajectory (ie, `item.input_trajectory`), or a template
166+
with variable references to the `item` namespace.
167+
"""
162168

163169
model: Optional[str] = None
164170
"""The name of the model to use for generating completions (e.g. "o3-mini")."""

‎src/openai/types/evals/create_eval_completions_run_data_source_param.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/create_eval_completions_run_data_source_param.py
+9-3Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ class InputMessagesTemplate(TypedDict, total=False):
113113
template: Required[Iterable[InputMessagesTemplateTemplate]]
114114
"""A list of chat messages forming the prompt or context.
115115
116-
May include variable references to the "item" namespace, ie {{item.name}}.
116+
May include variable references to the `item` namespace, ie {{item.name}}.
117117
"""
118118

119119
type: Required[Literal["template"]]
@@ -122,7 +122,7 @@ class InputMessagesTemplate(TypedDict, total=False):
122122

123123
class InputMessagesItemReference(TypedDict, total=False):
124124
item_reference: Required[str]
125-
"""A reference to a variable in the "item" namespace. Ie, "item.name" """
125+
"""A reference to a variable in the `item` namespace. Ie, "item.input_trajectory" """
126126

127127
type: Required[Literal["item_reference"]]
128128
"""The type of input messages. Always `item_reference`."""
@@ -147,12 +147,18 @@ class SamplingParams(TypedDict, total=False):
147147

148148
class CreateEvalCompletionsRunDataSourceParam(TypedDict, total=False):
149149
source: Required[Source]
150-
"""A StoredCompletionsRunDataSource configuration describing a set of filters"""
150+
"""Determines what populates the `item` namespace in this run's data source."""
151151

152152
type: Required[Literal["completions"]]
153153
"""The type of run data source. Always `completions`."""
154154

155155
input_messages: InputMessages
156+
"""Used when sampling from a model.
157+
158+
Dictates the structure of the messages passed into the model. Can either be a
159+
reference to a prebuilt trajectory (ie, `item.input_trajectory`), or a template
160+
with variable references to the `item` namespace.
161+
"""
156162

157163
model: str
158164
"""The name of the model to use for generating completions (e.g. "o3-mini")."""

‎src/openai/types/evals/create_eval_jsonl_run_data_source.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/create_eval_jsonl_run_data_source.py
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ class SourceFileID(BaseModel):
3636

3737
class CreateEvalJSONLRunDataSource(BaseModel):
3838
source: Source
39+
"""Determines what populates the `item` namespace in the data source."""
3940

4041
type: Literal["jsonl"]
4142
"""The type of data source. Always `jsonl`."""

‎src/openai/types/evals/create_eval_jsonl_run_data_source_param.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/create_eval_jsonl_run_data_source_param.py
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ class SourceFileID(TypedDict, total=False):
4141

4242
class CreateEvalJSONLRunDataSourceParam(TypedDict, total=False):
4343
source: Required[Source]
44+
"""Determines what populates the `item` namespace in the data source."""
4445

4546
type: Required[Literal["jsonl"]]
4647
"""The type of data source. Always `jsonl`."""

‎src/openai/types/evals/run_cancel_response.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/run_cancel_response.py
+9-9Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,6 @@ class DataSourceResponsesSourceResponses(BaseModel):
7676
This is a query parameter used to select responses.
7777
"""
7878

79-
has_tool_calls: Optional[bool] = None
80-
"""Whether the response has tool calls.
81-
82-
This is a query parameter used to select responses.
83-
"""
84-
8579
instructions_search: Optional[str] = None
8680
"""Optional string to search the 'instructions' field.
8781
@@ -170,7 +164,7 @@ class DataSourceResponsesInputMessagesTemplate(BaseModel):
170164
template: List[DataSourceResponsesInputMessagesTemplateTemplate]
171165
"""A list of chat messages forming the prompt or context.
172166
173-
May include variable references to the "item" namespace, ie {{item.name}}.
167+
May include variable references to the `item` namespace, ie {{item.name}}.
174168
"""
175169

176170
type: Literal["template"]
@@ -179,7 +173,7 @@ class DataSourceResponsesInputMessagesTemplate(BaseModel):
179173

180174
class DataSourceResponsesInputMessagesItemReference(BaseModel):
181175
item_reference: str
182-
"""A reference to a variable in the "item" namespace. Ie, "item.name" """
176+
"""A reference to a variable in the `item` namespace. Ie, "item.name" """
183177

184178
type: Literal["item_reference"]
185179
"""The type of input messages. Always `item_reference`."""
@@ -207,12 +201,18 @@ class DataSourceResponsesSamplingParams(BaseModel):
207201

208202
class DataSourceResponses(BaseModel):
209203
source: DataSourceResponsesSource
210-
"""A EvalResponsesSource object describing a run data source configuration."""
204+
"""Determines what populates the `item` namespace in this run's data source."""
211205

212206
type: Literal["responses"]
213207
"""The type of run data source. Always `responses`."""
214208

215209
input_messages: Optional[DataSourceResponsesInputMessages] = None
210+
"""Used when sampling from a model.
211+
212+
Dictates the structure of the messages passed into the model. Can either be a
213+
reference to a prebuilt trajectory (ie, `item.input_trajectory`), or a template
214+
with variable references to the `item` namespace.
215+
"""
216216

217217
model: Optional[str] = None
218218
"""The name of the model to use for generating completions (e.g. "o3-mini")."""

‎src/openai/types/evals/run_create_params.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/run_create_params.py
+9-9Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,12 +88,6 @@ class DataSourceCreateEvalResponsesRunDataSourceSourceResponses(TypedDict, total
8888
This is a query parameter used to select responses.
8989
"""
9090

91-
has_tool_calls: Optional[bool]
92-
"""Whether the response has tool calls.
93-
94-
This is a query parameter used to select responses.
95-
"""
96-
9791
instructions_search: Optional[str]
9892
"""Optional string to search the 'instructions' field.
9993
@@ -187,7 +181,7 @@ class DataSourceCreateEvalResponsesRunDataSourceInputMessagesTemplate(TypedDict,
187181
template: Required[Iterable[DataSourceCreateEvalResponsesRunDataSourceInputMessagesTemplateTemplate]]
188182
"""A list of chat messages forming the prompt or context.
189183
190-
May include variable references to the "item" namespace, ie {{item.name}}.
184+
May include variable references to the `item` namespace, ie {{item.name}}.
191185
"""
192186

193187
type: Required[Literal["template"]]
@@ -196,7 +190,7 @@ class DataSourceCreateEvalResponsesRunDataSourceInputMessagesTemplate(TypedDict,
196190

197191
class DataSourceCreateEvalResponsesRunDataSourceInputMessagesItemReference(TypedDict, total=False):
198192
item_reference: Required[str]
199-
"""A reference to a variable in the "item" namespace. Ie, "item.name" """
193+
"""A reference to a variable in the `item` namespace. Ie, "item.name" """
200194

201195
type: Required[Literal["item_reference"]]
202196
"""The type of input messages. Always `item_reference`."""
@@ -224,12 +218,18 @@ class DataSourceCreateEvalResponsesRunDataSourceSamplingParams(TypedDict, total=
224218

225219
class DataSourceCreateEvalResponsesRunDataSource(TypedDict, total=False):
226220
source: Required[DataSourceCreateEvalResponsesRunDataSourceSource]
227-
"""A EvalResponsesSource object describing a run data source configuration."""
221+
"""Determines what populates the `item` namespace in this run's data source."""
228222

229223
type: Required[Literal["responses"]]
230224
"""The type of run data source. Always `responses`."""
231225

232226
input_messages: DataSourceCreateEvalResponsesRunDataSourceInputMessages
227+
"""Used when sampling from a model.
228+
229+
Dictates the structure of the messages passed into the model. Can either be a
230+
reference to a prebuilt trajectory (ie, `item.input_trajectory`), or a template
231+
with variable references to the `item` namespace.
232+
"""
233233

234234
model: str
235235
"""The name of the model to use for generating completions (e.g. "o3-mini")."""

‎src/openai/types/evals/run_create_response.py

Copy file name to clipboardExpand all lines: src/openai/types/evals/run_create_response.py
+9-9Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,6 @@ class DataSourceResponsesSourceResponses(BaseModel):
7676
This is a query parameter used to select responses.
7777
"""
7878

79-
has_tool_calls: Optional[bool] = None
80-
"""Whether the response has tool calls.
81-
82-
This is a query parameter used to select responses.
83-
"""
84-
8579
instructions_search: Optional[str] = None
8680
"""Optional string to search the 'instructions' field.
8781
@@ -170,7 +164,7 @@ class DataSourceResponsesInputMessagesTemplate(BaseModel):
170164
template: List[DataSourceResponsesInputMessagesTemplateTemplate]
171165
"""A list of chat messages forming the prompt or context.
172166
173-
May include variable references to the "item" namespace, ie {{item.name}}.
167+
May include variable references to the `item` namespace, ie {{item.name}}.
174168
"""
175169

176170
type: Literal["template"]
@@ -179,7 +173,7 @@ class DataSourceResponsesInputMessagesTemplate(BaseModel):
179173

180174
class DataSourceResponsesInputMessagesItemReference(BaseModel):
181175
item_reference: str
182-
"""A reference to a variable in the "item" namespace. Ie, "item.name" """
176+
"""A reference to a variable in the `item` namespace. Ie, "item.name" """
183177

184178
type: Literal["item_reference"]
185179
"""The type of input messages. Always `item_reference`."""
@@ -207,12 +201,18 @@ class DataSourceResponsesSamplingParams(BaseModel):
207201

208202
class DataSourceResponses(BaseModel):
209203
source: DataSourceResponsesSource
210-
"""A EvalResponsesSource object describing a run data source configuration."""
204+
"""Determines what populates the `item` namespace in this run's data source."""
211205

212206
type: Literal["responses"]
213207
"""The type of run data source. Always `responses`."""
214208

215209
input_messages: Optional[DataSourceResponsesInputMessages] = None
210+
"""Used when sampling from a model.
211+
212+
Dictates the structure of the messages passed into the model. Can either be a
213+
reference to a prebuilt trajectory (ie, `item.input_trajectory`), or a template
214+
with variable references to the `item` namespace.
215+
"""
216216

217217
model: Optional[str] = None
218218
"""The name of the model to use for generating completions (e.g. "o3-mini")."""

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.