Python SDK (v3)
If you are self-hosting Langfuse, the Python SDK v3 requires Langfuse platform version >= 3.63.0 for traces to be correctly processed.
Our OpenTelemetry-based Python SDK (v3) is the latest generation of the SDK designed for a improved developer experience and enhanced ease of use. Built on the robust OpenTelemetry Python SDK, it offers a more intuitive API for comprehensive tracing of your LLM application.
The v3 SDK introduces several key benefits:
- Improved Developer Experience: A more intuitive API means less code to write for tracing your application, simplifying the integration process.
- Unified Context Sharing: Seamlessly hook into the tracing context of the current span to update it or create child spans. This is particularly beneficial for integrating with other instrumented libraries.
- Broad Third-Party Integrations: Any library instrumented with OpenTelemetry will work out-of-the-box with the Langfuse SDK. Spans from these libraries are automatically captured and correctly nested within your Langfuse traces.
There are three main ways of instrumenting your application with the new Langfuse SDK. All of them are fully interoperable with each other.
The @observe
decorator is the simplest way to instrument your application. It is a function decorator that can be applied to any function.
It sets the current span in the context for automatic nesting of child spans and automatically ends it when the function returns. It also automatically captures the function name, arguments, and return value.
from langfuse import observe, get_client
@observe
def my_function():
return "Hello, world!" # Input/output and timings are automatically captured
my_function()
# Flush events in short-lived applications
langfuse = get_client()
langfuse.flush()
Setup
Installation
To install the v3 SDK, run:
pip install langfuse
Initialize Client
Begin by initializing the Langfuse
client. You must provide your Langfuse public and secret keys. These can be passed as constructor arguments or set as environment variables (recommended).
If you are self-hosting Langfuse or using a data region other than the default (EU, https://cloud.langfuse.com
), ensure you configure the host
argument or the LANGFUSE_HOST
environment variable (recommended).
LANGFUSE_PUBLIC_KEY="pk-lf-..."
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_HOST="https://cloud.langfuse.com" # US region: https://us.cloud.langfuse.com
Verify connection with langfuse.auth_check()
langfuse.auth_check()
You can also verify your connection to the Langfuse server using langfuse.auth_check()
. We do not recommend using this in production as this adds latency to your application.
from langfuse import get_client
langfuse = get_client()
# Verify connection, do not use in production as this is a synchronous call
if langfuse.auth_check():
print("Langfuse client is authenticated and ready!")
else:
print("Authentication failed. Please check your credentials and host.")
Key configuration options:
Constructor Argument | Environment Variable | Description | Default value |
---|---|---|---|
public_key | LANGFUSE_PUBLIC_KEY | Your Langfuse project’s public API key. Required. | |
secret_key | LANGFUSE_SECRET_KEY | Your Langfuse project’s secret API key. Required. | |
host | LANGFUSE_HOST | The API host for your Langfuse instance. | "https://cloud.langfuse.com" |
timeout | - | Timeout in seconds for API requests. | 30 |
httpx_client | - | Custom httpx.Client for making non-tracing HTTP requests. | |
debug | LANGFUSE_DEBUG | Enables debug mode for more verbose logging. Set to True or "True" . | False |
tracing_enabled | LANGFUSE_TRACING_ENABLED | Enables or disables the Langfuse client. If False , all observability calls become no-ops. | True |
flush_at | LANGFUSE_FLUSH_AT | Number of spans to batch before sending to the API. | 512 |
flush_interval | LANGFUSE_FLUSH_INTERVAL | Time in seconds between batch flushes. | 5 |
environment | LANGFUSE_TRACING_ENVIRONMENT | Environment name for tracing (e.g., “development”, “staging”, “production”). Must be lowercase alphanumeric with hyphens/underscores. | "default" |
release | LANGFUSE_RELEASE | Release version/hash of your application. Used for grouping analytics. | |
media_upload_thread_count | LANGFUSE_MEDIA_UPLOAD_THREAD_COUNT | Number of background threads for handling media uploads. | 1 |
sample_rate | LANGFUSE_SAMPLE_RATE | Sampling rate for traces (float between 0.0 and 1.0). 1.0 means 100% of traces are sampled. | 1.0 |
mask | - | A function (data: Any) -> Any to mask sensitive data in traces before sending to the API. |
Accessing the Client Globally
The Langfuse client is a singleton. It can be accessed anywhere in your application using the get_client
function.
Optionally, you can initialize the client via Langfuse()
in order to pass in configuration options (see above). Otherwise, it is created automatically when you call get_client()
based on environment variables.
from langfuse import get_client
# Optionally, initialize the client with configuration options
# langfuse = Langfuse(public_key="pk-lf-...", secret_key="sk-lf-...")
# Get the default client
client = get_client()
Basic Tracing
Langfuse provides flexible ways to create and manage traces and their constituent observations (spans and generations).
@observe
Decorator
The @observe()
decorator provides a convenient way to automatically trace function executions, including capturing their inputs, outputs, execution time, and any errors. It supports both synchronous and asynchronous functions.
from langfuse import observe
@observe()
def my_data_processing_function(data, parameter):
# ... processing logic ...
return {"processed_data": data, "status": "ok"}
@observe(name="llm-call", as_type="generation")
async def my_async_llm_call(prompt_text):
# ... async LLM call ...
return "LLM response"
Parameters:
name: Optional[str]
: Custom name for the created span/generation. Defaults to the function name.as_type: Optional[Literal["generation"]]
: If set to"generation"
, a Langfuse generation object is created, suitable for LLM calls. Otherwise, a regular span is created.capture_input: bool
: Whether to capture function arguments as input. Defaults to env varLANGFUSE_OBSERVE_DECORATOR_IO_CAPTURE_ENABLED
orTrue
if not set.capture_output: bool
: Whether to capture function return value as output. Defaults to env varLANGFUSE_OBSERVE_DECORATOR_IO_CAPTURE_ENABLED
orTrue
if not set.transform_to_string: Optional[Callable[[Iterable], str]]
: For functions that return generators (sync or async), this callable can be provided to transform the collected chunks into a single string for theoutput
field. If not provided, and all chunks are strings, they will be concatenated. Otherwise, the list of chunks is stored.
Trace Context and Special Keyword Arguments:
The @observe
decorator automatically propagates the OTEL trace context. If a decorated function is called from within an active Langfuse span (or another OTEL span), the new observation will be nested correctly.
You can also pass special keyword arguments to a decorated function to control its tracing behavior:
langfuse_trace_id: str
: Explicitly set the trace ID for this function call. Must be a valid W3C Trace Context trace ID (32-char hex). If you have a trace ID from an external system, you can useLangfuse.create_trace_id(seed=external_trace_id)
to generate a valid deterministic ID.langfuse_parent_observation_id: str
: Explicitly set the parent observation ID. Must be a valid W3C Trace Context span ID (16-char hex).
@observe()
def my_function(a, b):
return a + b
# Call with a specific trace context
my_function(1, 2, langfuse_trace_id="1234567890abcdef1234567890abcdef")
The observe decorator is capturing the args, kwargs and return value of decorated functions by default. This may lead to performance issues in your application if you have large or deeply nested objects there. To avoid this, explicitly disable function IO capture on the decorated function by passing capture_input / capture_output
with value False
or globally by setting the environment variable LANGFUSE_OBSERVE_DECORATOR_IO_CAPTURE_ENABLED=False
.
Context Managers
You can create spans or generations anywhere in your application. If you need more control than the @observe
decorator, the primary way to do this is using context managers (with with
statements), which ensure that observations are properly started and ended.
langfuse.start_as_current_span()
: Creates a new span and sets it as the currently active observation in the OTEL context for its duration. Any new observations created within this block will be its children.langfuse.start_as_current_generation()
: Similar to the above, but creates a specialized “generation” observation for LLM calls.
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_span(
name="user-request-pipeline",
input={"user_query": "Tell me a joke about OpenTelemetry"},
) as root_span:
# This span is now active in the context.
# Add trace attributes
root_span.update_trace(
user_id="user_123",
session_id="session_abc",
tags=["experimental", "comedy"]
)
# Create a nested generation
with langfuse.start_as_current_generation(
name="joke-generation",
model="gpt-4o",
input=[{"role": "user", "content": "Tell me a joke about OpenTelemetry"}],
model_parameters={"temperature": 0.7}
) as generation:
# Simulate an LLM call
joke_response = "Why did the OpenTelemetry collector break up with the span? Because it needed more space... for its attributes!"
token_usage = {"input_tokens": 10, "output_tokens": 25}
generation.update(
output=joke_response,
usage_details=token_usage
)
# Generation ends automatically here
root_span.update(output={"final_joke": joke_response})
# Root span ends automatically here
Manual Observations
For scenarios where you need to create an observation (a span or generation) without altering the currently active OpenTelemetry context, you can use langfuse.start_span()
or langfuse.start_generation()
.
from langfuse import get_client
langfuse = get_client()
span = langfuse.start_span(name="my-span")
span.end() # Important: Manually end the span
If you use langfuse.start_span()
or langfuse.start_generation()
, you are
responsible for calling .end()
on the returned observation object. Failure
to do so will result in incomplete or missing observations in Langfuse. Their
start_as_current_...
counterparts used with a with
statement handle this
automatically.
Key Characteristics:
- No Context Shift: Unlike their
start_as_current_...
counterparts, these methods do not set the new observation as the active one in the OpenTelemetry context. The previously active span (if any) remains the current context for subsequent operations in the main execution flow. - Parenting: The observation created by
start_span()
orstart_generation()
will still be a child of the span that was active in the context at the moment of its creation. - Manual Lifecycle: These observations are not managed by a
with
block and therefore must be explicitly ended by calling their.end()
method. - Nesting Children:
- Subsequent observations created using the global
langfuse.start_as_current_span()
(or similar global methods) will not be children of these “manual” observations. Instead, they will be parented by the original active span. - To create children directly under a “manual” observation, you would use methods on that specific observation object (e.g.,
manual_span.start_as_current_span(...)
).
- Subsequent observations created using the global
When to Use:
This approach is useful when you need to:
- Record work that is self-contained or happens in parallel to the main execution flow but should still be part of the same overall trace (e.g., a background task initiated by a request).
- Manage the observation’s lifecycle explicitly, perhaps because its start and end are determined by non-contiguous events.
- Obtain an observation object reference before it’s tied to a specific context block.
Example with more complex nesting:
from langfuse import get_client
langfuse = get_client()
# This outer span establishes an active context.
with langfuse.start_as_current_span(name="main-operation") as main_operation_span:
# 'main_operation_span' is the current active context.
# 1. Create a "manual" span using langfuse.start_span().
# - It becomes a child of 'main_operation_span'.
# - Crucially, 'main_operation_span' REMAINS the active context.
# - 'manual_side_task' does NOT become the active context.
manual_side_task = langfuse.start_span(name="manual-side-task")
manual_side_task.update(input="Data for side task")
# 2. Start another operation that DOES become the active context.
# This will be a child of 'main_operation_span', NOT 'manual_side_task',
# because 'manual_side_task' did not alter the active context.
with langfuse.start_as_current_span(name="core-step-within-main") as core_step_span:
# 'core_step_span' is now the active context.
# 'manual_side_task' is still open but not active in the global context.
core_step_span.update(input="Data for core step")
# ... perform core step logic ...
core_step_span.update(output="Core step finished")
# 'core_step_span' ends. 'main_operation_span' is the active context again.
# 3. Complete and end the manual side task.
# This could happen at any point after its creation, even after 'core_step_span'.
manual_side_task.update(output="Side task completed")
manual_side_task.end() # Manual end is crucial for 'manual_side_task'
main_operation_span.update(output="Main operation finished")
# 'main_operation_span' ends automatically here.
# Expected trace structure in Langfuse:
# - main-operation
# |- manual-side-task
# |- core-step-within-main
# (Note: 'core-step-within-main' is a sibling to 'manual-side-task', both children of 'main-operation')
Nesting Observations
The function call hierarchy is automatically captured by the @observe
decorator reflected in the trace.
from langfuse import observe
@observe
def my_data_processing_function(data, parameter):
# ... processing logic ...
return {"processed_data": data, "status": "ok"}
@observe
def main_function(data, parameter):
return my_data_processing_function(data, parameter)
Updating Observations
You can update observations with new information as your code executes.
- For spans/generations created via context managers or assigned to variables: use the
.update()
method on the object. - To update the currently active observation in the context (without needing a direct reference to it): use
langfuse.update_current_span()
orlangfuse.update_current_generation()
.
LangfuseSpan.update()
/ LangfuseGeneration.update()
parameters:
Parameter | Type | Description | Applies To |
---|---|---|---|
input | Optional[Any] | Input data for the operation. | Both |
output | Optional[Any] | Output data from the operation. | Both |
metadata | Optional[Any] | Additional metadata (JSON-serializable). | Both |
version | Optional[str] | Version identifier for the code/component. | Both |
level | Optional[SpanLevel] | Severity: "DEBUG" , "DEFAULT" , "WARNING" , "ERROR" . | Both |
status_message | Optional[str] | A message describing the status, especially for errors. | Both |
completion_start_time | Optional[datetime] | Timestamp when the LLM started generating the completion (streaming). | Generation |
model | Optional[str] | Name/identifier of the AI model used. | Generation |
model_parameters | Optional[Dict[str, MapValue]] | Parameters used for the model call (e.g., temperature). | Generation |
usage_details | Optional[Dict[str, int]] | Token usage (e.g., {"input_tokens": 10, "output_tokens": 20} ). | Generation |
cost_details | Optional[Dict[str, float]] | Cost information (e.g., {"total_cost": 0.0023} ). | Generation |
prompt | Optional[PromptClient] | Associated PromptClient object from Langfuse prompt management. | Generation |
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_generation(name="llm-call", model="gpt-3.5-turbo") as gen:
gen.update(input={"prompt": "Why is the sky blue?"})
# ... make LLM call ...
response_text = "Rayleigh scattering..."
gen.update(
output=response_text,
usage_details={"input_tokens": 5, "output_tokens": 50},
metadata={"confidence": 0.9}
)
# Alternatively, update the current observation in context:
with langfuse.start_as_current_span(name="data-processing"):
# ... some processing ...
langfuse.update_current_span(metadata={"step1_complete": True})
# ... more processing ...
langfuse.update_current_span(output={"result": "final_data"})
Setting Trace Attributes
Trace-level attributes apply to the entire trace, not just a single observation. You can set or update these using:
- The
.update_trace()
method on anyLangfuseSpan
orLangfuseGeneration
object within that trace. langfuse.update_current_trace()
to update the trace associated with the currently active observation.
Trace attribute parameters:
Parameter | Type | Description |
---|---|---|
name | Optional[str] | Name for the trace. |
user_id | Optional[str] | ID of the user associated with this trace. |
session_id | Optional[str] | Session identifier for grouping related traces. |
version | Optional[str] | Version of your application/service for this trace. |
input | Optional[Any] | Overall input for the entire trace. |
output | Optional[Any] | Overall output for the entire trace. |
metadata | Optional[Any] | Additional metadata for the trace. |
tags | Optional[List[str]] | List of tags to categorize the trace. |
public | Optional[bool] | Whether the trace should be publicly accessible (if configured). |
Example: Setting Multiple Trace Attributes
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_span(name="initial-operation") as span:
# Set trace attributes early
span.update_trace(
user_id="user_xyz",
session_id="session_789",
tags=["beta-feature", "llm-chain"]
)
# ...
# Later, from another span in the same trace:
with span.start_as_current_generation(name="final-generation") as gen:
# ...
langfuse.update_current_trace(output={"final_status": "success"}, public=True)
Trace Input/Output Behavior
In v3, trace input and output are automatically set from the root observation (first span/generation) by default. This differs from v2 where integrations could set trace-level inputs/outputs directly.
Default Behavior
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_span(
name="user-request",
input={"query": "What is the capital of France?"} # This becomes the trace input
) as root_span:
with langfuse.start_as_current_generation(
name="llm-call",
model="gpt-4o",
input={"messages": [{"role": "user", "content": "What is the capital of France?"}]}
) as gen:
response = "Paris is the capital of France."
gen.update(output=response)
# LLM generation input/output are separate from trace input/output
root_span.update(output={"answer": "Paris"}) # This becomes the trace output
Override Default Behavior
If you need different trace inputs/outputs than the root observation, explicitly set them:
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_span(name="complex-pipeline") as root_span:
# Root span has its own input/output
root_span.update(input="Step 1 data", output="Step 1 result")
# But trace should have different input/output (e.g., for LLM-as-a-judge)
root_span.update_trace(
input={"original_query": "User's actual question"},
output={"final_answer": "Complete response", "confidence": 0.95}
)
# Now trace input/output are independent of root span input/output
Critical for LLM-as-a-Judge Features
LLM-as-a-judge and evaluation features typically rely on trace-level inputs and outputs. Make sure to set these appropriately:
from langfuse import observe, get_client
langfuse = get_client()
@observe()
def process_user_query(user_question: str):
# LLM processing...
answer = call_llm(user_question)
# Explicitly set trace input/output for evaluation features
langfuse.update_current_trace(
input={"question": user_question},
output={"answer": answer}
)
return answer
Trace and Observation IDs
Langfuse uses W3C Trace Context compliant IDs:
- Trace IDs: 32-character lowercase hexadecimal string (16 bytes).
- Observation IDs (Span IDs): 16-character lowercase hexadecimal string (8 bytes).
You can retrieve these IDs:
langfuse.get_current_trace_id()
: Gets the trace ID of the currently active observation.langfuse.get_current_observation_id()
: Gets the ID of the currently active observation.span_obj.trace_id
andspan_obj.id
: Access IDs directly from aLangfuseSpan
orLangfuseGeneration
object.
For scenarios where you need to generate IDs outside of an active trace (e.g., to link scores to traces/observations that will be created later, or to correlate with external systems), use:
Langfuse.create_trace_id(seed: Optional[str] = None)
(static method): Generates a new trace ID. If aseed
is provided, the ID is deterministic. Use the same seed to get the same ID. This is useful for correlating external IDs with Langfuse traces.
from langfuse import get_client, Langfuse
langfuse = get_client()
# Get current IDs
with langfuse.start_as_current_span(name="my-op") as current_op:
trace_id = langfuse.get_current_trace_id()
observation_id = langfuse.get_current_observation_id()
print(f"Current Trace ID: {trace_id}, Current Observation ID: {observation_id}")
print(f"From object: Trace ID: {current_op.trace_id}, Observation ID: {current_op.id}")
# Generate IDs deterministically
external_request_id = "req_12345"
deterministic_trace_id = Langfuse.create_trace_id(seed=external_request_id)
print(f"Deterministic Trace ID for {external_request_id}: {deterministic_trace_id}")
Linking to Existing Traces (Trace Context)
If you have a trace_id
(and optionally a parent_span_id
) from an external source (e.g., another service, a batch job), you can link new observations to it using the trace_context
parameter. Note that OpenTelemetry offers native cross-service context propagation, so this is not necessarily required for calls between services that are instrumented with OTEL.
from langfuse import get_client
langfuse = get_client()
existing_trace_id = "abcdef1234567890abcdef1234567890" # From an upstream service
existing_parent_span_id = "fedcba0987654321" # Optional parent span in that trace
with langfuse.start_as_current_span(
name="process-downstream-task",
trace_context={
"trace_id": existing_trace_id,
"parent_span_id": existing_parent_span_id # If None, this becomes a root span in the existing trace
}
) as span:
# This span is now part of the trace `existing_trace_id`
# and a child of `existing_parent_span_id` if provided.
print(f"This span's trace_id: {span.trace_id}") # Will be existing_trace_id
pass
Client Management
flush()
Manually triggers the sending of all buffered observations (spans, generations, scores, media metadata) to the Langfuse API. This is useful in short-lived scripts or before exiting an application to ensure all data is persisted.
from langfuse import get_client
langfuse = get_client()
# ... create traces and observations ...
langfuse.flush() # Ensures all pending data is sent
The flush()
method blocks until the queued data is processed by the respective background threads.
shutdown()
Gracefully shuts down the Langfuse client. This includes:
- Flushing all buffered data (similar to
flush()
). - Waiting for background threads (for data ingestion and media uploads) to finish their current tasks and terminate.
It’s crucial to call shutdown()
before your application exits to prevent data loss and ensure clean resource release. The SDK automatically registers an atexit
hook to call shutdown()
on normal program termination, but manual invocation is recommended in scenarios like:
- Long-running daemons or services when they receive a shutdown signal.
- Applications where
atexit
might not reliably trigger (e.g., certain serverless environments or forceful terminations).
from langfuse import get_client
langfuse = get_client()
# ... application logic ...
# Before exiting:
langfuse.shutdown()
Integrations
OpenAI Integration
Langfuse offers a drop-in replacement for the OpenAI Python SDK to automatically trace all your OpenAI API calls. Simply change your import statement:
- import openai
+ from langfuse.openai import openai
# Your existing OpenAI code continues to work as is
# For example:
# client = openai.OpenAI()
# completion = client.chat.completions.create(...)
What’s automatically captured:
- Requests & Responses: All prompts/completions, including support for streaming, async operations, and function/tool calls.
- Timings: Latencies for API calls.
- Errors: API errors are captured with their details.
- Model Usage: Token counts (input, output, total).
- Cost: Estimated cost in USD (based on model and token usage).
- Media: Input audio and output audio from speech-to-text and text-to-speech endpoints.
The integration is fully interoperable with @observe
and manual tracing methods (start_as_current_span
, etc.). If an OpenAI call is made within an active Langfuse span, the OpenAI generation will be correctly nested under it.
Passing Langfuse arguments to OpenAI calls:
You can pass Langfuse-specific arguments directly to OpenAI client methods. These will be used to enrich the trace data.
from langfuse import get_client
from langfuse.openai import openai
langfuse = get_client()
client = openai.OpenAI()
with langfuse.start_as_current_span(name="qna-bot-openai") as span:
langfuse.update_current_trace(tags=["qna-bot-openai"])
# This will be traced as a Langfuse generation
response = client.chat.completions.create(
name="qna-bot-openai", # Custom name for this generation in Langfuse
metadata={"user_tier": "premium", "request_source": "web_api"}, # will be added to the Langfuse generation
model="gpt-4o",
messages=[{"role": "user", "content": "What is OpenTelemetry?"}],
)
Supported Langfuse arguments: name
, metadata
, langfuse_prompt
Langchain Integration
Langfuse provides a callback handler for Langchain to trace its operations.
Setup:
Initialize the CallbackHandler
and add it to your Langchain calls, either globally or per-call.
from langfuse import get_client
from langfuse.langchain import CallbackHandler
from langchain_openai import ChatOpenAI # Example LLM
from langchain_core.prompts import ChatPromptTemplate
langfuse = get_client()
# Initialize the Langfuse handler
langfuse_handler = CallbackHandler()
# Example: Using it with an LLM call
llm = ChatOpenAI(model_name="gpt-4o")
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | llm
with langfuse.start_as_current_span(name="joke-chain") as span:
langfuse.update_current_trace(tags=["joke-chain"])
response = chain.invoke({"topic": "cats"}, config={"callbacks": [langfuse_handler]})
print(response)
What’s captured:
The callback handler maps various Langchain events to Langfuse observations:
- Chains (
on_chain_start
,on_chain_end
,on_chain_error
): Traced as spans. - LLMs (
on_llm_start
,on_llm_end
,on_llm_error
,on_chat_model_start
): Traced as generations, capturing model name, prompts, responses, and usage if available from the LLM provider. - Tools (
on_tool_start
,on_tool_end
,on_tool_error
): Traced as spans, capturing tool input and output. - Retrievers (
on_retriever_start
,on_retriever_end
,on_retriever_error
): Traced as spans, capturing the query and retrieved documents. - Agents (
on_agent_action
,on_agent_finish
): Agent actions and final finishes are captured within their parent chain/agent span.
Langfuse attempts to parse model names, usage, and other relevant details from the information provided by Langchain. The metadata
argument in Langchain calls can be used to pass additional information to Langfuse, including langfuse_prompt
to link with managed prompts.
Third-party integrations
The Langfuse SDK seamlessly integrates with any third-party library that uses OpenTelemetry instrumentation. When these libraries emit spans, they are automatically captured and properly nested within your trace hierarchy. This enables unified tracing across your entire application stack without requiring any additional configuration.
For example, if you’re using OpenTelemetry-instrumented databases, HTTP clients, or other services alongside your LLM operations, all these spans will be correctly organized within your traces in Langfuse.
You can use any third-party, OTEL-based instrumentation library for Anthropic to automatically trace all your Anthropic API calls in Langfuse.
In this example, we are using the opentelemetry-instrumentation-anthropic
library.
from anthropic import Anthropic
from opentelemetry.instrumentation.anthropic import AnthropicInstrumentor
from langfuse import get_client
# This will automatically emit OTEL-spans for all Anthropic API calls
AnthropicInstrumentor().instrument()
langfuse = get_client()
anthropic_client = Anthropic()
with langfuse.start_as_current_span(name="myspan"):
# This will be traced as a Langfuse generation nested under the current span
message = anthropic_client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude"}],
)
print(message.content)
# Flush events to Langfuse in short-lived applications
langfuse.flush()
Scoring traces and observations
span_or_generation_obj.score()
: Scores the specific observation object.span_or_generation_obj.score_trace()
: Scores the entire trace to which the object belongs.
from langfuse import get_client
langfuse = get_client()
with langfuse.start_as_current_generation(name="summary_generation") as gen:
# ... LLM call ...
gen.update(output="summary text...")
# Score this specific generation
gen.score(name="conciseness", value=0.8, data_type="NUMERIC")
# Score the overall trace
gen.score_trace(name="user_feedback_rating", value="positive", data_type="CATEGORICAL")
Score Parameters:
Parameter | Type | Description |
---|---|---|
name | str | Name of the score (e.g., “relevance”, “accuracy”). Required. |
value | Union[float, str] | Score value. Float for NUMERIC /BOOLEAN , string for CATEGORICAL . Required. |
trace_id | str | ID of the trace to associate with (for create_score ). Required. |
observation_id | Optional[str] | ID of the specific observation to score (for create_score ). |
score_id | Optional[str] | Custom ID for the score (auto-generated if None). |
data_type | Optional[ScoreDataType] | "NUMERIC" , "BOOLEAN" , or "CATEGORICAL" . Inferred if not provided based on value type and score config on server. |
comment | Optional[str] | Optional comment or explanation for the score. |
config_id | Optional[str] | Optional ID of a pre-defined score configuration in Langfuse. |
See Scoring for more details.
Datasets
Langfuse Datasets are essential for evaluating and testing your LLM applications by allowing you to manage collections of inputs and their expected outputs.
Interacting with Datasets
- Fetching: Retrieve a dataset and its items using
langfuse.get_dataset(name: str)
. This returns aDatasetClient
instance, which contains a list ofDatasetItemClient
objects (accessible viadataset.items
). EachDatasetItemClient
holds theinput
,expected_output
, andmetadata
for an individual data point. - Creating: You can programmatically create new datasets with
langfuse.create_dataset(...)
and add items to them usinglangfuse.create_dataset_item(...)
.
from langfuse import get_client
langfuse = get_client()
# Fetch an existing dataset
dataset = langfuse.get_dataset(name="my-eval-dataset")
for item in dataset.items:
print(f"Input: {item.input}, Expected: {item.expected_output}")
# Briefly: Creating a dataset and an item
new_dataset = langfuse.create_dataset(name="new-summarization-tasks")
langfuse.create_dataset_item(
dataset_name="new-summarization-tasks",
input={"text": "Long article..."},
expected_output={"summary": "Short summary."}
)
Linking Traces to Dataset Items for Runs
The most powerful way to use datasets is by linking your application’s executions (traces) to specific dataset items when performing an evaluation run. See our datasets documentation for more details. The DatasetItemClient.run()
method provides a context manager to streamline this process.
How item.run()
works:
When you use with item.run(run_name="your_eval_run_name") as root_span:
:
- Trace Creation: A new Langfuse trace is initiated specifically for processing this dataset item within the context of the named run.
- Trace Naming & Metadata:
- The trace is automatically named (e.g., “Dataset run: your_eval_run_name”).
- Essential metadata is added to this trace, including
dataset_item_id
(the ID ofitem
),run_name
, anddataset_id
.
- DatasetRunItem Linking: The SDK makes an API call to Langfuse to create a
DatasetRunItem
. This backend object formally links:- The
dataset_item_id
- The
trace_id
of the newly created trace - The provided
run_name
- Any
run_metadata
orrun_description
you pass toitem.run()
. This linkage is what populates the “Runs” tab for your dataset in the Langfuse UI, allowing you to see all traces associated with a particular evaluation run.
- The
- Contextual Span: The context manager yields
root_span
, which is aLangfuseSpan
object representing the root span of this new trace. - Automatic Nesting: Any Langfuse observations (spans or generations) created inside the
with
block will automatically become children ofroot_span
and thus part of the trace linked to this dataset item and run.
Example:
from langfuse import get_client
langfuse = get_client()
dataset_name = "qna-eval"
current_run_name = "qna_model_v3_run_05_20" # Identifies this specific evaluation run
# Assume 'my_qna_app' is your instrumented application function
def my_qna_app(question: str, context: str, item_id: str, run_name: str):
with langfuse.start_as_current_generation(
name="qna-llm-call",
input={"question": question, "context": context},
metadata={"item_id": item_id, "run": run_name}, # Example metadata for the generation
model="gpt-4o"
) as generation:
# Simulate LLM call
answer = f"Answer to '{question}' using context." # Replace with actual LLM call
generation.update(output={"answer": answer})
# Update the trace with the input and output
generation.update_trace(
input={"question": question, "context": context},
output={"answer": answer},
)
return answer
dataset = langfuse.get_dataset(name=dataset_name) # Fetch your pre-populated dataset
for item in dataset.items:
print(f"Running evaluation for item: {item.id} (Input: {item.input})")
# Use the item.run() context manager
with item.run(
run_name=current_run_name,
run_metadata={"model_provider": "OpenAI", "temperature_setting": 0.7},
run_description="Evaluation run for Q&A model v3 on May 20th"
) as root_span: # root_span is the root span of the new trace for this item and run.
# All subsequent langfuse operations within this block are part of this trace.
# Call your application logic
generated_answer = my_qna_app(
question=item.input["question"],
context=item.input["context"],
item_id=item.id,
run_name=current_run_name
)
print(f" Item {item.id} processed. Trace ID: {root_span.trace_id}")
# Optionally, score the result against the expected output
if item.expected_output and generated_answer == item.expected_output.get("answer"):
root_span.score_trace(name="exact_match", value=1.0)
else:
root_span.score_trace(name="exact_match", value=0.0)
print(f"\nFinished processing dataset '{dataset_name}' for run '{current_run_name}'.")
By using item.run()
, you ensure each dataset item’s processing is neatly encapsulated in its own trace, and these traces are aggregated under the specified run_name
in the Langfuse UI. This allows for systematic review of results, comparison across runs, and deep dives into individual processing traces.
Advanced Configuration
Masking Sensitive Data
If your trace data (inputs, outputs, metadata) might contain sensitive information (PII, secrets), you can provide a mask
function during client initialization. This function will be applied to all relevant data before it’s sent to Langfuse.
The mask
function should accept data
as a keyword argument and return the masked data. The returned data must be JSON-serializable.
from langfuse import Langfuse
import re
def pii_masker(data: any, **kwargs) -> any:
# Example: Simple email masking. Implement your more robust logic here.
if isinstance(data, str):
return re.sub(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+", "[EMAIL_REDACTED]", data)
elif isinstance(data, dict):
return {k: pii_masker(data=v) for k, v in data.items()}
elif isinstance(data, list):
return [pii_masker(data=item) for item in data]
return data
langfuse = Langfuse(mask=pii_masker)
# Now, any input/output/metadata will be passed through pii_masker
with langfuse.start_as_current_span(name="user-query", input={"email": "[email protected]", "query": "..."}) as span:
# The 'email' field in the input will be masked.
pass
Logging
The Langfuse SDK uses Python’s standard logging
module. The main logger is named "langfuse"
.
To enable detailed debug logging, you can either:
- Set the
debug=True
parameter when initializing theLangfuse
client. - Set the
LANGFUSE_DEBUG="True"
environment variable. - Configure the
"langfuse"
logger manually:
import logging
langfuse_logger = logging.getLogger("langfuse")
langfuse_logger.setLevel(logging.DEBUG)
The default log level for the langfuse
logger is logging.WARNING
.
Sampling
You can configure the SDK to sample traces by setting the sample_rate
parameter during client initialization (or via the LANGFUSE_SAMPLE_RATE
environment variable). This value should be a float between 0.0
(sample 0% of traces) and 1.0
(sample 100% of traces).
If a trace is not sampled, none of its observations (spans, generations) or associated scores will be sent to Langfuse.
from langfuse import Langfuse
# Sample approximately 20% of traces
langfuse_sampled = Langfuse(sample_rate=0.2)
OTEL and Langfuse
The Langfuse v3 SDK is built upon OpenTelemetry (OTEL), a standard for observability. Understanding the relation between OTEL and Langfuse is not required to use the SDK, but it is helpful to have a basic understanding of the concepts. OTEL related concepts are abstracted away and you can use the SDK without being deeply familiar with them.
- OTEL Trace: An OTEL-trace represents the entire lifecycle of a request or transaction as it moves through your application and its services. A trace is typically a sequence of operations, like an LLM generating a response followed by a parsing step. The root (first) span created in a sequence defines the OTEL-trace. OTEL-traces do not have a start and end time, they are defined by the root span.
- OTEL Span: A span represents a single unit of work or operation within a trace. Spans have a start and end time, a name, and can have attributes (key-value pairs of metadata). Spans can be nested to create a hierarchy, showing parent-child relationships between operations.
- Langfuse Trace: A Langfuse trace collects observations and holds trace attributes such as
session_id
,user_id
as well as overall input and outputs. It shares the same ID as the OTEL trace and its attributes are set via specific OTEL span attributes that are automatically propagated to the Langfuse trace. - Langfuse Observation: In Langfuse terminology, an “observation” is a Langfuse-specific representation of an OTEL span. It can be a generic span (Langfuse-span) or a specialized “generation” (Langfuse-generation) or a point in time event (Langfuse-event)
- Langfuse Span: A Langfuse-span is a generic OTEL-span in Langfuse, designed for non-LLM operations.
- Langfuse Generation: A Langfuse-generation is a specialized type of OTEL-span in Langfuse, designed specifically for Large Language Model (LLM) calls. It includes additional fields like
model
,model_parameters
,usage_details
(tokens), andcost_details
. - Langfuse Event: A Langfuse-event tracks a point in time action.
- Context Propagation: OpenTelemetry automatically handles the propagation of the current trace and span context. This means when you call another function (whether it’s also traced by Langfuse, an OTEL-instrumented library, or a manually created span), the new span will automatically become a child of the currently active span, forming a correct trace hierarchy.
The Langfuse SDK provides wrappers around OTEL spans (LangfuseSpan
, LangfuseGeneration
) that offer convenient methods for interacting with Langfuse-specific features like scoring and media handling, while still being native OTEL spans under the hood. You can also use these wrapper objects to add Langfuse trace attributes.
Upgrade from v2
The v3 SDK introduces significant improvements and changes compared to v2. It is not fully backward compatible. This comprehensive guide will help you migrate based on your current integration.
Core Changes to SDK v2:
- OpenTelemetry Foundation: v3 is built on OpenTelemetry standards
- Trace Input/Output: Now derived from root observation by default
- Trace Attributes (
user_id
,session_id
, etc.) Must be set via enclosing spans, not directly on integrations (OpenAI call, Langchain invocation) - Context Management: Automatic OTEL context propagation
Migration Path by Integration Type
@observe
Decorator Users
v2 Pattern:
from langfuse.decorators import langfuse_context, observe
@observe()
def my_function():
# This was the trace
langfuse_context.update_current_trace(user_id="user_123")
return "result"
v3 Migration:
from langfuse import observe, get_client # new import
@observe()
def my_function():
# This is now the root span, not the trace
langfuse = get_client()
# Update trace explicitly
langfuse.update_current_trace(user_id="user_123")
return "result"
OpenAI Integration
v2 Pattern:
from langfuse.openai import openai
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
# Trace attributes directly on the call
user_id="user_123",
session_id="session_456",
tags=["chat"],
metadata={"source": "app"}
)
v3 Migration:
If you do not set additional trace attributes, no changes are needed.
If you set additional trace attributes, you need to wrap the OpenAI call with an enclosing span and set the trace attributes on the enclosing span:
from langfuse import get_client
from langfuse.openai import openai
langfuse = get_client()
# CRITICAL: Wrap OpenAI calls with enclosing span for trace attributes
with langfuse.start_as_current_span(name="chat-request") as span:
# Set trace attributes on the enclosing span
span.update_trace(
user_id="user_123",
session_id="session_456",
tags=["chat"],
# Explicit trace input/output for LLM-as-a-judge features
input={"query": "Hello"},
)
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
# Only generation-specific metadata allowed here
metadata={"source": "app"}
)
# Set trace output explicitly
span.update_trace(output={"response": response.choices[0].message.content})
LangChain Integration
v2 Pattern:
from langfuse.callback import CallbackHandler
handler = CallbackHandler(
user_id="user_123",
session_id="session_456",
tags=["langchain"]
)
response = chain.invoke({"input": "Hello"}, config={"callbacks": [handler]})
v3 Migration:
from langfuse import get_client
from langfuse.langchain import CallbackHandler
langfuse = get_client()
# CRITICAL: Use enclosing span for trace attributes
with langfuse.start_as_current_span(name="langchain-request") as span:
span.update_trace(
user_id="user_123",
session_id="session_456",
tags=["langchain"],
input={"query": "Hello"} # Explicit trace input
)
handler = CallbackHandler() # No trace attributes in constructor
response = chain.invoke({"input": "Hello"}, config={"callbacks": [handler]})
# Set trace output explicitly
span.update_trace(output={"response": response})
LlamaIndex Integration Users
v2 Pattern:
from langfuse.llama_index import LlamaIndexCallbackHandler
handler = LlamaIndexCallbackHandler()
Settings.callback_manager = CallbackManager([handler])
response = index.as_query_engine().query("Hello")
v3 Migration:
from langfuse import get_client
from openinference.instrumentation.llama_index import LlamaIndexInstrumentor
# Use third-party OTEL instrumentation
LlamaIndexInstrumentor().instrument()
langfuse = get_client()
with langfuse.start_as_current_span(name="llamaindex-query") as span:
span.update_trace(
user_id="user_123",
input={"query": "Hello"}
)
response = index.as_query_engine().query("Hello")
span.update_trace(output={"response": str(response)})
Low-Level SDK Users
v2 Pattern:
from langfuse import Langfuse
langfuse = Langfuse()
trace = langfuse.trace(
name="my-trace",
user_id="user_123",
input={"query": "Hello"}
)
generation = trace.generation(
name="llm-call",
model="gpt-4o"
)
generation.end(output="Response")
v3 Migration:
In v3, all spans / generations must be ended by calling .end()
on the
returned object.
from langfuse import get_client
langfuse = get_client()
# Use context managers instead of manual objects
with langfuse.start_as_current_span(
name="my-trace",
input={"query": "Hello"} # Becomes trace input automatically
) as root_span:
# Set trace attributes
root_span.update_trace(user_id="user_123")
with langfuse.start_as_current_generation(
name="llm-call",
model="gpt-4o"
) as generation:
generation.update(output="Response")
# If needed, override trace output
root_span.update_trace(output={"response": "Response"})
Key Migration Checklist
-
Update Imports:
- Use
from langfuse import get_client
to access global client instance configured via environment variables - Use
from langfuse import Langfuse
to create a new client instance configured via constructor parameters - Use
from langfuse import observe
to import the observe decorator - Update integration imports:
from langfuse.langchain import CallbackHandler
- Use
-
Trace Attributes Pattern:
- Move all
user_id
,session_id
,tags
from integration calls to enclosing spans - Use
span.update_trace()
orlangfuse.update_current_trace()
- Move all
-
Trace Input/Output:
- Critical for LLM-as-a-judge: Explicitly set trace input/output
- Don’t rely on automatic derivation from root observation if you need specific values
-
Context Managers:
- Replace manual
langfuse.trace()
,trace.span()
with context managers if you want to use them - Use
with langfuse.start_as_current_span()
instead
- Replace manual
-
LlamaIndex Migration:
- Replace Langfuse callback with third-party OTEL instrumentation
- Install:
pip install openinference-instrumentation-llama-index
-
ID Management:
- No Custom Observation IDs: v3 uses W3C Trace Context standard - you cannot set custom observation IDs
- Trace ID Format: Must be 32-character lowercase hexadecimal (16 bytes)
- External ID Correlation: Use
Langfuse.create_trace_id(seed=external_id)
to generate deterministic trace IDs from external systems
from langfuse import Langfuse, observe # v3: Generate deterministic trace ID from external system external_request_id = "req_12345" trace_id = Langfuse.create_trace_id(seed=external_request_id) @observe(langfuse_trace_id=trace_id) def my_function(): # This trace will have the deterministic ID pass
-
Initialization:
- Replace constructor parameters:
enabled
→tracing_enabled
threads
→media_upload_thread_count
- Replace constructor parameters:
Detailed Change Summary
-
Core Change: OpenTelemetry Foundation
- Built on OpenTelemetry standards for better ecosystem compatibility
-
Trace Input/Output Behavior
- v2: Integrations could set trace input/output directly
- v3: Trace input/output derived from root observation by default
- Migration: Explicitly set via
span.update_trace(input=..., output=...)
-
Trace Attributes Location
- v2: Could be set directly on integration calls
- v3: Must be set on enclosing spans
- Migration: Wrap integration calls with
langfuse.start_as_current_span()
-
Creating Observations:
- v2:
langfuse.trace()
,langfuse.span()
,langfuse.generation()
- v3:
langfuse.start_as_current_span()
,langfuse.start_as_current_generation()
- Migration: Use context managers, ensure
.end()
is called or usewith
statements
- v2:
-
IDs and Context:
- v3: W3C Trace Context format, automatic context propagation
- Migration: Use
langfuse.get_current_trace_id()
instead ofget_trace_id()
-
Event Size Limitations:
- v2: Events were limited to 1MB in size
- v3: No size limits enforced on the SDK-side for events
Future support for v2
We will continue to support the v2 SDK for the foreseeable future with critical bug fixes and security patches. We will not be adding any new features to the v2 SDK.
Troubleshooting
- Authentication Issues:
- Ensure
LANGFUSE_PUBLIC_KEY
,LANGFUSE_SECRET_KEY
, andLANGFUSE_HOST
(if not using default cloud) are correctly set either as environment variables or in theLangfuse()
constructor. - Use
langfuse.auth_check()
after initialization to verify credentials. Do not use this in production as this method waits for a response from the server.
- Ensure
- No Traces Appearing:
- Check if
tracing_enabled
isTrue
(default). - Verify
sample_rate
is not0.0
. - Ensure
langfuse.shutdown()
is called or the program exits cleanly to allowatexit
hooks to flush data. Manually calllangfuse.flush()
to force data sending. - Enable debug logging (
debug=True
orLANGFUSE_DEBUG="True"
) to see SDK activity and potential errors during exporting.
- Check if
- Incorrect Nesting or Missing Spans:
- Ensure you are using context managers (
with langfuse.start_as_current_span(...)
) for proper context propagation. - If manually creating spans (
langfuse.start_span()
), ensure they are correctly ended with.end()
. - In async code, ensure context is not lost across
await
boundaries if not using Langfuse’s async-compatible methods.
- Ensure you are using context managers (
- Langchain/OpenAI Integration Not Working:
- Confirm the respective integration (e.g.,
from langfuse.openai import openai
orLangfuseCallbackHandler
) is correctly set up before the calls to the LLM libraries are made. - Check for version compatibility issues between Langfuse, Langchain, and OpenAI SDKs.
- Confirm the respective integration (e.g.,
- Media Not Appearing:
- Ensure
LangfuseMedia
objects are correctly initialized and passed ininput
,output
, ormetadata
. - Check debug logs for any media upload errors. Media uploads happen in background threads.
- Ensure
If you encounter persistent issues, please:
- Enable debug logging to gather more information.
- Check the Langfuse status page (if applicable for cloud users).
- Raise an issue on our GitHub repository with details about your setup, SDK version, code snippets, and debug logs.