Internal
Creates a new ExperimentManager instance.
Configuration object
The Langfuse client instance for API communication
Internal
Gets the global logger instance for experiment-related logging.
The global logger instance
Executes an experiment by running a task on each data item and evaluating the results.
This method orchestrates the complete experiment lifecycle:
The experiment configuration
Array of data items to process.
Can be either custom ExperimentItem[] or DatasetItem[] from Langfuse. Each item should contain input data and optionally expected output.
Optional
description?: stringOptional description explaining the experiment's purpose.
Provide context about what you're testing, methodology, or goals. This helps with experiment tracking and result interpretation.
Optional
evaluators?: Evaluator<Input, ExpectedOutput, Metadata>[]Optional array of evaluator functions to assess each item's output.
Each evaluator receives input, output, and expected output (if available) and returns evaluation results. Multiple evaluators enable comprehensive assessment.
Optional
maxConcurrency?: numberMaximum number of concurrent task executions (default: Infinity).
Controls parallelism to manage resource usage and API rate limits. Set lower values for expensive operations or rate-limited services.
Optional
metadata?: Record<string, any>Optional metadata to attach to the experiment run.
Store additional context like model versions, hyperparameters, or any other relevant information for analysis and comparison.
Human-readable name for the experiment.
This name will appear in Langfuse UI and experiment results. Choose a descriptive name that identifies the experiment's purpose.
Optional
runEvaluators?: RunEvaluator<Input, ExpectedOutput, Metadata>[]Optional array of run-level evaluators to assess the entire experiment.
These evaluators receive all item results and can perform aggregate analysis like calculating averages, detecting patterns, or statistical analysis.
Optional
runName?: stringOptional exact name for the experiment run.
If provided, this will be used as the exact dataset run name if the data contains Langfuse dataset items. If not provided, this will default to the experiment name appended with an ISO timestamp.
The task function to execute on each data item.
This function receives input data and produces output that will be evaluated. It should encapsulate the model or system being tested.
Promise that resolves to experiment results including:
const result = await langfuse.experiment.run({
name: "Translation Quality Test",
data: [
{ input: "Hello world", expectedOutput: "Hola mundo" },
{ input: "Good morning", expectedOutput: "Buenos días" }
],
task: async ({ input }) => translateText(input, 'es'),
evaluators: [
async ({ output, expectedOutput }) => ({
name: "bleu_score",
value: calculateBleuScore(output, expectedOutput)
})
]
});
const result = await langfuse.experiment.run({
name: "Large Scale Evaluation",
data: largeBatchOfItems,
task: expensiveModelCall,
maxConcurrency: 5, // Process max 5 items simultaneously
evaluators: [myEvaluator],
runEvaluators: [
async ({ itemResults }) => ({
name: "average_score",
value: itemResults.reduce((acc, r) => acc + r.evaluations[0].value, 0) / itemResults.length
})
]
});
Manages the execution and evaluation of experiments on datasets.
The ExperimentManager provides a comprehensive framework for running experiments that test models or tasks against datasets, with support for automatic evaluation, scoring.
Example: Basic experiment usage
Example: Using with Langfuse datasets