Optional
descriptionOptional description explaining the experiment's purpose.
Provide context about what you're testing, methodology, or goals. This helps with experiment tracking and result interpretation.
Optional
evaluatorsOptional array of evaluator functions to assess each item's output.
Each evaluator receives input, output, and expected output (if available) and returns evaluation results. Multiple evaluators enable comprehensive assessment.
Optional
maxMaximum number of concurrent task executions (default: Infinity).
Controls parallelism to manage resource usage and API rate limits. Set lower values for expensive operations or rate-limited services.
Optional
metadataOptional metadata to attach to the experiment run.
Store additional context like model versions, hyperparameters, or any other relevant information for analysis and comparison.
Human-readable name for the experiment.
This name will appear in Langfuse UI and experiment results. Choose a descriptive name that identifies the experiment's purpose.
Optional
runOptional array of run-level evaluators to assess the entire experiment.
These evaluators receive all item results and can perform aggregate analysis like calculating averages, detecting patterns, or statistical analysis.
Optional
runOptional exact name for the experiment run.
If provided, this will be used as the exact dataset run name if the data contains Langfuse dataset items. If not provided, this will default to the experiment name appended with an ISO timestamp.
The task function to execute on each data item.
This function receives input data and produces output that will be evaluated. It should encapsulate the model or system being tested.
Array of data items to process.
Can be either custom ExperimentItem[] or DatasetItem[] from Langfuse. Each item should contain input data and optionally expected output.