How to Use Model Evaluator

Model Evaluator¶

The Model Evaluator is a specialized testing environment that allows you to benchmark up to four different AI models against a single prompt simultaneously.

This helps teams identify which model provides the most accurate, well-structured, and cost-efficient responses for specific tasks before deploying them to production.

Learn More¶

To learn how to use this feature in detail, see the full guide below:

Model Evaluator Instructions

Quick Tips for Evaluation¶

Best Practices for Model Evaluation

Variable Testing

Always test prompts using diverse inputs for variables such as research_query.
This ensures that the selected model performs consistently across different topics and query styles.

Default Model Selection

Once you identify the best-performing model, click Update Prompt Model.
This saves the selected model as the permanent default for that specific prompt.