How to Use Model Evaluator
Model Evaluator¶
The Model Evaluator is a specialized testing environment that allows you to benchmark up to four different AI models against a single prompt simultaneously.
This helps teams identify which model provides the most accurate, well-structured, and cost-efficient responses for specific tasks before deploying them to production.
Learn More¶
To learn how to use this feature in detail, see the full guide below:
Quick Tips for Evaluation¶
Best Practices for Model Evaluation
Variable Testing
Always test prompts using diverse inputs for variables such as research_query.
This ensures that the selected model performs consistently across different topics and query styles.
Default Model Selection
Once you identify the best-performing model, click Update Prompt Model.
This saves the selected model as the permanent default for that specific prompt.