Model Evaluator¶

The Model Evaluator in Kompass helps teams determine which AI model performs best for a specific prompt.

Different AI models can produce different responses for the same prompt. Some models may:

-give more accurate insights

-generate better structured outputs

-respond faster or be more cost-efficient

The Model Evaluator allows users to test the same prompt across multiple AI models, compare responses, and select the best performing model for production use.

Navigation "Prompts" ->"Model Evaluator"

Step 1: Prompt Selection¶

Users can search and select an existing prompt from the Kompass prompt library.

Step 2:Load the Prompt with Variables¶

Select the prompt from the list.

The prompt may include variables such as:

research_query

This variable will accept a research question or topic as input.

Example input:

"Impact of AI on retail supply chains"

The system will use this input when generating responses.

Step 3: Add Models for Comparison¶

Upto 4 models can be tested simultaneously to compare performance.

Step 4: Generate Responses¶

The system sends the prompt and the variable input (e.g., the research query) to each selected model.

Each model generates its own response.

Step 5: Review and Edit the Prompt¶

Use the dropdown editor to preview the prompt.

From this interface you can:

review prompt instructions
edit the prompt structure
modify how the variable is used
improve output formatting

Step 6: Evaluate Model Responses¶

The Model Evaluator panel displays:

Prompt name
Prompt variables
Selected models
Generated responses

Users should evaluate responses based on:

relevance to the research query
depth of insight
structured output
clarity
factual accuracy

Step 7: Select the Best Performing Model¶

After reviewing the responses, determine which model produces the best output for the given prompt and variable.

The best model should provide:

accurate information
structured responses
consistent results

Step 8: Update the Prompt Model¶

Click:

Update Prompt Model

This saves the selected model as the default model used for that prompt.

Future executions of the prompt will automatically use this model.