SQL Disambiguation

DescriptionEvaluates an LLM's ability to disambiguate user requests for generating SQL queries based on the given business rules and database schema. A question can either be answered using the schema, a combination of the schema and the business rules, or requires additional information to be answered.Number of Samples245LanguagePortugueseProvideriFoodEvaluation MethodWeighted accuracy score to evaluate the LLM's ability to accurately classify user requests into one of the three categories: schema-based, schema-and-rule-based, or additional-information-required. Ground truth was curated using domain expert labels.Data Collection PeriodApril 2024 - May 2024

Last updated: September 10, 2024

Share this view
#
Model
Provider
Size
Accuracy Score
No results.

Have a unique use-case you’d like to test?

We want to evaluate how LLMs perform on your specific, real world task. You might discover that a small, open-source model delivers the performance you need at a better cost than proprietary models. We can also add custom filters, enhancing your insights into LLM capabilities. Each time a new model is released, we'll provide you with updated performance results.

Leaderboard

An open-source model beating GPT-4 Turbo on our interactive leaderboard.

Don’t worry, we’ll never spam you.

Please, briefly describe your use case and motivation. We’ll get back to you with details on how we can add your benchmark.