Function Calling
DescriptionEvaluates an LLM's ability to accurately use defined functions to perform specific tasks, such as web searches, code execution, and planning multiple function calls. Input data is a conversation history, and a list of possible tools to use.Number of Samples788LanguageEnglishProviderToqanEvaluation MethodMulti-class classification accuracy using human-labeled data & Auto-evaluation with GPT4-Turbo over ground-truth.Data Collection PeriodJanuary 2024 - May 2024
Function
Function Types the models were tested on.
Inference Method
Approach to querying the model for function use.
Last updated: August 30, 2024
Share this view
# | Model | Provider | Size | Inference | Function Accuracy | Argument Correctness |
---|---|---|---|---|---|---|
No results. |