StackUnseen
DescriptionEvaluates an LLM's ability to answer recent Stack Overflow questions, highlighting its effectiveness with new and emerging content.Number of Samples194LanguageEnglishProviderStack OverflowEvaluation MethodAuto-evaluation with GPT4-Turbo over ground-truth.Data Collection PeriodMarch 2024 - May 2024
Question Type
The type of coding question asked.
Complexity
The complexity level of the questions.
Programming Language
The programming language used in the question.
Last updated: November 14, 2024
Share this view
# | Model | Provider | Size | Acceptance |
---|---|---|---|---|
No results. |