ProLLM Benchmarks

StackUnseen

DescriptionEvaluates an LLM's ability to answer recent Stack Overflow questions, highlighting its effectiveness with new and emerging content.Number of Samples194LanguageEnglishProviderStack OverflowEvaluation MethodAuto-evaluation with GPT4-Turbo over ground-truth.Data Collection PeriodMarch 2024 - May 2024

Question Type

The type of coding question asked.

Complexity

The complexity level of the questions.

Programming Language

The programming language used in the question.

Last updated: November 14, 2024

Share this view

#	Model	Provider	Size	Acceptance
No results.

StackUnseen

View Examples