ProLLM Benchmarks

Summarization

DescriptionEvaluates an LLM's ability to accurately summarize long texts from diverse sources such as YouTube video transcripts, websites, PDFs, and direct text inputs. It also assesses the model's capacity to follow detailed user instructions to extract specific data insights. The dataset consists of 41 unique entries in English, which have been translated into Afrikaans, Brazilian Portuguese, and Polish using machine translation.Number of Samples164LanguageEnglish, Afrikaans, Brazilian Portuguese, PolishProviderToqanEvaluation MethodAuto-evaluation with GPT4-Turbo over ground-truth summaries.Data Collection PeriodFebruary 2022 - October 2023

Language

Language of the source document

Complexity

The complexity level of the summary requests.

Last updated: November 12, 2024

Share this view

#	Model	Provider	Size	Chunk Size	Adherence To Instructions	Accuracy of Content	Quality of Writing
No results.

Summarization

View Examples