MMLU | GPTtopic

MMLU, also known as Massive Multitask Language Understanding, is an assessment of language comprehension ability for large models. It is currently one of the most famous semantic comprehension assessments for large models, launched by researchers from UC Berkeley University in September 2020. This test covers 57 tasks, including elementary mathematics, American history, computer science, law, and more. The task covers a wide range of knowledge, in English, to evaluate the basic knowledge coverage and understanding ability of the large model.

data statistics

Relevant Navigation

LLMEval3

LLMEval is a large-scale model evaluation benchmark launched by the NLP Laboratory of Fudan University.

HELM

HELM, also known as Holistic Evaluation of Language Models, is a large-scale model evaluation system developed by Stanford University.

C-Eval

C-Eval is a multi-level and multidisciplinary Chinese assessment kit suitable for large language models

MMBench

MMBench is a multimodal benchmark test developed by researchers from Shanghai Artificial Intelligence Laboratory, Nanyang Technological University,

SuperCLUE

SuperCLUE is a comprehensive evaluation benchmark for Chinese general large models, which evaluates the capabilities of models from three different dimensions: basic ability, professional ability, and Chinese characteristic ability.

OpenCompass

OpenCompass is a large-scale open evaluation system officially launched by Shanghai Artificial Intelligence Laboratory (Shanghai AI Laboratory) in August 2023

No comments

No comments...