AI Model Evaluation

HELM

HELM, also known as Holistic Evaluation of Language Models, is a large-scale model evaluation system developed by Stanford University.

Tags:

HELM, also known as Holistic Evaluation of Language Models, is a large-scale model evaluation system developed by Stanford University. This evaluation method mainly includes three modules: scene, adaptation, and metrics. Each evaluation run requires specifying a scene, a prompt for adapting the model, and one or more metrics. It mainly covers English and has 7 indicators, including accuracy, uncertainty/calibration, robustness, fairness, bias, toxicity, and inference efficiency; Tasks include Q&A, information retrieval, summarization, text classification, etc.

data statistics

Relevant Navigation

No comments

No comments...