FlagEval
FlagEval, jointly developed by Zhiyuan Research Institute and multiple university teams
Tags:AI Model EvaluationAI TestFlagEval, jointly developed by Zhiyuan Research Institute and multiple university teams, is a large-scale model evaluation platform that adopts a three-dimensional evaluation framework of “ability task indicator”, aiming to provide comprehensive and detailed evaluation results. The platform has provided comprehensive evaluations of over 600 dimensions, including over 30 abilities, 5 tasks, and 4 categories of indicators. The task dimensions include 22 subjective and objective evaluation datasets and 84433 questions.
data statistics
Relevant Navigation
Chatbot Arena is a benchmark platform for Large Language Modeling (LLM), which conducts anonymous random battles through crowdsourcing. The project is led by LMSYS Org, a research organization co founded by the University of California, Berkeley, the University of California, San Diego, and Carnegie Mellon University.