MMBench | GPTtopic

MMBench is a multimodal benchmark test developed by researchers from Shanghai Artificial Intelligence Laboratory, Nanyang Technological University, The Chinese University of Hong Kong, National University of Singapore, and Zhejiang University. This system has developed a comprehensive evaluation process, which divides the assessment from perceptual to cognitive abilities, covering 20 fine-grained abilities and collecting about 3000 multiple-choice questions from internet and authoritative benchmark datasets. Breaking the conventional question and answer based on rule matching extraction options for evaluation, iteratively shuffling options to verify the consistency of output results, and replying to options based on the ChatGPT precise matching model.
The characteristics and advantages of MMBench
Based on perception and reasoning, the evaluation dimensions are subdivided step by step. Approximately 3000 multiple-choice questions covering 20 fine-grained evaluation dimensions including object detection, text recognition, action recognition, image understanding, and relational reasoning
A more robust evaluation method. The same single choice question is asked repeatedly, and the model outputs all pointing to the same answer, which is considered as passed. Compared to the traditional one-time pass evaluation, the accuracy of top-1 decreases by an average of 10% to 20%. Minimizing the impact of various noise factors on the evaluation results ensures the reproducibility of the results.
A more reliable method for extracting model output. Based on ChatGPT matching model output and options, even if the model does not output according to instructions, it can accurately match to the most reasonable option

data statistics

Relevant Navigation

提示工程指南

The Prompt Engineering Guide is provided by DAIR The AI initiated project aims to assist research and development and industry professionals in understanding reminder engineering.

AIPRM

ChatGPT Prompts browser extension primarily designed for SEO and SaaS copywriting

PromptBase

AI Prompts Collection Market

PubMedQA

PubMedQA is a biomedical research question and answer dataset that includes 1K expert annotated, 61.2K unlabeled, and 211.3K manually generated QA instances. The ranking currently includes medical test scores for 18 models.

MJ Prompt Tool

Midjournal Prompt Help Tool

词魂

Word Soul is an AIGC boutique prompt word library where you can find various prompt words and spells for AI painting, helping you better use AI tools, quickly achieve desired effects, and improve work efficiency. If you are an excellent prompt word creator, you can also sell your own prompt words here.

No comments

No comments...