AI Model Evaluation

MMBench

MMBench is a multimodal benchmark test developed by researchers from Shanghai Artificial Intelligence Laboratory, Nanyang Technological University,

Tags:

MMBench is a multimodal benchmark test developed by researchers from Shanghai Artificial Intelligence Laboratory, Nanyang Technological University, The Chinese University of Hong Kong, National University of Singapore, and Zhejiang University. This system has developed a comprehensive evaluation process, which divides the assessment from perceptual to cognitive abilities, covering 20 fine-grained abilities and collecting about 3000 multiple-choice questions from internet and authoritative benchmark datasets. Breaking the conventional question and answer based on rule matching extraction options for evaluation, iteratively shuffling options to verify the consistency of output results, and replying to options based on the ChatGPT precise matching model.
The characteristics and advantages of MMBench
Based on perception and reasoning, the evaluation dimensions are subdivided step by step. Approximately 3000 multiple-choice questions covering 20 fine-grained evaluation dimensions including object detection, text recognition, action recognition, image understanding, and relational reasoning
A more robust evaluation method. The same single choice question is asked repeatedly, and the model outputs all pointing to the same answer, which is considered as passed. Compared to the traditional one-time pass evaluation, the accuracy of top-1 decreases by an average of 10% to 20%. Minimizing the impact of various noise factors on the evaluation results ensures the reproducibility of the results.
A more reliable method for extracting model output. Based on ChatGPT matching model output and options, even if the model does not output according to instructions, it can accurately match to the most reasonable option

data statistics

Relevant Navigation

No comments

No comments...