AI Model Evaluation
PubMedQA
PubMedQA is a biomedical research question and answer dataset that includes 1K expert annotated, 61.2K unlabeled, and 211.3K manually generated QA instances. The ranki...
Tags:AI Model EvaluationAI TestPubMedQA is a biomedical research question and answer dataset that includes 1K expert annotated, 61.2K unlabeled, and 211.3K manually generated QA instances. The ranking currently includes medical test scores for 18 models.
data statistics
Relevant Navigation
Chatbot Arena
Chatbot Arena is a benchmark platform for Large Language Modeling (LLM), which conducts anonymous random battles through crowdsourcing. The project is led by LMSYS Org, a research organization co founded by the University of California, Berkeley, the University of California, San Diego, and Carnegie Mellon University.
No comments...