Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos

TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK REMOVE Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B (CoT) Text Score 25.8 # 14 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B (CoT) Video Score 22.2 # 18 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B (CoT) Group Score 5.2 # 19 Temporal Relation Extraction Vinoground Phi-3.5-Vision Text Score 24 # 16 Temporal Relation Extraction Vinoground Phi-3.5-Vision Video Score 22.4 # 17 Temporal Relation Extraction Vinoground Phi-3.5-Vision Group Score 6.2 # 17 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B Text Score 21.8 # 19 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B Video Score 25.6 # 13 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B Group Score 6.2 # 17 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B (CoT) Text Score 21.8 # 19 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B (CoT) Video Score 26.2 # 11 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-7B (CoT) Group Score 6.8 # 14 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B Text Score 23 # 18 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B Video Score 21.2 # 20 Temporal Relation Extraction Vinoground LLaVA-NeXT-Video-34B Group Score 3.8 # 20 Temporal Relation Extraction Vinoground Claude 3.5 Sonnet Text Score 32.8 # 10 Temporal Relation Extraction Vinoground Claude 3.5 Sonnet Video Score 28.8 # 7 Temporal Relation Extraction Vinoground Claude 3.5 Sonnet Group Score 10.6 # 9

Read more here: Source link