12-18 Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge
11-21 CAT-Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios