I am a current research intern and incoming MSc Computer Vision student at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), advised by Prof. Yutong Xie. Before that, I graduated from the University of Science and Technology of China (USTC) with a B.E. in Computer Science.
My research interests include computer vision, multimodal language models, and multimodal agents, especially in the context of AI for healthcare. I’m actively looking for collaboration and PhD opportunities, please feel free to drop me an email if interested.
") does not match the recommended repository name for your site ("").
", so that your site can be accessed directly at "http://".
However, if the current repository name is intended, you can ignore this message by removing "{% include widgets/debug_repo_name.html %}" in index.html.
",
which does not match the baseurl ("") configured in _config.yml.
baseurl in _config.yml to "".

Anonymous Authors
Submitted to MICCAI 2026
MemSurg is a memory-augmented framework for long-horizon surgical video understanding that leverages guided prompting and chain-of-thought reasoning. By constructing an external surgical memory graph from segmentation and motion cues, it retrieves task-relevant evidence across time and composes more coherent prompts for downstream inference. This design improves consistency on instrument, action, and workflow understanding, outperforming GPT-4o and by 22.42%.
Anonymous Authors
Submitted to MICCAI 2026
MemSurg is a memory-augmented framework for long-horizon surgical video understanding that leverages guided prompting and chain-of-thought reasoning. By constructing an external surgical memory graph from segmentation and motion cues, it retrieves task-relevant evidence across time and composes more coherent prompts for downstream inference. This design improves consistency on instrument, action, and workflow understanding, outperforming GPT-4o and by 22.42%.