Portrait
Zhiran (Minerva) Zhang
Graduate Student
About Me

I am a current research intern and incoming MSc Computer Vision student at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), advised by Prof. Yutong Xie. Before that, I graduated from the University of Science and Technology of China (USTC) with a B.E. in Computer Science.

My research interests include computer vision, multimodal language models, and multimodal agents, especially in the context of AI for healthcare. I’m actively looking for collaboration and PhD opportunities, please feel free to drop me an email if interested.

Education
  • Mohamed bin Zayed University of Artificial Intelligence
    Mohamed bin Zayed University of Artificial Intelligence
    M.Sc. in Computer Vision
    Aug. 2026 - present
  • University of Science and Technology of China
    University of Science and Technology of China
    B.E. in Computer Science
    Sep. 2021 - Jul. 2025
Honors & Awards
  • Outstanding, National Innovation Program
    2025
  • Bronze Academic Scholarship
    2022
  • Class of Honors Scholarship
    2021, 2022
  • Multiple Provincial and University Swimming Records
Experience
  • Mohamed bin Zayed University of Artificial Intelligence
    Mohamed bin Zayed University of Artificial Intelligence
    Research Intern
    Nov. 2025 - Now
  • Stony Brook University
    Stony Brook University
    Research Intern
    Dec. 2024 - Jun. 2025
  • iFLYTEK
    iFLYTEK
    Software Engineer Intern
    Jul. 2024 - Oct. 2024
News
2026
Submitted my first conference paper to MICCAI 2026.
Jan 26
2025
Proud to receive my bachelor's degree from the University of Science and Technology of China!
Jun 22
Our Chinese National University Innovation Program Research on Deepfake Video Generation Technology Based on Deep Learning received the outstanding award (the only one in thesis defense group)!
May 21
Selected Publications (view all )
MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding
MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

Anonymous Authors

Submitted to MICCAI 2026

MemSurg is a memory-augmented framework for long-horizon surgical video understanding that leverages guided prompting and chain-of-thought reasoning. By constructing an external surgical memory graph from segmentation and motion cues, it retrieves task-relevant evidence across time and composes more coherent prompts for downstream inference. This design improves consistency on instrument, action, and workflow understanding, outperforming GPT-4o and by 22.42%.

MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

Anonymous Authors

Submitted to MICCAI 2026

MemSurg is a memory-augmented framework for long-horizon surgical video understanding that leverages guided prompting and chain-of-thought reasoning. By constructing an external surgical memory graph from segmentation and motion cues, it retrieves task-relevant evidence across time and composes more coherent prompts for downstream inference. This design improves consistency on instrument, action, and workflow understanding, outperforming GPT-4o and by 22.42%.

All publications