Homepage - Minerva's Homepage

Zhiran (Minerva) Zhang

Graduate Student

About Me

I am a current research intern and incoming MSc Computer Vision student at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), advised by Prof. Yutong Xie. Before that, I graduated from the University of Science and Technology of China (USTC) with a B.E. in Computer Science.

My research interests include computer vision, multimodal language models, and multimodal agents, especially in the context of AI for healthcare. I’m actively looking for collaboration and PhD opportunities, please feel free to drop me an email if interested.

Education

Mohamed bin Zayed University of Artificial Intelligence

M.Sc. in Computer Vision

Aug. 2026 - present
University of Science and Technology of China

B.E. in Computer Science

Sep. 2021 - Jul. 2025

Honors & Awards

Outstanding, National Innovation Program

2025
Bronze Academic Scholarship

2022
Class of Honors Scholarship

2021, 2022
Multiple Provincial and University Swimming Records

Experience

Mohamed bin Zayed University of Artificial Intelligence

Research Intern

Nov. 2025 - Now
Stony Brook University

Research Intern

Dec. 2024 - Jun. 2025
iFLYTEK

Software Engineer Intern

Jul. 2024 - Oct. 2024

News

2026

Submitted my first conference paper to MICCAI 2026.

Jan 26

2025

Proud to receive my bachelor's degree from the University of Science and Technology of China!

Jun 22

Our Chinese National University Innovation Program Research on Deepfake Video Generation Technology Based on Deep Learning received the outstanding award (the only one in thesis defense group)!

May 21

Selected Publications (view all )

MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

Anonymous Authors

Submitted to MICCAI 2026

MemSurg is a memory-augmented framework for long-horizon surgical video understanding that leverages guided prompting and chain-of-thought reasoning. By constructing an external surgical memory graph from segmentation and motion cues, it retrieves task-relevant evidence across time and composes more coherent prompts for downstream inference. This design improves consistency on instrument, action, and workflow understanding, outperforming GPT-4o and by 22.42%.

MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

Anonymous Authors

Submitted to MICCAI 2026

Warning

Action required

Education

Honors & Awards

Experience

News

Selected Publications (view all )

MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

MemSurg: Memory-Augmented Long-Horizon Reasoning for Surgical Video Understanding

All publications