I am Shida Wang (王世炟), an M.S. student at the School of Artificial Intelligence and Data Science, University of Science and Technology of China (USTC), advised by Prof. Linli Xu. My research spans Multimodal Large Language Models, Video Understanding, Token Compression, and LLM Security, with a focus on building efficient and trustworthy vision-language systems. Outside research, I enjoy basketball and R&B music.

News

Jun 2026
Started as dots · Ace Top Intern at Xiaohongshu (RED), Shanghai.
Mar 2026
Released SCORE (Dynamic Token Compression for video) on arXiv.
Feb 2026
Two papers (TableMix, DiG) accepted to CVPR 2026.
Oct 2025
Joined Tencent Youtu Lab as a Research Intern.
Aug 2025
Released FPEdit (robust LLM fingerprinting) on arXiv.

Research Interests

Multimodal LLMs Video Understanding Token Compression Efficient Inference LLM Security

Publications

SCORE teaser
arXiv 2026
Shida Wang*, Yongxiang Hua*, Zhou Tao, Haoyu Cao, Linli Xu (* equal contribution)
An RL-learned policy compresses video tokens for 16× prefill speedup at ~99.5% performance.
Multimodal Large Language Models have demonstrated remarkable capabilities in video understanding, yet face prohibitive computational costs and performance degradation from "context rot" due to massive visual token redundancy. Existing compression strategies typically rely on heuristics or fixed transformations that are often decoupled from the downstream task objectives. We propose SCORE (Surprise-augmented token COmpression via REinforcement learning), a unified framework that learns an adaptive token compression policy. SCORE introduces a lightweight policy network conditioned on a surprise-augmented state representation that incorporates inter-frame residuals to capture temporal dynamics and motion saliency, optimized via a group-wise reinforcement learning scheme with a split-advantage estimator and a static→real two-stage curriculum. SCORE achieves a 16× prefill speedup while preserving 99.5% of original performance at a 10% retention ratio.
CVPR
2026
CVPR 2026
Chaohu Liu, Shida Wang, Yubo Wang, Linli Xu
A data-centric augmentation strategy that boosts multimodal table reasoning in MLLMs.
DiG teaser
CVPR 2026
Zhou Tao*, Shida Wang*, Yongxiang Hua, Haoyu Cao, Linli Xu (* equal contribution)
MLLMs learn fine-grained perception by localizing all differences between similar image pairs.
Multimodal Large Language Models (MLLMs) have achieved impressive performance across vision-language tasks, yet their fine-grained visual perception and precise spatial reasoning remain limited. We introduce DiG (Differential Grounding), a proxy task framework where MLLMs learn fine-grained perception by identifying and localizing all differences between similar image pairs, without prior knowledge of the number of differences. To support scalable training, we develop an automated 3D-rendering-based data generation pipeline and employ curriculum learning for stable optimization. DiG significantly improves performance on RefCOCO, RefCOCO+, RefCOCOg, and general multimodal perception benchmarks.
FPEdit teaser
arXiv 2025
Shida Wang, Chaohu Liu, Yubo Wang, Linli Xu
Knowledge-editing injects robust natural-language fingerprints into LLMs in under 2 minutes.
We introduce FPEdit, a framework that leverages knowledge editing to inject semantically coherent natural language fingerprints through sparse, targeted modifications to model weights. Our Promote-Suppress Value Vector Optimization achieves 95–100% fingerprint retention under full-parameter fine-tuning and parameter-efficient adaptation, remaining robust under quantization, pruning, and stochastic decoding. FPEdit embeds 10 fingerprint pairs into LLaMA2-7B in under 2 minutes using less than 30 GB GPU memory.

Internships

June 2026 – Present
dots · Ace Top Intern
Xiaohongshu (RED) · Shanghai, China
Oct. 2025 – May 2026
Research Intern
Tencent Youtu Lab · Hefei, China

Education

Sept. 2024 – Present
M.S. in Artificial Intelligence
USTC · School of Artificial Intelligence and Data Science, Hefei
Sept. 2021 – July 2024
B.S. in Artificial Intelligence
USTC · School of Artificial Intelligence and Data Science, Hefei
Sept. 2020 – July 2021
B.S. in Management (transferred)
USTC · School of Management, Hefei

Honors & Awards

2023 China Petroleum Scholarship · Top 1
2023 Soong Ching Ling Future Scholarship
2022 Third Prize, Contemporary Undergraduate Mathematical Contest in Modeling, Anhui Province
2019 First Prize, National High School Mathematics Competition, Jilin Province
2019 Second Prize, National High School Physics Competition, Jilin Province
2019 Second Prize, China Chemistry Olympiad (Preliminary Round), Jilin Province