✨ About me

Hi, everyone! I am currently a second-year PhD student (10.2024-) at King’s College London, NLP group, School of Informatics. I am fortunate to be supervised by Dr. Lin Gui and Prof. Yulan He. I finished my MSC AI at the University of Edinburgh and my BEng EEE project jointly at the University of Edinburgh and North China Electric Power University(NCEPU). I am fortunate to be supervised by Prof. Frank Keller for my MSC and Dr. Jiabin Jia for my BEng.

In addition to research, I interned for four months as a full-stack engineer specialising in voice cloning algorithms at 01.AI. I am currently a qingyun intern at Tencent YuanBao for Agent Memory, welcome any chat with me!

🔍 Research Summary

My research interests lie in the intersection of Natural Language Processing and multi-modal understanding, with a focus on aligning retrieval, reasoning, and model internal representations for reliable and efficient language models. Generally speaking, my goal is 1) Advancing retrieval-augmented and embedding-based frameworks to robustly connect retrieved evidence with LLMs, improving answer faithfulness and controllability in long-context generation. 2) Developing the robust cross-modality alignment and comprehension abilities in real-world, interactive communication environments. My long-term vision is to build principled AI systems that unify retrieval, reasoning, and multi-modal understanding into trustworthy, efficient, and human-aligned communicative agents. Specifically, my latest research focuses on:

Agent Memory and Retrieval for LLM Generation, My main research focuses on how retrieval should support language generation, spanning classic retrieval- augmented generation (RAG) and more recent agent memory settings. In early work, I studied open-domain QA and retrieval-aware generation, including efficient question–answer representations EEE-QA, COLING 2024, embedding-level retrieval control for open-domain QA EmbQA, ACL 2025 Main and reader-aware retrieval summarisation Spectrum Projection Score (SPS), AAAI 2026 Oral. More recently, I have extended this research line to agent memory, formalising it as a structured retrieval problem beyond standard RAG xMemory, Arxiv 2026.02. Across these settings, I study how evidence can be selected, organised, and aligned with LLM representations to improve generation quality, efficiency, and faithfulness.
Efficient Embedding-Level Control in Latent-Space, I also study how language model behaviour can be made more efficient, controllable, and robust beyond explicit token-level prompting. My work in this area explores latent spaces and continuous mechanisms, especially methods that compress reasoning CODI, 2025 EMNLP Main, guide internal representations, or improve robustness through representation-level control [OSCR-Attack, 2026 ACL findings].
Multimodal Understanding and Generation, Another strand of my work focuses on multimodal understanding and generation, including visual question generation Causal and temporal inference in videos QA, ACL ALVR 2024, visual reasoning, visual localisation, decoding enhancement, and human motion generation Human motion video generation, TPAMI 2025.

🔥 News

2026.04:🧑‍💻Started internship at Tencent Yuanbao Qingyun Intern for Agent Memory ! 🎉
2026.04: Our paper OSCR-Attack: One-Shot Character Level Attacks through Self-Optimizing Continuous Relaxation has been accepted by ACL 2026 findings! 🎉
2025.11: Our paper Spectrum Projection Score: Aligning Retrieved Summaries with Reader Models in Retrieval-Augmented Generation has been accepted by AAAI 2026 Oral🌟! 🎉
2025.08: Our paper CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation has been accepted by EMNLP 2025 Main! 🎉
2025.07: Our paper Human motion video generation: A survey has been accepted by TPAMI 2025! 🎉
2025.05: Our paper Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering has been accepted by ACL 2025 Main! 🎉
2024.10: I start my PhD📚 journey at King's College London, NLP group!
2024.06: Our paper Causal and Temporal Inference in Visual Question Generation by Utilizing Pre-trained Models has been accepted by ACL ALVR 2024! 🎉
2024.02: Our paper Exploring Effective and Efficient Question-Answer Representations has been accepted by COLING 2024! 🎉
2023.12: Our paper EEE-QA: Exploring Effective and Efficient Question-Answer Representations has been accepted by AAAI 2024 DEPLOYABLE AI! 🎉

🚀 I am always open to new collaborations and engaging discussions. Feel free to reach out if you are interested in working together or just want to chat!

😆 Mentee

LLM Safety
- Lingyi Kong (LLM Security and Attack)
Multi-modal Alignment
- Zipeng Zhu (Image Edit)

💬 Invited Talks

02/2026. Queen Mary University of London, NLP Group

📚 Text Generation & Retrieval/RAG

Arxiv 02.2026

Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation

Zhanghao Hu, Qinglin Zhu, Hanqi Yan, Yulan He, Lin Gui

[Project] [Code]

Standard RAG top-k is misaligned with agent memory, yielding redundant retrieval and brittle pruning in correlated dialogue streams.
Introduced decoupling→aggregation: semantic-component indexing with guided hierarchy building, plus structure-driven top-down retrieval with uncertainty-gated expansion.
Improved QA quality and token efficiency on LoCoMo/PerLTQA across multiple LLM backbones, validating component-level retrieval over top-k+pruning.

AAAI 2026 Oral

Beyond Perplexity: Let the Reader Select Retrieval Summaries via Spectrum Projection Score

Zhanghao Hu, Qinglin Zhu, Siya Qi, Yulan He, Hanqi Yan, Lin Gui

[Project] [Code]

Proposes SPS, a supervision-free metric to assess semantic alignment between retrieved summaries and LLM representations.
Introduces xCompress, an inference-time controller that ranks and compresses retrievals to improve generation and clarify retrieval–generation interaction.

ACL 2025 Main

Beyond Prompting: An Efficient Embedding Framework for Open-Domain Question Answering

Zhanghao Hu, Hanqi Yan, Qinglin Zhu, Zhenyi Shen, Yulan He, Lin Gui

[Project] [Code]

Reordering retrieved passages to highlight those most likely to contain correct answers by refining query representations via lightweight linear layers under an unsupervised contrastive learning objective.
Introduce an exploratory embedding that broadens the model’s latent semantic space to diversify candidate generation and employs an entropy-based selection mechanism to choose the most confident answer automatically

COLING 2024

EEE-QA: Exploring Effective and Efficient Question-Answer Representations

Zhanghao Hu, Yijun Yang, Junjie Xu*, Yifu Qiu, Pinzhen Chen

[Project]

This work challenges the existing question-answer encoding convention and explores finer representations. We experiment with different PLMs, and with and without the integration of knowledge graphs. Results prove that the memory efficacy of the proposed techniques is with little sacrifice in performance.

🤔 Latent & Efficient Reasoning

EMNLP 2025 Main

CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation

Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He

[Project]

CODI (Continuous Chain-of-Thought via Self-Distillation) is a novel framework that distils CoT into a continuous space, where a shared model acts as both teacher and student, jointly learning explicit and implicit CoT while aligning their hidden activation on the token generating the final answer.

TPAMI 2025

Human motion video generation: A survey

Haiwei Xue, Xiangyang Luo, Zhanghao Hu, Xin Zhang, Xunzhi Xiang, Yuqin Dai, Jianzhuang Liu, Zhensong Zhang, Minglei Li, Jian Yang, Fei Ma, Zhiyong Wu, Changpeng Yang, Zonghong Dai, Fei Richard Yu

[Project]

This paper addresses this gap by providing an in-depth survey of human motion video generation, encompassing over ten sub-tasks, and detailing the five key phases of the generation process: input, motion planning, motion video generation, refinement, and output.

ACL ALVR 2024

Causal and Temporal Inference in Visual Question Generation by Utilizing Pre-trained Models

Zhanghao Hu, Frank Keller

[Project]

Our study introduces a framework that leverages vision-text matching pre-trained models to guide language models in recognizing event-entity relationships within videos and generating inferential questions.

Professional Service

Volunteer:
- AAAI 2026
Reviewers:
- NLP: EMNLP 2025, ACL 2026
- AI/ML: AAAI 2026, ICML 2026

🎖 Honors and Awards

2023.01 IBM Shortlist for Best Project in Machine Learning Practical Course, Ranked 5/103.
2022.06 Outstanding Graduate Award, NCEPU
2022.05 £3000 Scholarship, University of Edinburgh, For Excellent 2+2 International Students
2021.05 £2500 Scholarship, University of Edinburgh, For Excellent 2+2 International Students
2020.10 Third Prize Academic Scholarship, NCEPU, Awarded to top 10% of students
2019.10 Second Prize Academic Scholarship, NCEPU, Awarded to top 5% of students

📖 Educations

2022.09 – 2023.11, MSc in Artificial Intelligence, University of Edinburgh, Distinction Degree, ranked top ~10%
2020.09 – 2022.05, Bachelor in Electronics and Electrical Engineering, University of Edinburgh, First-Class Honour Degree, ranked top ~10%
2018.09 – 2020.06, Bachelor in Electrical Engineering and Its Automation, North China Electric Power University (NCEPU) Ranked ~15%

💻 Internships

2024.04 - 2024.08, Research Intern at 01.AI.

Zhanghao Hu

✨ About me

🔍 Research Summary

🔥 News

😆 Mentee

💬 Invited Talks

📚 Text Generation & Retrieval/RAG

🤔 Latent & Efficient Reasoning

Multi-modal interpretability & application

Professional Service

🎖 Honors and Awards

📖 Educations

💻 Internships