top of page

Upcoming Event

Feature Representations for Visual and Language: Towards Deeper Video Understanding

This talk contains three research topics focusing on deeper video understanding using Transformer-based models for feature representation. The research proposes better systems for video question answering and humor prediction tasks and uses video related data to validate the effectiveness of the proposed systems. For video QA, BERT is used to represent visual and subtitle semantics to improve the accuracy on TVQA and Pororo datasets. A comparative study of Transformers is then made to link performance differences to their pre-training methods. For humor prediction, a novel multimodal method using pose, face, and subtitle features with a sliding window outperforms previous approaches on a new comedy dataset. The work highlights the importance of selecting optimal features and models for deeper video analysis.

​​

​

Presenter

Prof. Zekun YANG, Tokyo University of Science

​​

​

​Date

March 12, 2026 (Thursday)

​​

​

Time

11:00 am (HK Time)

​​

​

​Venue

CPD-3.01, Run Run Shaw Tower, Centennial Campus

​​

​

Presenter's biography

Zekun YANG is an Assistant Professor at Tokyo University of Science. He graduated from the University of Osaka in 2021 and has worked at Donghua University (China) and Nagoya University (Japan). His research areas include Machine Learning, Intelligent System and Applications, and Multimedia Information Processing.

​

​

event20260312.jpg

Bachelor of Arts in Humanities and Digital Technologies

Tel: 852-3917 8977 

Email: bahdt@hku.hk

Address: Faculty of Arts General Office, Room 4.05,

4/F Run Run Shaw Tower, Centennial Campus
The University of Hong Kong


© 2024 by the Faculty of Arts, The University of Hong Kong.

bottom of page