image

Yang Hao

Phd Student
  • hyang@ir.hit.edu.cn
  • https://github.com/yhit98

About Me

I am a Ph.D. student in Research Center for Social Computing and Information Retrieval(SCIR), at Harbin Institute of Technology (HIT, China). My advisor is Prof. Yanyan Zhao and Prof. Bing Qin. My research interests include multimodal fine-grained sentiment analysis, multimodal large language models and multimodal data generation.

PUBLICATIONS

2024
arxiv

From Unimodal to Multimodal: A Framework for Generating High-Quality Multimodal Emotional Chit-chat Dialogue

Hao Yang, Yanyan Zhao, Yang Wu, Shilong Wang, Tian Zheng, Hongbo Zhang, Bing Qin
  • Large language models (LLMs) have demonstrated formidable capabilities in both task-oriented and chit-chat dialogues. However, when extended to large vision-language models (LVLMs), we found LVLMs excel in objectively describing the content in the image, while subjective multimodal emotional chit-chat (MEC) dialogues ability is insufficient, a shortfall we attribute to the scarcity of high-quality MEC data. The collection and annotation of high-quality MEC dialogue data are helpful, but the cost of such processes makes the acquisition of large-scale data challenging. Addressing this gap, we introduce an adversarial LLM-based data augmentation framework U2MEC for generating MEC dialogue data from unimodal data.
2024
arxiv

Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey

Hao Yang, Yanyan Zhao, Yang Wu, Shilong Wang, Tian Zheng, Hongbo Zhang, Bing Qin
  • This survey aims to present a comprehensive review of recent research in text-centric multimodal sentiment analysis tasks, examine the potential of LLMs for text-centric multimodal sentiment analysis, outlining their approaches, advantages, and limitations, summarize the application scenarios of LLM-based multimodal sentiment analysis technology, and explore the challenges and potential research directions for multimodal sentiment analysis in the future.
2024
arxiv

An early evaluation of gpt-4v (ision)

Yang Wu, Shilong Wang, Hao Yang, Tian Zheng, Yanyan Zhao, Bing Qin
  • In this paper, we evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio.
2023
arxiv

UNIMO-3: Multi-granularity Interaction for Vision-Language Representation Learning

Hao Yang, Can Gao, Hao Liu, Xinyan Xiao, Yanyan Zhao, Bing Qin
  • We propose the UNIMO-3 model, which has the capacity to simultaneously learn the multi-modal in-layer interaction and cross-layer interaction. UNIMO-3 model can establish effective connections between different layers in a cross-modal encoder, and adaptively capture the interaction between two modalities at different levels.
2022
EMNLP

Face-Sensitive Image-to-Emotional-Text Cross-modal Translation for Multimodal Aspect-based Sentiment Analysis

Hao Yang, Yanyan Zhao, Bing Qin
  • Proposed a cross-modal translation approach that explicitly utilizes facial expressions as visual emotional cues in open-domain images. Introduced a fine-grained cross-modal alignment method based on CLIP, achieving alignment and matching between textual sentiment targets and facial expressions in images. The method achieved state-of-the-art results on the Twitter-15 and Twitter-17 datasets.
2022
ESI

MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with Multimodal Fine-grained Aligned Annotations

Hao Yang, Yanyan Zhao, Yang Wu, Bing Qin
  • Constructed the first Chinese sentiment analysis dataset, MACSA, with fine-grained cross-modal alignment annotations of both text and images. Proposed a cross-modal fine-grained alignment annotation based on aspect categories, mitigating the issues of weakly supervised text-image alignment and missing sentiment targets in text. On the MACSA dataset, introduced a cross-modal alignment fusion network based on multi-modal heterogeneous graphs, achieving state-of-the-art results.
2022
Computer Research and Development

Survey: Multimodal Sentiment Analysis

Yanyan Zhao, Hao Yang, Yang Wu, Zhenyu Zhang, Bing Qin
  • Reviewed and summarized the relevant research in multimodal sentiment analysis, and proposed a research framework for fine-grained multimodal sentiment analysis.
2021
ACL Findings

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

Yang Wu, Yanyan Zhao, Hao Yang, Song Chen, Bing Qin, Xiaohuan Cao, Wenting Zhao
  • We propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily.

Education

  • Phd in Computer Science
    Harbin Institute of Technology
    2020 - Present
  • Visiting Student
    Nanyang Technological University
    2024 - Present
  • BSc Software Engineering
    Harbin Institute of Technology
    2016 - 2020

Internship

Baidu

2022.08 - 2023.05
TPG, Prospective research, Vision-and-Language Pre-train Model

Rewards

  • The 2018 ACM-ICPC Asia Beijing Regional Contest
    Silver Medal, 14th
    2018
  • The 2019 ACM-ICPC Shandong Province Contest
    Gold Medal, 18th
    2019

Skills

  • Python
  • C++
  • Pytorch
  • Tensorflow