Adaptive Gradient Harmonization: Mitigating Modality Dominance in Unified Representation Learning

Jan 23, 2025·

Lakshya

· 1 min read

Abstract

In unified representation learning, distinct modalities often compete for optimization bandwidth, leading to a phenomenon known as ‘Modality Laziness’ or dominance. This paper introduces Adaptive Gradient Harmonization, a novel approach to mitigate this issue. We innovate the Modality Fairness Controller (MFC) to dynamically balance learning rates across Vision and Audio based on their real-time training dynamics. Furthermore, we propose the Overfitting-to-Generalization Ratio (OGR) as a new metric for evaluating multimodal model health. Our experiments on CIFAR-10 and CREMA-D benchmarks demonstrate a 2.3% accuracy boost and a 40% reduction in modality imbalance.

Type

Publication

Under Review

Overview

This research addresses the critical challenge of Modality Dominance in multimodal neural networks. When training on diverse data types (e.g., Audio and Vision), models often prioritize the modality that is “easier” to learn, neglecting the other.

Key Contributions

Modality Fairness Controller (MFC): A dynamic mechanism that adjusts learning rates in real-time.
Overfitting-to-Generalization Ratio (OGR): A robustness metric for unified models.
Benchmark Success: Verified improvements on CIFAR-10 and CREMA-D.

Last updated on Jan 23, 2025

Multimodal Learning Deep Learning Optimization

Authors

Lakshya (he/him)

AI Researcher & Systems Engineer

M.Tech AI & ML student specializing in Multimodal Representation Learning and Advanced Systems Engineering.

DFU Image Classification Cost Reduction Jan 23, 2025 →

No results found

Adaptive Gradient Harmonization: Mitigating Modality Dominance in Unified Representation Learning

Overview

Key Contributions