AffectFusion: Multimodal Emotion Recognition

Description

AffectFusion is a real-time multimodal emotion recognition system that combines facial expressions, speech signals, and behavioral patterns to detect emotional states and engagement levels. The system uses a webcam and microphone to provide live analysis through an intuitive visual overlay.

The framework integrates three specialized deep learning models: ResNet-101 trained on DFEW dataset for facial valence detection (Positive/Negative), a MediaPipe FaceMesh based engagement detector trained on DAiSEE dataset (classifying Engagement, Boredom, Confusion, Frustration), and a 1D CNN trained on RAVDESS for speech emotion recognition.

Demo

A sophisticated fusion algorithm intelligently combines facial and audio predictions, handling disagreements through confidence-weighted voting and uncertainty detection. The system processes video at 640×360 resolution with rolling buffers for temporal analysis, providing real-time feedback on emotional state, confidence levels, and engagement status. This multimodal approach achieves more robust emotion recognition compared to single-modality systems.

🔗 Links & References

GitHub: AffectFusion

Description#

🔗 Links & References#

Description

🔗 Links & References