Description
AffectFusion is a real-time multimodal emotion recognition system that combines facial expressions, speech signals, and behavioral patterns to detect emotional states and engagement levels. The system uses a webcam and microphone to provide live analysis through an intuitive visual overlay.
The framework integrates three specialized deep learning models: ResNet-101 trained on DFEW dataset for facial valence detection (Positive/Negative), a MediaPipe FaceMesh based engagement detector trained on DAiSEE dataset (classifying Engagement, Boredom, Confusion, Frustration), and a 1D CNN trained on RAVDESS for speech emotion recognition.
A sophisticated fusion algorithm intelligently combines facial and audio predictions, handling disagreements through confidence-weighted voting and uncertainty detection. The system processes video at 640ร360 resolution with rolling buffers for temporal analysis, providing real-time feedback on emotional state, confidence levels, and engagement status. This multimodal approach achieves more robust emotion recognition compared to single-modality systems.
๐ Links & References
- GitHub: AffectFusion