Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.
Tags:Paper and LLMsPricing Type
- Pricing Type: Free
- Price Range Start($):
GitHub Link
The GitHub link is https://github.com/w-wu/steer
Introduce
The repository “W-Wu/sTEER” contains code related to the “Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations” paper. The paper introduces a system that combines emotion recognition, speech recognition, and speaker diarisation in a jointly-trained model. The proposed evaluation metrics include Time-weighted Emotion Error Rate (TEER) and speaker-attributed Time-weighted Emotion Error Rate (sTEER). The code provides instructions and tools for data preparation, training, testing, and evaluation using Python, PyTorch, and Speechbrain. The paper details these processes and includes references for proper citation. Note that results might slightly differ due to PyTorch’s CTC loss function behavior.
Content
Two metrics proposed to evaluate emotion classification performance with automatic segmentation:

Related

LAMA utilizes a reinforcement learning framework combined with a motion matching algorithm. Reinforcement learning helps the model make appropriate decisions in various scenarios, while motion matching algorithms ensure that synthesized actions match real human actions. In addition, LAMA also utilizes the motion editing framework of manifold learning to cover various possible changes in interactions and operations.








