Music Emotion Recognition

To what degree can computers recognise of emotions in music?

Recently I and Cameron Anderson undertook the challenge to take a stock of how computers are able to recognise emotions in music (2026). This meta-analysis looked at all relevant studies from the last 10 years (34 in all, containing 290 models) to see how well the computational models can predict feelings such as excitement, relaxation, happiness, or sadness from music.

In this area, called Music Emotion Recognition (MER), scholars typically use machine-learning to train computers to detect emotional patterns in music. The most common target of these models are two main dimensions of affects, called arousal (energy level) and valence (positive versus negative feeling). Models tend to be very good at detecting how energetic a song is—for example, distinguishing a high-energy dance track from a calm lullaby (meta-analysis suggests that across all studies, arousal is generally predicted to a high degree, r = .81). Detecting whether music feels emotionally positive or negative is much more difficult, but models are improving in this task (r = 0.67). When models simply classify songs into categories like “happy” or “sad,” they were correct about 87% of the time (in the overall results of the meta-analysis).

Music Emotion Recognition as one figure generated by Gemini AI

We also found, surprisingly, that simpler models based on domain expert knowledge performed just as well as state-of-the-art deep-learning models. This suggests that complex models are not always needed to capture the emotional qualities of music. Or alternatively, we don’t yet have sufficiently large datasets to train sufficiently fine-grained deep-learning models.

How can this research be improved?

Despite encouraging progress, we encountered quite a lot of shortcomings in the way the studies were designed and reported. In light of this diversity and sometimes lack of standards, we expressed the following recommendations for future studies:

Greater musical diversity: The majority of models are trained mainly on Western pop music. Including more genres and cultures would help computers learn a broader range of musical emotions.
Clearer evaluation standards: Researchers should use consistent ways of measuring performance, making it easier to compare different models. Currently there are many metrics, and comparison is difficult. We recommend Matthews Correlation Coefficient (MCC) for classification studies over common metrics like F1 or precision, as it is more robust for binary and multi-class evaluations. For regression tasks, R squared is recommended as the most transparent, scale-invariant measure for comparing models.
Open science practices: Sharing data and code would allow other researchers to replicate results and improve existing models. While this was sometimes done (xx%) in the studies included in this analysis, this level transparency is not sufficiently high in a field that actually is all about data and transparent models.
Richer emotion descriptions: Instead of simple labels like “happy” or “sad”, or simple quadrants in the affective circumplex, future models should use more nuanced descriptions of emotional experience in music.

As music is increasingly used in health applications and well-being work, the reliability of these models becomes more than just a technical curiosity. By following these recommendations which include prioritizing standardized reporting, feature validation, and dataset diversity, the next decade of MER research can move from mere recognition to a deeper, more human-like understanding of the music that moves us.

This was a preregistered study where we share the data, analysis scripts and the manuscript at https://tuomaseerola.github.io/metaMER/.

References

Eerola, T. & Anderson, C. (2026). A Meta-Analysis of Music Emotion Recognition Studies. ACM Computing Surveys. https://doi.org/10.1145/3796518

Written on April 4, 2026