Datasets

The most frequently used 3 datasets are MediaEval (Soleymani et al., 2013), DEAM (Aljanaki et al., 2017), and AMG1608 (Chen et al., 2015). These datasets represent Western pop music, are moderate in terms of the size (containing from 744 to 1802 music excerpts) and have been manually annotated by relative large number of participants (either by experts, students, or crowdsourced workers). Two of the most popular datasets offer a large number (260 to 6669) features extracted with OpenSMILE (Eyben et al., 2010). Looking at the datasets more broadly, the diversity in the size and the features of the datasets is notable. Only two feature extraction tools are used across multiple datasets (OpenSMILE Eyben et al. (2010) and MIR Toolbox, Lartillot & Toiviainen (2007)). However, despite this diversity, there does not seem to be a direct link between the model success rates and the features themselves, or at least separating the features from variation created by the dataset size, annotation accuracy and genre is not possible.

Dataset	Stim. Type	Stim. Dur. (s)	Stim. N	Feature N	Ppt. N	Feature Source	In studies
MediaEval	Western pop	45	744	6669	10/track	OpenSMILE	Bai et al. (2016), Bai et al. (2017), Yang (2021), Chin et al. (2018), Coutinho & Schuller (2017), Markov & Matsui (2014), Medina et al. (2020), Wang, Wang, et al. (2022), Xie et al. (2020)
DEAM	Pop	45	1802	260	5-10/track	OpenSMILE	Sorussa et al. (2020), Orjesek et al. (2022), Panwar et al. (2019), M. Zhang et al. (2023)
AMG1608	Pop	30	1608	72	643	MIR Toolbox, YAAFE	Chen et al. (2017), X. Hu & Yang (2017), Wang, Wei, et al. (2022)
EMOPIA	Piano Solo (pop music)	30-40	387	24	1 annot./track	MIDI Toolbox	Bhuvana Kumar & Kathiravan (2023)
NTUMIR	Famous pop songs	25	60	46	40 annot./track	MIR Toolbox, Sound Description Toolbox, MA Toolbox	Chin et al. (2018)
Soundtracks	Obscure film soundtracks	15	110	NA	116	NA	Wang, Wang, et al. (2022)
PSIC3839	Chinese popular	180	3839	NA	87	Librosa	Xu et al. (2021)
CH818	Chinese pop	30	818	15	3	MIR Toolbox, PsySound, Chroma Toolbox, Tempogram Toolbox	X. Hu & Yang (2017)
Zhang et al. (2015)	Chinese pop	30	171	84	10	MA Toolbox, MIR Toolbox, Coversongs	J. Zhang et al. (2016)
PMEmo	Pop songs	Variable	794	6373	457	ComParE 2013 baseline feature set	M. Zhang et al. (2023)
NJU-V1	Limited detail	Variable	777	Not reported	NA (tags)	NA	Agarwal & Om (2021)
ISMIR-2012	Popular music	30 or 60	2904	54	NA (tags)	MIR Toolbox	Agarwal & Om (2021)
MIREX2009	Popular	Full	297	3	NA	Paulus & Klapuri (2009)	Yeh et al. (2014)
Million Songs Dataset	Pop	Full	1,000,000	55	None	EchoNest	Cao & Park (2023)
Free Music Archive	Various	Variable	>100,000	NA	NA	NA	Koh et al. (2023)
Jamendo	Various	Variable	10,000	24	NA	Metadata	Xiao Hu et al. (2022)
Chinese Classical Music Dataset	Chinese classical	~30s	500	557	20	Essentia, MIR Toolbox	Wang, Wang, et al. (2022)

Notes: \(\dagger\) Used in Álvarez et al. (2023)

References

Agarwal, G., & Om, H. (2021). An efficient supervised framework for music mood recognition using autoencoder-based optimised support vector regression model. IET Signal Processing, 15(2), 98–121. https://doi.org/10.1049/sil2.12015

Aljanaki, A., Yang, Y.-H., & Soleymani, M. (2017). Developing a benchmark for emotional analysis of music. PloS One, 12(3), e0173392.

Álvarez, P., Quirós, J. G. de, & Baldassarri, S. (2023). RIADA: A machine-learning based infrastructure for recognising the emotions of spotify songs. International Journal of Interactive Multimedia and Artificial Intelligence, 8(2), 168–181. https://doi.org/10.9781/ijimai.2022.04.002

Bai, J., Feng, L., Peng, J., Shi, J., Luo, K., Li, Z., Liao, L., & Wang, Y. (2016). Dimensional music emotion recognition by machine learning. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 10(4), 74–89. https://doi.org/10.4018/IJCINI.2016100104

Bai, J., Luo, K., Peng, J., Shi, J., Wu, Y., Feng, L., Li, J., & Wang, Y. (2017). Music emotions recognition by machine learning with cognitive classification methodologies. International Journal of Cognitive Informatics and Natural Intelligence, 11(4), 80–92. https://doi.org/10.4018/IJCINI.2017100105

Bhuvana Kumar, V., & Kathiravan, M. (2023). Emotion recognition from MIDI musical file using enhanced residual gated recurrent unit architecture. Frontiers in Computer Science, 5. https://doi.org/10.3389/fcomp.2023.1305413

Cao, Y., & Park, J. (2023). The analysis of music emotion and visualization fusing long short-term memory networks under the internet of things. IEEE ACCESS, 11, 141192–141204. https://doi.org/10.1109/ACCESS.2023.3341926

Chen, Y.-A., Wang, J.-C., Yang, Y.-H., & Chen, H. H. (2017). Component tying for mixture model adaptation in personalization of music emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1409–1420. https://doi.org/10.1109/TASLP.2017.2693565

Chen, Y.-A., Yang, Y.-H., Wang, J.-C., & Chen, H. (2015). The AMG1608 dataset for music emotion recognition. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 693–697.

Chin, Y.-H., Wang, J.-C., Wang, J.-C., & Yang, Y.-H. (2018). Predicting the probability density function of music emotion using emotion space mapping. IEEE Transactions on Affective Computing, 9(4), 541–549. https://doi.org/10.1109/TAFFC.2016.2628794

Coutinho, E., & Schuller, B. (2017). Shared acoustic codes underlie emotional communication in music and speech-evidence from deep transfer learning. PLOS ONE, 12(6). https://doi.org/10.1371/journal.pone.0179289

Eyben, F., Wöllmer, M., & Schuller, B. (2010). Opensmile: The munich versatile and fast open-source audio feature extractor. Proceedings of the 18th ACM International Conference on Multimedia, 1459–1462.

Hu, Xiao, Li, F., & Liu, R. (2022). Detecting music-induced emotion based on acoustic analysis and physiological sensing: A multimodal approach. Applied Sciences, 12(18). https://doi.org/10.3390/app12189354

Hu, X., & Yang, Y.-H. (2017). Cross-dataset and cross-cultural music mood prediction: A case on western and chinese pop songs. IEEE Transactions on Affective Computing, 8(2), 228–240. https://doi.org/10.1109/TAFFC.2016.2523503

Koh, E. Y., Cheuk, K. W., Heung, K. Y., Agres, K. R., & Herremans, D. (2023). MERP: A music dataset with emotion ratings and raters’ profile information. SENSORS, 23(1). https://doi.org/10.3390/s23010382

Lartillot, O., & Toiviainen, P. (2007). A matlab toolbox for musical feature extraction from audio. International Conference on Digital Audio Effects, 237, 244.

Markov, K., & Matsui, T. (2014). Music genre and emotion recognition using gaussian processes. IEEE Access, 2, 688–697. https://doi.org/10.1109/ACCESS.2014.2333095

Medina, Y. O., Beltran, J. R., & Baldassarri, S. (2020). Emotional classification of music using neural networks with the MediaEval dataset. PERSONAL AND UBIQUITOUS COMPUTING. https://doi.org/10.1007/s00779-020-01393-4

Orjesek, R., Jarina, R., & Chmulik, M. (2022). End-to-end music emotion variation detection using iteratively reconstructed deep features. Multimedia Tools and Applications, 81(4), 5017–5031. https://doi.org/10.1007/s11042-021-11584-7

Panwar, S., Rad, P., Choo, K.-K. R., & Roopaei, M. (2019). Are you emotional or depressed? Learning about your emotional state from your music using machine learning. JOURNAL OF SUPERCOMPUTING, 75(6, SI), 2986–3009. https://doi.org/10.1007/s11227-018-2499-y

Soleymani, M., Caro, M. N., Schmidt, E. M., Sha, C.-Y., & Yang, Y.-H. (2013). 1000 songs for emotional analysis of music. Proceedings of the 2nd ACM International Workshop on Crowdsourcing for Multimedia, 1–6. https://doi.org/10.1145/2506364.2506365

Sorussa, K., Choksuriwong, A., & Karnjanadecha, M. (2020). Emotion classi cation system for digital music with a cascaded technique. ECTI Transactions on Computer and Information Technology, 14(1), 53–66. https://doi.org/10.37936/ecti-cit.2020141.205317

Wang, X., Wang, L., & Xie, L. (2022). Comparison and analysis of acoustic features of western and chinese classical music emotion recognition based on v‐a model. Applied Sciences, 12(12). https://doi.org/10.3390/app12125787

Wang, X., Wei, Y., & Yang, D. (2022). Cross-cultural analysis of the correlation between musical elements and emotion. COGNITIVE COMPUTATION AND SYSTEMS, 4(2, SI), 116–129. https://doi.org/10.1049/ccs2.12032

Xie, B., Kim, J. C., & Park, C. H. (2020). Musical emotion recognition with spectral feature extraction based on a sinusoidal model with model-based and deep-learning approaches. Applied Sciences, 10(3). https://doi.org/10.3390/app10030902

Xu, L., Sun, Z., Wen, X., Huang, Z., Chao, C., & Xu, L. (2021). Using machine learning analysis to interpret the relationship between music emotion and lyric features. PEERJ COMPUTER SCIENCE, 7. https://doi.org/10.7717/peerj-cs.785

Yang, J. (2021). A novel music emotion recognition model using neural network technology. Frontiers in Psychology, 12. https://doi.org/10.3389/fpsyg.2021.760060

Yeh, C.-H., Tseng, W.-Y., Chen, C.-Y., Lin, Y.-D., Tsai, Y.-R., Bi, H.-I., Lin, Y.-C., & Lin, H.-Y. (2014). Popular music representation: Chorus detection & emotion recognition. Multimedia Tools and Applications, 73(3), 2103–2128. https://doi.org/10.1007/s11042-013-1687-2

Zhang, J., Huang, X., Yang, L., & Nie, L. (2016). Bridge the semantic gap between pop music acoustic feature and emotion: Build an interpretable model. Neurocomputing, 208(SI), 333–341. https://doi.org/10.1016/j.neucom.2016.01.099

Zhang, M., Zhu, Y., Zhang, W., Zhu, Y., & Feng, T. (2023). Modularized composite attention network for continuous music emotion recognition. Multimedia Tools and Applications, 82(5), 7319–7341. https://doi.org/10.1007/s11042-022-13577-6