Meta-analysis Pre-registration: Music Emotion Recognition
Section: Metadata
Title
title
Music emotion recognition: Meta-analysis of regression and classification success of emotion ratings from audio
Contributors
authors
Eerola, T., Anderson, C. J.
Subjects
target_discipline
music cognition, music information retrieval, music psychology
Tasks and roles
tasks_and_roles
equal contribution
Section: Review methods
Type of review
type_of_review
Meta-analysis
Review stages
review_stages
Search, Screening, Extraction, Synthesis
Current review stage
current_stage
Screening
Start date
start_date
2024-05-15 2024-05-15
End date
end_date
2024-06-30
Background
background
The aim is to establish the current state of the model success in predicting emotions expressed by music from audio. We will focus on the last 10 years of research and especially the research that has predicted valence and arousal ratings from music audio. No such analysis exists and there are interesting challenges in predicting emotional content of music that relates to specificity of the music and the type of emotions and features used that would benefit from a systematic analysis.
Primary research question(s)
primary_research_question
To what degree can arousal and valence ratings of emotions expressed by music be predicted from audio? How are the prediction rates related to genres of music, the type of models used, the type of features, modelling design and cross-validation utilised, and the model complexity and parsimony?
Secondary research question(s)
secondary_research_question
What is the prediction rate related to classification of quadrants in the affective circumplex?
Expectations / hypotheses
expectations_hypotheses
Prediction of arousal ratings is generally high and robust, and in terms of the model outcome metrics (correlation), achieves at least r = 0.77 (R square of 0.60). Prediction of valence ratings from audio is more challenging and more context dependent and will achieve generally a lower prediction rate, r = 0.63 (R square 0.40)
Dependent variable(s) / outcome(s) / main variables
dvs_outcomes_main_vars
Regression model performance will be converted to Pearson correlation coefficients and classification model performance will be converted to Matthews correlation coefficient (MCC) when possible.
Music genre, prediction type (linear or classification), feature type (based on prior work by Panda et al., 2020), model complexity (high, medium, low), model validation (exists or not)
Additional variable(s) / covariate(s)
additional_variables
Unspecified
Software
software
R and Github repository
Funding
funding
Mitacs Globalink Research Award (Mitacs & British High Commission - Ottawa, Canada)
Conflicts of interest
cois
There are no identified conflicts of interests.
Overlapping authorships
overlapping_authorships
Not applicable
Section: Search strategy
Databases
databases
Web of Science, Scopus, and Open Alex
Interfaces
interfaces
Web of Science, Scopus, and Open Alex
Grey literature
grey_literature
Not included
Inclusion and exclusion criteria
inclusions_exclusion_criteria
Sample, Phenomenon of Interest, Design, Evaluation, Research type
Query strings
query_strings
Scopus: TITLE-ABS-KEY ( valence OR arousal OR classi OR categor OR algorithm AND music AND emotion AND recognition ) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND ( LIMIT-TO ( DOCTYPE , “ar” ) ) Web of science: (DT=(Article) AND PY=(2014-2025)) AND ALL=(music emotion recognition valence arousal) Open Alex: https://openalex.org/works?page=1&filter=default.search%3A music%20emotion%20recognition%20valence%20arousal, type%3Atypes%2Farticle,publication_year%3A2014-2024, keywords.id%3Akeywords%2Femotion-recognition, keywords.id%3Akeywords%2Faffective-computing, language%3Alanguages%2Fen, open_access.any_repository_has_fulltext%3Atrue
Search validation procedure
search_validation_procedure
Manual checking, separate keywords searches
Other search strategies
other_search_strategies
Not applied
Procedures to contact authors
procedure_for_contacting_authors
Unspecified
Results of contacting authors
results_of_contacting_authors
Not carried out
Search expiration and repetition
search_expiration_and_repetition
Searches were done during the active search period in late May early June 2024 and no repetition is planned.
Search strategy justification
search_strategy_justification
The three major databases should be able yield a robust picture of the topic
Miscellaneous search strategy details
misc_search_strategy_details
No alternative searches were articulated or envisaged.
Section: Screening
Screening stages
screening_stages
We completed screening using custom fields inserted to the bibtex file and managed with citation managers (jabref and bibdesk). To filter relevant studies, we followed a three-stage screening procedure. In stage 1, we screened the 553 studies’ titles for relevance, removing irrelevant studies and recording exclusion criteria (see Used exclusion criteria). CA assigned 63 studies to the High Priority based on titles’ relevance, assigned 338 studies to Low Priority based on irrelevant titles, and 152 studies to Medium Priority for additional screening. In stage 2, CA assessed the 152 Medium Priority studies for relevance by screening abstracts. 95 studies’ status changed to Low Priority, whereas 30 studies’ status changed to High Priority. 27 studies remained in the Medium priority category. TE and CA evaluated the remaining 27 studies’, moving 15 to the High Priority Category and 12 to the Medium Priority Category. For studies moved to Low Priority, brief BiBTex comments summarized the rationale for exclusion. In stage 3, TE and CA independently screened Priority 1 studies for relevance, including an include, exclude, or unsure decision in a user-comment BiBTeX field.
Screened fields / masking
screened_fields_masking
We left authors, titles, publication years, and journal names unmasked.
Used exclusion criteria
used_exclusion_criteria
We excluded studies according to the following exclusion criteria: soundscapes/vocalisations, non-music audio, video clips, physiological markers, dance, video/movie, physiological/EEG/ECG/MEG/GSR/brain imaging/heart rate/neuroscience/brain studies, sensor data, multimodal, autism, ageing, review/systematic review/overview/survey, face emotion recognition, mental health, music therapy, schizophrenia, memory/emotion factors as IVs, recommender systems, or systems that identify the location of emotional excerpts. We included results from some studies meeting exclusion criteria (e.g., multimodal studies involving physiological measurements) if they reported separately on acoustic-only models.
Screener instructions
screener_instructions
As described above.
Screening reliability
screening_reliability
In the pass 1 and 2, we included a quality control check after the pass to discuss the identified categories. In the third pass, we double-coded decisions, resolving discrepancies through discussion.
Screening reconciliation procedure
screening_reconciliation_procedure
We reconcile discrepancies through discussion, resolving “unsure” votes first, followed by discrepancies in include/exclude decisions between authors Results of this updating procedure are available in the Pass 3 comparison document.
Sampling and sample size
sampling_and_sample_size
We identified and retained 553 articles from Scopus, Web of Science, and Open Alex based on the search strategy outlined above. See table at the end that details the cumulative exclusions.
Screening procedure justification
screening_procedure_justification
To offer a broad summary of music emotion recognition tasks, we attempted to include all studies involving prediction with acoustic features. We performed screening unblinded and determined inclusion/exclusion criteria based on studies’ relevance to the task explored.
Data management and sharing
screening_data_management_and_sharing
Sources will be shared as (a) BibTeX library(ies) including reviewer notes.
The data extraction will be completed in stages. In the first stage, CA will complete a pass of the collection using our initial entities to extract document. The challenges are discussed and the entities are revised.
CA will perform extractions; TE will verify extractions for quality assurance.
Extraction reconciliation procedure
extraction_reconciliation_procedure
Discussion and joint decision for studies where extraction proves to be challenging and issues of interpretation arise.
Extraction procedure justification
extraction_procedure_justification
These are documented in the extraction details.
Data management and sharing
extraction_data_management_and_sharing
We retain the information of the studies in shared bibtex files, extraction data will be stored in ascii data files (.bibtex), and the parser for reading the data from .bibtex files to R for the analysis will be available (as quarto/markdown/R files), and all these are managed, structured, shared and documented in Github repository according to FAIR principles.
Miscellaneous extraction details
misc_extraction_details
NA
Section: Synthesis and Quality Assessment
Planned data transformations
planned_data_transformations
For regression studies, we convert all metrics to Pearson correlation coefficients. For classification studies, we convert the outcomes of classification to Matthews Correlation Coefficient (MCC) from the precision, accuracy, specificity, F1 scores. Alternatively, we use Cohen’s kappa for multiple classes.
Missing data
missing_data
If no main outcome variables are available, we exclude the study.
Data validation
data_validation
None planned beyond the staged approached already documented in extraction process.
Quality assessment
quality_assessment
Not all the bias assessment tools for clinical studies are relevant for our purposes, we adapt the overall approached advocated in [Higgins et al. (2011)] (https://doi.org/10.1136/bmj.d5928).
Synthesis plan
synthesis_plan
We analyse regression and classification studies separately, and depending on the quantity of the studies forming suitable sub-groupings based on techniques, materials or music collections/genres, we may further synthesise the results across groupings that are formed along these subsets.
Criteria for conclusions / inference criteria
criteria_for_conclusions
NA
Synthesist masking
synthesis_masking
NA
Synthesis reliability
synthesis_reliability
NA
Synthesis reconciliation procedure
synthesis_reconciliation_procedure
NA
Publication bias analyses
publication_bias
We utilise Egger’s test to assess the publication bias and potentially correct the effect size bias by selecting 10% most precise effect sizes as recommended by Van Aert, Wicherts, & Van Assen (2019).
Sensitivity analyses / robustness checks
sensitivity_analysis
Within regression and classificiation tasks, we will carry out sensitivity analysis using sub-groups of studied based on type of models, and the type of journal the studies were published in.
Synthesis procedure justification
synthesis_procedure_justification
We share our justification of the synthesis and the subsetting carried out in the manuscript but we have not formulated these in advance except for synthesizing classiciation and regression approaches separately and creating subsets within these approaches according to techniques and datasets utilised.
Synthesis data management and sharing
synthesis_data_management_and_sharing
We share the data, procedures, definitions, the analysis scripts with the outcomes as R code in Quarto notes at Github.
Miscellaneous synthesis details
misc_synthesis_details
Unspecified
Cumulative Exclusions
Remaining Studies
Database Search
NA
553
Pass 1
338
215
Pass 2
433
120
Pass 2 (Discussion)
457
96
Pass 3 (Discussion)
507
46
References
Higgins, J. P. T., Altman, D. G., Gøtzsche, P. C., Jüni, P., Moher, D., Oxman, A. D., Savović, J., Schulz, K. F., Weeks, L., & Sterne, J. A. C. (2011). The Cochrane Collaboration tool for assessing risk of bias in randomised trials. BMJ, 343. https://www.bmj.com/content/343/bmj.d5928
Panda, R., Malheiro, R., & Paiva, R. P. (2020). Audio features for music emotion recognition: a survey. IEEE Transactions on Affective Computing, 14(1), 68-88. https://doi.org/10.1109/TAFFC.2020.3032373
Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ, 349. https://www.bmj.com/content/349/bmj.g7647