preregistration – metaMER

This preregistration is made with preregr package from https://preregr.opens.science/ that implements the BMJ published guidance for meta-analysis protocols (Shamseer et al., 2015).

Meta-analysis Pre-registration: Music Emotion Recognition

Section: Metadata

Title

title

Music emotion recognition: Meta-analysis of regression and classification success of emotion ratings from audio

Contributors

authors

Eerola, T., Anderson, C. J.

Subjects

target_discipline

music cognition, music information retrieval, music psychology

Tasks and roles

tasks_and_roles

equal contribution

Section: Review methods

Type of review

type_of_review

Meta-analysis

Review stages

review_stages

Search, Screening, Extraction, Synthesis

Current review stage

current_stage

Screening

Start date

start_date

2024-05-15 2024-05-15

End date

end_date

2024-06-30

Background

background

The aim is to establish the current state of the model success in predicting emotions expressed by music from audio. We will focus on the last 10 years of research and especially the research that has predicted valence and arousal ratings from music audio. No such analysis exists and there are interesting challenges in predicting emotional content of music that relates to specificity of the music and the type of emotions and features used that would benefit from a systematic analysis.

Primary research question(s)

primary_research_question

To what degree can arousal and valence ratings of emotions expressed by music be predicted from audio? How are the prediction rates related to genres of music, the type of models used, the type of features, modelling design and cross-validation utilised, and the model complexity and parsimony?

Secondary research question(s)

secondary_research_question

What is the prediction rate related to classification of quadrants in the affective circumplex?

Expectations / hypotheses

expectations_hypotheses

Prediction of arousal ratings is generally high and robust, and in terms of the model outcome metrics (correlation), achieves at least r = 0.77 (R square of 0.60). Prediction of valence ratings from audio is more challenging and more context dependent and will achieve generally a lower prediction rate, r = 0.63 (R square 0.40)

Dependent variable(s) / outcome(s) / main variables

dvs_outcomes_main_vars

Regression model performance will be converted to Pearson correlation coefficients and classification model performance will be converted to Matthews correlation coefficient (MCC) when possible.

Independent variable(s) / intervention(s) / treatment(s)

ivs_intervention_treatment

Music genre, prediction type (linear or classification), feature type (based on prior work by Panda et al., 2020), model complexity (high, medium, low), model validation (exists or not)

Additional variable(s) / covariate(s)

additional_variables

Unspecified

Software

software

R and Github repository

Funding

funding

Mitacs Globalink Research Award (Mitacs & British High Commission - Ottawa, Canada)

Conflicts of interest

cois

There are no identified conflicts of interests.

Overlapping authorships

overlapping_authorships

Not applicable

Section: Search strategy

Databases

databases

Web of Science, Scopus, and Open Alex

Interfaces

interfaces

Web of Science, Scopus, and Open Alex

Grey literature

grey_literature

Not included

Inclusion and exclusion criteria

inclusions_exclusion_criteria

Sample, Phenomenon of Interest, Design, Evaluation, Research type

Query strings

query_strings

Scopus:

TITLE-ABS-KEY ( valence OR arousal OR classi OR categor OR algorithm AND music 
 AND emotion AND recognition ) AND PUBYEAR > 2013 AND PUBYEAR < 2025 AND 
 ( LIMIT-TO ( DOCTYPE , “ar” ) )

Web of science:

 (DT=(Article) AND PY=(2014-2025)) AND ALL=(music 
emotion recognition valence arousal)

Open Alex:

 https://openalex.org/works?page=1&filter=default.search%3A
music%20emotion%20recognition%20valence%20arousal, 
type%3Atypes%2Farticle,publication_year%3A2014-2024,
 keywords.id%3Akeywords%2Femotion-recognition,
 keywords.id%3Akeywords%2Faffective-computing, language%3Alanguages%2Fen,
 open_access.any_repository_has_fulltext%3Atrue

Search validation procedure

search_validation_procedure

Manual checking, separate keywords searches

Other search strategies

other_search_strategies

Not applied

Procedures to contact authors

procedure_for_contacting_authors

Unspecified

Results of contacting authors

results_of_contacting_authors

Not carried out

Search expiration and repetition

search_expiration_and_repetition

Searches were done during the active search period in late May early June 2024 and no repetition is planned.

Search strategy justification

search_strategy_justification

The three major databases should be able yield a robust picture of the topic

Miscellaneous search strategy details

misc_search_strategy_details

No alternative searches were articulated or envisaged.

Section: Screening

Screening stages

screening_stages

We completed screening using custom fields inserted to the bibtex file and managed with citation managers (jabref and bibdesk). To filter relevant studies, we followed a three-stage screening procedure.
In stage 1, we screened the 553 studies’ titles for relevance, removing irrelevant studies and recording exclusion criteria (see Used exclusion criteria). CA assigned 63 studies to the High Priority based on titles’ relevance, assigned 338 studies to Low Priority based on irrelevant titles, and 152 studies to Medium Priority for additional screening.
In stage 2, CA assessed the 152 Medium Priority studies for relevance by screening abstracts. 95 studies’ status changed to Low Priority, whereas 30 studies’ status changed to High Priority. 27 studies remained in the Medium priority category. TE and CA evaluated the remaining 27 studies’, moving 15 to the High Priority Category and 12 to the Medium Priority Category. For studies moved to Low Priority, brief BiBTex comments summarized the rationale for exclusion.
In stage 3, TE and CA independently screened Priority 1 studies for relevance, including an include, exclude, or unsure decision in a user-comment BiBTeX field.

Screened fields / masking

screened_fields_masking

We left authors, titles, publication years, and journal names unmasked.

Used exclusion criteria

used_exclusion_criteria

We excluded studies according to the following exclusion criteria: soundscapes/vocalisations, non-music audio, video clips, physiological markers, dance, video/movie, physiological/EEG/ECG/MEG/GSR/brain imaging/heart rate/neuroscience/brain studies, sensor data, multimodal, autism, ageing, review/systematic review/overview/survey, face emotion recognition, mental health, music therapy, schizophrenia, memory/emotion factors as IVs, recommender systems, or systems that identify the location of emotional excerpts. We included results from some studies meeting exclusion criteria (e.g., multimodal studies involving physiological measurements) if they reported separately on acoustic-only models.

Screener instructions

screener_instructions

As described above.

Screening reliability

screening_reliability

In the pass 1 and 2, we included a quality control check after the pass to discuss the identified categories. In the third pass, we double-coded decisions, resolving discrepancies through discussion.

Screening reconciliation procedure

screening_reconciliation_procedure

We reconcile discrepancies through discussion, resolving “unsure” votes first, followed by discrepancies in include/exclude decisions between authors Results of this updating procedure are available in the Pass 3 comparison document.

Sampling and sample size

sampling_and_sample_size

We identified and retained 553 articles from Scopus, Web of Science, and Open Alex based on the search strategy outlined above. See table at the end that details the cumulative exclusions.

Screening procedure justification

screening_procedure_justification

To offer a broad summary of music emotion recognition tasks, we attempted to include all studies involving prediction with acoustic features. We performed screening unblinded and determined inclusion/exclusion criteria based on studies’ relevance to the task explored.

Data management and sharing

screening_data_management_and_sharing

Sources will be shared as (a) BibTeX library(ies) including reviewer notes.

Miscellaneous screening details

misc_screening_details

Unspecified

Section: Extraction

Entities to extract

entities_to_extract

These are listed and defined in extraction details.

Extraction stages

extraction_stages

The data extraction will be completed in stages. In the first stage, CA will complete a pass of the collection using our initial entities to extract document. The challenges are discussed and the entities are revised.

Extractor instructions

extractor_instructions

See extraction details.

Extractor blinding

extractor_blinding

Blinding was not used.

Extraction reliability

extraction_reliability

CA will perform extractions; TE will verify extractions for quality assurance.

Extraction reconciliation procedure

extraction_reconciliation_procedure

Discussion and joint decision for studies where extraction proves to be challenging and issues of interpretation arise.

Extraction procedure justification

extraction_procedure_justification

These are documented in the extraction details.

Data management and sharing

extraction_data_management_and_sharing

We retain the information of the studies in shared bibtex files, extraction data will be stored in ascii data files (.bibtex), and the parser for reading the data from .bibtex files to R for the analysis will be available (as quarto/markdown/R files), and all these are managed, structured, shared and documented in Github repository according to FAIR principles.

Miscellaneous extraction details

misc_extraction_details

Section: Synthesis and Quality Assessment

Planned data transformations

planned_data_transformations

For regression studies, we convert all metrics to Pearson correlation coefficients. For classification studies, we convert the outcomes of classification to Matthews Correlation Coefficient (MCC) from the precision, accuracy, specificity, F1 scores. Alternatively, we use Cohen’s kappa for multiple classes.

Missing data

missing_data

If no main outcome variables are available, we exclude the study.

Data validation

data_validation

None planned beyond the staged approached already documented in extraction process.

Quality assessment

quality_assessment

Not all the bias assessment tools for clinical studies are relevant for our purposes, we adapt the overall approached advocated in [Higgins et al. (2011)] (https://doi.org/10.1136/bmj.d5928).

Synthesis plan

synthesis_plan

We analyse regression and classification studies separately, and depending on the quantity of the studies forming suitable sub-groupings based on techniques, materials or music collections/genres, we may further synthesise the results across groupings that are formed along these subsets.

Criteria for conclusions / inference criteria

criteria_for_conclusions

Synthesist masking

synthesis_masking

Synthesis reliability

synthesis_reliability

Synthesis reconciliation procedure

synthesis_reconciliation_procedure

Publication bias analyses

publication_bias

We utilise Egger’s test to assess the publication bias and potentially correct the effect size bias by selecting 10% most precise effect sizes as recommended by Van Aert, Wicherts, & Van Assen (2019).

Sensitivity analyses / robustness checks

sensitivity_analysis

Within regression and classificiation tasks, we will carry out sensitivity analysis using sub-groups of studied based on type of models, and the type of journal the studies were published in.

Synthesis procedure justification

synthesis_procedure_justification

We share our justification of the synthesis and the subsetting carried out in the manuscript but we have not formulated these in advance except for synthesizing classiciation and regression approaches separately and creating subsets within these approaches according to techniques and datasets utilised.

Synthesis data management and sharing

synthesis_data_management_and_sharing

We share the data, procedures, definitions, the analysis scripts with the outcomes as R code in Quarto notes at Github.

Miscellaneous synthesis details

misc_synthesis_details

Unspecified

	Cumulative Exclusions	Remaining Studies
Database Search	NA	553
Pass 1	338	215
Pass 2	433	120
Pass 2 (Discussion)	457	96
Pass 3 (Discussion)	507	46

References

Higgins, J. P. T., Altman, D. G., Gøtzsche, P. C., Jüni, P., Moher, D., Oxman, A. D., Savović, J., Schulz, K. F., Weeks, L., & Sterne, J. A. C. (2011). The Cochrane Collaboration tool for assessing risk of bias in randomised trials. BMJ, 343. https://www.bmj.com/content/343/bmj.d5928
Panda, R., Malheiro, R., & Paiva, R. P. (2020). Audio features for music emotion recognition: a survey. IEEE Transactions on Affective Computing, 14(1), 68-88. https://doi.org/10.1109/TAFFC.2020.3032373
Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ, 349. https://www.bmj.com/content/349/bmj.g7647