library(tidyverse)
library(stringr)
library(ggplot2)
Data pre-processing
Data and scripts
Libraries
The manually annotated data titled data/WEIRD coding final corr names.csv
was cleaned through process captured in weird data cleaning.R
script.
Here we take this manually annotated and cleaned data, data/WEIRD_06_05_2024.csv
.
<- read.csv('data/WEIRD_06_05_2024.csv',header = TRUE)
d ::kable(d[1:5,1:5]) knitr
X | PaperTitle | Year | Journal | FirstAuthorName_cleaned |
---|---|---|---|---|
1 | Corroborating external observation by cognitve data in the description and modelling of traditional music | 2010 | Musicae Scientiae | Arom, Simha |
2 | Looking into the eyes of a conductor performing Lerdahl’s “Time after Time” | 2010 | Musicae Scientiae | Bigand, Emmanuel |
3 | Auditive Analysis of the Quartetto per Archi in due tempi (1955) by Bruno Maderna | 2010 | Musicae Scientiae | Adesi, Anna Rita |
4 | The Difficulty of discerning between composed and improvised nusic | 2010 | Musicae Scientiae | Lehmann, Andreas C. |
5 | Musical Parameters and Children’s Images of motion | 2010 | Musicae Scientiae | Eitan, Zohar |
Preprocess
In the preprocessing, we add elements to the data (WEIRD and non-WEIRD country codes, paper, study, sample ids, expand samples, and various other operations).
This preprocessing creates four data frames used in subsequent analyses.
source('scripts/preprocess.R')
Preprocessing.....
Number of coded studies: 1622
Unique first authors: 1021
1. Studies with samples (human studies)
2. Studies with music (music studies)
3. Create paper_ids
...Unique studies: 1360
4. Add study ids
5. Expand into samples....Loop across papers with multiple studies to get multiple samples (;)
6. Clean variables
7. Combine expanded and original data
8. Convert online studies based on SamplePrimaryCountryofOrigin
6. Handle online origin
9. Create UIDs
10. Gender balance indicators
11. WEIRD indicator for countries
Using WEIRD_country_index: Krys
11. Calculate weighted mean and sd age
12. Check counts of the expanded data frame
...Original: 1622 obs
...Expanded: 1743 obs
...Within expanded:1622
...Number of papers (study1 and sample0): 1360
Final dataframes:
D = human studies:
1532
DF = human studies with samples:
1653
Unique samples in DF: 1589
DF = human studies with distinct samples:
1589
15. Clean up
Description of data frames
d
is the original data with 1622 observations. Each row refers to a study (not article).D
is a version of the original data containing empirical/human studies with 1532 observations. Each row refers to a study (not article).df
contains samples within studies with 1743 observations.DF
contains samples with redundant samples removed. It has 1589 observations.
Within the data, there are 1360 articles.
We have created a identifier for articles (paper_id
), studies (study_id
), and samples (sample_id
) and a unique identifier which combines these (paper_id_study_id
).
Description of variables
CountryDataCollected_WEOG
is based on WEIRD country index derived from Krys et al., (2014).FirstAuthorCountry_WEOG
is based on WEIRD country index derived from Krys et al., (2014).gender_balance
is the proportion of females in the samples… (to be continued)
PaperTitle
: enter the exact title of the articleYear
: year publishedJournal
: journal published inFirstAuthorName
: enter the first author’s name in this format: Jakubowski, KellyFirstAuthorCountry
: enter the country of the institution where the first author worksCountryDataCollected
: enter the country(s) where the data was collected. If more than one country (e.g., for a cross-cultural study), please enter ALL countries where data was collected, separated by semi-colons, for example: UK; Mali; Uruguay OR online = for online studies with no origin country data collectedSamplePrimaryCountryofOrigin
: enter the country where the majority of the sample came from (for instance, if 54% of the participants were from the UK, enter UK). If sample came equally from multiple countries (e.g. 1/3 of participants came from each of 3 countries), OR if the researchers explicitly recruited from different countries for the sake of making cross-national comparisons, then enter all countries separated by semi-colons, for example: UK; Mali; Uruguay
Ethnicity
: if described, enter the ethnicity of the majority of the sample (e.g. if more than 50% were white, enter white). If split equally across ethnicities, you can enter more than one ethnicity, separated by semi-colons. Categories to be used here: Asian, black, white, Hispanic, otherSampleSize
: enter total sample size as a numberSampleAgeMean
: enter mean age of the sampleSampleAgeSD
: enter the standard deviation of the age of the sampleSampleOtherDescription
: if other description of sample such as ‘university students’ or ‘undergraduate students’ is provided, please add this here. Please try and retain consistent category labels across studiesSampleMusicianshipDescription
: if the sample is described by the authors as only consisting of musicians, write ‘musicians’; if described as only non-musicians, write ‘non-musicians’; if sample included both, include both labels separated by a semi-colon (musicians; non-musicians)SamplingMethodDescription
: indicate category from the following list: volunteers, course credit, paid, other (if more than one of these categories is relevant, enter both, separated by a semi-colon such as: volunteers; paid)FemaleParticipantsNumber
: total number of female participants in the studyMaleParticipantsNumber
: total number of male participants in the study
MeanYearsEducation
: mean years of education of the sampleMusicPrimaryGenre
: please enter the main/primary genre of music studied (if any) such as classical, pop, jazz, etc. If a study sampled equally across multiple genres, please enter these separated by semi-colons, for instance: classical; rock; punk. artificial = for artificial stimuli (beeps, clicks, sine waves, single sounds)MusicOriginCountry
: If a study used standard excerpts from styles such as classical, pop, rock, or jazz, please write Western, as it is likely such music came from US/UK/Western Europe primarily. For other, more international styles, please specify country from which the music comes (e.g., China, India).
If a study utilised music from multiple origin counties, please enter these separated by semi-colons, for instance: Western; Mali; UruguayMusicSource
: indicate category from the following list: precomposed, semi-precomposed, experimenter-created, other (note: ‘precomposed’ refers to music that was not composed specifically for the study, e.g. a Beethoven symphony or a commercially released pop song; ‘semi-precomposed’ refers to precomposed music that has been manipulated by the experimenter- for instance, a Beethoven symphony was converted to a single-line MIDI melody for the experiment)Comments
: you may use this column to optionally enter any problems/issues that you would like a second coder to check over; if none, then please just leave blank!CoderName
: enter your nameKeywords
: enter all keywords as listed at the start of the article, separated by semi-colons, for instance: functional harmony; extended tonality; harmonic substitutions; music perception; musical syntax
Save (if necessary)
<- FALSE
needs_saving if(needs_saving==TRUE){
save(d,df,D,DF,file='data/WEIRD_data.Rdata')
}