Data for Dataset Composition DVs
From the previous blog on data preparations, I will be using several main datasets to work with different purposes of DVs.
To visualize data regarding the fictional characters and works composition of the Open-Source Psychometrics site (as of 2020), a character_index dataframe was created.
setwd("~/COMM2501 Portfolio - z5218332")
load("~/COMM2501 Portfolio - z5218332/files/character_index.Rda")
head(character_index)
## character_code fictional_work character_name gender
## 1 A/01 Alien Dallas Male
## 2 A/02 Alien Ellen Ripley Female
## 3 A/03 Alien Lambert Female
## 4 A/04 Alien Ash Male
## 5 A/05 Alien the Alien Male
## 6 A/06 Alien Parker Male
summary(character_index)
## character_code fictional_work character_name gender
## Length:800 Length:800 Length:800 Length:800
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
From this dataset, it can be observed that there are 800 characters in the dataset, with the information of their character codes as assigned by Open-Psychometrics, respective fictional works, character names and gender. Hence, visualizations using this dataset can be regarding the proportion how much a series make up the Open-Psychometrics data. It would be very interesting to visualize the amount of characters of each gender which makes up a certain series as well.
Although, audiences should be kept in mind the DVs which will be created using this dataset represent mainly the amount of characters and series which are included in the Open-Psychometrics data, and that it does not necessarily reflect the amount of characters, or genders in the series. It should also be noted that series which are longer in length and are more popular are more likely to have more characters in the dataset. Nevertheless, these DVs generally would give a good indication of the amount of prominent characters in the series and the proportion of gender in the prominent characters in the series.
Data for Personality Spectrum DVs
To illustrate the data distribution of personality spectrums across characters and genders, the full dataset would be used. The full dataset can be found in the data_full R dataframe file.
load("~/COMM2501 Portfolio - z5218332/files/data_full.Rda")
head(data_full)
## character_code fictional_work character_name gender spectrum
## 1 A/04 Alien Ash Male BAP4
## 2 A/04 Alien Ash Male BAP5
## 3 A/04 Alien Ash Male BAP8
## 4 A/04 Alien Ash Male BAP12
## 5 A/04 Alien Ash Male BAP15
## 6 A/04 Alien Ash Male BAP20
## spectrum_positive spectrum_negative mean sd
## 1 masculine feminine -16.9 22.3
## 2 charming awkward 23.1 25.2
## 3 strict lenient -32.8 20.9
## 4 artistic scientific 43.0 12.1
## 5 orderly chaotic -28.2 27.1
## 6 spiritual skeptical 34.5 22.1
summary(data_full)
## character_code fictional_work character_name gender
## Length:28800 Length:28800 Length:28800 Length:28800
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## spectrum spectrum_positive spectrum_negative mean
## Length:28800 Length:28800 Length:28800 Min. :-49.4000
## Class :character Class :character Class :character 1st Qu.:-17.8000
## Mode :character Mode :character Mode :character Median : 1.0000
## Mean : 0.8211
## 3rd Qu.: 19.3000
## Max. : 48.1000
## sd
## Min. : 0.80
## 1st Qu.:19.90
## Median :24.20
## Mean :23.48
## 3rd Qu.:27.50
## Max. :42.70
To show the distribution of the personality spectrums, mainly the respective mean values and possibly standard deviations of the personality spectrums are needed, but to add depth th the visualization, it would be meaningful if users can compare the personality distributions between male and female characters.
It would be very useful for users to know whether majority of characters tend to fall within a certain range of a personality spectrum. As there are many dimensions to this personality spectrum, DVs for this category will mainly be visualized using Tableau, so that users can explore the dataset based on their own curiosity as it offers a very customizable, filterable, interactive feature.
Data for Character Personalities DVs
Lastly, correlation values between different characters are needed to indicate whether two characters have similar or opposite personalities. These correlation matrices were calculated using the mean spectrum scores of the characters, as illustrated in the previous blog, and can be obtained from the char_cor data files. Matrices were calculated for each of the full dataset, 10 selected series, and the 10 series combined.
Below is an example of the mean_matrix of the Avatar the Last Airbender series.
load("~/COMM2501 Portfolio - z5218332/files/char_cor_ala.Rda")
char_cor_ala
## ALA/01 ALA/02 ALA/03 ALA/04 ALA/05 ALA/06
## ALA/01 1.0000000 -0.11353446 0.60562139 0.12145248 0.48090792 -0.60267446
## ALA/02 -0.1135345 1.00000000 0.05948133 0.62837136 0.08113064 0.49277943
## ALA/03 0.6056214 0.05948133 1.00000000 0.09800885 0.15670493 -0.34491950
## ALA/04 0.1214525 0.62837136 0.09800885 1.00000000 0.50904654 0.38930006
## ALA/05 0.4809079 0.08113064 0.15670493 0.50904654 1.00000000 -0.09552781
## ALA/06 -0.6026745 0.49277943 -0.34491950 0.38930006 -0.09552781 1.00000000
## ALA/07 0.7893246 -0.06831766 0.74306159 0.14140753 0.24760274 -0.51295471
## ALA/08 -0.7396712 0.45282289 -0.45976223 0.18887431 -0.31221333 0.85084865
## ALA/09 0.7314900 -0.13359811 0.34704383 0.06248613 0.42267790 -0.21967323
## ALA/10 0.5935482 0.03293412 0.71430933 0.19896495 0.25794318 -0.47748013
## ALA/07 ALA/08 ALA/09 ALA/10
## ALA/01 0.78932460 -0.7396712 0.73148998 0.59354820
## ALA/02 -0.06831766 0.4528229 -0.13359811 0.03293412
## ALA/03 0.74306159 -0.4597622 0.34704383 0.71430933
## ALA/04 0.14140753 0.1888743 0.06248613 0.19896495
## ALA/05 0.24760274 -0.3122133 0.42267790 0.25794318
## ALA/06 -0.51295471 0.8508487 -0.21967323 -0.47748013
## ALA/07 1.00000000 -0.5635277 0.46594610 0.80657537
## ALA/08 -0.56352771 1.0000000 -0.50076635 -0.39609912
## ALA/09 0.46594610 -0.5007664 1.00000000 0.13281086
## ALA/10 0.80657537 -0.3960991 0.13281086 1.00000000
These correlation matrices were calculated using the available 36 selected personality spectrum. These personality correlation values would be a more accurate in values if all the available spectrums in the raw dataset was used. However, the remaining personality spectrums will not be used for the ease of computation and maintain the consistency of the main data to be used for this portfolio. The character personality correlation values have been checked by me using the fictional characters I am familiar with. For example, it has been ensured that cheerful, lightheaded characters such as Aang and Ty Lee have a very high correlation value, while Aang and characters with a serious, evil personality such as Azula or Ozai would have a negatively high correlation value.
In this character correlation matrices, row and column names are in character codes, hence for the creation of DVs a separate dataframe would be needed to insert the character names in for better user comprehension. These correlation values reflect the similarity and differences in character personalities very well overall. It should be noted however, that these values were calculated based on a pool of ratings by people with different opinions and interpretations with the characters. In the full dataset, the standard deviations of people’s ratings on the characters’ personalities are included, yet are not taken into account to the correlation calculation.
Hence, it should be noted that there are certain characters with an evident personality persona, yet there are those whose personas are ambiguous. Users should then interpret these character correlation values do not reflect the characters as a whole, but as a general indicator on how similar or different the personalities of characters are.