Data Analysis

Data for Dataset Composition DVs

From the previous blog on data preparations, I will be using several main datasets to work with different purposes of DVs.

To visualize data regarding the fictional characters and works composition of the Open-Source Psychometrics site (as of 2020), a character_index dataframe was created.

setwd("~/COMM2501 Portfolio - z5218332")
load("~/COMM2501 Portfolio - z5218332/files/character_index.Rda")
head(character_index)

##   character_code fictional_work character_name gender
## 1           A/01          Alien         Dallas   Male
## 2           A/02          Alien   Ellen Ripley Female
## 3           A/03          Alien        Lambert Female
## 4           A/04          Alien            Ash   Male
## 5           A/05          Alien      the Alien   Male
## 6           A/06          Alien         Parker   Male

summary(character_index)

##  character_code     fictional_work     character_name        gender         
##  Length:800         Length:800         Length:800         Length:800        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character

From this dataset, it can be observed that there are 800 characters in the dataset, with the information of their character codes as assigned by Open-Psychometrics, respective fictional works, character names and gender. Hence, visualizations using this dataset can be regarding the proportion how much a series make up the Open-Psychometrics data. It would be very interesting to visualize the amount of characters of each gender which makes up a certain series as well.

Although, audiences should be kept in mind the DVs which will be created using this dataset represent mainly the amount of characters and series which are included in the Open-Psychometrics data, and that it does not necessarily reflect the amount of characters, or genders in the series. It should also be noted that series which are longer in length and are more popular are more likely to have more characters in the dataset. Nevertheless, these DVs generally would give a good indication of the amount of prominent characters in the series and the proportion of gender in the prominent characters in the series.

Data for Personality Spectrum DVs

To illustrate the data distribution of personality spectrums across characters and genders, the full dataset would be used. The full dataset can be found in the data_full R dataframe file.

load("~/COMM2501 Portfolio - z5218332/files/data_full.Rda")
head(data_full)

##   character_code fictional_work character_name gender spectrum
## 1           A/04          Alien            Ash   Male     BAP4
## 2           A/04          Alien            Ash   Male     BAP5
## 3           A/04          Alien            Ash   Male     BAP8
## 4           A/04          Alien            Ash   Male    BAP12
## 5           A/04          Alien            Ash   Male    BAP15
## 6           A/04          Alien            Ash   Male    BAP20
##   spectrum_positive spectrum_negative  mean   sd
## 1         masculine          feminine -16.9 22.3
## 2          charming           awkward  23.1 25.2
## 3            strict           lenient -32.8 20.9
## 4          artistic        scientific  43.0 12.1
## 5           orderly           chaotic -28.2 27.1
## 6         spiritual         skeptical  34.5 22.1

summary(data_full)

##  character_code     fictional_work     character_name        gender         
##  Length:28800       Length:28800       Length:28800       Length:28800      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    spectrum         spectrum_positive  spectrum_negative       mean         
##  Length:28800       Length:28800       Length:28800       Min.   :-49.4000  
##  Class :character   Class :character   Class :character   1st Qu.:-17.8000  
##  Mode  :character   Mode  :character   Mode  :character   Median :  1.0000  
##                                                           Mean   :  0.8211  
##                                                           3rd Qu.: 19.3000  
##                                                           Max.   : 48.1000  
##        sd       
##  Min.   : 0.80  
##  1st Qu.:19.90  
##  Median :24.20  
##  Mean   :23.48  
##  3rd Qu.:27.50  
##  Max.   :42.70

To show the distribution of the personality spectrums, mainly the respective mean values and possibly standard deviations of the personality spectrums are needed, but to add depth th the visualization, it would be meaningful if users can compare the personality distributions between male and female characters.

It would be very useful for users to know whether majority of characters tend to fall within a certain range of a personality spectrum. As there are many dimensions to this personality spectrum, DVs for this category will mainly be visualized using Tableau, so that users can explore the dataset based on their own curiosity as it offers a very customizable, filterable, interactive feature.

Data for Character Personalities DVs

Lastly, correlation values between different characters are needed to indicate whether two characters have similar or opposite personalities. These correlation matrices were calculated using the mean spectrum scores of the characters, as illustrated in the previous blog, and can be obtained from the char_cor data files. Matrices were calculated for each of the full dataset, 10 selected series, and the 10 series combined.

Below is an example of the mean_matrix of the Avatar the Last Airbender series.

load("~/COMM2501 Portfolio - z5218332/files/char_cor_ala.Rda")
char_cor_ala

##            ALA/01      ALA/02      ALA/03     ALA/04      ALA/05      ALA/06
## ALA/01  1.0000000 -0.11353446  0.60562139 0.12145248  0.48090792 -0.60267446
## ALA/02 -0.1135345  1.00000000  0.05948133 0.62837136  0.08113064  0.49277943
## ALA/03  0.6056214  0.05948133  1.00000000 0.09800885  0.15670493 -0.34491950
## ALA/04  0.1214525  0.62837136  0.09800885 1.00000000  0.50904654  0.38930006
## ALA/05  0.4809079  0.08113064  0.15670493 0.50904654  1.00000000 -0.09552781
## ALA/06 -0.6026745  0.49277943 -0.34491950 0.38930006 -0.09552781  1.00000000
## ALA/07  0.7893246 -0.06831766  0.74306159 0.14140753  0.24760274 -0.51295471
## ALA/08 -0.7396712  0.45282289 -0.45976223 0.18887431 -0.31221333  0.85084865
## ALA/09  0.7314900 -0.13359811  0.34704383 0.06248613  0.42267790 -0.21967323
## ALA/10  0.5935482  0.03293412  0.71430933 0.19896495  0.25794318 -0.47748013
##             ALA/07     ALA/08      ALA/09      ALA/10
## ALA/01  0.78932460 -0.7396712  0.73148998  0.59354820
## ALA/02 -0.06831766  0.4528229 -0.13359811  0.03293412
## ALA/03  0.74306159 -0.4597622  0.34704383  0.71430933
## ALA/04  0.14140753  0.1888743  0.06248613  0.19896495
## ALA/05  0.24760274 -0.3122133  0.42267790  0.25794318
## ALA/06 -0.51295471  0.8508487 -0.21967323 -0.47748013
## ALA/07  1.00000000 -0.5635277  0.46594610  0.80657537
## ALA/08 -0.56352771  1.0000000 -0.50076635 -0.39609912
## ALA/09  0.46594610 -0.5007664  1.00000000  0.13281086
## ALA/10  0.80657537 -0.3960991  0.13281086  1.00000000

These correlation matrices were calculated using the available 36 selected personality spectrum. These personality correlation values would be a more accurate in values if all the available spectrums in the raw dataset was used. However, the remaining personality spectrums will not be used for the ease of computation and maintain the consistency of the main data to be used for this portfolio. The character personality correlation values have been checked by me using the fictional characters I am familiar with. For example, it has been ensured that cheerful, lightheaded characters such as Aang and Ty Lee have a very high correlation value, while Aang and characters with a serious, evil personality such as Azula or Ozai would have a negatively high correlation value.

In this character correlation matrices, row and column names are in character codes, hence for the creation of DVs a separate dataframe would be needed to insert the character names in for better user comprehension. These correlation values reflect the similarity and differences in character personalities very well overall. It should be noted however, that these values were calculated based on a pool of ratings by people with different opinions and interpretations with the characters. In the full dataset, the standard deviations of people’s ratings on the characters’ personalities are included, yet are not taken into account to the correlation calculation.

Hence, it should be noted that there are certain characters with an evident personality persona, yet there are those whose personas are ambiguous. Users should then interpret these character correlation values do not reflect the characters as a whole, but as a general indicator on how similar or different the personalities of characters are.