Data Analysis in Social Science 3
Alexey Bessudnov, expanded and edited by Max Shilling (BSc Sociology student 2018)
This website accompanied the module as offered in 2018. While in 2019 most things remained similar, I have introduced some changes. For the 2019 version of the module, please use the Github repository here – https://github.com/dataanalysis3/datan3_2019 . It contains the updated module outline and all other teaching materials.
This is a website for the Data Analysis in Social Science 3 module at the University of Exeter (SOC2094/3094 POL2094/3094) as offered in Term 2 in 2018.
The website will be updated with new R Markdown scripts as the course progresses. All the scripts are provided on the ‘as is’ basis.
The idea for this module is to teach you how to work with complex longitudinal data sets using data from the Understanding Society, a longitudinal household survey conducted in the UK. You will learn how to read complex data into R, manipulate and summarise data using dplyr, merge and restructure data frames, visualise data using ggplot2, create statistical reports with R Markdown and (possibly) interactive applications with Shiny. For more details see the module outline.
The module is organised according to the “flipped classroom” pedagogy. This means that you read new material and do exercises before the class, and in class we work together on the exercises and discuss how what you learned at home can be applied to the Understanding Society data. The readings and exercises mostly come from the R for Data Science book by G.Grolemund and H.Wickham.
The pre-requisites for this module are SOC/POL1041 and SOC/POL2077.
All the scripts and other materials are available in the Github repository for the module: https://github.com/abessudnov/dataanalysis3
1.1 Things to do before the next class
Before coming to class 2 please do the following:
Register an account with the UK Data Service, create a data usage (or join the existing data usage that I have created) and download the Understanding Society data in the tab delimited format (SN6614). Read the User Manual and familiarise yourself with the structure of the data.
Install Git and register an account on Github (if you do not have it already). Then you can either create a new repository for this module or fork my repository (see the link above). Create a project in R Studio for this repository.
To learn how Git and Github work take this free online course: https://www.datacamp.com/courses/introduction-to-git-for-data-science/
- In the root folder for your project create a folder “data” and put there the Understanding Society data as you downloaded them. The next subfolder must be “UKDA-6614-tab”. Never change anything in this folder. Also create an empty “myData” folder in the root folder.
You do not want to track these two folders on Github. To avoid this, include the following two lines in your .gitignore file:
- Read ch.11 (Data Import) from R for Data Science and do the exercises - http://r4ds.had.co.nz/data-import.html