Wrangling and Munging Data with SQL and R

Events

  • 08 March 2013
  • statistics.com
  • Organiser: statistics.com
  • Event Details

Course tutor: Kathryn Vasilaky

Please note this course runs until 5th April 2013.

Aim of Course:

Most big data are stored in relational database management systems, which are organized as a number of related tables. To perform common statistical analyses, the data must typically be in a single table.

The SQL (structured query language) programming language is often used to pull data from the various tables in a database and assemble it in a format amenable to statistical analysis or review. SQL can also be used for basic calculations, but it’s not meant for heavy-duty statistical programming.

The purpose of this course is to teach you how to extract data from a relational database using SQL, and merge it into a single file in R, so that you can perform statistical operations.

An example you might face at work is: you’d like to see the effect of interest rates on the number of X (let’s say cars) sold in each state. Interest rates on car loans are in one table, sales for each dealership in each state are in another table. You’ll need to aggregate sales by dealership and state, and merge in interest rates by state. Finally, you’ll want to run a linear regression or correlation between total sales and interest rates.

This course will help you solve the above, and similar such problems. You will learn how to think “like” a relational database, so that you can manipulate matrixes and vectors of data using SQL queries. Then you will learn how to bring data from your database and organize it into a flatfile in R. This will allow you, the user, to do anything from basic statistical calculations (e.g. averages, tabulations, linear regressions, test of two means) to machine learning algorithms on your data. By the end of the 4 weeks, you should be able to visualize how you need to manipulate a dataset in order to perform a desired calculation or answer a particular question.


Related Topics

Related Publications

Related Content

Site Footer

Address:

This website is provided by John Wiley & Sons Limited, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ (Company No: 00641132, VAT No: 376766987)

Published features on StatisticsViews.com are checked for statistical accuracy by a panel from the European Network for Business and Industrial Statistics (ENBIS)   to whom Wiley and StatisticsViews.com express their gratitude. This panel are: Ron Kenett, David Steinberg, Shirley Coleman, Irena Ograjenšek, Fabrizio Ruggeri, Rainer Göb, Philippe Castagliola, Xavier Tort-Martorell, Bart De Ketelaere, Antonio Pievatolo, Martina Vandebroek, Lance Mitchell, Gilbert Saporta, Helmut Waldl and Stelios Psarakis.