Data Science: Foundations using R

Instructor: Roger D. Peng, PhD , Brian Caffo, PhD , Jeff Leek, PhD

Beginner Level • 4 months at 10 hours a week • Flexible Schedule

What You'll Learn

  • Use R to clean, analyze, and visualize data.
  • Learn how to ask the right questions, obtain data, and perform reproducible research.
  • Use GitHub to manage data science projects.

Skills You'll Gain

Data Integration
R Programming
Statistical Analysis
Statistical Programming
Data Sharing
Data Visualization
Knitr
Ggplot2
Version Control
Exploratory Data Analysis
Big Data
Statistical Reporting

Shareable Certificate

Earn a shareable certificate to add to your LinkedIn profile

Outcomes

  • Learn in-demand skills from university and industry experts
  • Master a subject or tool with hands-on projects
  • Develop a deep understanding of key concepts
  • Earn a career certificate from Johns Hopkins University

5 courses series

In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment and describe generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples.

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.

Learner Testimonials

Felipe M.
Felipe M. • Learner since 2018

To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood.

Jennifer J.
Jennifer J. • Learner since 2020

I directly applied the concepts and skills I learned from my courses to an exciting new project at work.

Larry W.
Larry W. • Learner since 2021

When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go.

Chaitanya A.
Chaitanya A. • Learner since 2727

Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits.