Introduction to Data Science using R (RE)

Training course goals:

R is an open-source programming language which has been extensively used in statistics, visualization and data mining.

The goal of this introductory course is to teach how to use R for data science while especially focusing on novel R packages (e.g., tidyverse) that improve the performance and ease of use compared to base R. Course participants will be guided through a systematic introduction to the usage of R and its packages for Data Science. Many exercises will help deepen the knowledge conveyed during the course.

Target group:

This course is aimed at all users who would like to use R for Data Science and who have little to no experience in R. This course is also suitable for users whose R experience dates back several years and who would like to catch up with the newest packages, features, and improvements that have occurred within the last seven years.

Course contents:

The course starts by guiding the user through the installation of R and RStudio (an advanced R IDE) as well demonstrating their usage. This is followed by a systematic introduction to the R syntax and the use of several packages from the tidyverse group that facilitate data management, data manipulation and visualization. Simple statistical procedures and their implementation in R will also be introduced. The course concludes by providing an outlook on the use of R for creating dashboards (shiny) or reports (markdown).

In specific, the following topics will be covered:

  • Installation of R and RStudio (an IDE for R)
  • Introduction to the features and usage of RStudio
  • Basics of the R Syntax (base)
  • Solving problems by using the help function and internet search
  • Introduction to the Data Science workflow using the package tidyverse:
    • Data management und data cleaning using readr, tibble, and tidyr
    • Data manipulation using dplyr
    • Graphical data exploration using ggplot2 (e.g., histograms, lineplots, scatterplots, boxplots)
    • Adjustment of Plots und and multi-plots using ggplot2
  • Simple statistical methods (linear regressions, Student's T-Tests, ANOVA)
  • Outlook: Dashboarding using shiny and reporting using markdown

The complete course is taught in an interactive way to enable the participants to explore the various mehtods and procedures right away. Furthermore, every section will be followed by a set of excercises. The course is completely documented; this makes taking notes during the course obsolete and enables participants to fully focus on practicing the methods in R

Requirements:

No R experience is required to participate in this course. However, previous experience in using R or other programming languages as well as basic statistical knowledge is beneficial.

Supplementary courses:

Further R courses are currently being developed, which will cover the following topics: "advanced visualization in R", "Dashboarding and reporting in R", "introduction to statistics using R", and "advanced statistics using R".

Furthermore, we offer a series of methodological courses, which cover different software independent topics, for example methodological courses as “Efficient Data Management” (EDM), “Big Data Analytics” (BDA), “Decision Trees” (TRM), as well as “Methods of Statistical Data Mining” (MDM) and “Data Mining in Practice” (PDM).

Duration: 2 Days             Time: 9:30 - 17:00 h            Price: EUR 1.040,- (plus VAT) per participant

 

Register

back to overview