Course

Practical data science

Statistics for big data & business intelligence

As a result of the large scale availability of data nowadays, the use of statistical methods has broadened considerably and the importance and meaning of data science has increased, not only in the laboratory and industry but also in marketing and business intelligence.

For such applications, this course offers essential insights into statistical concepts and skills needed to apply data analysis techniques responsibly. You will learn how to work statistically sound and interpret datasets and models correctly.

The course starts with a review of basic principles from the fields of statistics and probability theory. This provides a good starting point for the data analysis methods within the important fields of data mining (big data) and time series analysis, discussed thereafter. You will also gain experience in working effectively with the statistical software R.

A well-prepared start with data science

The course consists of three topics:

  1. Applied probability and statistics revisited
    intended as a refresher for the principles and techniques that are used within applied probability and statistics for the analysis and modeling of data.
  2. Data mining in a nutshell
    provides an introduction and overview of commonly used methods in the field of predictive analytics to generate predictions and classifications based on adequate models and to recognize patterns in large data sets.
  3. Time series analysis in a nutshell
     deals with methods for modeling time-dependent data and for making forecasts for the future based on these models.


This course also gives you a strong base for our specialist follow-up courses in the field of statistics such as Data mining & business analytics and Time series analysis and forecasting.

Intended for

Academics and higher professionals who want to make use of modern applied statistical techniques in their work and who want to familiarize themselves with the relevant skills, and want to get acquainted with the latest statistical freeware. The course is also suited for lecturers at universities or colleges of higher education who want to be informed on actual methods for data analysis and data science.

You have mathematics at at least secondary education level. Basic knowledge in the field of statistics is desirable.

In consultation with the participants this course can be taught in Dutch or English. Practical exercises are in R.

The techniques demonstrated with R, can also be performed in Python and yield comparable results. Participants receive sample data files so that they can reproduce the results using their preferred data analysis software. It is possible to use R from Python and Python from R and combine the advantages of each. Do you want to use Python or discover more possibilities of Python? View our course Python for engineers.

Share this page

  • Information
    Trainer: Dhr. Dr. J.J.M. Rijpkema (Technische Universiteit Eindhoven (TU/e))
    Course data: March 1, 8, 15 and 22 2021
    Location: Eindhoven
    Price: € 2,295.00 ex. vat
    Timetable information: Classroom (course will be online if necessary due to COVID)
    Language
    The program can be taught in English on request.
  • Program

    I. Statistics and applied probability theory, including an introduction to the software program ​R

    • Introduction and overview of the course and the software to be used R
    • Exploratory data analysis
    • Introduction to probability and probability distributions
    • Exercises exploratory data analysis and probability
    • Statistical testing and estimation in a nutshell
    • Selection, validation and use of probability distributions in practice
    • Exercises probability distributions and principles of testing & estimating

    II. Data mining in a nutshell

    • Prediction modeling based on linear regression methods
    • Selection, validation and use of regression models in practice
    • Exercises regression models
    • Classification modeling based on logistic regression methods
    • Selection, validation and use of logistic regression models in practice
    • Exercises logistic regression models
    • Alternative methods for predictive and classification modeling
    • Selection, validation and use of predictive and classification models in practice
    • Cluster analysis
    • Exercises predicting and classification models and cluster analysis

    III. Time series analysis in a nutshell

    • Introduction, characterisation and exploratory analysis of time series data
    • Time series models based on exponential smoothing
    • Selection, validation and use of exponential smoothing models in practice
    • Exercises exponential smoothing models
    • Box-Jenkins models for time series data
    • Selection, validation and use of Box-Jenkins models in practice
    • Exercises Box-Jenkins models
  • Reviews
    This course is assessed with a 8.4
    “Good start to use R. You really get tools to analyze data.”
    participant of Royal HaskoningDHV
    “Excellent course, meets all expectations.”
    participant of NXP Semiconductors BV Nijmegen
    “Very informative and applicable.”
    participant of Albemarle Catalysts BV
    “Very informative: good starting point for using R.”
    participant of Albemarle Catalysts BV
    “Very interesting. Despite the level of difficulty a good start to apply in my daily practice.”
    participant of Abbott