The course aims to introduce methods and models to extract relevant information from large amounts of data, with particular attention to statistical learning (statistical learning) both in a predictive and non-predictive context (supervised and non-supervised learning). In order to provide the skills for the analysis and modeling of real data, the lessons will be supplemented by R exercises in the computer room.
Program:
Introduction to data mining and statistical learning.
Data visualization techniques: Regression and Classification: multiple linear regression, logistic regression, discriminant analysis and K-nearest neighbors.
Non-linear methods (flexible regression): polynomial regression, regression splines, smoothing splines, generalized additive models.
Unsupervised learning: association rules, principal component analysis, grouping methods.