SOCIAL DATA SCIENCE

Università degli Studi
"G. d'Annunzio"

Chieti - Pescara

Single discipline educational activity

Course Sheet

Course code:

RSPC24

Academic Year of enrolment:

2022

Year of supply:

2023/2024

Degree:

Course of 2nd Cycle Degree in SOCIAL RESEARCH, SECURITY POLICY AND CRIMINAL STUDIES

Disciplinary Sector:

Social Statistics

Language:

Italian

Location:

CHIETI

Credits:

Programme Year:

Professor and Collaborators:

FONTANELLA Lara

Cycle:

First semester

Hours of classroom activity:

Individual study hours:

102

Prerequisites:

Basic knowledge of statistics

Objectives

The course aims to provide the student with the tools to extract relevant information from large amounts of data, with particular attention to statistical learning (statistical learning) both in a predictive and non-supervised context (supervised and non-supervised learning). Besides, this course introduces students to Text mining. Text mining is a recent field of research whose development is strictly linked to the increasing volume of online text data and the development of statistical methodologies and algorithms for information retrieval and automatic classification.
Analysis will be performed through the statistical language R.

LEARNING OUTCOMES
1. Understand the nature of multivariate and textual data and the statistical techniques exploited to analyse them.
2. Understanding and ability to explain the fundamentals of algorithms for extracting information from multivariate and textual databases
3. Ability to apply the principles of statistical reasoning in the preparation and interpretation of company reports
4. Ability to use the R software for statistical analysis

Making judgements
- To learn the logical and statistical concepts that are indispensable for working independently in the research, selection and processing of data relevant in the Data mining and Text mining context.
Communication skills
- Learn the terminology and statistical techniques to communicate or correctly discuss the results of the analysis of company data relevant in the Data mining and Text mining context.

Contents

The following topics are considered as important parts of the teaching program for the fulfilment of the objectives: Introduction to computing in R; Introduction to Statistical Learning; Data visualization; Regression and Classification; Non-supervised learning (principal component analysis, Clustering); Introduction to Text data mining; Text preparation; Text Analytics; Visualization or textual data; Web scraping.

Extended Syllabus

1. Introduction to R
2. Introduction to data mining and statistical learning.
3. Data visualization techniques
4. Review of probability
5. The multivariate Normal distribution
6. Supervised Learning Models (Regression, Classification)
7 Unsupervised Learning Models (Clustering, ACP)
8. Introduction to Text Mining
9. Preparation of texts (Standardization or preprocessing, tokenization, Stopwords, Stemming, "Bag of words" model)
10. Textual data display
11. Statistical analysis of textual data
12. Automatic classification of texts
13. Topic models
14. Web scraping

Recommended Bibliography

Coursebooks:

Vardanega, Agnese. 2011–2021. «R per l’analisi dei dati.Una wiki per l’analisi dei dati con R». 2011–2021. https://www.agnesevardanega.eu/wiki/r/start.

Vardanega, Agnese. 2022. «Strumenti per l’analisi testuale e il text mining con R». https://www.agnesevardanega.eu/books/analisi-testuale-2021/index.html

Further materials (e.g. slides) can be downloaded from https://fad.unich.it

Direct link: https://fad.unich.it/course/view.php?id=1342

English Textbooks:
James, Witten, Hastie, Tibshirani (2013) An Introduction to Statistical
Learning (with Applications in R), Springer-Verlag

Julia Silge, David Robinson, Text Mining With R: A Tidy Approach, Oreilly & Associates Inc (31 luglio 2017)

Methods of Provision

Conventional

Teaching Methods

Frontal lectures as well as practical exercises with the use of the software R. Attendance to teaching activities, even if not compulsory, is strongly recommended

Evaluation methods

Verification of learning:

The exam consists of a presentation of a project designed and developed during the course and an oral discussion on the same topics.

Non-attending students can find instructions for carrying out projects on the FAD website and are invited to contact the teacher for any clarifications.

Contacts/More Information

E-mail: lara.fontanella@unich.it
Students will be received after the lectures
Appointments can be fixed by e-mail

Università degli Studi
"G. d'Annunzio"

Chieti - Pescara

Aree