You are here

Data Science & Privacy Education

Applied Statistics and Data Science Group (ASDa)

The Applied Statistics and Data Science Group (ASDa) in the UBC Department of Statistics provides assistance in the statistical formulation of research questions, the design of experiments and sampling plans for surveys, the choice and explanation of statistical methodology, statistical computing and graphics, statistical analysis, the interpretation of findings, and more. ASDa also plays an active role in continuing education on and off the UBC campus, giving seminars and workshops on statistical concepts and methodology to various departments, research groups and at teaching hospitals.

Previously recorded topics from ASDa and Graduate Pathways for Success Program from the Faculty of Graduate Studies (UBC CWL access):

Previously recorded topics from ASDa and Vancouver Coastal Health, with focus on medical studies:

Data Science: A First Introduction Textbook

Data Science: A First Introduction is an open source textbook originally written for UBC's DSCI 100 - Introduction to Data Science course aimed to introduce undergraduate students to data science.

This book is structured so that learners spend the first four chapters learning how to use R programming language and Jupyter notebooks to load, wrangle/clean, and visualize data, while answering descriptive and exploratory data analysis questions. The remaining chapters illustrate how to solve four common problems in data science, which are useful for answering predictive and inferential data analysis questions:

  1. Predicting a class/category for a new observation/measurement

  2. Predicting a value for a new observation/measurement

  3. Finding previously unknown/unlabelled subgroups in the data

  4. Estimating an average or a proportion from a representative sample and using that estimate to generalize to the broader population

Open Lecture Series

Harvard COMPSCI 229r: Algorithms for Big Data (25 Lectures)

Stanford CS221: Artificial Intelligence: Principles and Techniques (19 Lectures)

Stanford CS224N: Natural Language Processing with Deep Learning (22 Lectures)

​​​​​

Stanford CS229: Machine Learning (20 Lectures) - Coursera Course

Stanford Lecture Collection: Convolutional Neural Networks for Visual Recognition (16 Lectures)

​​​​​​MIT 6.0002: Introduction to Computational Thinking and Data Science (15 Lectures)

MIT 6.S191: Introduction to Deep Learning (43 Lectures)

Manchester: Introduction to R for Health Data Science

Google Machine Learning Crash Course (25 Lessons)

Steve Brunton Intro to Data Science (Short Videos)

AMBOSS Medical Statistics (Short Videos)

​​​​​​

Kaggle Open Courses on Data Science

Kaggle is an online community of data scientists and machine learning experts that allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Its free and open courses on data science pare down complex topics to their key practical components for learners to gain usable skills.

The courses include but not limited to:

  • Python

  • Intro to Machine Learning

  • Data Visualization

  • SQL

  • Deep Learning

  • AI Ethics

  • Geospatial Analysis

Coding Resources for Beginners
UBC Library Research Data Management: Anonymize and De-Identify

Some kinds of data are sensitive, and cannot be shared for legal or ethical reasons. This can include:

  • Personal identifiers

  • Sensitive ecological data

  • Sacred or protected cultural practices

De-identification means removing identifying data from a dataset. Once a dataset has been de-identified, the dataset can be shared without disclosing identifying information.

Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.

UBC Library introduces several ways of approaching de-identification, each with its own benefits and drawbacks.