The Applied Statistics and Data Science Group (ASDa) in the UBC Department of Statistics provides assistance in the statistical formulation of research questions, the design of experiments and sampling plans for surveys, the choice and explanation of statistical methodology, statistical computing and graphics, statistical analysis, the interpretation of findings, and more. ASDa also plays an active role in continuing education on and off the UBC campus, giving seminars and workshops on statistical concepts and methodology to various departments, research groups and at teaching hospitals.
Previously recorded topics from ASDa and Graduate Pathways for Success Program from the Faculty of Graduate Studies (UBC CWL access):
Previously recorded topics from ASDa and Vancouver Coastal Health, with focus on medical studies:
Data Science: A First Introduction is an open source textbook originally written for UBC's DSCI 100 - Introduction to Data Science course aimed to introduce undergraduate students to data science.
This book is structured so that learners spend the first four chapters learning how to use R programming language and Jupyter notebooks to load, wrangle/clean, and visualize data, while answering descriptive and exploratory data analysis questions. The remaining chapters illustrate how to solve four common problems in data science, which are useful for answering predictive and inferential data analysis questions:
Predicting a class/category for a new observation/measurement
Predicting a value for a new observation/measurement
Finding previously unknown/unlabelled subgroups in the data
Estimating an average or a proportion from a representative sample and using that estimate to generalize to the broader population
Harvard COMPSCI 229r: Algorithms for Big Data (25 Lectures)
Stanford CS221: Artificial Intelligence: Principles and Techniques (19 Lectures)
Stanford CS224N: Natural Language Processing with Deep Learning (22 Lectures)
Stanford CS229: Machine Learning (20 Lectures) - Coursera Course
Stanford Lecture Collection: Convolutional Neural Networks for Visual Recognition (16 Lectures)
MIT 6.0002: Introduction to Computational Thinking and Data Science (15 Lectures)
MIT 6.S191: Introduction to Deep Learning (43 Lectures)
Steve Brunton Intro to Data Science (Short Videos)
AMBOSS Medical Statistics (Short Videos)
Kaggle is an online community of data scientists and machine learning experts that allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
Its free and open courses on data science pare down complex topics to their key practical components for learners to gain usable skills.
The courses include but not limited to:
Intro to Machine Learning
Data Science for Healthcare: Python Fundamentals
Some kinds of data are sensitive, and cannot be shared for legal or ethical reasons. This can include:
Sensitive ecological data
Sacred or protected cultural practices
De-identification means removing identifying data from a dataset. Once a dataset has been de-identified, the dataset can be shared without disclosing identifying information.
Removing identifiers is important to protect the confidentiality of research participants. But there is always a risk of re-identifying data, and changing technology introduces new ways to re-identify data. Managing that risk is an important part of sharing research data.
UBC Library introduces several ways of approaching de-identification, each with its own benefits and drawbacks.
The Privacy and Information Security Management team is proud to offer ongoing Workshops to the UBC community. Each session focuses on a specific privacy and/or security topic, and is presented by subject matter experts of the chosen item.
Topics are chosen based upon community feedback, as well as current news items as they relate to UBC and UBC staff, faculty and researchers.