Module Information | Website for Kuan Rong Chan Lab

Timetable for FY2023 (3pm - 6pm)

Module assessment

Class participation and Attendance: 20%

Mid-term Exam: 40%

Team-based project presentation: 40%

Module details

Week 1 An Introduction to Data Science

Learning objectives

At the end of the lesson:

Students are expected to understand why data science is important at present and in the future.
Students should appreciate the changing computational biology landscape
Students should know the importance of reproducibility in data science and how to improve data reproducibility
Students should understand the best practices for file naming
Students should be able to identify different types of data and strategise the best methods to use for interpreting different data types.

Week 1 Data Analysis With Spreadsheets and Graphpad Prism

Learning objectives

At the end of the lesson:

Students should be able to understand the strengths and limitations of Microsoft Excel
Students should know how to use Excel functions, including but not limited to Excel formulas, filter functions, sort and filter functions
Students should be able to know the most common errors inherent with Excel analysis.
Students should know how to plot publication quality figures and panels with Graphpad Prism

Week 2 GitHub and Introduction to Python

Learning objectives

At the end of the lesson:

Students should know how to naviage GitHub and understand some of the useful features of the GitHub repository
Students should understand simple markdown language to create the read.md file for other users to understand your repository
Students should be able to use GitHub to store data files and make their data file sharing private/public
Students should be able to appreciate why Python is becoming a popular programming language, and understand the useful features of Python
Students should know why Python is preferred over Excel for omics data aanlysis
Students should be able to use Jupyter Notebook or JupyterLab to run simple Python codes

Week 2 Pandas and Exploratory Data Analysis

Learning objectives

At the end of the lesson:

Students should be able to understand why the Pandas library is critical for data analysis
Students are expected to be able to read csv or Excel files in Python using the Pandas function
Students should be able to manipulate and check the dataframes uploaded into Python
Students should be able to understand the benefits of performing exploratory data analysis on their datasets

Week 3 Applied Biostatistics

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the origins of errors in experiments and use graphs to appropriate depict errors from experiments Use Python codes to calculate descriptive statistics
Understand the workflows for performing inferential statistics
Define the meaning of type I and type II errors, and how to best control for these 2 types of errors
Learn how to use standardise mean difference to control for large sample size analysis
Understand the different methods to perform inferential statistics and be able to identify the most suitable methods to use for analysis
Define the meaning behind data normality, skewness and kurtosis, and know how to use Python codes to measure these parameters
Students should be able to understand why the Pandas library is critical for data analysis

Week 3 Data Processing, Scaling and Normalisation

Learning objectives

At the end of the lesson, students are expected to:

Understand the importance for data preprocessing and the workflows associated with data preprocessing
Know how to use Python codes to do data filtering, manage missing data and handle duplicate terms
Appreciate the importance of data normalisation and the different methods that can be used for data normalisation
Use Python codes to perform data normalisation and use data visualisation tools to visualise normalised data

Week 4 Chart Anatomy and Data Visualisation

Learning objectives

At the end of the lesson, students are expected to:

Be familiar with chart anatomy and understand why they are necessary for data presentation
Know how to use different colours to best annotate their graphs
Appreciate the use of data transformation to better represent skewed data Learn how to best use bar charts, histograms, density plots, dot plots, box plots, violin plots, pie charts and heatmaps for data visualisation
Know how to use scatterplots, pair plots and correlation matrices to show association between 2 or more variables
Understand how to present data in graphs, but without causing data misinformation

Week 4 Features and Outlier Detection Approaches

Learning objectives

At the end of the lesson, students are expected to:

Understand the different methods of correlation
Know the characteristics of different kinds of outliers and how to manage outliers in datasets
Use Python codes to do outlier management
Appreciate the importance of batch effects and how they influence measurements

Week 5 Pathway Enrichment Analysis

Learning objectives

At the end of the lesson, students are expected to:

Understand the bioifnormatics workflows involved in omics data analysis
Appreciate the importance of volcano plots in omics data visualisation
Execute the web tools used for pathway analysis, including Enrichr, GSEA, gProfiler and REVIGO
Understand how to identify functions of genes that are differentially regulated in enriched pathways
Have an overview of the different high-throughput tools for molecular profiling

Week 5 Systems Biology Workflows

Learning objectives

At the end of the lesson, students are expected to:

Have a deep understanding of the different Python codes to facilitate omics data analysis
Understand how to interpret data generated from omics data analysis

Week 6 Meta-analysis and Databases

Learning objectives

At the end of the lesson, students are expected to:

Appreciate why depositing datasets in data repositories are useful, and the critical information required for depositing datasets
Understand the need for meta-analysis and how to improve consistencies between different datasets
Know the common visualisation tools (ie Forest plots and AUC-ROC curves) for meta-analysis Understand how to make a database from meta-analysis for other users to query against

Week 7 Streamlit for Webtool Development

Learning objectives

At the end of the lesson, students are expected to:

Understand the concepts of front-end and back-end development, and the programming languages to execute front-end and back-end commands
Understand how Voila and Streamlit Python open-source packages can be used for data dashboarding
Know how to build web tools using Streamlit

Week 8 Machine Learning

Learning objectives

At the end of the lesson, students are expected to:

Understand what machine learning is about, the concepts involved and how it can help facilitate decision making
Understand the scenerios where machine learning can fail
Understand the difference between supervised and unsupervised learning, and the machine learning algorithms that can be used Appreciate that Python packages can help in machine learning

Week 8 Co-expression Networks and Time-course Studies

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the need to develop new models and statistical tools for big data analysis
Understand the fundamental concepts of WGCNA, EDGE and pseudotime for big data analysis

Week 9 Virus Sequencing Analysis

Learning objectives

To be confirmed

Week 10 Making Publication Quality Figures

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the use of Adobe Illustrator to generate publication quality figures
Learn how to use Adobe Illustrator to edit figures and create standardised figures
Understand how to present tables in publications