**Timetable for FY2022 (Meeting Room 6A, 9am - 12pm)**

**Module assessment **

Attendance: 5%

Class participation: 10%

Group work on Excel: 5%

Mid-term Exam: 40%

Team-based project presentation: 40%

**Module details**

__Week 1 An Introduction to Data Science __

Learning objectives

At the end of the lesson:

Students are expected to understand why data science is important at present and in the future.

Students should appreciate the changing computational biology landscape

Students should know the importance of reproducibility in data science and how to improve data reproducibility

Students should understand the best practices for file naming

Students should be able to identify different types of data and strategise the best methods to use for interpreting different data types.

__Week 1 Data Analysis With Spreadsheets and Graphpad Prism__

Learning objectives

At the end of the lesson:

Students should be able to understand the strengths and limitations of Microsoft Excel

Students should know how to use Excel functions, including but not limited to Excel formulas, filter functions, sort and filter functions

Students should be able to know the most common errors inherent with Excel analysis.

Students should know how to plot publication quality figures and panels with Graphpad Prism

__Week 2 GitHub and Introduction to Python__

Learning objectives

At the end of the lesson:

Students should know how to naviage GitHub and understand some of the useful features of the GitHub repository

Students should understand simple markdown language to create the read.md file for other users to understand your repository

Students should be able to use GitHub to store data files and make their data file sharing private/public

Students should be able to appreciate why Python is becoming a popular programming language, and understand the useful features of Python

Students should know why Python is preferred over Excel for omics data aanlysis

Students should be able to use Jupyter Notebook or JupyterLab to run simple Python codes

__Week 2 Pandas and Exploratory Data Analysis__

Learning objectives

At the end of the lesson:

Students should be able to understand why the Pandas library is critical for data analysis

Students are expected to be able to read csv or Excel files in Python using the Pandas function

Students should be able to manipulate and check the dataframes uploaded into Python

Students should be able to understand the benefits of performing exploratory data analysis on their datasets

__Week 3 Applied Biostatistics__

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the origins of errors in experiments and use graphs to appropriate depict errors from experiments Use Python codes to calculate descriptive statistics

Understand the workflows for performing inferential statistics

Define the meaning of type I and type II errors, and how to best control for these 2 types of errors

Learn how to use standardise mean difference to control for large sample size analysis

Understand the different methods to perform inferential statistics and be able to identify the most suitable methods to use for analysis

Define the meaning behind data normality, skewness and kurtosis, and know how to use Python codes to measure these parameters

Students should be able to understand why the Pandas library is critical for data analysis

__Week 3 Data Processing, Scaling and Normalisation__

Learning objectives

At the end of the lesson, students are expected to:

Understand the importance for data preprocessing and the workflows associated with data preprocessing

Know how to use Python codes to do data filtering, manage missing data and handle duplicate terms

Appreciate the importance of data normalisation and the different methods that can be used for data normalisation

Use Python codes to perform data normalisation and use data visualisation tools to visualise normalised data

__Week 4 Chart Anatomy and Data Visualisation__

Learning objectives

At the end of the lesson, students are expected to:

Be familiar with chart anatomy and understand why they are necessary for data presentation

Know how to use different colours to best annotate their graphs

Appreciate the use of data transformation to better represent skewed data Learn how to best use bar charts, histograms, density plots, dot plots, box plots, violin plots, pie charts and heatmaps for data visualisation

Know how to use scatterplots, pair plots and correlation matrices to show association between 2 or more variables

Understand how to present data in graphs, but without causing data misinformation

__Week 4 Features and Outlier Detection Approaches__

Learning objectives

At the end of the lesson, students are expected to:

Understand the different methods of correlation

Know the characteristics of different kinds of outliers and how to manage outliers in datasets

Use Python codes to do outlier management

Appreciate the importance of batch effects and how they influence measurements

__Week 5 Pathway Enrichment Analysis__

Learning objectives

At the end of the lesson, students are expected to:

Understand the bioifnormatics workflows involved in omics data analysis

Appreciate the importance of volcano plots in omics data visualisation

Execute the web tools used for pathway analysis, including Enrichr, GSEA, gProfiler and REVIGO

Understand how to identify functions of genes that are differentially regulated in enriched pathways

Have an overview of the different high-throughput tools for molecular profiling

__Week 5 Systems Biology Workflows__

Learning objectives

At the end of the lesson, students are expected to:

Have a deep understanding of the different Python codes to facilitate omics data analysis

Understand how to interpret data generated from omics data analysis

__Week 6 Meta-analysis and Databases__

Learning objectives

At the end of the lesson, students are expected to:

Appreciate why depositing datasets in data repositories are useful, and the critical information required for depositing datasets

Understand the need for meta-analysis and how to improve consistencies between different datasets

Know the common visualisation tools (ie Forest plots and AUC-ROC curves) for meta-analysis Understand how to make a database from meta-analysis for other users to query against

__Week 7 Streamlit for Webtool Development__

Learning objectives

At the end of the lesson, students are expected to:

Understand the concepts of front-end and back-end development, and the programming languages to execute front-end and back-end commands

Understand how Voila and Streamlit Python open-source packages can be used for data dashboarding

Know how to build web tools using Streamlit

__Week 8 Machine Learning__

Learning objectives

At the end of the lesson, students are expected to:

Understand what machine learning is about, the concepts involved and how it can help facilitate decision making

Understand the scenerios where machine learning can fail

Understand the difference between supervised and unsupervised learning, and the machine learning algorithms that can be used Appreciate that Python packages can help in machine learning

__Week 8 Co-expression Networks and Time-course Studies__

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the need to develop new models and statistical tools for big data analysis

Understand the fundamental concepts of WGCNA, EDGE and pseudotime for big data analysis

__Week 9 Virus Sequencing Analysis__

Learning objectives

To be confirmed

__Week 10 Making Publication Quality Figures__

Learning objectives

At the end of the lesson, students are expected to:

Appreciate the use of Adobe Illustrator to generate publication quality figures

Learn how to use Adobe Illustrator to edit figures and create standardised figures

Understand how to present tables in publications