BMEG3105-Fall-2024

BMEG3105: Data analytics for personalized genomics and precision medicine — Fall 2024

[Pre-course survey, Piazza, Scribing preference, Logistics, Course schedule and materials, Assignments, Presentation schedule]

Course description

With social-economic development, people are increasingly caring about health. Consequently, in the field of genomics and healthcare, especially personalized genomics and precision medicine, we have accumulated a tremendous amount of data, which are waiting to be analyzed. This course is designed to equip students with the ability to analyze such data, which would benefit both the students’ personal development and society. In the course, we will cover high-throughput experimental methods, standard data processing pipelines, sequence alignment and mapping, foundational concepts of data analytics, data exploration and visualization, clustering and classification, dimension reduction, and their applications in personalized genomics and precision medicine. For personalized genomics, we will also cover the integration of heterogeneous sequencing and non-sequencing data, single-cell data analysis, multi-omics analysis methods, and cancer genomics. For precision medicine, we will cover protein-RNA interactions, biological graph analysis, and a gentle introduction to biomedical imaging and electronic health records.

Teaching team

Lecturer:

TA:

Time and location

Wednesday: 9:30am-11:15am, SC L4
Friday: 9:30am-10:15am, MMW 703
Friday: 10:30am-11:15am, MMW 703 (Tutorial)

Format

In-person. Slides will be available the day before the lecture.

Logistics

Communications

Blackboard is the main software to manage the course, and grading will be through Blackboard. We will use Piazza (BMEG3105) for discussion. You can ask questions and discuss on Piazza, even anonymously. For personal matters, please use the private post to the instructor and the TA. You are also very welcomed to send emails to the teaching team.

Grading

Bonus (up to 6%):

Open-book quiz and exam policy

All exams and quizzes are open-book. You are allowed to take any paper-based materials. However, no phone or computer is allowed. Other communication tools are also not allowed. Discussion is not allowed.

AI tools use policy

You can use AI tools including ChatGPT in the project to polish your report. However, you are required to submit both your own version and the one polished using AI tools. You are required to make it clear how you used AI tools and which part in the report. We will grade on the one you would like us to grade, but if you do not hand in your own version, we would not consider the submission complete.

Programming

Python (the TA will prepare a recitation class to introduce it, mainly for the non-grading homework and your project) or any other languages that you are familiar with. For python, we suggest you to use Colab.

The programming assessments include a non-graded programming assignment (5%) and the implementation in the project (5%). The bonus is sufficient to cover all the programming credit in the project, if you really do not want to try hand-on experiments at all. We do encourage you try.

Scribing

Please sign up the scribing preference. We should have at least one student for each lecture. We may adjust the assignment if necessary. Notice that your note and scribing will be posted online, for others reference. You can choose to remove your name or not. Deadline for signing the scribing: 11:59 pm on 15th Sep. After that, the Google sheet will be closed. For students assigned to the first two lectures, you have additional one week to submit the scribing.

Projects

We will have individual projects. You can propose your project to us and seek our help, or we will predefine some projects for you to choose from. Some potential project topics:

Both a midterm report (1 page) and a final report should be submitted.

Late days

Each student will have 6 late days to turn in the assignments, which can be used on A1, A2, A3, PA1, and the project midterm report. They cannot be used on the project final report and the scribing note. A maximum of 2 late days can be used for each assignment. Grades will be deducted by 25% for each additional late day.

Post-lecture survey

Deadline for each survey: 11:59pm on the day before the next lecture. We do this because we could have time to answer the questions you mentioned in the survey. Please enter a “1” in the Google sheet: Survey results, once you have finished one survey. Usually, we will trust the 1s you fill in the Google sheet. But we will check the things in detail if the number of survey forms we received and the number of 1s on the Google sheet is not consistent.

Course schedule and materials

Lecture Date Location Topic Slides Notes Reading Important dates (All due at 11:59 pm)
1 Sep 4 (Wed) SC L4 Introduction Lec-1 note1, note2, note3, note4, note5, note6, Course outline  
2 Sep 6 (Fri) MMW703 Data & Python Lec-2   sample code PA0 posted
3 Sep 11 (Wed) SC L4 Data & Python & Sequence Lec-2,3 note1, note2, note3, note4 sample code  
4 Sep 13 (Fri) MMW703 DP Lec-4 note1, note2, note3 Python Tut-1, Chapter 2&3 A1 posted
5 Sep 20 (Fri) MMW703 Assembly & Mapping Lec-5 note1, note2, note3, note4, note5 Sample code, RNA-seq analysis, intro to python tutorial, anaconda starter guide, conda cheatsheet PA0 due
6 Sep 25 (Wed) SC L4 Data exploration Lec-6 note1, note2, note3 Python for DA, Sample code, Sample code-2  
7 Sep 27 (Fri) MMW703 Clustering Lec-7 note1, note2, note3, note4, note5 Data mining book, Sample code, Sample code-2, intro to pandas&numpy tutorial A1 due
8 Oct 2 (Wed) SC L4 Classification Lec-8 note1, note2, note3, note4, note5 Data mining book, Correlation  
9 Oct 4 (Fri) MMW703 Classification & Perf evaluation Lec-9 note1, note2, note3, note4 Data mining book, Python Tut-2 A2 posted
10 Oct 9 (Wed) SC L4 Perf evaluation Lec-10 note1, note2, note3, note4, note5 Data mining book  
11 Oct 16 (Wed) SC L4 Dim reduction Lec-11 note1, note2, note3, note4 PML book  
12 Oct 18 (Fri) MMW703 Midterm review Lec-12     Quiz, A2 due
13 Oct 23 (Wed) SC L4 Midterm       8:30am-11:15am, Midterm exam
      Module 2 start        
14 Oct 25 (Fri) MMW703 Multi-omics overview Lec-14 note1, note2, note3 D2L book, Intro to cancer, Cancer genomics  
15 Oct 30 (Wed) SC L4 Cancer genomics overview Lec-15 note1, note2, note3, note4 Intro to cancer, Cancer genomics, Cancer genomics, GATK, GWAS, Epigenetics, ENCODE PA1 posted
16 Nov 1 (Fri) MMW703 Genomics data analysis Lec-16 note1, note2 GATK, GWAS, Epigenetics, ENCODE Tutorial-3  
17 Nov 6 (Wed) SC L4 Single cell genomics Lec-17   Tut-4, Current best practice, Tutorial-1, Tutorial-2, Tutorial-3, Clustering challenges , PyTorch Tutorial  
18 Nov 8 (Fri) MMW703 Data visualization Lec-18 note1, note2 Sequence motif, PCA-tSNE-UMAP,D2L book Tutorial-4 Tutorial-5 Project M-report (Proposal) due
      Module 3 start        
19 Nov 13 (Wed) SC L4 Deep learning Lec-19 note1 Pytorch examples A3 posted
20 Nov 15 (Fri) MMW703 CNN Lec-20   Pytorch examples, EHRs processing Tut-6 PA1 due
21 Nov 20 (Wed) SC L4 EHRs & Text Lec-21   EHRs processing  
22 Nov 22 (Fri) MMW703 Project Presentation       A3 due
23 Nov 27 (Wed) SC L4 Course review Lec-23     Quiz
24 Nov 29 (Fri) MMW703 Project Presentation       Project report due on 2 Dec

Assignments