Data Science Ph.D. student at Worcester Polytechnic Institute.

First, you need to understand the methods we are going to use to solve cool problems this summer. This project is under the umbrella of **Machine Learning** as we are trying to predict the labels of time series data. We are investigating a **Deep Learning** algorithm called **Recurrent Neural Networks**.

The topics you need to have a strong grasp on are:

- Basic Machine Learning
- Deep Feedforward Neural Networks
- Recurrent Neural Networks

- Deep Learning Textbook
- Part I - Chapter 5
- Part II - Chapter 6, Chapter 7, Chapter 10
- More in depth:
- Chapter 8, particularly 8.2.5
- My interests: 10.7, 10.9, 10.11

- The Unreasonable Effectiveness of RNNs
- Recurrent Neural Networks Tutorial
- Stanford cs231n: This course has very clear descriptions of complex deep learning topics. I recommend
*Neural Networks*parts 1, 2, and 3. - Understanding LSTMs: This is the integral update to the classic RNN that allowed for the success seen by RNNs. If you get through here and feel that you have a good enough understanding to explain the intuition of an RNN with LSTM memory cells, then you are doing fantastic!

- Andrew Ng’s Coursera Course: If you are unfamiliar with deep learning, please watch these videos in sequence. There is no need to pour time into the homeworks, but they can be valuable resources as well. Videos are also available on YouTube. Please watch the RNN videos, they are short.

All of our programming will be Python. If you are unfamiliar with Python, **LEARNING PYTHON IS YOUR FIRST TASK**. Try this tutorial.
Our preprocessing will be done using NumPy and deep learning algorithms will be implemented using PyTorch.

- Programming will be in Python.
- Deep Learning algorithms will be implmented in PyTorch.
- Group work will be done in GitHub.
- Group messages will be through Slack.
- We will be using Electronic Health Records (EHR) from the MIMIC III database.

Here’s how to get started:

- Getting started with PyTorch (assuming you have a vague idea of what deep learning is and have some experience scripting in Python).
- I recommend beginning with the 60 Minute Blitz to get familiar with what it means to design deep learning algorithms in PyTorch.
- Next, you need to follow along with a couple of tutorials where you implement a full deep learning pipeline (data loading, processing, defining a model, training the model, evaluating the model). For this, I recommend Classifying Names with a Character-Level RNN and Translation with a Sequence to Sequence Network and Attention.
- After these tutorials, I will provide the code for my current research and we will discuss next steps.

- We will be storing our work through this GitHub repository.
- If you have not used GitHub much, please complete this tutorial.
- I will create a folder for each of us where you may store code and data.
- You should have an understanding of the expected GitHub workflow.

- The MIMIC III database is a rather complex relational database containing clinical records of over 45,000 patients. We will be extracting a subset of the data and detecting different adverse events contained within.
- I recommend reading through the data pages to get a hint of how the database is laid out (i.e., what information is contained? How would we extract one person’s heart rate? Or uncover whether or not they died in the hospital?).
- As it is a relational database, extracting the data is easiest via SQL, a relational query language. It is very intuitive to use, but takes a bit of exploration, please look up a SQL tutorial.