Gaining background in Machine Learning and Recurrent Neural Networks

Gaining background in Machine Learning and Recurrent Neural Networks

First, you need to understand the methods we are going to use to solve a cool research problem this summer. This project is under the umbrella of Machine Learning as we are trying to predict the labels of time series data. We are investigating a Deep Learning algorithm called Recurrent Neural Networks.

The topics you need to have a strong grasp on are:

  • Basic Machine Learning
  • Deep Feedforward Neural Networks
  • Recurrent Neural Networks

Reading List


  • Andrew Ng’s Coursera Course: If you are unfamiliar with deep learning, please watch these videos in sequence. There is no need to pour time into the homeworks, but they can be valuable resources as well. Videos are also available on YouTube. Please watch the RNN videos, they are short.


All of our programming will be Python. If you are unfamiliar with Python, LEARNING PYTHON IS YOUR FIRST TASK. Try this tutorial. Our preprocessing will be done using NumPy and deep learning algorithms will be implemented using PyTorch.

  • Programming will be in Python.
  • Deep Learning algorithms will be implemented in PyTorch.
  • Group work will be done in GitHub.
  • Group messages will be through Slack.
  • We will be using Electronic Health Records (EHR) from the MIMIC III database.

Here’s how to get started:

  • Getting started with PyTorch (assuming you have a vague idea of what deep learning is and have some experience scripting in Python).
  • We will be storing our work through a GitHub repository that I will invite you to.
    • If you have not used GitHub much, please complete this tutorial.
    • I will create a folder for each of us where you may store code and data.
    • You should have an understanding of the expected GitHub workflow.
  • The MIMIC III database is a rather complex relational database containing clinical records of over 45,000 patients. We will be extracting a subset of the data and detecting different adverse events contained within.
    • I recommend reading through the data pages to get a hint of how the database is laid out (i.e., what information is contained? How would we extract one person’s heart rate? Or uncover whether or not they died in the hospital?).
    • As it is a relational database, extracting the data is easiest via SQL, a relational query language. It is very intuitive to use, but takes a bit of exploration, please look up a SQL tutorial. Don’t get started on this until we discuss, it’s likely that you won’t need to do this at all and I will just send you a dataset.

Some useful specific stuff:

  • Copying files to/from a remote server (e.g. Turing)? Use Rsync.
  • Searching through your history of terminal commands to repeat a command? Use Fuzzy Finder.