NRSG 741, Big Data Analytics for Healthcare


BIOS 500 and 501


3 credit hours


This course will describe the concepts underlying the field of study identified as big data analytics along with its application in healthcare. The theoretical underpinnings of these concepts will be presented along with applications in healthcare, including knowledge discovery, precision medicine/nursing, and the development of targeted interventions to improve health outcomes. Commonly used methods in big data analytics will be reviewed, and the challenges related to gathering, analyzing, visualizing, and interpreting big data will be discussed. Hands-on computer laboratory experience with these techniques relevant to an identified area will be included.


At the end of this course the student will be able to successfully:

  1. Demonstrate knowledge of the principles undergirding the tools of big data analysis in health related research.

  2. Identify the potential of, and challenges to, incorporating big data analytics to improve the development and testing of precision medicine / nursing interventions.

  3. Understand the principles of reproducible research and implement an appropriate workflow for data analysis and manuscript / report generation.

  4. Effectively critique published research of health related studies conducted by using big data theoretical frameworks and research techniques.

  5. Analyze ethical issues related to the use of big data analytics in health related research

  6. Demonstrate knowledge of how data are gathered, stored, managed, and analyzed for big data analytics.


This course uses a variety of teaching methods, including readings, case presentations, lectures, practical skills application discussion sessions, simulation, and projects.


Assignment Points
Homework (8 total, 5% each) 40 points
Final Big Data Project (presentation and report) 60 points (breakdown below)
================================================ =========
Milestone 1: 1-2 page document describing your idea 5 points
Milestone 2: approximately 10 page document showing here you are going - initial DRAFT report 10 points
Milestone 3: FINAL
- Paper 20 points
- Oral Presentation to class 15 points
- Slide Deck 10 points

Class participation is expected.

TEXTBOOKS (Required):

The books listed above are available online through the Emory Safari Books Online site. Use the following URL:

Other useful texts:

Useful websites:


Successful completion of the course objectives will be determined by performance with the following course assignments:

  1. Homeworks - Homework assignments will comprise applications of techniques learned in class to different data sets.

  2. One comprehensive final project – a practicum/exemplar presentation of the student’s skills and knowledge for obtaining, analyzing and presenting results from Big Data sources to address an issue of interest in healthcare.


During this course, class materials, exercises and assignments will utilize GITHUB repositories. Everyone will need an account created at

RStudio Cloud will also be utilized for some assignments and exercises. Everyone will need an account created at


The exercises and assignments in this course require use of the R programming language. Two FREE open source software packages must be installed: install the R programming language first and then install the RStudio interface: * R programming language, download from * RStudio IDE (integrated development environment), download RStudio Desktop from * GIT, needed to link R projects to Github,

Calendar Schedule for Lectures & Homework

Session Date Topic Assignment DUE
1 01/15 Introduction to the Course; What is data science?; R and RStudio; Rmarkdown reports
2 01/22 Reproducible Research Principles and Pipelines
3 01/29 Data Wrangling with dplyr [Steve Pittard Guest Lecturer] HW 1
4 02/05 Exploratory Data Analysis [Lisa Elon Guest Lecturer] Milestone 1
5 02/12 Text/web scraping and working with text (regular expressions) HW2
6 02/19 Microbiome analysis – the DADA2 / Phyloseq pipeline HW3
7 02/26 Microbiome analysis – the DADA2 / Phyloseq pipeline (continued)
8 03/04 More on Data (and code) Wrangling HW4
10 03/18 Metabolomics analysis [Rebecca Mitchell Guest Lecturer] HW5
11 03/25 Models – Linear and Logistic; Prediction Milestone 2
11 04/01 Machine learning – supervised methods [Joyce Ho Guest Lecturer] HW6
12 04/08 Machine learning – unsupervised methods HW7
13 04/15 Network science HW8
15 04/22 To the cloud and beyond - AWS [Roy Simpson Guest Lecturer]
16 04/29 Project presentations Milestone 3: [1] Paper; [2] Oral Presentation; and [3] Slides

Copyright © Melinda Higgins, Ph.D.. All contents under (CC) BY-NC-SA license,CC-BY-NC-SA unless otherwise noted.

Feedback, Comments (email me)?