"Vicki Hertzberg"
"February 6, 2019"
Recall the Basic Paradigm of Data Science
Question => Data Acquisition => EDA => Model => Communicate
Today we are going to focus on Exploratory Data Analysis, or EDA.
After you have acquired your data, but before you jump into modeling, you need to see what you have. This is the EDA step.
What EDA IS:
Use
in order to
What EDA is NOT:
It is NOT making fancy visualizations.
It is NOT making aesthetically pleasing visualizations.
EDA is about creating figures so that somebody can look at them and understand, within seconds, what is going on in your dataset.
The process is more like an iteration:
EDA => clean => model => clean => EDA => model, etc.
What to look for:
Types of simple visualizations
The Flaws of Averages
Moral of the Anscombe Quartet:
PLOT YOUR DATA!!!
Suggested must haves for EDA:
Remember:
What questions do you have?