The explosion in data being produced is something you should know about, but that doesn’t mean you should have the tools to analyze it.
So what does the data analyst do when the data explodes in their data-driven lives?
In a new study published in the Journal of Applied Data Analysis, Harvard University researcher Shaul Alperstein explains what it’s like to do this analysis.
In this video, Alpersteins senior research scientist Daniel Hamer provides a primer on the basics of data analysis.
“What’s important is to know what’s really happening,” he says.
“What are the patterns, the dynamics, the relationships between these things?”
What he’s looking for are patterns that can be used to understand the data.
For instance, the types of relationships you might find between people, events, and locations.
“If you can identify those relationships, you can then make inferences about what the underlying structure of the data is,” he explains.
“There are all kinds of different types of information that you can use to look at data.”
Alpersten’s study used data from the New York City Department of Health and Mental Hygiene.
He analyzed more than 100 million records of patients and their contacts, medical visits, and the like.
For each of the datasets, he extracted a variety of information about the patient, his or her location, the kind of medications they took, and other factors.
Alperstieres data analysis included data from a total of 5,719,939 records from the City Health Department.
For the purposes of this study, he used data in the City’s patient registry, which allows doctors to see how many patients are treated for a given condition.
But, as Alperster points out, this data does not capture the type of treatment or how often it occurs.
Instead, he looked at the number of times a particular person is referred to a doctor for a specific diagnosis.
“You know, I’m not a big fan of this metric, because it’s a very noisy metric,” he tells Science of Us.
“But we have a lot of different metrics that are related to health outcomes and so on.
And that’s kind of where this metric comes from.”
In addition to analyzing the relationships among the data points, Alberts research also looked at what the data represented about the people who were involved in the outbreak.
“We looked at all of these people that were involved, and we saw that there was a lot more variation in these relationships than there was in the rest of the people that we didn’t have data on,” he points out.
“For instance, one of the patients that we did not have data for was the father of the woman who was in a coma.”
Alberstein also found a surprising amount of variability in the number and severity of the symptoms patients were reporting to health-care providers.
For example, there were many cases of the condition known as “syndrome of agitation” where patients complained of being unable to eat, feeling nauseous, and having other symptoms that did not make sense.
“The vast majority of these symptoms were attributed to other symptoms, but the ones that were more serious were those related to the disease,” he notes.
“And that was quite interesting because they tended to have much greater impact in people with more serious diseases.”
What does this all mean?
For Alperston, this is a good time to ask, what do you really need to know about your data before you can get started analyzing it?
“This is the first time that we have ever really looked at this kind of data and how we can use it,” he concludes.
“In my experience, there is not much that you don’t know before you even start.”
The best way to learn more about data analysis is to start with this new study.
In fact, this research could help you understand your own data.
“I think it’s good that we can start to use this kind in a systematic way,” Alperson says.
For starters, it’s very easy to start learning about the data when you are only starting to learn about the underlying principles of data science.
If you are interested in data science, it is definitely worth checking out the official online course on data science offered by Harvard.
“Data science is the study of how to make sense of data,” Alberson says, “and we are trying to understand data.”
And with that, we will wrap up this video with some tips on how to become a better data analyst.
If your data-analyzing passion is science, you might want to consider joining an organization like the Data Science Lab at Harvard.
And if you are just starting out, here are some tips for becoming an advanced data analyst: If you like science, don’t worry about the definition of science.
Albersteins research is focused on how people’s beliefs affect how they interpret data, so it