So What Does a Data Scientist Actually Do?
Domo Technologies reports that every year humanity produces 1,200 Exabytes of data – to get a sense of how much data that really is, it would require 80.53 billion 16GB phones to store that information.
At the same time a McKinsey report estimates that by 2018 there will be a shortage of 140,000 to 190,000 people in the job market with deep analytical skills.
This sounds like a tremendous opportunity for data science. However not everyone is clear on when big data really is BIG DATA and the difference between a data analyst and a data scientist.
I wanted to try and answer some of the questions, and interviewed our very own data scientist, Dr. Zoe Katsimitsoulia, to find out.
ME: What is Your Education and Experience?
ZOE: I started off with an undergrad at McMaster (Hamilton) in biochemistry, with a specialization in molecular biology & genetic engineering, but also ended up taking a few courses in computer science. I was really enjoying programming, but not being in a lab pipetting at 3AM in the morning. Then the new field of bioinformatics came on the scene, and I wanted to explore that opportunity.
So in researching options, I ended up completing a Masters at the University of Liverpool, which eventually lead to a PhD at Oxford in computational biology. My thesis was “Macromolecular studies for bionanotechnology“, which mostly had to do with structural and dynamic modelling of protein interactions.
I then continued my work with a post-doc at Columbia University (New York) in computational biology.
After all that education and research, I came back to Toronto, and was at two different startups (AppHero and Fuse Powered) working as a Data Scientist before I landed at Nudge.
ME: wow – lots of education and research experience to get where you are today
ZOE: Yes, I really believe that one of the potential challenges in the field is the number of schools coming out with 2 year “Data Science” programs, for something that takes considerably more training.
The programs I’ve looked at have been quite variable in their definition of data science as reflected in their curriculum and in such a short time frame can only provide a superficial treatment of the field at best. This could lead to problems in the future, as the demand increases for this role in business – and you have a workforce with entry level skills.
ME: So What is Data Science?
ZOE: So I first of all, I hate the term data science because obviously there is no science without data – it is redundant and too general to have any value for a practitioner.
ME: (evil stare)
ZOE: Okay but if I get over my issues with the term, data science currently is really a continuum being defined by businesses trying to optimize and drive improvements.
Data science can come into play when we have complex systems generating lots of data we wish to take advantage of. That means more than just analyzing the data. It means building models using state-of-the-art algorithms to explain or predict behaviour. These models need to be testable and this is where the scientific process comes in. So the major difference between data science and data analysis, is the math and then mentality.
ME: So How is That Done?
ZOE: So let’s look at an example at Nudge. I may have a hypothesis that by using profile information I can build a program that recommends content based on explicit and implicit interests that is better than a random selection. There may be multiple variables to consider, and the data or lack of data may create challenges. However you can model this system of people, profiles, interests and content using probabilistic algorithms, clustering techniques, and collaborative or content based filtering approaches, to name a few options.
ME: So I can see the difference between that, and analysing for example the performance of a marketing campaign.
ME: So Then What Types of Businesses Need Data Scientists?
ZOE: Well really every business can take advantage of it. There ishowever this big expectation on the data scientist to transform a business, but really it is much more of a collaboration between the data scientist and the business leaders.
ME: it makes sense that this like any other business discipline requires collaboration to take advantage of the insight. But I would hope any great business would do that.
ME: So Where Do You Find Data Scientists?
ZOE: A lot of scientists have jumped ship and left academia like I did. The reason (for me) is that in business you experience much more immediate feedback than in research – and that can be very rewarding. Also the money is better :).
ME: What Do You Think the Future Will Be for Data Science?
ZOE: I think there is a lot of great work being done in fundamental research in academia, but also business has driven a lot of the progress and innovation in the field. Companies like Google, Yahoo, Facebook and Linkedin have huge amounts of data they can apply science to, to help both themselves and their users.
Ultimately I believe the discipline needs to be better defined, and have more distinction between different types of data scientists. This will really help generate more specific roles, with the needed skills that add real value to businesses.
ME: thanks for your time Zoe, I have always been big on data, but now understand a little more about what data scientists really do with BIG DATA.