Load the term “data scientist” into Dice.com’s and Monster.com’s search engines, and you’ll get eight and seven responses respectively. Clearly, data scientists-the brainy, analytical folks who are charged with using statistical modeling tools to draw insights from huge quantities of data-laren’t in demand today the way software engineers are, but experts in the field of data mining and data science believe that’s going to change.
Brian Hopkins, a principal analyst with Forrester Research who specializes in enterprise architecture with a focus on emerging information management technologies, anticipates demand for data scientists to grow as companies increasingly seek a competitive advantage from the massive amounts of data they collect and as they realize they’re not getting the wisdom they need from existing data mining and business intelligence tools alone.
“Companies are always looking for ways to know more than their competitors,” says Hopkins. “There’s this notion that if they buy some predictive analytics tools, the tools will give them insight. [In fact,] You need this specialized class of data scientist to create and run [statistical] models against data and present the results in ways people can act on.”
The limitations of existing business intelligence software combined with the maturation of parallel computing and sophisticated data modeling tools have paved the way for the data scientist’s emergence, according to Steve Hillion, vice president of analytics with data storage company EMC. Like Hopkins, Hillion agrees that companies are beginning to realize they can’t rely on software alone to make sense of their terabytes of data. They need individuals with highly specialized skills, and those individuals require special technology. Hillion says the hardware and software data scientists use to perform their analysis is now robust, scalable and cheap enough that a variety of companies in different industries can use it.
Hillion spoke to CIO.com about the burgeoning data scientist role, the industries where they’re most in demand, the skills they need, where they sit in organizations, and what the role means for business intelligence software.
CIO: What is a data scientist?
Steve Hillion, Vice President of Analytics, EMC: Somebody who is charged with creating insights from large quantities of data using computer-based analytical and modeling techniques. The job is about applying the methods of data mining, statistics and modeling to large quantities of data, typically to generate business insights or identify underlying trends or patterns.
What kinds of businesses need data scientists?
You’ll typically find a data scientist or data analyst in large organizations that gather a great deal of consumer data, whether that’s a Web company, an advertising network, a cell phone company or a retail company that tracks sales or marketing data.
Is this a new role? Haven’t companies had people in charge of analyzing data before?
If you look at retail companies, consumer goods companies like Coca-Cola or Procter & Gamble, they’ve long had researchers, statisticians who have [for example] applied econometrics techniques to understand pricing and customer data.
What’s changed, with the advent of the Internet and Web traffic, is that more data is being generated, and these [analytical] techniques are more readily available. The role used to be restricted to data intensive industries. Now more organizations want to apply these techniques to all the data they’re generating. They’re saying, ‘I want to get beyond reporting what happened the previous quarter, month or year. I want to dig deeper into the data to identify patterns and answer the why and how and what will questions, rather than [look at] retrospective statistics.’
Does the growing need for data scientists indicate a failure of business intelligence (BI) and data warehousing tools? Weren’t those technologies supposed to answer those questions?
It’s not so much a failure as it is a recognition that standard business intelligence is somewhat limited in scope. Typically, business intelligence presents an aggregate summation of historical data to determine what happened in the past, in this [particular] region, during this [particular] time frame, on this set of products or groups of people. You cannot run your business without that information. You need to understand trends, basic facts about how you’re performing and what your customers are doing. Business intelligence has succeeded in the sense that it’s made people hungrier for more information. The more they know about how their business is doing, the more they want to know why and how they can improve it.
There have been some limitations exposed by standard data warehouses. They’ve tended to become very strictly controlled and slow moving. They’re big. They’re cumbersome. They’re often used to provide regulatory and compliance reporting, end of year reporting, where the numbers really have to match up. It’s appropriate that those things need to be strictly controlled. But the limitation is that they don’t encourage the open questioning and exploration of data that data scientists want to do. If you want to ask predictive or interpretive questions, you have to be able to work with much more flexible sources of data.
Do you think the rise of the data scientist is a sign of the economic times? Is this a role that could only emerge when companies are beginning to focus on growth and they’re looking for insights that can help them grow their businesses?
Over the last two years, organizations have had to think very carefully about optimizing their businesses. When you’re working with reduced headcounts and budgets, when business is highly competitive, you need to understand not just how you grow but also how you optimize the resources you have, and that’s a complex and difficult task. Businesses have had to get really smart about how to streamline, and that requires understanding operations and business processes at a deep level. Those are just the sort of tasks, just the sort of requirements that are likely to lead to deeper data analysis.
Another way the role is a sign of the times: Technology has reached a point where it can support this. Ten or 15 years ago, at lot of analysis was done with basic tools, client-only, memory-based analytical tools. Sometimes it was Excel. Sometimes it was statistical tools that were powerful in terms of the techniques they offered, but they couldn’t scale. Now there are much more advanced statistical tools that scale. You need clusters of computers that can support parallel operations. You need sophisticated modeling tools. You need rapid data manipulation. You need all of these things to come together to support what data scientists want to do. Those things started reaching maturity in the last decade.
What skills do IT professionals need to be good data scientists?
You need skills in term of statistics and modeling and mathematics. That’s key. Physicists often make great data scientists because they’re used to taking real world situations and applying mathematics to come up with real world models. People from the field of bioinformatics are also well suited to be data scientists.
You also need to have a deep understanding of the business or domain in which you’re working. Data scientists start their work by offering hypotheses. You can only offer up a hypothesis if you’ve lived in a domain for some time.
The ability to listen is important. Much as a software engineer needs to understand customer requirements, a data scientist has to listen to the questions that business users, executives and management are asking.
Another aspect of being a good data scientist centers on visualization and explanation of data. The average executive has a limited amount of bandwidth to absorb information. Data scientists need to be able to tease out that one insight that will resonate the most with the business, have the greatest impact on the business, and is the thing that the business can actually act on. A certain amount of artistry is needed to present the data scientist’s results, to make them visual, appealing, and to have an impact.
Do data scientists work in corporate IT departments or in another business unit?
I’ve seen both. There are cases where data scientists are part of the IT division. But my sense is that typically they are going to reside within the functional units, whether the marketing department or finance department. Sometimes they’re in a research department that sits between IT and the business. That corroborates my point that data scientists have to understand the business at a deep level.