Data scientists help organizations shift from relying on instinct and experience to using data for new, transformative insights. Yet the role of “data scientist” was not identified as a profession until a decade ago. For the past two years, however, recruiting site Glassdoor has cited data scientist as the highest-ranked job in the U.S. based upon salary, job satisfaction and the number of job openings. And like everywhere else, Kenya’s Data Science community is growing rapidly.
Data scientists are in great demand because of the volume of data that organizations are dealing with, due in part to the explosion of data streams now enabled by cloud. About 20 percent of this is structured data that businesses have historically collected, but the other 80 percent is unstructured data, which comes in the form of emails, social media, images or videos, and can be much harder to manage, collect and analyze.
Additionally, recent survey data highlights cloud growth in several areas, which means data scientists will need to grapple with new workloads from AI, analytics and IoT devices. Access to data in the cloud is critical to today’s data scientists, as they need a centralized and accessible platform across all teams — especially data science teams. Additionally, with stronger information, privacy and data protection laws coming into effect in Kenya and other parts of the continent, businesses operating in Africa must think again about where and with which providers they host their cloud enterprise data and applications.
As digital transformation drives more companies and industries around the world to the cloud, there is a constantly growing need to capture and manage both new and legacy data. As long as a data scientist has easy access to this data, he or she is already equipped with the skills to analyze the growing volumes through cloud technology to turn information into insights that can transform businesses and industries. The problem is, there’s just not enough data scientists to handle current, let alone future demands.
According to the “Worldwide Semiannual Big Data and Analytics Spending Guide” from International Data Corp., global revenues for big data and business analytics will grow from $130 billion in 2016 to more than $203 billion in 2020. More than half, or $95 billion, of all big data and business analytics revenues will come from the U.S., according to IDC. The second largest geographic region will be Western Europe, followed by Asia/Pacific (excluding Japan) and Latin America. The two regions with the fastest growth over the five-year period will be Latin America and the Middle East and Africa.
Most organizations hire data scientists to develop algorithms and build machine learning models, which is typically the part of the job that they enjoy most. Data scientists spend 80 percent on finding, cleansing, and organizing data, leaving only 20 percent to actually analyze data.
For these reasons, organizations need to provide new cloud services and technology to provide data scientists with the tools they need to rapidly find and organize growing volumes of data. This leaves them with more time to focus on where their skills are most valuable: analyzing and working with the increasing volume of datasets being generated by everything from sensors to devices and users.
This can include tools to automate and simplify data discovery, curation and governance, as well as intelligent search capabilities to help data scientists find the data they need. Metadata, such as tags, comments, and quality metrics, can help them more quickly decide whether a data set will be useful. Integrated data governance provides data scientists with confidence that the models and results they produce from data sets are used responsibly by others in the organization.
The goal is to give data scientists the time needed to build and train multiple models simultaneously, rather than being limited to working on one model at a time. This approach spreads out the risk of analytics projects, encouraging experimentation that yields breakthroughs, instead of focusing resources on a single approach that could be a dead end.
Cloud is the foundation of such a strategy, and it gives data scientists the ability to easily save, access, and extend models, allowing them to use existing assets as templates for new projects. The practice, called “transfer learning,” lets them avoid starting from scratch every time and focuses on preserving the knowledge gained while solving one problem and applying it to a different, but related problem.
Disruptive technology provides data scientists with the tools to reclaim much of the time that they’re currently wasting on discovering and cleansing data. Instead, data scientists can produce innovative work that provides competitive advantage for organizations and will help them transform their businesses and industries.
(Mann is Chief Operating Officer – IBM East Africa)