If you’re one of the many executives thinking about exploring how artificial intelligence, machine learning, or predictive analytics might help your company gain a competitive edge in 2021, then keep reading because we explore one common pitfall faced by many legacy companies looking to take advantage of their mountains of historical data.
The Pitfalls of Hiring for Machine Learning too Early
Many executives are (rightfully) sold on machine learning, yet may not realize the true state of their data before they decide to start a machine learning project.
The sad reality is that in order to use machine learning, artificial intelligence, and predictive analytics at a wide scale in your organization, a lot of work will need to be done in the Data Engineering area before you address which advanced initiatives you want to tackle, who needs to be involved, and how it will get done.
At Aptude, we define Data Engineering as the capabilities involved in making data ready to be used by data analysts and data scientists.
As mentioned in another post, data engineering involves:
- Strategizing and creating “data dictionaries” which can become a reference for what data means, how it’s related to other data, its usage, and its format
- “Cleaning” data so it can be standardized across data sources and trusted when it’s used in visualizations and algorithms
- Transforming data lakes to data warehouses
- Importing data from unstructured sources and transforming the data into a structured, standardized format
- Creating “data pipelines” that consolidates data from multiple sources and makes it available for data analytics and visualization
The pitfall of hiring too early is that your data scientist will be disappointed at best… and underutilized at worst. Or doing a job that would be better suited to a Data Engineer or Data Architect.
Sign 1 : You’ve Addressed Your Data Siloes
The first sign that you are on track to be ready for a machine learning project is that you’ve addressed your many data siloes and data lakes. You’ve identified who owns which data, who’s responsible for its upkeep, and how data pipelines could be built so the data can be analyzed and utilized.
Sign 2 : Your Data is in Easy-to-Consume Data Warehouses or Data Marts
The second sign you’re ready is that you’ve taken a further step of structuring your data and putting various data sources together into easy-to-consume data marts. Your team can probably pull basic reports and build visualizations with some ease, even if it’s not perfect.
Sign 3 : Teams Across Organizations Share Data to Some Extent
The third sign you’re on path for advanced data projects is that your teams share data. This culture of sharing data – rather than hoarding it – is a natural consequence of effective data management, warehousing, and leadership.
Sign 4 : You Have Large Amounts of Clean Data
A fourth sign you could be ready for ML in your organization is that you have large amounts of data… preferably, clean data. This means your data conforms to your data dictionaries and can be trusted to be used in visualizations, analysis, and ultimately machine learning applications. A large amount of data is critical for machine learning to work.
Sign 5 : You Have Clear Use Cases for Machine Learning.
Finally, a fifth sign you’re ready for machine learning is that you have a clear use case for your first (or next) machine learning project. This use case could be a problem you want to solve or a question (or questions) you want to be able to answer in ways you can’t right now.
Ideally, you know what you want to do, just not how you’re going to do it.
The Benefits of a Machine Learning Micro Project
If you have all these items in place, then great. You are well on your way to building a robust, data-driven organization that has a competitive edge. But if you don’t have these in place, then there’s still hope.
The best place to start if you can’t address all your data is to start small with a micro project. A micro project allows you to focus on a smaller set of data you control, clean it up, and then create a smaller machine learning-based solution based on that data.
So maybe your solution only involves sales, for example. Or marketing data. Or historical transportation logs. Whatever it is, the ask is small, confined to a single question, and has clear ROI.
So you can invest in data projects now and get ROI sooner, rather than waiting for years to get all your data ducks lined in a row.
How Aptude Can Help
If you’d like some expert helping in figuring out where to start and what you need in terms of data, manpower, tools, and budget, we can help. Many of our projects involve data-related initiatives, especially since we now have a Python Center of Excellence in Mexico City, Mexico. Getting our help is as easy as contacting us via email, form, or phone.
We can also try out machine learning on a small micro project, where we look for immediate ROI with a smaller question to answer or problem to solve. Or, we can help with some of the preliminary data cleanup steps to get you ready to turn data into insights. Contact us to start the conversation.