In the previous blog (The Smart Framework Part 1 of 2) we detailed how the SMART framework (part 1) is helpful in clearly understanding the steps involved in a data analytics consulting practice. However, it would be erroneous to think that these steps are independent, or even sequential, with clearly defined boundaries. Instead, they are the foundational pillars that the consulting team needs to master in order to flawlessly switch between each as projects progress. Upon those pillars, the work is typically divided into three phases:
- Phase I: Discovering needs from data driven questions. The keyword here is ‘data driven’. We are called to leverage the data that customers provide to us, and go beyond the intuition that employees may have developed after long years working in their company. To paraphrase Sherlock Holmes: “It is a capital mistake to theorize before one has data. Insensibly one begins to twist the fact to suit theories, instead of theories to suit the facts”. Our job is to understand the company through the lens of its data, identify bottlenecks, surface means of improvements, and implement them. This applies to business operation (finance, supply chain, work processes, etc) as well as to more specific activities such as medical screening, drug discovery, recommendation engine, shipment optimization, and more.
- Phase II: Provide Data Driven Answers. This phase is the heart of our data analytics approach, where we implement various machine learning algorithms adapted to the problem at hand. A non exhaustive lists includes supervised and unsupervised methods such as neural networks, decision trees (CART, Random Forest, etc), regressions (linear, logistics, generalized), clustering algorithms (k-means, k-mode, etc), Bayesian approaches, natural language processing, sentiment analysis, and more. Without entering into further details here, it is important to note that before applying these algorithms we need to have previously completed data ETL (extract, transfer, load), data consistency check, and data cleansing. Not glamorous, but essential.
- Phase III: Mapping Decisions to Operation. Finally, following various stages of implementations, we tie our algorithmic results to the company’s problem, and articulate how they provide a path toward improvement. This can be either in the form of a direct result from the analysis (for example a production forecast would impact investment decisions, a patient screening methodology would suggest updating test schedules), or a by-product (for example a parameter may be found to be predictive of an outcome, leading the company to devote additional resources to better capture this parameter, more frequently, with higher accuracy). This phase is prone to generate friction, and it is therefore essential to present mathematical results that can be understood and accepted by the leadership team of the company. Doing this right requires credentials and experience: if we want algorithmic results to be translated into action within the company, it is essential to deliver intuitive recommendations. In other words, interpretability is often more important than accuracy: delivering black box results is often met with resistance by Executives, especially if they have to take the results up the chain and further explain them to their superiors. A typical example is Neural Network (NN) vs Decision Trees (DT): NN results are most often mathematically superior to DT results, but they are also essentially useless in providing an intuition as to how the results were obtained. Instead, despite being mathematically less accurate, DT are easy to understand by looking at the decision trees and following the most important nodes. If the tree is not too complex (and it should not be), the thinking process becomes very intuitive, is easy to follow and absorb, and ends up being much more accepted by company leaders. This is not to say that NN are not good methods, just to say that they are not universally good methods.
Similarly to the fives pillars of the SMART network, these three phases are not sequential (even if they are labelled I, II, and III) and too often people in the field think that they are independent. Instead, the power of data analytics comes from constantly iterating between them, going back and forth between the business problem, the people involved, and the modeling work. Data analysts (which many of us are) frown when we emphasize that mathematical modeling is only a small part of the entire consulting chain. The key is to flawlessly move between the different parts of the SMART network, make each component enhance the next step, iterate, experiment, and communicate around the models, their implications, and their limitations. Doing this successfully leads to smarter organizations, which is the final cap on the SMART network of Figure 1 and, incidentally, makes the whole framework look like MIT famous dome.