We stand at that edge today where every industry is immersing itself
into the age of data analytics that promises to bring out intelligent
insights from the past and predict future growth. Today it’s not just
data management but a discipline where information is critically mined
in order to decipher how the systems, customers, machines, weather, etc.
behaved in the past and are indicating a behavioral pattern for the
coming time. It fascinates me how an organization’s data can be
systematically broken down into diverse data sets pin-pointing
counter-intuitive trends & associations, validating crucial business
suppositions.
Eric Ries in his book “The Lean Startup” writes about how businesses
employ “Pivot or Persevere” decisions at any juncture. He, very aptly
talks about having right metrics as well as right hypothesis as the
first step and analyzing the data correctly to make a “Pivot or
Persevere” conclusion as the next step. I strongly believe that today,
we can leverage statistical modeling otherwise known as data mining or
machine learning for supporting or rejecting our business hypothesis
along with the well-established historical data analysis techniques.
R & Python are the most common languages that give us the ability
to cleanse, process and devise apposite models for data mining. But,
there are several other tools like MicroStrategy, SAS, AWS Analytics
platforms, Google Cloud Platform, Microsoft Azure, etc., which provide
exceptional capabilities to implement data mining.
In spite of the overwhelming choices that we have, all these platforms are aiming to achieve the same underlying data mining functionality. The structure & architecture could be different but as we go through their extensive documentation, we find that they all are providing same features like gathering the data, cleaning it, preparing or transforming it, building statistical models on top of it and reporting applicable statistical parameters for validating our hypotheses.
Indeed we are well-equipped to make thorough analysis today as there
are ample of platforms to facilitate every kind of analyses we can think
of, to make “Pivot or Persevere” kind of decisions!
In order to dig further into this pool of platforms & data
mining, it is crucial to not just have a propensity to learn & use
the tool but to have an analytical mind, understanding of statistics and
business workflow. This rests as the bed rock for determining how much
one should trust the statistical analysis when making important
decisions that sometimes put billions of dollars at stake. Two terms
that I consider of prime importance from data mining and trust worthy
business decision making perspectives are:
Hypothesis
Model accuracy
Understanding these well, I believe, shall ensure complete awareness
of the amount of confidence or risk involved while making critical
business decisions. Below is a small justification as to why I feel they
are imperative parameters.
Hypothesis: It is like the question that we are
trying to answer or a statement that we are trying to validate. Without
deciding the hypothesis, we would be maneuvering without purpose and
chances are, we would come across a plethora of information but would
not know how to use it. It could be as simple as ‘Type 2 Apparel is
having declining sales every year’. This can be validated by analyzing
historical trends. Another example could be ‘Type 2 Apparel belongs to a
group that is contributing minimally to revenue each year’. This could
be verified using statistical method of clustering. It is essential to
have an apt hypothesis as it will decide if apparel of type 2 should
really be discarded from manufacturing going forward or not. It is also
needed to channelize the data mining/analysis process.
Modelaccuracy: There are several
subtle caveats to the statistical numbers reported in data mining. Most
numbers come with a probability of their correctness. The accuracy is
specific to data used for building the model. However, it may occur that
the model which gave 90% accuracy on the training data-set, perform
poorly on real data. Cross validation techniques should definitely be
used before reporting the model’s accuracy to the business. Not just
that, clear communication that the accuracy operates at a certain
probability of success (and failure), provides all the necessary
risk-assessment facts to business decision-makers as they take key
decisions.
Thus, with appropriate knowledge about statistics, to “Pivot or
Persevere” hypothesis today, can be supported by historical data
reporting & data mining methodologies.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
3rd Party Cookies
This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.
Keeping this cookie enabled helps us to improve our website.
Please enable Strictly Necessary Cookies first so that we can save your preferences!