“data is the new oil”… one of the quotes you get to see often lately. Basically there is nothing wrong with this. Unfortunately I often see a large focus on the gathering, generating and storing of large amounts of data. And because we are so fond of sticking labels, we have a label for this too. We call this ‘data hoarding’.
There might be a good reason for this data hoarding. Because everyone is ‘into’ big data you have to be there too, don’t we? The lack of clear definition doesn’t help here at all. Just by hoarding huge amounts of data you actually can claim being into Big Data. If that’s all you need you can rest assure and stop reading any further.
I use a different definition for Big Data. I don’t use 3, 4, 5 of even more V’s (which apparently need to be used to define Big Data or Analytics). My starting point for discussing Big Data is the purpose of it. And there is just one purpose (challenge me please). This purpose is to generate insights. Insights are the corner stones for decision making (other corner stones might be blindness, intuition, emotions, and so on).
Everyone knows that the best decision making is based on facts. And this is were data can play its role. But just throwing huge amounts at a decision maker probable doesn’t do the trick. This data needs to be prepared, sliced, diced, analyzed and so on before it can do its magic. This is where the data analytics come into play.
So should we just throw large amounts of data to the data analytical person (data scientist)? He or she will probably like it but whether or not this leads to results remains the question. This then all too much depends on serendipity and that just don’t passes your doorstep frequently enough. So, this process of data analytics (or data science) needs to be guided. I must admit that allowing data scientists to just have their way with data is also good. Innovation is sparked by creativity and creativity is sparked by degrees of freedom (in my opinion at least).
I just learned from a Coursera course on Data Science that the question of what to analyze is even more important than the data itself. This question is the guiding mechanism for data science (call it structured data science).
So there is a holy matrimony between in obtaining valuable insights:
1. The decision maker and his question; and
2. The Data Scientist and his data skills; and
3. The Data and its uncaptured potential
4. The Subject Matter expert and his knowledge of the meaning and use of the data
And then there is this thing called IoT (Internet of Things). I don’t want to go very in depth in IoT, but basically it does two things:
1. It generates events which trigger a predefined action (mostly of another device)
2. It generates data, more data and even more data.
Data generated from IoT is often described in terms of providing the factual foundation for decision making. This can only be true if this data is processed by proper data science and guided or structured by a proper demand for insights.
There are several angles for demand for insights accompanied by various forms of data analysis:
1. Searching for relationships with a population that are yet unknown. This is the basic level of data analysis and is called Exploratory Analysis.
2. Trying do say something of a larger population by looking at a smaller sample group. This is called Inferential Analysis.
3. Trying to predict outcomes for objects based on the analysis of the data for other objects. This is called Predictive Analysis.
4. Investigating the effect of changing one variable onto another variable. This is called Causal Analysis.
5. Investigating how exactly changes in variables lead to changes in other variables for individual objects. This is called Mechanistic Analysis.
The more precise the question, the deeper the analysis goes. The deeper the analysis goes, the broader and deeper the capabilities of the data scientist have to go.
In conclusion I have to say that undertaking a path of Big data requires more than just hoarding data. It requires the deliberate and careful built-up of analytical capabilities, the definition of the business questions that guide the development of insights and the involvement of subject matter expert who are capable of putting everything into perspective.
Without these, Big Data is nothing!