Ok Folks, so let me put this in perspective. In the times where everyone is talking about the Cognitive Era and all businesses embracing It to solve different business challenges OR gain new insights, it is almost obvious and evident that data is the raw asset with which each of these journeys will start. And nobody is debating that Data will be fuel to these applications to drive insights from.
However, what I think less obvious is that “one of the problems to be solved is data in itself”.
Now, why do I say so?
Let us first look at what defines a problem as a cognitive problem? (As in, something that would not have solvable realistically with non-cognitive means at hand). One obvious set of problems are of course the millennium prize problems for obvious advantages but I digress.
Coming back to the point.
In my view, a problem to get defined as a cognitive problem, it needs to have three fundamental dimensions:
- At least the hypothesis of resolution is based on data. (Basically, no data, like a new theoretical physics problem is not cognitive. That’s where human brain has huge advantage called Power of Imagination. And evidently, we are losing it fast.)
- The scale of resolution and required churning into data is huge enough to make it almost unrealistic of individual EXPERT human being to solve
- The nature of the problem is volatile. As in, the problem needs to be resolved in specific time frame, if not done so, it may become irrelevant OR resolution may have diminishing returns. So, speed of reaching resolution and accuracy is of utmost importance.
Having said this, now let us go back to what is Cognitive Problem in Data. The all-encompassing problem statement would be “How to support growing capacity and performance needs of underlying data storage, in flat OR even shrinking IT budgets, when organic technology cost reductions are not enough?”
This problem statement can then be distributed into multiple smaller problems, but you will agree that any such sub-problem will need lot of understanding of current data layout. Some of the questions that beg for answers are
- What is the production capacity?
- What performance need for each application (what is the gap in projected and consumed)
- What capacity is orphaned
- How much gets backed up? If there is a gap in production and backup, what are the application which are production which do not consider backup as strategy ;)?
- How many copies of each production instance?
- What is the file / object workloads? Average file sizes? Access Patterns?
- What are the hotspots?
- How much data lies in tape/ backup pools?
- What is the trend of growth for all that is already deployed?
- What is coming? What are key IT projects that the business aspires to drive? What is the impact on capacity and performance?
Sign up for Computerworld eNewsletters.