The Visual8

If you can see it, you can say it.

The Future of Unstructured Data

People often ask “what is the future of unstructured data?”

This is natural to hear. My role is the Director of Field CTOs for Unstructured Data Solutions at Dell.

The immediate answer used to be “it’s growing, be ready.”

However, that response is out of date. Our attention should turn from “what?” to “now what?” Now what about that increasing amount of data?

Yes, the total sphere of data is growing because every human, machine, and application leaves a digital trail. And we dutifully store it. But are we missing the bigger picture?

From my perspective, there are three tines to the fork of the future of unstructured data. Three factors that are uniquely tied to the same trigger point: AI goes mainstream, and we are addicted to it.

On the roads of this great country we see signs that say “slow down when workers are present.” Yet in this new world, it’s going to be “speed up when AI workers are present.” Every recommendation engine, every predictive maintenance algorithm, and every digital assistant feeds on data.

And not the kind that fits neatly into rows and columns. It’s data without that structure.

As we make the AI become more effective, it will be with an increasing appetite for unstructured data. The institutions which figure how this works for them, will be the fast fish that eat the big fish.

But what happens if the company’s repository of intellectual property remains in analog form? Like film archives at major motion picture companies? Or the case histories from court proceedings? Or hand written letters from a historical figure?

Sure, we “digitize” those elements into .mpg, .pdf, and .jpg files. That makes progress. But we would only have data that describes the file, not the contents. The need is to go further. To capture the essence of those artifacts and to put that data to work.

I predict the rise of “Context Capture” companies.

Their sole purpose would be to safely and confidentially describe, summarize, and index that content. This would be done page by page, frame by frame, and scene by scene.

The Gig Economy has taught us many lessons. One important lesson is that new business models emerge where there is excess supply on one side. And there is also demand ready for it on the other side. AirBnB gave us spare beds (excess supply) when we need a place to sleep for a conference (ready demand).

Once we have made our intellectual property consumable, it’s time to find ready buyers. It is reasonable that when capacity finds a market place, that an economy driven by insights from data emerges.

Data brokers exist today. Insight brokers are what’s next.

Our industry must give up on defining unstructured versus structured data. It is better to characterize data as activated and non-activated.

This shift moves us from storing data based on access patterns to storing data based on value to the business. This implies that metadata is automatically written with a understanding of what is important. We could do this with people close to the business.

But, humans are not reliable or fast enough to do this on their own. Machine Learning Algorithms will fill in where humans fall out. This means we should expect that stored data is deeply described with metadata tags driven by AI. And those tags are deeply integrated into a mesh of data catalogs.

This will only accelerate the prominence of object storage.

Object storage is architecturally ready for this new world. It has a global namespace, rich metadata, and handles a massive number of small objects. With software-driven controls (not software-defined), object storage stands prepared.

All flash systems will narrow the performance gap to the storage platforms of the past. The story improves once the design engineers of object storage treat their product more like a race car than a school bus.

Culture eats strategy for breakfast.

Once data becomes the product and not the byproduct, we change our relationship to it. We change the way we interact with it.

Remember, we were secretly trained on how to search on Google. We learned to find what we wanted and only what we wanted.

We are seeing this unspoken training, again. Educational videos teach us how to write better prompts for Large Language Models (LLMs). We are changing. It’s up to us to be intentional about it.

Neil deGrasse Tyson wanted all of us to be data literate in the pursuit of being scientifically literate. At least that’s what I remembered in a radio interview shortly after his book, Starry Messenger, came out.

Data literacy seeks to understand what is meant. And done so with critical thinking.

I think we are going to go further than being data literate. We will learn to be data writers and not just data readers.

As with any authorship, there is a responsibility for clarity, engagement, and economy. It has to resonate, keep us interested, and be done with the shortest path possible.

This is a culture shift to data publishing and data consumption. Not just data consumption. We can, and should, be intentional about it.

In that way, we quickly realize that the data curation is the silent, necessary step in the process.

Understanding our audience matters. Personalizing matters. Emotion matters.

Emotion matters because amazement glues such insights into memory. Good writers know this. Great writers do this while we are not paying attention.

There is going to be change ahead with unstructured data.

My friend Joe Steiner thinks we need to shift the conversation to “unified data” from “unstructured data.” I like that.

Because we move from describing what it is to what benefit we get from it.