I’ve had the good fortune of sharing the possibilities, power, and my personal vision of the semantic web with a number of audiences in the past couple of months. This has also given me a great deal of time to think deeper about how we can utilize the massive amount of unstructured data that exists now on the web. There’s a lot of beneficial data out there, information companies can ingest and use in machine learning, and data that should be openly shared externally and made available for both humans and machines to access and distill.
While brainstorming new ideas for my next go at an interesting presentation, I concocted a very simple “strategic formula” that I believe all business and organizations could leverage when it comes to the Semantic Web, Linked Open Data and, well, just data in general. It looks a little something like this:
So what do these spheres mean? Anyone who sells something or provides a service that people use should be looking for as much exposure as they can get on the web. The external data sphere represents human and machine readable data that you’d want everyone to access. One of the primary vehicles gaining popularity on the web is RDFa, a way of utilizing richly annotated HTML to deliver data to machines while retaining the rich visual web human users have become accustomed to. There are also markup techniques like Microdata that do a similar job, allowing us to enrich HTML utilizing semantic vocabularies like GoodRelations to create virtual representations of real world physical objects. Search engines like Yahoo! have been taking advantage of rich data markup techniques for years, and Google has built RDFa, Microdata and Microformats support into their Rich Snippets initiative. The great thing about “front-end” semantic markup techniques is with a little additional knowledge and tools, it allows countless numbers of HTML devs to create a very rich web of data by simply adding data annotations to their HTML, essentially making the entire web an open and queryable database or API for us to extract knowledge from.
On the other side of the spectrum, most businesses have proprietary or sensitive data that they would not want to expose, but could still utilize internally for business benefits. This is where non human-readable semantic data technologies like RDF/XML would be useful. Companies could build internal apps that query a large amount of data that they posses, but typically don’t utilize. What if I could mash up internal data like product margins, inventory levels, along with store trend data and the “sentiment of the web” and start asking it questions? I can see benefits that touch every aspect of the business, from extremely contextual consumer and associate-facing product recommendation engines to merchandising tools that automatically determine trends and adjust product levels across the enterprise, even down to the region or individual store level, with limited human involvement.
Combining these external and internal data structures will result in insights — a necessary resource needed by all companies simply to survive in the current extremely competitive landscape. Data-driven insights are device, platform and trend agnostic, meaning they can easily be utilized and deployed to any new app, operating system or device. With the online space rapidly transforming into a “splinternet” of device types and methods for consuming and producing data, a solid base of semantically structured and linked data will be key to the next generation of successful enterprises.