Vanilla Ice was also rollin' -- in the 80's
Well folks, it’s deployment season again here at Best Buy, which means lots of changes, late night code deployments, and business project managers running around with their hair on fire. It also means it’s time to sneak some good structured data type stuff on to the site without actually having to explain in detail what’s so gosh darn important about delivering rich data in our site’s HTML output. It’s with great pleasure that I announce the official launch of schema.org reviews markup on all pages of bestbuy.com, gradually rollin’ out over the next couple of days to a browser near you.
So why schema.org? I was fortunate enough to participate in the schema.org workshop in Mountain View last month where a bunch of really smart people were talking structured data on the web. If you know anything about the history between some of the individuals attending, you’d figure we would have several opposing viewpoints and many arguments would ensue. To my surprise, this was not the case — we had a great day of very constructive talks. And with this warm and fuzzy spirit of goodwill, I figured it was time to put the rubber to the road and release a new standard for all to test.
If you’re still wondering why schema.org, please take a gander at these thoughts:
- Yep, it’s Microdata, but it’s about schema, not syntax. I’ve been doing my homework, and I believe the product reviews vocabulary created by the Google-Rich-Snippets-now-schema.org group is a solid and well thought out vocabulary. Additionally, the consensus from the workshop was the support of multiple syntaxes, so I’m not terribly worries about being lambasted for trying a new syntax .
- I still love RDFa. One of the greatest things about RDFa is it’s out of the box support for multiple types/ vocabularies, which was also a desired requirement coming out of the schema.org workshop. I was also moved by the excellent presentation by Ben Adida, where he talked RDFa and the new RDFa 1.1 Lite, which looks very, very promising. Plans are already in the works to port a segment of the reviews to RDFa 1.1 Lite, with a little help from my friends.
- Continuing to push for changes in the schema — most notably support for multiple types.
- It could be one of the first large deployments of schema.org serve as an example. Suggestions? Comments? Want to see the code change to point your parser at? Let me know, let’s create something wonderful for the web.
Finally, if you’re curious, check out this Sony TV example.
Well folks, we’re at it again. The month by month the journey continued Monday into Tuesday night to semantify the hallowed templates of bestbuy.com. One of April’s goals: to enhance machine understanding of Best Buy’s considerable product offerings while retaining human searchability and readability. After long wait, we have deployed code to the search templates to establish a human-readable and machine-parseable front-end API.
Many moons ago (even before all this RDFa goodness), we established a URI scheme we call “shop URLs”. Basically it’s an easy way to pass a search term in a URI and get a visual list of up 50 products our search appliance considers relevant. However, when you have a catalog of 400K+ products, simple visual results may not be the best or most efficient way to sort through the cruft and get at what you’re looking for. Enter stage left our friendly machine helpers: Search Engines, Parsers and Aggregators — this deployment activity is focused on feeding you! We’ve deployed step one of enabling a solution to product visibility and discovery issue by unleashing the result data in RDFa (with GoodRelations, Dublin Core, FOAF, Google Ratings vocabs) for maximum machine parseability.
After all this grandeur and hype, I’m hoping you’re still interested in how it works. You may point your eyes and parsers here:
* Please note, due to marketing and business considerations, some of the more popular terms may redirect you to a dataless “category page”. To get a RDFa-enabled result, simply append a * to your search term, e.g., http://www.bestbuy.com/shop/ipods* (how dare those marketing people stand in the way of good data!)
Let’s dive deeper with a quick example. So I’m a bit eclectic and looking for a thermometer online. I would like to see results of the “thermometers” from bestbuy.com, plus pass the data to my machine friend, an application I am building to help me make the right product choice.
First I type access my human-friendly representation using a “shop URL” directly in the browser:
Which results in a human-readable web page:
human-readable shop url
Looks like I have 15 product offers that match and are available via bestbuy.com or in store. Excellent.
I’m going to take that same URI and pass it on to my machine helper who just wants the data, no fluff. Let’s say we’re working with RDF/XML…on the surface, the 15 product offers may appear like this:
rdf extract from shop url
Expanding an individual offer yields the following data-rich result:
expanded data extract of shop url
So endeth the second phase of sematification. Make sure and leave your API keys at home, this search data is all open! Tune in for more later this week, I will be discussing another one of April’s goals, expanding RDFa markup to Best Buy’s product detail pages.
I’ve had the good fortune of sharing the possibilities, power, and my personal vision of the semantic web with a number of audiences in the past couple of months. This has also given me a great deal of time to think deeper about how we can utilize the massive amount of unstructured data that exists now on the web. There’s a lot of beneficial data out there, information companies can ingest and use in machine learning, and data that should be openly shared externally and made available for both humans and machines to access and distill.
While brainstorming new ideas for my next go at an interesting presentation, I concocted a very simple “strategic formula” that I believe all business and organizations could leverage when it comes to the Semantic Web, Linked Open Data and, well, just data in general. It looks a little something like this:
So what do these spheres mean? Anyone who sells something or provides a service that people use should be looking for as much exposure as they can get on the web. The external data sphere represents human and machine readable data that you’d want everyone to access. One of the primary vehicles gaining popularity on the web is RDFa, a way of utilizing richly annotated HTML to deliver data to machines while retaining the rich visual web human users have become accustomed to. There are also markup techniques like Microdata that do a similar job, allowing us to enrich HTML utilizing semantic vocabularies like GoodRelations to create virtual representations of real world physical objects. Search engines like Yahoo! have been taking advantage of rich data markup techniques for years, and Google has built RDFa, Microdata and Microformats support into their Rich Snippets initiative. The great thing about “front-end” semantic markup techniques is with a little additional knowledge and tools, it allows countless numbers of HTML devs to create a very rich web of data by simply adding data annotations to their HTML, essentially making the entire web an open and queryable database or API for us to extract knowledge from.
On the other side of the spectrum, most businesses have proprietary or sensitive data that they would not want to expose, but could still utilize internally for business benefits. This is where non human-readable semantic data technologies like RDF/XML would be useful. Companies could build internal apps that query a large amount of data that they posses, but typically don’t utilize. What if I could mash up internal data like product margins, inventory levels, along with store trend data and the “sentiment of the web” and start asking it questions? I can see benefits that touch every aspect of the business, from extremely contextual consumer and associate-facing product recommendation engines to merchandising tools that automatically determine trends and adjust product levels across the enterprise, even down to the region or individual store level, with limited human involvement.
Combining these external and internal data structures will result in insights — a necessary resource needed by all companies simply to survive in the current extremely competitive landscape. Data-driven insights are device, platform and trend agnostic, meaning they can easily be utilized and deployed to any new app, operating system or device. With the online space rapidly transforming into a “splinternet” of device types and methods for consuming and producing data, a solid base of semantically structured and linked data will be key to the next generation of successful enterprises.