Archive for the ‘RDF’ Category
April 26th, 2011 | Permalink | 3 Comments
Well folks, we’re at it again. The month by month the journey continued Monday into Tuesday night to semantify the hallowed templates of bestbuy.com. One of April’s goals: to enhance machine understanding of Best Buy’s considerable product offerings while retaining human searchability and readability. After long wait, we have deployed code to the search templates to establish a human-readable and machine-parseable front-end API.
Many moons ago (even before all this RDFa goodness), we established a URI scheme we call “shop URLs”. Basically it’s an easy way to pass a search term in a URI and get a visual list of up 50 products our search appliance considers relevant. However, when you have a catalog of 400K+ products, simple visual results may not be the best or most efficient way to sort through the cruft and get at what you’re looking for. Enter stage left our friendly machine helpers: Search Engines, Parsers and Aggregators — this deployment activity is focused on feeding you! We’ve deployed step one of enabling a solution to product visibility and discovery issue by unleashing the result data in RDFa (with GoodRelations, Dublin Core, FOAF, Google Ratings vocabs) for maximum machine parseability.
After all this grandeur and hype, I’m hoping you’re still interested in how it works. You may point your eyes and parsers here:
* Please note, due to marketing and business considerations, some of the more popular terms may redirect you to a dataless “category page”. To get a RDFa-enabled result, simply append a * to your search term, e.g., http://www.bestbuy.com/shop/ipods* (how dare those marketing people stand in the way of good data!)
Let’s dive deeper with a quick example. So I’m a bit eclectic and looking for a thermometer online. I would like to see results of the “thermometers” from bestbuy.com, plus pass the data to my machine friend, an application I am building to help me make the right product choice.
First I type access my human-friendly representation using a “shop URL” directly in the browser:
Which results in a human-readable web page:
human-readable shop url
Looks like I have 15 product offers that match and are available via bestbuy.com or in store. Excellent.
I’m going to take that same URI and pass it on to my machine helper who just wants the data, no fluff. Let’s say we’re working with RDF/XML…on the surface, the 15 product offers may appear like this:
rdf extract from shop url
Expanding an individual offer yields the following data-rich result:
expanded data extract of shop url
So endeth the second phase of sematification. Make sure and leave your API keys at home, this search data is all open! Tune in for more later this week, I will be discussing another one of April’s goals, expanding RDFa markup to Best Buy’s product detail pages.
December 9th, 2010 | Permalink | 1 Comment
I’ve had the good fortune of sharing the possibilities, power, and my personal vision of the semantic web with a number of audiences in the past couple of months. This has also given me a great deal of time to think deeper about how we can utilize the massive amount of unstructured data that exists now on the web. There’s a lot of beneficial data out there, information companies can ingest and use in machine learning, and data that should be openly shared externally and made available for both humans and machines to access and distill.
While brainstorming new ideas for my next go at an interesting presentation, I concocted a very simple “strategic formula” that I believe all business and organizations could leverage when it comes to the Semantic Web, Linked Open Data and, well, just data in general. It looks a little something like this:
So what do these spheres mean? Anyone who sells something or provides a service that people use should be looking for as much exposure as they can get on the web. The external data sphere represents human and machine readable data that you’d want everyone to access. One of the primary vehicles gaining popularity on the web is RDFa, a way of utilizing richly annotated HTML to deliver data to machines while retaining the rich visual web human users have become accustomed to. There are also markup techniques like Microdata that do a similar job, allowing us to enrich HTML utilizing semantic vocabularies like GoodRelations to create virtual representations of real world physical objects. Search engines like Yahoo! have been taking advantage of rich data markup techniques for years, and Google has built RDFa, Microdata and Microformats support into their Rich Snippets initiative. The great thing about “front-end” semantic markup techniques is with a little additional knowledge and tools, it allows countless numbers of HTML devs to create a very rich web of data by simply adding data annotations to their HTML, essentially making the entire web an open and queryable database or API for us to extract knowledge from.
On the other side of the spectrum, most businesses have proprietary or sensitive data that they would not want to expose, but could still utilize internally for business benefits. This is where non human-readable semantic data technologies like RDF/XML would be useful. Companies could build internal apps that query a large amount of data that they posses, but typically don’t utilize. What if I could mash up internal data like product margins, inventory levels, along with store trend data and the “sentiment of the web” and start asking it questions? I can see benefits that touch every aspect of the business, from extremely contextual consumer and associate-facing product recommendation engines to merchandising tools that automatically determine trends and adjust product levels across the enterprise, even down to the region or individual store level, with limited human involvement.
Combining these external and internal data structures will result in insights — a necessary resource needed by all companies simply to survive in the current extremely competitive landscape. Data-driven insights are device, platform and trend agnostic, meaning they can easily be utilized and deployed to any new app, operating system or device. With the online space rapidly transforming into a “splinternet” of device types and methods for consuming and producing data, a solid base of semantically structured and linked data will be key to the next generation of successful enterprises.
December 29th, 2009 | Permalink | 5 Comments
There has been a flurry of chatter around the potential impact of RDFa on SEO after my brief presentation at SES Chicago 2009. In subsequent conversations with SEOs at the SES conference and folks from around the industry, I was surprised at how many people practicing SEO weren’t involving their web developers in their solutions, but rather focusing mostly on content, linking and social strategies. While these solutions are key in any SEO activities, the fact that our panel discussion and presentation was the only one involving code and coding techniques surprised me. This raises an interesting question: are many SEOs missing a core element to success, namely well structured, semantically-rich core web sites?
One can look at the current state of HTML on many web sites as an indicator of where people are focusing their efforts. The research performed to create the hProduct Microformat draft spec gives some good insight as to the condition of front-end HTML code. For years we have been building web sites mostly for visual, presentational (human-readable) purposes, and this is clear in many pages of source code analyzed for the hProduct spec. Luckily, search engines have done an incredible job of parsing out the junk and extracting the contextual and important data from billions of web pages. Machines have become vital to helping us learn, but up to this point there has been an imbalance in human-readable vs. machine-readable front-end code. Now there are emerging techniques and technologies that web developers can easily use to correct this by coding their pages to give them meaning to humans AND machines.
By combining rich front-end user and data experiences utilizing RDFa, Microformats, or the emerging Microdata spec, we build direct pathways to rich datasets, which enable machines (mostly search engines, but also next-gen parsers, browser plugins, etc.) to easily access important data and apply their algorithms, etc., to make sense of it all and index it in the ways they see fit. My personal theory is that by providing more direct access to data through front-end semantic code, machines will spend fewer CPU cycles parsing presentational code. These extra resources could then be re-allocated to better natural language processing, extending search into the “deep web”, or other efforts to make the web and it’s users smarter.
Of course this has implications to the SEO/SEM world. It forces SEO professionals to engage their web developers or become slightly more code savvy themselves. It shifts more emphasis on developing strong, data-driven semantic web sites that balance the visual needs of humans and the data needs of machines, rather than focusing on seemingly artificial techniques that increase “link juice” or utilize “secret sauce”. Using traditional SEO content strategies in combination with building strong data-rich web sites can lead to a more intelligent and useful web, which is ultimately good for businesses, users and consumers.