RDFa - Resource Description Framework attributes

October 9th, 2008 by Derek Rayner

This is the first in a series of articles looking at specific technologies that are highly applicable in a government context.  RDFa should be of interest to anyone wanting to make information available on the web in a smart usable way.

What’s the problem?

The web is currently like a subsistence farm.  Pretty much everything is planted, harvested and sifted by hand.  OK, there’s the odd steam engine but we’re by and large still pre-industrial in the way we use the web.  We live mostly in a world of pages hand crafted to please the individual eye and the content is relatively static on the page.

RDFa is designed to enhance the now largely manual process of browsing the web with automation.  From a publisher’s point of view it promises to make vast amounts of information consumable in new ways that could add significant value.  From the user perspective it could enable many new and intriguing services as yet unthought of.

For government the game is getting more value out of public information and data by making it more available and more usable without significantly increasing costs - maybe even lowering them.
There are some interesting use cases describing how RDFa can be used here: http://www.w3.org/TR/xhtml-rdfa-scenarios/

Most of the examples revolve around making a page more directly useful to a viewer by handling event or identity data in a smart way.  Even more powerful are examples of direct references to scientific nomenclature through an external vocabulary to unambiguously reference, for example, proteins, so the reader can go directly to a source definition via a pop up.  In addition, with RDFa aware tools, a web clipboard function becomes possible where the semantic markup is portable and can be copied anywhere.

What is RDFa?

RDFa provides a set of HTML attributes to augment human readable data with machine-readable hints.  It specifies a fixed set of attributes and parsing rules, but the attributes may contain properties from any RDF vocabulary.  The Wikipedia entry for RDFa gives the history and a short overview  http://en.wikipedia.org/wiki/RDFa

Two key aspects of RDF that make the above possible are the use of URIs (Universal Resource Identifiers) and the concept of the triple.  A triple (subject, predicate, object) will typically consist of URIs for subject and predicate with a literal value for the object.  To see this at work the following example from the W3C RDFa primer is useful -  http://www.w3.org/TR/xhtml-rdfa-primer/

<div xmlns:dc=”http://purl.org/dc/elements/1.1/”>
   <div about=”/posts/trouble_with_bob”>
      <h2 property=”dc:title”>The trouble with Bob</h2>
      <h3 property=”dc:creator”>Alice</h3>
      …
   </div>

In the above code fragment a namespace is declared for the Dublin Core vocabulary (see http://dublincore.org).  The about attribute is used to define the subject - in this case a blog post within the code context.  The property attribute is used to define a predicate - here title and creator from the Dublin Core vocabulary.  Finally literal values are assigned to the properties (objects).

I think the main point to get here is that if a browser can interpret the generic RDFa syntax, it can process any type of data with reference to any type of vocabulary.

So is it a silver bullet?

Personally, I don’t believe in silver bullets so I’d say no.  However, the concept has a degree of intellectual elegance which often characterises a winning idea.  Timing also plays a part and the best technology doesn’t always win (betamax and vhs anyone?).  Whatever the winning technology turns out to be I think some of the key points of RFDa will need to be present:
a. Must be scalable to the web - with RDFa you have one syntax to interpret, but many vocabularies to choose from
b. Must be portable - machine copyable without losing meaning
c. Must be independent of the page concept - most of the web we know now revolves around pages, but with IPv6 the address space is so vast it will be possible to address data segments that are not marked up within a page.  

This last point is an interesting one - we already separate structure from presentation on a web page using style sheets - so the next logical step is to make structured data available without the presentation - let the user decide what that should be.  Obviously this is a worst nightmare for branding experts and page designers and it won’t suit all types of content.  But we’re already seeing the start of this with feed technology and readers.  Whatever we might like to think, most users don’t care about flash pages and clever design, they just want stuff to work and deliver the goods.  We’ve barely begun on this one.

For some thoughtful current discussion on semantic markup and HTML look here: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/subject.html#start      Topic = Ghosts from the past and the semantic web

Microformats

Microformats try to solve the same markup problems addressed by RDFa but are a more pragmatic and ad hoc approach to tagging HTML.  Some are gaining popularity and support such as hcal, hcard and geo. While this is great, the approach is ultimately unsustainable across the vast amounts of data available on the web.  Each format embeds specific syntax and vocabulary into the HTML which must be recognised and supported by browsers and other processors.  The process for agreement on each format is currently outside that of the usual web standards bodies.

For a broader explanation and comparison of RDFa and Microformats try this article.
http://www.samaxes.com/2008/08/29/the-semantic-web-and-rdfa/
For more information on microformats try here.
http://microformats.org/

In my view microformats are going to be around for a while and there is plenty of value in the address/calendar formats that can be taken advantage of very easily right now.  Why wouldn’t you code important event dates and contact information on your web site in these formats with popular browsers beginning to support them?

Why does it matter now?

Governments hold vast amounts of information and are beginning to recognise that cheaper and more effective ways to disseminate it are needed.  Big static web sites and custom information transfer formats are expensive and difficult to maintain.  So we have efforts in the United Kingdom to reduce the number of government web sites, but at the same time make information available more freely.  Arguments around the economic benefits that could accrue when entrepreneurs are able to freely mashup data in new ways are beginning to be heard.  The basic idea is that instead of charging for data access government gets a dividend back in taxes from the generated business.

Then you have the more idealistic general-good argument that unless there is good legal or personal privacy reason to keep something confidential, then most government information should be freely available to the taxpayers that have financed its collection and maintenance.  In the past this has often been prohibitively expensive, but for the most part this argument no longer holds true.  

With wider uptake of broadband and the advance of many web technologies a convergence seems to be happening now between innovation, government policy, awakening demand and ability to supply.

So we’re heading towards a situation where many agencies will be encouraged to make more and more data available on web sites, what’s missing is the mark-up to make it mashable.  Here’s an interesting video showing a future concept browser using data objects in some novel ways.  The video is a bit cheesy with the live person bits done in a kind of freeze frame, but you get the idea.  http://www.vimeo.com/1450211    

If government doesn’t get into this game what happens.  It could be that some of the data collection done exclusively by government in the past suddenly gets some competition.  The danger is the confusion about authoritative sources that could ensue, not to mention the wasted effort in duplicate collection and maintenance.  Alternatively, we might just stifle the development of new businesses and services and deprive the economy of a much needed boost.  

What next?

Now is a great time to start exploring the capacity of your CMS system to handle semantic markup. Microformats are a good start and I don’t see time spent applying some of this coding as wasted even if RDFa eventually becomes the accepted standard.  However, things are moving fast so keep an eye on the technology.  Some organisations, such as the BBC, are moving towards a mixed environment using microformats and RDFa where appropriate and this seems a likely scenario in the forseeable future.  http://www.bbc.co.uk/blogs/radiolabs/2008/06/microformats_and_rdfa_and_rdf.shtml   

It’s also a great time to start articulating some of the ideas behind the technology to your business users. (Try this video as an introduction http://www.youtube.com/watch?v=OGg8A2zfWKg  ). There will be pressure on all government agencies to deliver more information in new ways more cheaply - time to get ahead of the game.

Related technology: Feeds  - see http://research.elabs.govt.nz/feeds-for-thought/     


Slashdot Digg Reddit del.icio.us Facebook Technorati Google StumbleUpon

1 Star2 Stars3 Stars4 Stars5 Stars (34 votes, average: 2.91 out of 5)

Tags: , ,

Leave a Comment





Is rain wet or dry?