Meta-Data-in-Ogg

Background

This is a demonstration of the meta-data in Ogg concept spelled out here.

A quick note: Ogg Vorbis is a widely supported digital music format. The samples on this site use a modified Vorbis format. Don't be suprised if you can't listen to them. They will not play correctly in very many players yet, mostly because people don't bother with error checks in their code. XMMS, Xine and MPlayer will manage, Helix and RealPlayer for Linux may well soon (their problem is nothing to do with error checking).

What is meta data?

To borrow from the Dublin Core website, The simplest definition of metadata is 'structured data about data.' To make this a bit more tangible, open up a book. Most books will have, within the first few pages, information on the edition and publication dates, the publisher, the author, some legal details and other things like typesetting information. It is quite common to have an introduction on the back of a paperback or on the inside of the dust jacket of a hardback. This is all meta data. When it comes to music, meta data can include things like the composer, the performers, sound engineers, recording location, dates and notes about details of a live performance.

Meta data is anything you can say about data that is not the data itself. A subset of this is used in libraries, and recorded by systems like Dublin Core, for purposes of categorisation. You can look up works composed by Mussorgsky, books written by H. G. Wells, recordings featuring Jimmy Page. This is "library-card" meta data, the kind that lets you say, "I'm looking for some early 20th century crime writing," and find it. Web pages can include meta data and (theoretically) be indexed by it (in practice the system is abused by spammers).

So what does a second grip do anyway?

But if you look at the sleeve for an album, you will find information that won't fit onto a library card. Who plays what instruments on what tracks? Someone is thanked for sandwiches, do they make it on as a "contributor" (to use the Dublin Core terminology), or are they a footnote? Movies are worse, they each seem to have their own terminology. Some libraries might record that Tito Gobbi sings the part of Scarpia in a particular recording of Tosca; but not many, and where was it recorded anyway?

This is the difference between the most familiar type of meta data and what is sometimes called "kitchen sink" meta data. If we are going to put meta data into a file it is not playing the role of a library card; it is playing the role of an album sleeve, a foreward to a book or the credits of a film. Ideally it is not the role of the librarian or the enthusiast to create that record, but the content creator; it is a way for them to communicate information they think you should have. In practice Ogg files are often created from other sources (e.g. transferring music onto a computer), so to create a complete record the archivist might want to preserve the meta data.

Why not use CMML?

CMML is not something I should try to describe in a few sentences. It stands for Continuous Media Markup Language and is designed to both annotate media streams and clips, and to enable authoring media streams by describing the combination of those annotated streams. I should add that I think putting fuller meta data in Ogg complements CMML quite nicely; the information is always there and associated with the stream if you want it. People have asked, Why not use CMML for meta data? So here is my opinion (and if CMML works for you, you can safely ignore it):

CMML meta tags

The first interpretation of the question is "Why not use the CMML markup language?" (The redundant abbreviation will make more sense once you get the the second interpretation.) My answer would be that the CMML meta tags, like the closely related meta tags in HTML, are for limited library-card type meta data: categorisation and indexing. You could push kitchen sink meta data into them if you tried, but what you would end up with would be very poorly structured (remember, "Structured data about data."). Basically you end up with a stack of name="value" pairs.

CMML serialisation

The second interpretation is "Why not use the CMML serialisation?" The serialisation specifies a way to put the CMML (which is text), into a media stream (which is binary data). The serialisation simply involves putting the following into the stream in order:

A binary header identifying a CMML stream and giving some useful information, mainly relating to working out what time an annotation corresponds to.
The start of the CMML document.
The annotations themselves

There is no particular reason not to do this. My current view is that meta data is something that describes the fields relation to things internally, while CMML describes its internal structure. In this picture the metadata does not need to be spread through the file, so doesn't need the support CMML does. This is still an open question though.

General XML

The attempt to put RDF/XML meta data in Ogg files also opens the way to putting other types of XML document in. The SVG (Scalable Vector Graphics) format has been suggested. XML itself says nothing about what a particular document means, and it's entirely possible people will come up with imaginative uses for putting XML into Ogg files if they can do so. Allowing such additions to 'tag along' as plain XML documents may reduce the barriers to this.

My thoughts on demuxing of such documents are that you use an XML parsing codec to recognise their document type and decide where to put them. The demuxer doesn't need to know they're being passed on elsewhere, and doesn't really need to worry about it unless, like CMML, they need to have time within the stream. In which case the CMML serialisation is certainly appropriate.

Example general XML application: SVG fonts for subtitles: Include the fonts for subtitles as an XML document. The XML codec recognises the SVG document type and collects the document. A subtitle stream specifies that it wants to use the SVG as the source for its fonts, so the user agent gets the document from the XML codec and uses it to generate the fonts. Note that, as far as the user agent is concerned, this is just a source of fonts in a particular format and that the meaning of the SVG requires the request from another stream (otherwise it could have been promotional material like an album cover).

Installing the kitchen sink

There is a problem with "kitchen-sink" meta data; it is hard to structure. Relationships involving multiple people are a good example. In RDF you have a subject (Spot), predicate (owns) and object (ball). To connect multiple contributors to a resource you ideally need multiple predicates, unfortunately it is the predicates that need to be defined by the various standards and understood by processing. This means that predicates are often few and simple: Dublin Core has "creator" and "contributor" with a few role qualifiers. The Library of Congress MARC Relators are more extensive, but also have to cover a wide range of roles.

The best approach is probably then to maintain as much structure as possible, and accept that the more refined the meta data is the less likely it can be structured in a machine-readable way. In line with its probable use (providing deatiled information about it resource to some interested in it—v.s. providing sufficient information to decide whether you're interested in it), that means it should be human readable or parsable into human readable form.

Content copyright Ian Malone 2005, except for the logos on the front page and where otherwise noted. Stylesheets and code cribbed from various sources on the web, see links section for a few.