Standards

You are currently browsing the archive for the Standards category.

Lawrence Lessig by Joi Ito, CC-BY

Lawrence Lessig by Joi Ito, CC-BY

The Open Video Alliance will be hosting an online chat with Lawrence Lessig tomorrow at 6pm eastern time / 3pm pacific time (see more time zones here.)

There are a lot of events in person as well. I will be at the event near San Francisco.

The event will also be broadcast live with open video, thanks to Fluendo. The best client for you to watch it in will be either Firefox or VLC. More instructions on clients can be on the openvideo wiki.

There was a recent post on LWN suggesting that three specific Nokia patents may cover Theora. A deeper analysis indicates that’s just not true.

Two of the patents 6,950,469 and 7,263,125, are post-VP3 and therefore not relevant because the patent filing dates are after the invention and introduction of VP3 (the basis for Theora.) Thus, Theora predates these patents and could not infringe.

The other patent, 6,504,873, requires the affirmative step of defining a “linear equation” between two reference pixels. Theora does not use such an equation; thus it doesn’t satisfy the limitations of the claims and does not infringe.

For background on the free software angle on this story please check out Robert O’Callahan’s post on this topic. Also check out Mike Shaver’s shorter background post as well. This post differs from theirs in that I want to talk about network effects, why codecs should be considered a fundamental web technology and what the long-term effects of the choices at this inflection point might look like.

Recently Youtube announced that you could test out an HTML5-enabled version of their site. They said that they were doing this partially based on people’s “number one request” that Youtube do more with HTML5. (They left out the other half of that #1 request – that the implementation be based on open codecs, but more on that later.) Not to be outdone, Vimeo rushed to announce a beta version of their player based on their site that claims HTML5 support as well.

To be clear, this is great news. This is just the latest in a long string of changes for video on the web. We started with a raw “player” delivered by Real Media. Then on to media embedded directly in pages via Windows Media + Quicktime. More recently video on the web has been a a platform play by Flash. And finally to a place where media becomes a first class citizen on the web without a single source provider. These moves by Google and Vimeo (and before either of them, DailyMotion) show that things are changing for the better, and faster than I think anyone could have imagined.

The players from Google and Vimeo do present a pretty serious problem, though. Each of these require a proprietary H.264 codec to be able to view them. These codecs aren’t compatible with the royalty-free web standards that the rest of the web is built on. The fact that they are being so unabashedly hyped along with the new darling of the web – HTML5 – means that most people don’t understand that something very dangerous is taking place behind the scenes.

If you think that this isn’t an issue that’s worth worrying about you need to read the rest of this post. In particular the history of GIF shows us what happens when patented technologies are used on the web and what happens when network effects over-run the natural drive to royalty-free technologies at scale. MP3 pricing gives us a glimpse into the strategy around H.264 licensing and what the landscape might look like 5 years from now, assuming H.264 were baked into the web platform as a requirement. I’ll also talk about other options that might be coming in the near future that most people don’t know about.

The Web Exploded on Royalty-Free

The web has always been based on the assumption of Royalty Free. In fact, participation in a working group at the W3C requires that any parties disclose and make available any essential claims on the technology covered by that working group.

But that’s just a technicality. The truth is in the tests: you can still build a web browser, spider, client, web server, image editor, a JS library, a CSS library, an HTML editor, a web publishing system, commerce system – anything that is based on fundamental web technologies – without asking anyone for permission. This is a fundamental reason why the web has spread everywhere. Because everyone had the chance to add to the mix.

It’s worth saying twice. Anyone can create technology or services on the web and they don’t have to ask anyone for permission to do it. This is why we’ve had billions of dollars of investment and a fundamental shift in the way that western society acts and communicates – all in the course of a very short period of time. The web grew up on Royalty-Free.

Learning from GIF

The web in 1999 was a lot smaller than it is today, so a lot of people don’t remember what happened back when Unisys decided to start to enforce their GIF-related patents. GIF was already widely used on the web as a fundamental web technology. Much like the codecs we’re talking about today it wasn’t in any particular spec but thanks to network effects it was in use basically everywhere.

Unisys was asking some web site owners $5,000-$7,500 to able to use GIFs on their sites. Note that these patents expired about five years ago, so this isn’t an issue today, but it’s still instructive. It’s scary to think of a world where you would have to fork up $5000 just to be able to use images on a web site. Think about all of the opportunity, the weblogs, the search engines (even Google!) and all the other the simple ideas that became major services that would never have been started because of a huge tax being put on being able to use a fundamental web technology. It makes the web as a democratic technology distinctly un-democratic.

We’re looking at the same situation with H.264, except at a far larger scale.

So let’s talk about what makes a fundamental web technology, and how they should be licensed. First, the licensing. I think that Apple said it best:

After careful consideration of the draft patent policy, Apple believes that it is essential to continued interoperability and development of the Web that fundamental W3C standards be available on a royalty-free basis. In line with the W3C’s mission to “lead the Web to its full potential,” Apple supports a W3C patent policy with an immutable commitment to royalty-free licensing for fundamental Web standards. Apple offers this statement in support of its position.

(The post then goes on to talk about an opt-out mechanism for participating members.)

This leads to the obvious question: is the codec a fundamental web technology? The HTML5 working group argued and punted on the issue. Given that the standard is mum on the issue it falls to the actors in the market to determine what’s required to support HTML5. Given the state of online video today this basically boils down to one actor: Google.

Google has a near-monopoly in online video thanks to the ubiquity of Youtube. This means that they are the effective arbiter of codec choices for HTML5 video. If you want Youtube to work, you have to support whatever they are using. (Right now that’s Flash or a native Youtube app for mobile devices, but it’s clearly changing.) Let’s set up a strawman and say that it’s going to be H.264. (I’ll discuss later why I don’t think that this will be a requirement, but let’s say that it is.)

Their choice for H.264 had an immediate effect. It’s a signal to the market that it’s OK to start using H.264 as the main codec for HTML5 video. This is proven out by Vimeo’s HTML5 beta player launch. Vimeo is a secondary player, but were perfectly happy doing what Google did. The effects of that move have spread quickly and you can see it in people’s reactions: John Gruber getting angry at Mozilla as a result of Google’s actions, people on twitter claiming that we don’t support HTML5 at all because of Google’s use of a proprietary codec (Not true! Firefox actually leads in HTML5 support in a huge number of areas.) and Gizmodo’s choice comment: “Luckily, YouTube accounts for a hefty chunk of said architecture, their catalog is rendered in HTML5-friendly h.264 format. This is what network effects look like in real time.

So if you think that Google has settled on H.264 as the only codec they will support (unlikely) it would appear that they have set us up to have another GIF-like situation. Note that I think that this will not actually be the case, as I discuss later, but it’s worth thinking about as a framework. So instead let’s talk about what that situation would look like, with MP3 as the model for the H.264 licensing strategy.

Rising Costs and Unpredictable Licensing

So what can we learn about H.264’s licensing strategy as it pertains to pricing? Much like GIF we already have a model to look at that is already near the end of its cycle: MP3. Network effects and goverment-sponsored monopolies make a very powerful combination. But getting the most out of them requires a very specific strategy, one we saw with MP3 and we’re seeing again with H.264.

History is instructive. We know that MP3 was licensed quite liberally early in its lifespan. Before 2002, “no license fee is expected for desktop software mp3 decoders/players that are distributed free-of-charge via the Internet for personal use of end-users”. They changed that after the network effects had already taken their toll. Not only were decoders free for free software, but bulk flat rate licenses were available to large distributors. That’s how widely distributed software picked up with ability to play back MP3-encoded files.

But as the cycle continued and MP3 became a requirement for playback the pricing changed to where we are today. So let’s talk about what it looks 8 years later.

If you look at the public published rates for a couple of the MP3 licensors (and there are more than just two) someone who wanted to use it would be looking at a royalty rate of about $1/downloaded unit. So if you were doing, say, two million downloads a day you would be looking at about $2,000,000 per day just to have permission from those companies to include an MP3 decoder. Could you negotiate a lower rate? Probably. But that gives you a sense of the scale if you’re a small provider in a world where getting started on the web is hard and you don’t have much negotiating power.

People casually say that we should support licensed codecs like MP3, but they haven’t done the research. We have.

Much like MP3, H.264 is currently liberally licensed and also has a license that changes from year to year, depending on market conditions. This means that something that’s free today might not be free tomorrow. Like sending an H.264 file over the Internet.

In fact there are already royalties charged to some people using H.264 for streaming, according to Jan Ozer. Some quotes from that article:

Whenever I speak at industry groups about H.264, and detail the upcoming royalty obligation, some attendees are invariably surprised that using H.264 will generate royalties. Here’s what you need to know about H.264 and royalties, in an except from an article that I wrote for StreamingMedia.com [ed: full article here.]

When I spoke with Harkness, he stated that the patent group hadn’t yet decided the license provisions for internet broadcast, or even if there would be a license, though he conceded that it would make little sense for the patent group to forego this revenue. The only thing certain is that the royalty provisions must be announced by January 2010 for royalties that would be payable the following year.

OK. This paragraph hits all of the big points:

  • Right now there aren’t any fees for “internet broadcast.”
  • But there might be in the future
  • The license changes from year to year.

Remember, this is still very early in H.264’s history so the licensing is very friendly, just like it used to be for MP3. The companies who own the IP in these large patent pools aren’t in this for the fun of it – this is what they do. They patent and they enforce and then enjoy the royalties. If they are in a position to charge more, they will. We can expect that if we allow H.264 to become a fundamental web technology that we’ll see license requirements get more onerous and more expensive over time, with little recourse.

Selective Enforcement

One reason why a lot of this isn’t known is because patents can be selectively enforced. And because it’s still early in H.264’s lifespan it’s extremely advantageous to lightly enforce the patents in the patent pool. MP3 and GIF both prove that if you allow liberal licensing early in a technology’s lifespan, network effects create much more value down the road when you can change licenses to capture value created by delivering images and data in those formats. Basically wait for everyone to start using it and then make everyone pay down the road. (Three words: unpredictable business costs.)

The other problem is that the Internet, because of it’s global nature, hides many of these costs. Everyone – and I mean everyone – uses tools from parts of the world where there are no software patents to transcode and edit videos. (One of the world’s largest free software downloads after Firefox? VLC. 111M last time I checked.) This grey area for tools means that these heavily patented formats gain much of the same advantage as free formats – lots of free tools and tons of ad-hoc support from free software people – but with the ability to still enforce and monetize in parts of the world where patents are enforced. It’s actually a brilliant strategy, even though the outcome is that the true costs of patents are hidden from the view of most people.

So What Now?

Remember that my setup for this was that Google’s choice was going to be H.264-only and that their decision would have network effects on the web, setting up another GIF-like situation for the web.

But I, like many others, have reason to believe that H.264 will not be Google’s final choice. There’s good reason to believe this: they are purchasing On2. On2 has technologies that are supposed to be better than H.264. If Google owns the rights to those technologies they are very likely to use them on their properties to promote them and are also likely to license them in a web-friendly (i.e. royalty-free) fashion. Google actually has a decent history of doing this. In particular you can get a sense of this from their post on The meaning of open:

If there are existing standards for handling user data, then we should adhere to them. If a standard doesn’t exist, we should work to create an open one that benefits the entire web, even if a closed standard appears to be better for us (remember — it’s not!). In the meantime we need to do whatever we can to make leaving Google as easy as possible. Google is not the Hotel California — you can check out any time you like and you CAN, in fact, leave!

In this case they were talking about user data, not video formats. But it’s the same set of principles at work. It’s also very hard to imagine Google licensing proprietary codecs as a revenue stream. It just doesn’t align with how they have worked in the past.

So it’s very likely that Google will be using a codec that’s superior to H.264 in terms of bandwidth usage and will also have web-friendly licensing attached to it. I know that at Mozilla we would support that and would very likely incorporate that technology into our browser, much like we did with Theora and Vorbis.

In Summary

So that’s the case for supporting free formats and also describes why we should be avoiding H.264 as a fundamental web standard. We don’t want to set ourselves up with another GIF situation and set up licensing like MP3 where we’ll be dealing with increased costs and restrictions over time. Google is likely to support something other than H.264 on Youtube and we’re likely to end up with something that’s better on a royalty-free basis as a result. And as I mention below, Theora and Vorbis are still excellent alternatives even if they for some reason don’t do as we expect.

Mozilla and Firefox continue to stand with the web on this topic. We don’t think that fundamental web technologies should be encumbered with patents and our actions and messages reflect that. We hope that you will stand with us on this.

A Note About Theora and Vorbis

Many of you might notice that I haven’t talked much about Theora or Vorbis. In fact some of you might read this post as me throwing them under the bus. That couldn’t be further from the truth. What I’ve really been talking about is one part of a larger ecosystem. What the web is really asking for is a codec that is implemented everywhere, that competes well on quality and doesn’t come with GIF-like surprises. Theora and Vorbis fit every part of this bill. You can actually use them on all of the desktop browsers, either via native support or via a Java plugin that actually works pretty well.

On the quality side what we’ve been able to do at Mozilla, with the help of the rest of the Xiph community, is to show that even though Theora is based on older, royalty-free technology, most people can’t really tell the difference between a video encoded with a decent Theora encoder and a video encoded with H.264.

But given the situation with submarine patents it would actually be a good idea for us to have more than one royalty-free codec available for browser vendors, site owners and content publishers. That way if one of them turns out to have issues, you just turn one of them off and continue to use the other one That’s why I think that if Google did offer a new codec that it would make a wonderful addition to the list of codecs we could use on the web. And if they want to use it on Youtube and other Google sites, that’s great. But it’s good to have other options in the wings.

So this means that Theora and Vorbis aren’t going anywhere. There are other reasons to continue to support (and promote!) Theora and Vorbis as well:

  • There’s a growing corpus of Theora content on sites like the Internet Archive, Wordpress and Dailymotion, not to mention all the private sites that are out there starting to use it.
  • Vorbis is far better quality than MP3 for the same bandwidth and I would expect that Google would use it as the audio codec of choice to match a free video codec.
  • Vorbis is actually supported in a large number of hardware devices, often quietly. My phone supports it, for example.
  • Theora with Ogg as a container actually is a fantastic live streaming format for HTTP. This is often overlooked. While Apple has had to add a bunch of code and description files trying to get live streaming to work with their proprietary H.264 codec and MPEG containers, we’ve been doing live streaming over HTTP out of the box ever since Theora and Ogg were part of the browser without any changes to standards. This is largely a function of history. Vorbis and Ogg were originally built as a radio streaming format. It’s possible to jump into the middle of a stream and start decoding. (As a side note it will be interesting to see if Google ends up trying to build their own container format. Ogg is simple, and it works.)

So I wouldn’t expect these formats to go anywhere. Instead I would expect to see them implemented everywhere either as backups or to support existing content and streaming.

One Cloud? by liberato

One Cloud? by liberato

This really wonderful post by Anil Dash echos a lot of what I’ve been talking about in the context of the larger web. I had a discussion with Ben Galbraith recently about this topic during a Mozilla lunch. He and I took (intentionally) different positions on topics to see what kind of discussion we could stimulate around how developers see the web platform.

Ben has some concern that the web platform isn’t as coherent as those that you find from the other big players – the iPhone platform, Silverlight, Java or any of the other giant siloed stacks. (Actually Ben was more interested in the capabilities of those platforms vs. the web, but I’ll talk about that later.) I’m basically of the opinion that the web that we have, and as messy as it seems, actually produces pretty good results. That the incrementalism and experimentation that we’ve seen from web browser vendors results in what I call “developer-friendly incompatibility.” That those changes are eventually codified to standards and taken mainstream because they degrade well and we can learn as we go. (Kind of like life!)

But it does raise an interesting question – what capabilities do we need to have for the web that are found in these stacks? And can they be applied in an incremental fashion? We’re starting to see that with video being promoted as a first class citizen with Flash as a trailing edge fallback. We’re starting to see the web pick up 3D capabilities with participation from Google, Apple and Mozilla. And we have the pretty wonderful library model that has produced jQuery, jQuery UI, mootools, YUI, dojo and many others – all of which come from pushing complexity to the edges of the web community.

But is it enough? Discuss. What’s missing, and what’s interesting? I would particularly love to hear from Java and Silverlight developers. What do you really love about those platforms? Is source-as-delivery and incrementalism enough?

There’s a press release / post up about a BoF that’s going to happen tomorrow, July 30th, at the IETF meeting in Stockholm, Sweden. The Xiph folks, along with some people from Skype, are proposing that the IETF form a working group around audio codecs in use on the Internet (with a capital ‘I’.) You can also attend this meeting online as well to voice your position. To learn how, have a look at the post.

What’s interesting about this meeting is not that it’s happening. What’s interesting is that there’s a lot of resistance to this idea. It’s rumored, for example, that Ericsson, who has a vested interest in heavily-patented audio methods in use for VoIP, has sent anywhere from 40-100 people to vote against such a working group. (Is this true? I don’t know. But if it is it’s an interesting signal about the types of business interests such a thing might displace.)

Note that this is completely separate from the HTML5 working group and has no relation to those actions. And the IETF is a very different organization from either the whatwg or the W3C. But it’s interesting to see similar discussions taking place in another, similar, organization.

Google has announced the availability of a plugin that implements 3D technology and makes it available over the web. You can read about the announcement in in the Google Code Blog and in an excellent article by Ryan Paul in Ars Technica.

Ryan points out that there are significant differences between what Google has built here and what we’ve built. I thought it might be worth it to expand on that a bit since it isn’t explained in depth in the Ars article.

Google’s 3D work is a plugin.  So much like how Flash or Silverlight works you get a rectangle in the browser to draw into.  They provide a high level scene graph API which uses the COLLADA format for loading objects underneath.  It’s a very large chunk of code.  If you take a look at the API and click around at the packages and classes you can see that there’s a lot there.  Their use case is games and game-like things – virtual worlds.  So it’s a great piece of work, but it’s also at a very high level.

Mozilla’s current proposal to Khronos is a very simple API that’s a wrapper around OpenGL ES 2.0.  It’s currently available as an extension to Firefox 3.5 and is likely to be rolled into a version of Firefox after 3.5.  The proposal is very focused on 3D.  For example, we didn’t try to include video or audio because those are being covered by other web standards and we’re interested in making sure they are well integrated instead of trying to wrap those into a 3D spec.  We’ve bound it to the canvas element so you can use it in much the same way you use the current canvas 2D context.  Things like asset loading (via COLLADA or other systems) are things we haven’t dealt with because those can be handled entirely outside of the 3D api and layered on top of it.  (Later in this post you’ll understand why this is important.)  But the important thing is that it’s something that you can easily mix with the rest of the open web.  Open Video and Audio, CSS, HTML, Canvas 2D, Canvas 3D, etc – you should be able to mix them all together and that’s our goal.

So these two 3D things from Mozilla and Google are pretty different.  Not really competitive, either, because they have such different goals.  The Google software is a very high level API 3D graphics API and what we’re proposing is more akin to the low level graphics API that those high-level systems are built on.

Given the title of the google blog post (“Towards an open web standard for 3D graphics”) it’s important to point out these differences since they affect how the standards process might look, and what the output might be.  We’ve been through this a few times with different standards and it’s easy to point out what the key success factors are to build a successful standard.  Here’s a quick iteration on those principals in my mind:

1. It’s important to keep the scope as small as possible.

The smaller the scope of the standard, the easier it is to understand the interaction of the various parts, what your goals are and what it takes to build an interoperable implementation.  It’s also the easiest thing you can do to remain as future-proof as possible.  It’s easier to add new APIs later if your scope is very very small.

2. Clear rules for interaction with the rest of content.

How does it work with the rest of the HTML spec?  CSS?  Video?  Images?  How can you copy content in and out?  Can you use them as textures?  These are just some of the questions that you have to raise as a way to describe how something like this might work with content.  Once again, this is gated on #1 above – if the functionality is simple then the interactions can generally be pretty simple as well.

3. Allow the scope to change slowly over time.

Understanding that technology – especially on the web – does not exist in a vacuum outside of time.  Standards do change over time and understanding how people use technology in the real world is the best possible way to understand how something should change and improve.  Understanding that standards are an iterative process is important.  Note that in #1 above – controlling scope – I mention that it’s important to keep things future-proofed via small and simple APIs.  This is why – because you know that you will need to improve that API once you understand how people are using it in the field.

4. Allow most of the innovation to happen next to and on top of your API.

Last point – your standard should allow as much iteration and work to happen on top of your API as possible.  This allows you to learn as much as possible about how people are using your software and gives them huge amounts of freedom to experiment and teach you about what you need to improve in the next iteration.  If people are stretching your APIs and finding gaps in performance, you can add convenience APIs to make things faster – as long as they are simple APIs.  We saw this in the real world with the JS libraries (dojo, jQuery) – we’ve been optimizing our engines and APIs over time to assist them as they have pushed our browsers to the limits.  But we would not have known had we tried to implement everything that the libraries could have possibly done at the browser level.

OK, so those are the things that we think make for a successful standards process.  I’ll point out one particular example of a dichotomy that I believe illustrates these rules so that people understand what I’m talking about: Canvas vs. SVG + SMIL.

Canvas is a very simple API (more info), much like what we’ve proposed to Khronos for 3D support.  It’s well-scoped, well understood and integrates very well with other web technologies.  And it’s been getting a huge amount of traction on the web.  People are writing all kinds of really neat technology on top of it, including useful re-usable libraries for visualization.  Have a look through Google’s own promotional site for Chrome – a huge number of them use canvas.  It has traction.  And we’ve gone through a couple of iterations – we’ve added support for text and a couple of other odds and ends once we understood what people were trying to do with it.

Now compare this to SVG and SMIL.  Each of those specs are multi-hundred page documents with very large APIs and descriptions of how to translate their retained-mode graphics into something that’s usable on the web.  (SVG 1.1 is a 719 page PDF.  SVG 1.2 Tiny is 449 pages.  The spec for SMIL is a 2.7MB HTML file.)  We’ve seen some implementation of SVG and SMIL in browsers, but it’s been slow in coming and hasn’t seen full interoperability testing nor any real pick up on the web.  The model for these specs was wrong, and I think it shows.

So I’ve spent some time talking about the context for standardization and what makes standards successful.  How does this related to our stuff or Google’s stuff?  Well, quite a bit actually.  If we want something that browser vendors can easily implement, we need to understand that context and what we’re trying to standardize.  Much of the work that Google did happened before browsers got as fast as they have, so there’s a good reason why they felt that they needed to implement so much of the code as native code and deliver it as a plugin.  Their API is a good example of what a scenegraph API would look like on top of Canvas 3D.  JS engines have gotten a lot faster since they started their plug-in and we think that it’s time that we start using them.  Hence a low-level API that we can build on.

There’s a lot of great stuff going on with 3D on the web.  We’ll be working with Google (and others!) via the Khronos group to try and standardize on a low-level API that browsers can support.  It’s going to be a really fun year and I’m happy that we’re working to drive the web forward.