Google has announced the availability of a plugin that implements 3D technology and makes it available over the web. You can read about the announcement in in the Google Code Blog and in an excellent article by Ryan Paul in Ars Technica.
Ryan points out that there are significant differences between what Google has built here and what we’ve built. I thought it might be worth it to expand on that a bit since it isn’t explained in depth in the Ars article.
Google’s 3D work is a plugin. So much like how Flash or Silverlight works you get a rectangle in the browser to draw into. They provide a high level scene graph API which uses the COLLADA format for loading objects underneath. It’s a very large chunk of code. If you take a look at the API and click around at the packages and classes you can see that there’s a lot there. Their use case is games and game-like things – virtual worlds. So it’s a great piece of work, but it’s also at a very high level.
Mozilla’s current proposal to Khronos is a very simple API that’s a wrapper around OpenGL ES 2.0. It’s currently available as an extension to Firefox 3.5 and is likely to be rolled into a version of Firefox after 3.5. The proposal is very focused on 3D. For example, we didn’t try to include video or audio because those are being covered by other web standards and we’re interested in making sure they are well integrated instead of trying to wrap those into a 3D spec. We’ve bound it to the canvas element so you can use it in much the same way you use the current canvas 2D context. Things like asset loading (via COLLADA or other systems) are things we haven’t dealt with because those can be handled entirely outside of the 3D api and layered on top of it. (Later in this post you’ll understand why this is important.) But the important thing is that it’s something that you can easily mix with the rest of the open web. Open Video and Audio, CSS, HTML, Canvas 2D, Canvas 3D, etc – you should be able to mix them all together and that’s our goal.
So these two 3D things from Mozilla and Google are pretty different. Not really competitive, either, because they have such different goals. The Google software is a very high level API 3D graphics API and what we’re proposing is more akin to the low level graphics API that those high-level systems are built on.
Given the title of the google blog post (“Towards an open web standard for 3D graphics”) it’s important to point out these differences since they affect how the standards process might look, and what the output might be. We’ve been through this a few times with different standards and it’s easy to point out what the key success factors are to build a successful standard. Here’s a quick iteration on those principals in my mind:
1. It’s important to keep the scope as small as possible.
The smaller the scope of the standard, the easier it is to understand the interaction of the various parts, what your goals are and what it takes to build an interoperable implementation. It’s also the easiest thing you can do to remain as future-proof as possible. It’s easier to add new APIs later if your scope is very very small.
2. Clear rules for interaction with the rest of content.
How does it work with the rest of the HTML spec? CSS? Video? Images? How can you copy content in and out? Can you use them as textures? These are just some of the questions that you have to raise as a way to describe how something like this might work with content. Once again, this is gated on #1 above – if the functionality is simple then the interactions can generally be pretty simple as well.
3. Allow the scope to change slowly over time.
Understanding that technology – especially on the web – does not exist in a vacuum outside of time. Standards do change over time and understanding how people use technology in the real world is the best possible way to understand how something should change and improve. Understanding that standards are an iterative process is important. Note that in #1 above – controlling scope – I mention that it’s important to keep things future-proofed via small and simple APIs. This is why – because you know that you will need to improve that API once you understand how people are using it in the field.
4. Allow most of the innovation to happen next to and on top of your API.
Last point – your standard should allow as much iteration and work to happen on top of your API as possible. This allows you to learn as much as possible about how people are using your software and gives them huge amounts of freedom to experiment and teach you about what you need to improve in the next iteration. If people are stretching your APIs and finding gaps in performance, you can add convenience APIs to make things faster – as long as they are simple APIs. We saw this in the real world with the JS libraries (dojo, jQuery) – we’ve been optimizing our engines and APIs over time to assist them as they have pushed our browsers to the limits. But we would not have known had we tried to implement everything that the libraries could have possibly done at the browser level.
OK, so those are the things that we think make for a successful standards process. I’ll point out one particular example of a dichotomy that I believe illustrates these rules so that people understand what I’m talking about: Canvas vs. SVG + SMIL.
Canvas is a very simple API (more info), much like what we’ve proposed to Khronos for 3D support. It’s well-scoped, well understood and integrates very well with other web technologies. And it’s been getting a huge amount of traction on the web. People are writing all kinds of really neat technology on top of it, including useful re-usable libraries for visualization. Have a look through Google’s own promotional site for Chrome – a huge number of them use canvas. It has traction. And we’ve gone through a couple of iterations – we’ve added support for text and a couple of other odds and ends once we understood what people were trying to do with it.
Now compare this to SVG and SMIL. Each of those specs are multi-hundred page documents with very large APIs and descriptions of how to translate their retained-mode graphics into something that’s usable on the web. (SVG 1.1 is a 719 page PDF. SVG 1.2 Tiny is 449 pages. The spec for SMIL is a 2.7MB HTML file.) We’ve seen some implementation of SVG and SMIL in browsers, but it’s been slow in coming and hasn’t seen full interoperability testing nor any real pick up on the web. The model for these specs was wrong, and I think it shows.
So I’ve spent some time talking about the context for standardization and what makes standards successful. How does this related to our stuff or Google’s stuff? Well, quite a bit actually. If we want something that browser vendors can easily implement, we need to understand that context and what we’re trying to standardize. Much of the work that Google did happened before browsers got as fast as they have, so there’s a good reason why they felt that they needed to implement so much of the code as native code and deliver it as a plugin. Their API is a good example of what a scenegraph API would look like on top of Canvas 3D. JS engines have gotten a lot faster since they started their plug-in and we think that it’s time that we start using them. Hence a low-level API that we can build on.
There’s a lot of great stuff going on with 3D on the web. We’ll be working with Google (and others!) via the Khronos group to try and standardize on a low-level API that browsers can support. It’s going to be a really fun year and I’m happy that we’re working to drive the web forward.
-
Is there a mailing list where the discussion at the Khronos group is taking place?
-
SVG is a big spec indeed. That and some ‘political’ things is why it took long to implement and see used around the web. Things are changing significantly.
Wikipedia has SVG all over the place. Google Maps and Google Docs use it.
Just to name some, more via my (SVG links) website (needs a big update).
Maybe you should take a look at SVG Open, it’s hosted by Google this year … -
Disclaimer: I work on the Chrome team at Google, but I don’t have any special knowledge of O3D.
The distinctions between “one is a plugin, the other will be in Firefox” seem a bit artificial to me. My understanding is that O3D was released as a plugin for the same reason Mozilla’s proposals were released as a Firefox extension: they allow you to play with it today. The hope of both groups is to make this simply part of the browser.
I also don’t know that “immediate mode versus retained mode” is very parallel to “the simplicity of canvas versus the complexity of SVG”. Perhaps that wasn’t what you were trying to imply; maybe you were simply saying “SVG is an example of not focusing on what the market needs now”. But you seem to be assuming that an immediate mode API is the only thing you need if you have a fast JS engine. Is that necessarily true? Perhaps the reason O3D’s API looks as it does is not because “without fast JS you need this” as much as “maybe some app developers just don’t want to write immediate-mode code in their JS”.
I guess I find “working…to try and standardize on a low-level API that browsers can support” to be fine as far as it goes but a bit exclusive of other APIs which might not be worse or less necessary. I certainly agree that any proposal which ends up as horribly bloated as SVG would be unfortunate. But it’s not clear to me that’s true with O3D.
-
(1) Plugin vs in browser — totally agree on the plugin prison comments. I think (hope?) everyone agrees plugin prisons aren’t desirable. It wasn’t clear to me whether you thought O3D expected to spend eternity in one, but I’m pretty sure they don’t.
(2) Retained and immediate — I am too clueless about rendering to provide a good technical answer up or down to your concerns. I definitely agree with the thrusts of “giant complex specs suck” and “don’t make something more complex than it needs to be”. I don’t _think_ retained mode necessarily implies the violation of one of those; hopefully some O3D person will be able to clarify how they view the surface area of the proposal and what tradeoffs they’re trying to make.
I don’t have a clue what the ChromiumKhronos interaction will look like; I have my head down on various bugs right now :).
-
I don’t know anything about the two 3d specs, but I’ll comment on SVG vs. canvas. Canvas is also a ‘rectangle’ in the browser. The fact that it has a different tag name and the browser supports the API rather than a plugin, is immaterial. SVG creates dom objects which I can manipulate, rather than just blank out and start again.
In that sense, I’d much rather a spec like SVG’s, complex or no. On the occasions we’ve used canvas over SVG, it’s mostly been related to better workarounds for IE being available for canvas.
-
Hi Chris, I just wanted to chime in here to clarify and a explain a few things on our end. Full disclosure: I work on O3D.
The biggest source of confusion here is that retained-mode APIs are usually considered high-level, but what we’ve actually built with O3D is a pretty low-level retained-mode API. We keep track of where objects are in a scene so we can do math-heavy operations z-sorting and culling in native code, but that’s a chore we felt most developers wouldn’t complain about not getting to do. Most higher level functions are left to Javascript. For example, file loading, which you mention as being baked into our Plugin/API, is actually performed in Javascript (we actually just posted about this on our blog), using a similar technique as C3DL provides on top of Canvas3D. We totally agree that there needs to be room for others innovate on top of, around, next to and everywhere else with respect to APIs!
That being said, there are a few higher level pieces that are accomplished in native code. For example, O3D has an animation and skinning system that are currently built in. We agree that to keep a standards process focussed, APIs should be as minimal as possible while remaining useful, and so we would likely keep things like that out of any first attempt at a standard and, as you say, let it evolve over time. But the usefulness question brings up an important, and we think, unresolved point. We’d love to build the animation and skinning system in JS, but we just couldn’t get a JS-based animation system fast enough — even on our retained-mode API. Javascript is getting faster all the time and we love that, but until someone builds some apps it’ll be hard to know what’s fast enough.
Standardizing a GL-like immediate mode API for JS makes total sense. It’s a well defined problem, lots of people know GL, and we think it will be useful. But some of the demos we wrote _already_ don’t run well without a modern JS implementation, and moving to immediate mode won’t help that (but we’d love to be proven wrong). That’s why we think it makes sense to explore both an immediate and a retained mode 3D, and make sure they work well together.
And that’s exactly what we’d like to do. Re: your plugin comments, we don’t like our prison either. When we catch our breath, we’re going to start looking at what it will take to get O3D and Canvas3D up and running in Chromium.
Henry
-
First of all thank you for very thought-out and well-versed opinion on the current state of 3D in browsers. Your observations on SVG and Canvas are right on the money too (disclaimer: I work on dojox.gfx and work with SVG/Canvas/VML/Silverlight on the regular basis).
But I want to clarify the retained mode vs. the immediate mode. I think that the problem is not stated properly. It is more about scene graphs and where to keep them: on the JS side, or on the browser side (e.g., in C/C++) and corresponding flexibility vs. performance trade-offs.
1) Different tasks (regenerating a picture, processing events, animating objects, and so on) may require different scene graphs. They can be traversed differently (e.g., in different order), organized differently (e.g., using a spacial index vs. a linked list), or processed differently (e.g., all invisible shapes/vertices are removed ahead of time).
2) A markup language (examples: SVG, HTML) != a universal scene graph. It can be used to derive it (to some degree), but it is not the same. The same consideration goes for storage formats.
Being a pragmatist I think that we do need scene graphs that are close-to-the-metal outside of JS, but their construction, and their behavior should be simple and well defined. It should not be a black box. If this problem is solved for Canvas/Canvas3D there is no stopping us (client-side developers) from exploding the market of rich/desktop applications.
-
Pingback from Ajaxian » Google and Mozilla 3D Round-up on April 27, 2009 at 7:00 am
-
Pingback from Google and Mozilla 3D Round-up | Guilda Blog on April 28, 2009 at 12:26 am
-
Pingback from WebDevGeekly » Blog Archive » Episode 11 on May 2, 2009 at 7:55 pm
-
Not sure what this means: “A markup language (examples: SVG, HTML) != a universal scene graph.”
Markup languages can represent anything and everything. If there is some other representation capable of representing a ‘universal scene graph’, then it can certainly be represented in a markup language. Maybe SVG and HTML aren’t it, but…


23 comments
Comments feed for this article
Trackback link: http://www.0xdeadbeef.com/weblog/2009/04/my-thoughts-on-googles-3d-experiment/trackback/