Two Fridays ago, right in the middle of a brutal two weeks of travel, I stopped by in San Francisco and participated in a brainstorming session around embedding Mozilla into other applications. The raw notes are available in the image above, and are a bit blurry. I will try and give some context for them and try and give a quick overview of what we talked about in the meeting and what we think the next steps and priorities must be.
There were a huge number of people at the meeting. Behdad, Stuart, Vlad, Mark Finkle, Brad Lassey, Dave Camp, Doug Turner, Christian Schaller, Wim Taymans were all there, some in person, some on the phone, but everyone was able to contribute. It was a great meeting.
Use Cases
There are some use cases listed in the wiki but we spent most of our time talking about one of the use cases – embedding Mozilla into another application. We didn’t worry too much about building XUL apps or other extension systems. Mostly we’re worried about how to improve our story and developer experience around embedding and allow a vibrant community to develop around embedding.
Goals
From the image you can see that we had a few goals listed. These include:
Create a consistent story
One of our problems is that we don’t really have a consistent story around how to embed. Or at least, we have a story that’s hard to tell. Sometimes you use libxul, sometimes you use the win32 embedding widget, sometimes you use the gtk embedding widget, sometimes you have to reach down into internal interfaces to change things and some times you don’t. Having a single story around how to make use of the embedding APIs on your platform and in your environment is one of our goals.
Someone who comes along and wants to do this should be able to find a single location for documentation and examples and also what builds they should be using and what they have to do to ship the libraries along with their product.
Build a community of users and owners
This is probably the most critical goal of this effort. Right now there’s a huge amount of latent demand for Mozilla to have decent embedding APIs and support for this use case. The trick is enabling this community to coalesce around a particular effort and giving them the tools to be successful. This requires leadership in the Mozilla project and a single direction. We also need to create a place for embedders to work. I’m talking about code here – their own set of APIs and interfaces. There’s already dev.embedding and dev.platforms.mobile and other newsgroups where those people hang out but giving a sense of ownership to this crowd and letting them drive requirements and schedules is a very important factor for success.
Note that I call out users and owners separately. There is a much larger number of people trying to use Mozilla in their apps than those who would be willing to invest and drive development. But they are both very important and we need to make sure that we’re addressing both of their needs with owners/developers connecting with users.
No nsI* (An anti-goal?)
The use of nsISupports-based interfaces from embedding code has been problematic for a lot of reasons. First, it’s kind of a pain in the butt to use when concrete classes would, in many cases, do a fine job. They have proven to pretty fragile as well, API and ABI changes being the most obvious problems and hard to understand refcnt rules for casual users. And last, but not least, in the future we’re likely to be moving the nsISupports system from a refcnt system to garbage collection APIs. This will result in a huge amount of pain for people already using those interfaces and will require a lot of re-education of people who are using our current interfaces. We would like to isolate users from those changes if it’s at all possible.
So building something new on top of those interfaces, isolating users from changes and making things a lot easier to understand feels like the right goal. We suspect that we won’t be able to get away from it entirely but we should be able to get away from most of it. Certainly much more than what people are having to do today to be successful.
Predictable
We want a set of APIs that are predictable and useful to a huge number of people. And with regular, consumable releases and a roadmap as well. This is an obvious process-based goal that will come more with time and understanding more of our user base for the embedding APIs than a specific technical milestone to hit. So this is just something that we need to be aware of during the process more than an early goal. Something to aim for once we’re to the point where we have something that’s supportable.
Well documented
Someone should be able to pick up one of our regular releases and have a clear roadmap and technical documentation on how to use it. This means pretty easy to use APIs with clear boundaries for the interfaces and people who are interested in writing sample code and tests that show how the code should work.
Covers most use cases
See the notes above about nsISupports. We should be able to hit most use cases with this embedding API and not have to require most people to reach out of the API for common functionality. (Is 95% a good rule? 80%?) It’s hard to tell here what the metric will be for success, but we’ll know more once we understand what people want to do with the API and how useful they find the code that gets written.
A Stable API
One of the main complaints that we’ve heard from people that are trying to embed Mozilla is that our useful interfaces still change from time to time. (While also complaining at the same time that we don’t release often enough, which I find personally amusing.) We think that by creating a stable embedding API that’s based around what people need, as opposed to how Mozilla works internally, that we can create that stability that people require.
However, based on experience everyone agreed that it’s going to take us a while to get to a stable API. It’s nearly impossible to get the right API out of the gate. You just never get it right the first time. So we will have some iteration during early development and will start locking things down once we have a better sense of what people and what we’ll need to change internally once we understand about our user’s specific use cases. Stable API is a goal, but it’s a longer goal. The more that we have people help us understand and contribute code out of the gate the faster we will get here.
What would it look like?
There’s a quick sketch from the whiteboard that I put into a graphic.
A quick description of the various boxes:
- The pinkish “Embedding API” is new code. It’s not just hooking up to existing methods. This serves two purposes both technical and social. It isolates embedders from API changes and lets them focus on what matters: making Mozilla easy to embed into their app. And from the social side it gives embedders a place to work and code to own. This is something that’s important enough to call out.
- We want to keep as much of the embedding API as cross-platform as possible. That is, if you’re writing and app so that it works on a few platforms the amount of code that you need to write for each platform should be as small as possible.
- The pink “Platform APIs” section represents the minimum API required to get your platform up and running. This is roughly equivalent to the current gtk mozilla embedding widget. Making it possible, but not necessarily convenient to embed Mozilla into your app.
- The yellowish API boxes represent convenience APIs for each platform. For GNOME this would represent easy to use widgets and things like a friendly gobject DOM access API on top of the underlying Embedding API. For win32 this would be an ActiveX widget. On OSX this might represent a full framework.
- Application code has the option of working with the platform convenience code or can just reach down into the embedding API directly.
Areas of Work
We identified a huge number of areas of work to start on. These are roughly prioritized.
Part One
- HTML Viewer – Rendering a chunk of HTML. We already do this quite well, but it needs some nice interfaces around it.
- Setting up proper libs + headers – Need to have a decent headers + libs setup. We might be able to build on the current SDK work that’s in XULRunner for this but it’s important to make things consumable.
- Profiles – This is the big one that’s going to take a lot of work to fix. Our current profile interfaces are basically “here’s a directory, do what you want!” This has actually been great in one sense because we don’t need to deal with structured storage. But we would love to be able to run without a profile at all – that’s one of the main use cases around widgets and dashboard. And fixing that is going to be a decent amount slogging through the codebase.
Part Two
- Download a Page – Downloading a static page. This might be called “easy access to our Networking APIs” and is something that people have wanted forever and ever. Don’t need profile access for this. Should respect PAC and have a dead simple interface.
- Capture a Page (static) – This is basically capturing a page and rendering it to an image or surface. Pretty important for transitions for 3D environments that want to embed and render. But also only a first step.
- HTML to PDF – Point a URL at a simple interface and it spits out a PDF. Once again, this is something that people have wanted for a long time and is probably pretty easy to do given our cairo support.
- Extend JS Context – Lots of people want the easy ability to extend the JS context that’s used in content to enable all kinds of things. There’s a lot of work to be done here, but it shouldn’t be too hard. Just a set of convenience methods. Things that might use this include embedding apps that want to interact with the web in a different way or Dashboard-style widgets. (JS-ctypes, anyone?)
- Call into JS – There’s been a lot of interest in calling into JS that’s included in a page that’s been loaded into a document. We should make this pretty easy.
Part Three
- HTML Editing – Basically making it easy to build an easy to use HTML editor for inline small stuff as well as pages. We already have code that does some of this, it’s just a question of making things cheap and easy. See Profiles in number one above.
- Capture a Page (live) – This is a continuation of the capture a page listed above except that it should support dynamic updates. This is what Skyfire and others do today, but we should have nice interfaces to make it easy.
- Web Runtime / Dashboard Widgets – Every mobile vendor under the sun is building little local apps using bundles of web apps with extended JS attributes for access to things like GPS, Camera, Address Book info, etc. We’re actually in a fantastic position to support this as we already do something similar with XUL Add-ons and have a much more rich language for this than everyone else (XUL for the win again.) We just need to come up with a way to match that up with extending JS (see above) and a packaging system. Once again, borrowing heavily from Dashboard widgets or the work of others is important.
- UI Layout with Native widgets – This is something that WebKit does today and is pretty nice. (We already do some of this with XUL but it’s a little bit different.) Basically pulling in native widgets into the layout and using web-like semantics to lay out your application. Not even sure what’s required here and needs a lot more scoping. But it’s clear that people are using web rendering engines for rendering entire apps now, and not just in the XUL sense.
- Parse a document without display – I like to call this “use jQuery to parse HTML on your server.” Basically we should be able to allow people to download documents, parse them and have a full DOM set up without actually having it connected to a drawing surface. Could also be the basis for doing dynamic updating and page capture as listed above.
- DOM Interfaces (IDispatch, NPAPI) – This is an important item. Basically as an embedded app, how do you get access to the internal DOM of a page and manipulate it? Right now you can do this with our current interfaces but it’s one of the places where things break very often. We need to pick something stable based on the DOM and expose it in a stable, supported fashion.
- Networking: SSL Certs – Getting access to the certs database for an existing profile and also hooking up so that you can backend our current cert code with your own database if required. (Some people are doing the latter today.) Also having the right UI prompts in place so you can replace them with your own and not just accept things blindly as some have done.
- Networking: Cookies – Getting easy access to the cookie database or being able to replace our cookie backend with your own for storage. Also part of the first order profile work.
- Networking: Cache – Running without a cache (actually easy to do today by setting a pref!)
- Load Interception – If a document loads another sub-document (i.e. a script or an image or another page) you should be able to catch that event and load it yourself from the right resource. Once again, we think that WebKit does this today and I’ve personally seen requests for it quite a few times.
Part Four
- Full Networking Replacement – Kind of a crazy wishlist item. Dropping in your own networking stack. No idea if we’ll get here.
- Browser – And last but not least given all the tools above you should be able to use our embedding APIs above to implement your own browser without reaching too far into the internal interfaces.
Copy an API or build a new one?
I don’t think that there was anyone in the room that had an attachment to building a completely new API. Far from it, in fact. If something hits the goals listed at the top of this post and is able to leverage the work that someone else has done, that’s great. For example, copying the WebKit or the MSHTML/WebBrowser Control is certainly on the table. Probably copying in the sense of looking at the types of calls and functionality, not the specific style or variable names or anything. Some of those interfaces are probably pretty scary looking.
Note that trying to be a drop in replacement to WebKit or MSHTML/WebBrowser Control is not on the table. Therein lies madness. You end up chasing compatibility instead of just trying to make something that works really really well. But we can learn what works well from them and what doesn’t and hopefully apply that to our new embedding interfaces.
C vs. C++
This was kind of a side-discussion at the end and isn’t new to anyone. A few things to note here that came out of it, though. First, C++ has gotten a lot better over the years in terms of compiler sanity and stability. We’ve stayed away from it in the past in some places because when we started 10 years ago the compilers were pretty bad. (i.e. our C++ doesn’t look a lot like C++ – not a lot of stdc++ or exceptions or anything else particularly fancy.) Also, it seems like a lot of the ABI issues with C++ have gotten better as well. Both because of new test suites on Linux/UNIX but also because things have just gotten better with time. So a lot of the reasons not to use C++ have largely been mitigated with time.
C is a nice lowest common denominator. It’s easy to bind to other languages and it’s easy to build stable interfaces in it. But it’s also pretty dissimilar to the way that everyone else is embedding if we’re interested in leveraging other APIs and other experiences. Also, if there are people who want C apis it’s pretty easy to build them on top of what we’re looking to do.
Note that this isn’t decided yet, but I bet I can guess which way it’s going to go.
Next Steps
Out next step is pretty clear: We need to start writing up code and interfaces and start assigning ownership. Some of that will come from Mozilla but there’s a huge amount of external work being done on these items by a vast number of parties. Giving them a place to work and a place to post patches is a great first start. I’ll make a separate post about that once we have things up and running. But really I think we should stop talking and start coding and see where we can go.
To that end, we’ll be having another Mozilla Embedding Meetup in Mountain View on May 8th and 9th (Thursday and Friday.) It’s part of a Mozilla “work week.” Mark Finkle, Pelle Johnsen, Dave Camp, Vlad, Stuart and a bunch of other people will be around and will be hacking on some of this stuff. If you’re in the area and you’re interested come on by. It’s an open meeting. We’ll try and put together some more structured meetings with phone dial-ins, too. (I’ll do a separate post about this as well to give it more visibility.)
You can also join us on #embedding on irc.mozilla.org. It’s a quiet channel right now, but it’s starting to pick up steam. Once we start writing code it will probably start seeing more traffic. Lurk there if you want, we don’t mind.
More good stuff to come!


Wow… that was like reading the design notes from Spyglass in 1996 or so.. They had gone through the wringer with Microsoft and Mozilla and were trying to figure out where to go. It was pretty much the same ‘layout’ and such but a lot of the items were way ahead of the hardware/infrastructure at that time (having to craft their own widgets etc and build a TCP/IP stack because well the gasoline pump wouldn’t have it.)
Oh if you go for the network stack and such… make sure you have an intelligent agent to keep track of each of the sub-agents. In some ways you end up with a miniature OS to make sure that your various ‘stacks’ are operating correctly etc.
In the light of extending the JS context. You might also want to add having the ability to add new XML languages (like XFORMS). In other words adding XTF support to the Embedding API. Right now you’ll have to use the XPCOM Component Manager for that.
C vs. C++: You’re talking about embedding here, so in some case that might mean going backwards in terms of compilers/tools wrt the standard today. Also when you need to integrate from a completely different language, binding problems can be huge. So it would be good if those who for some reason really need it could avoid using c++, like having a c interface for platform code and c++ for embedding.
Pingback: Will’s Miro dev blog » Blog Archive » Miro hackfest in Boston
Pingback: Will’s Miro dev blog » Blog Archive » June 3rd chat — follow-up and some more answers
I have tried incapsulating some of the core DOM interfaces into STL/stdc++ like ones. It seems to work pretty well. I have iterates for collections, lists and everywhere I use std::string. Also I have tried to mimic the javascript naming and interface as much as possible. So I return values instead of just the error result and so on.
It is notcomplete but it is good enough for my purpose but perhaps the ideas could be used in this context. I also have had some wild toughts about whether itwould be possible to extens the XPIDL “compiler” to also generate those STL like interfaces automatically .
Martin –
Did you want to post that to the .embedding newsgroup?
Pingback: Between the Lines mobile edition
Pingback: PrankVids news collection
I’m not quite sure you’ll get this, but I’m getting desperate.
After much research I’m still at a loss for the best way to embed a web browser in a GTK application. I have a Gtk::Layout (GtkLayout in C) in which I would like to place a web browser with no end user navigation outside of following links within the page displayed (i.e. no toolbars, status bars, file menu, etc…). The web pages specifically are written using the ExtJS framework with PHP controllers.
I tried using Gecko, but I keep running into dead ends saying “Use XUL for all new embedding!” and then I find no way of embedding XUL in my GtkLayout. So I tried your gtkmozembed widget which is incompatible with Gtk+ 3 and I’d rather not rewrite your code for a simple web browser ( not even a full browser, just a window! ) Is there any chance you still maintain any of this, or know anyone who does, or know how to embed XUL into a gtk widget, or have ANY suggestions? If so I would greatly appreciated it.
I’ve posted a stackoverflow question with more details if you’re interested…
http://stackoverflow.com/questions/7771942/webkitgtk-wont-cant-find-post-information