the other side of twitter failures

Update: I just added twitter.com to /etc/hosts and pointed it at a site that doesn’t have a webserver. Works for now until twitter comes back.

Having some lunch and I thought it might be worth a small post while my burrito cools.

I just had to disable polling on whoisi because twitter is down. Again. Whiosi’s polling system, in case you were wondering is basically as dumb as a wooden post right now. I’m not trying to pretend that I thought it would work forever, nor that it was very good. But it works well as long as the internet is pretty healthy and the number of failures is evenly spread out among sites on the web.

Here’s how the poll system works right now for each site in the database: refresh every site every n minutes where n is a random number between 1 and 30. That’s it. And it does that for everything. No backoff, no per-site limits, etc. It’s easy to plug that kind of thing into the code, but it’s yet another thing on the “not yet done list.” Designed to be smart, but without the brains behind it.

You also have to understand how jobs are run. Jobs come from two sources, the “master service” (which I’ll describe in a later post) and the web site. But they all run through the same job queue. So when you try and add a new person it tries to go out and make a little preview of the site. That job has to compete with site refreshes that are also underway. The limit on the number of jobs that can be run at the same time is also dumb. Right now it’s 50 at once. Not 3/sec or 50 waiting for I/O, just 50 in progress.

So when you have a few hundred twitter accounts you’re polling and they fail by having to time out, the queue gets backed up. Given how many people are adding accounts right now I thought it would be good if the site interacted well instead of having things refresh instantly. It’s a tough choice but it’s how it is until twitter recovers from whatever its latest pain is.

I wish that twitter would fail by giving an immediate 500 or even a connection refused. The slow death of waiting for a response is basically the worst possible thing that can happen. Fail faster. Please.

Not that I should throw stones for even a second, given how dumb my code is. But just a lesson and what happens when a (dare I say important?) service dies.

  1. Sérgio Veiga’s avatar

    Hey Chris, congratulations on the great job you did with whoisi.com!

    We are also getting into the diso world and we one of our first ideas we wanted to do, is exactly what whoisi is :), great you did it first.

    One of the stuff we had think to our service, that we believe will fit perfectly into whoisi, would be a open-avatar api, yes like gravatar, but in the wiki style and not with emails but with names, nicknames, the power of alias that you implemented already :)

    We believe this would be a very cool service and since it would be completely open, will definitely be one more small step to create a real open data hub.

    If you are interested please contact us, so we can start immediately working on it :)

    P.S – You need to implement an api ASAP :)

  2. James’s avatar

    Twitter also returns 200 pages for errors instead of 500, which means I get crap instead of nothing, which confuses my RSS code no end.