Making Twitter Better

I think that twitter is a fantastic service and has a bright future. However, like a lot of new things, the question of whether it will flourish or perish is really all down how the growth is managed, planned and executed.

I should point out that I don’t know the people at twitter at all and is very much an outsiders opinion. I have been running a business for about nine years, and while it is of nowhere near the success of twitter, I’ve definitely learned some hard lessons. I am not complaining – I am however, voicing some ideas on how things could be made better.

My experience also includes working very closely with megabus.com, which grew from a fledgling website 6 years ago to what it is today servicing over a 100,000 visitors every day.

My gut instinct about Twitter is that the guys and gals are working hard to delivery one really good service really well. However, it is of a size now where service delivery should be happening in the background with little or no effort.

When megabus.com first launched and over the first couple of years, we spent a lot of time managing the hardware, software and processes till we got it right. It went through a dramatic re-architecture in 2005 and since then, the management time has dropped dramatically.

To take twitter to the next level so that it can be bigger than facebook, in my opinion, requires twitter to a lot of things:

Reliability & Performance

I don’t know the architecture / infrastructure of twitter but having used it fairly heavily over the last few days, have noticed intermittent outages. This has to be solved. Not just in the short term, but in the medium and long term. Twitter has to be a service that just works. All websites suffer glitches and outages but the mean time to failure needs to be a lot higher and it should be cheap and cost effective to scale.

TwitApplications

There are a lot of services and applications that link into twitter. I consistently use tweetburner, tweetdeck and have looked at / considered a range of other services / applications. While the wiki page can point someone in the right direction. This needs to be integrated better into twitter itself

Facebook really took off and removed bebo and myspace as competitors, in my opinion the day it introduced facebook applications.

It should be a different process from facebook as facebook applications are of a different breed and different target market. Twitter simply needs to make it easier for applications to integrate in to solve two problems

  1. Easy launchpad to add them in and use them
  2. Remove the need to provide the twitter username/password in other websites. I currently have to do this with tweetburner to post directly which makes me very uncomfortable.

Accessibility

I am not talking about makes it easier for people with disabilities to access the site. I am talking about people who are not technically savvy or more importantly twitter savvy.

I joined twitter a while back and just felt a bit lost. There was no guidance as to what a tweet was, what it meant to be a follower or what it meant for people to follow you.

It took an article on a magazine explaining it to make it easier for me to understand and re-boot my twitter life.

Help & Support are good and useful but it should not be necessary if the help and support is present throughout the site. Facebook does this well and makes it easy to learn and do new things. It does not need to be idiot proof but it does need to have just enough information for a newbie to get started.

There are numerous blogs, articles and websites that cover this information but that means that someone has to spend enough effort getting out there and finding out.

This can be difficult when you don’t know what you are searching for as well.

Functional Integrations

There are several integrations that would be useful. There are websites that do some of these things but it would be useful to have them integrated within the site. Examples include:

  • Easy way to see the last tweet of all the people you are following / your followers
  • Popularity of the people you are following / your followers
  • Group people, so that you can follow people who blog about different things but read them together

Conclusion

From my perspective, this is of course a starting point, the tip of the iceberg. Twitter is involved in a lot of new things but without the soft aspect, I think it is making its life harder than it has to be to get the masses.

Making Twitter Faster

From my perspective, Twitter has a really really interesting technical problem to solve. How to store and retrieve a large amount of data really really quickly.

I am making some assumptions based on how I see twitter working. I have little information about how it is architected apart from some posts that suggests that it is running ruby on rails with MySQL?

Twitter is in the rare category where there is a very large number of data being added. There should be no updates (except to user information but there should be relatively very small amount of that). There is no need for transactionality. If I guess right, it should be a large amount of inserts and selects.

While a relational database is probably the only viable choice for the time being, I think that twitter can scale and perform better if all the extra bits of a relational database system was removed.

I love challenges like this. Technical ones are easier 😉

If I didn’t have a lifetime job, I would prototype this in a bit more depth. Garry pointed me in the direction of Hadoop. Having had a quick look at it, it can take care of the infrastructure, clustering and massive horizontal scaling requirements.

Now for the data layer on top. How to store and retrieve the data. HBase is probably a good option but doing it manually should be fairly straightforward too.

From my limited understanding of twitter, there are two key pieces of functionality, the timelines and search.

The timelines can be solved by storing each tweet as a file within a directory structure. My tweets would go into

/w/o/r/d/s/o/n/s/a/n/d/<tweet-filename>

The filename would be <username>-<timestamp>

For the public timeline, you just have a similar folder structure, but with the timestamp, for example, the timestamp 1236158897 would go into the following structure as a symlink

/1/2/3/6/1/5/8/8/9/7/<username>

For search, pick up each word in the tweet and pop the tweet as a symlink into that folder. You could have a folder per word or follow the structure above.

/t/w/i/t/t/e/r/<username>-<timestamp> OR

twitter/<username>-<timestamp>

You would then have an application running on top with a distributed cache with an API to ease access into the data easier than direct file access. Running on Linux, the kernel will take care of the large part of the automatic caching and buffering as long as there is enough RAM on the box.

This can in theory be done without Hadoop in between and separating the directory structures across multiple servers but that can have complications of its own, especially with adding and removing boxes for scalability.

You are also likely to run into issues with the number of files / sub-directories limits but they can be solved by ‘archiving’ – multiple options for that too…

Thinking about this problem brought me back to the good old days of working on the search mechanism within megabus.com. We needed the site to deal with a large number of searches on limited hardware when the project was still classified as a pilot.

With some hard work and experimentation, we were able to reduce the search time to a tenth of the original time.

I’ll admit that I don’t know the details or the intricacies of the requirements that twitter has. I have probably over-simplified the problem but it was still fun to think about. If you can think of problems with this – let me know; I wanna turn them into opportunities 😉