Download this file
As visitor traffic wanes, the social networking site has turned to massively parallel database analytics to get inside the minds of its members
MySpace was one of the first social networks to penetrate
mainstream consciousness, and between 2006 and 2008 it had more members than
any other. In July 2005, the site was acquired by News International for $580
million, and celebrated its 100 millionth member soon after.
Since then, however, MySpace has been rather
eclipsed by rival Facebook. According to web traffic analysis site Compete.com,
Facebook attracts around 120 million unique visitors per month and is still
growing, while MySpace’s monthly figure of around 70 million visitors per month
is gradually falling.
Back in 2008, when Facebook first overtook its
visitor traffic, MySpace decided to focus on what made it unique – the number
of bands and singers that use the site to promote themselves and the fact that
visitors therefore use it to discover new music.
That places discovery – the ability for users to
find music they don’t know but may like based on their existing interests – at
the heart of its comeback strategy. “Facebook is about people you already know;
LinkedIn is about your professional contacts,” says Don Watters, MySpace's
chief data architect. “But MySpace is about finding people and music that you
may not know yet. Our motto is ‘Discover and be discovered’.”
To enhance the potential for discovery on its
site, MySpace had to become better acquainted with the tastes and behavioural
patterns of individual members. Analysing the precise tastes and social
connections of each user requires vast quantities of data – it’s not good
enough to take a sample of the data and to generalise the findings to all
users. “It’s incredibly important that we are not just looking at a set of the
data,” says Watters.
At the time, MySpace’s data warehouse was not up
to the task. “It had been built when we were growing like crazy, and the main
thing was keeping the servers up,” says Watters.
In its search for new data warehousing
technology, the company evaluated technology from vendors including Teradata
and Netezza. However, says Watters, “we didn’t think any of it could scale
according to our needs.”
Instead it turned to Aster Data, whose “massively
parallel” database technology is based on Google’s MapReduce distributed
analytics engine, and therefore operates on clusters of cheap, commodity
hardware. That, Watters explains, means that the cluster can be scaled up cost
effectively and in a granular fashion.
Today, MySpace’s data warehousing cluster
consists of 120 servers and contains 190 terabytes of data. As well as
supporting the music and content recommendations on the site, the deployment
has allowed MySpace
to introduce such functionality as audience analytics for bands and their record labels.
to introduce such functionality as audience analytics for bands and their record labels.
Clearly, however, it has not yet been enough to
stem the gradual decline in MySpace’s visiting figures. And while MySpace has
so far managed to generate greater advertising revenue than Facebook despite
having fewer visitors, that too is reported to be on the wane.
It is nevertheless a good idea to keep one eye on
the giants of the web as they wrestle with their data problems, because there
is every chance that mainstream businesses will be faced with similar
challenges before long.
SOURCE: http://www.information-age.com/channels/information-management/it-case-studies/1267988/myspace-taps-big-data-for-turnaround.thtml
No comments:
Post a Comment