Want to know how Apple's (AAPL) Genius song recommendation system for iTunes works? Apple engineer Erik Goldman offered up some insights to users of answer service Quora in a post back in May. While Goldman's post has since been deleted, Christopher Mims covered it in an MIT Technology Review story on Wednesday. Goldman's answer on Quora offered a sneak peak into the way big data analytics and aggregated personal information combine to personalize song recommendations and create custom content for iTunes customers. The Genius service boosts revenue for Apple, but insights into its workings could also benefit Web users as a whole.
Recommendation engines are the key to showing the entire Web on small devices, such as mobile phones, and to creating a hyperpersonalized surfing experience. For consumers, the Web has opened up billions of opportunities to find content, with much of it contained in the so-called long tail made famous by Wired's Chris Anderson. But mere mortals can't filter though all the possibilities to discover what the heck they want to read, watch, or listen to. Hence the popularity of recommendation engines and discovery services from such companies as Amazon.com (AMZN), Apple, Netflix (NFLX), and even Google (GOOG).
The heart of the Genius recommendation system is statistics applied to a large amount of data. The initial goal is to take an individual's playlist and measure the frequency of certain elements (such as the artist) and determine how significant that element might be in making a recommendation. To do that, the algorithms check the frequency of those elements in other Genius users' playlists to see which ones occur widely and which ones don't. This allows the system to compare playlists between people who like the same obscure bands rather than trying to draw conclusions based on the hundreds of millions of playlists that include Lady Gaga's Bad Romance.
The second element of figuring this out relies on assessing which rules the recommendation engine can apply to your playlist to reduce the amount of data it must cycle through—the so-called latent factors. Christopher Mims writes:
Latent factors are what shake out when you do a particular kind of statistical analysis, called a factor analysis, on a set of data, looking for the hidden, unseen variables that cause the variation in all the different variables you're examining. Let's say that the variability in a dozen different variables turns out to be caused by just four or five "hidden" variables—those are your latent factors. They cause many other variables to move in more or less lock-step.
Discovering the hidden or "latent" factors in your data set is a handy way to reduce the size of the problem that you have to compute, and it works because humans are predictable: People who like Emo music are sad, and sad people also like the sound tracks to movie versions of vampire novels that are about yearning, etc. You might think of it as the mathematical expression of a stereotype—only it works.
Track and share business topics across the Web.