Cracking the code

When idly browsing through the Daily Telegraph’s lists of 100 books and movies that defined the noughties (here and here, respectively) I couldn’t but notice a very interesting discrepancy. I had seen (if not always watched) 39 of the total 100 movies mentioned on the list but only read 2 out of 100 titles in books—one of those a rather bland novel that had me simply shrug my shoulders (The No 1 Ladies’ Detective Agency by Alexander McCall Smith) and the other a complete atrocity against literature and civilization in general (none other than Da Vinci Code).

Now this had me scratching my head in puzzlement.

It could be that my personal criteria for approaching books and movies are somehow very different—that I am somehow a lot less discerning when it comes to movies than I am when choosing which book to read next, and subsequently my taste for literature is a lot more snobbish and eclectic than my choice of movies. It is also true that while it takes seldom longer than 2 hours to see a movie there are relatively few books that can be consumed in less than a couple of days (unless you’re Harold Bloom, of course, in which case you can read The War and Peace in the same time that it would take to watch Shrek), and thus the decision whether to see a movie is more trivial, so to say, less of a commitment. It is quite possible to watch movies rather passively, spacing out at home or in an airplane, while reading a book takes at least somewhat conscious effort. However, eventually I don’t think that all this would get us very far in explaining the 39:2 score. It must be something else why I haven’t read 98 out of 100 books that define the decade.

This all got me thinking a bit more deeply about this curious thing we refer to as a “taste”—as in “taste for music” or “taste for certain kind of literature”. When I refer to “books that I like” (or those that I don’t), and if I want to get beyond a simple ostensive statement—then what qualitative set of features am I actually talking about? It is quite easy to specify my likes and dislikes when talking about particular books (movies, songs, paintings or whatnot) but it is not easy at all to actually put your finger on what precisely do I like about them.

As the things are, there seem to be two principal ways of doing it. First, functional way, would be to say that even if there is such a set of qualitative features then it really doesn’t matter. This is why we are comfortable asking our friends “read anything interesting lately?” and then heed the recommendation, assuming that a certain level of interpersonal compatibility would also translate into similar tastes—if they liked it then there’s a good chance that so do we. And there seem to be pretty good grounds to believe that this is true. Pierre Bourdieu has done some research on that topic and found out that, although we are nominally completely free to like or dislike whatever the hell we choose, our actual tastes are remarkably similar to those of the other people in our immediate social surroundings. We learn to like things not because of their innate qualities but simply because other people like them and thus our musical, literary or whatever other sympathies cluster together. This is, of course, the idea behind phenomenons such as J.K. Rowling Dan Brown and Stieg Larsson, the Beatles or Oasis. In the case of all those examples, their literary/musical merits (or respective lack thereof) matter a lot less than the fact that they are liked by so many. Incidentally, this is how recommendation engine works—it ultimately treats us as members of a crowd, even if the crowd in question is a pretty small and elitist one. In music, this is how the self-proclaimed internet music revolution operates, profiling you by the music you seem to like and then suggesting you artists from the playlists of people whose listening habits have a significant overlap with yours.

However, there are people who insist upon taking a different, ontological approach. Here’s an article on how Pandora, an alternative service to, works—it’s an interesting and, at least to me, quite a counterintuitive idea. is completely oblivious of the content it streams to you, differentiating between Britney Spears and Metallica only based on their different listener bases. Pandora, however, employs a bunch of specialists with PhD-s in musicology who pick every musical piece apart to its constituent parts and assign a numerical value to each of them. As a result, Pandora completely disregards who listens to (or even who performs) any particular tune and makes its suggestions based on what could be called a genome of the musical piece, a certain set of quantitative similarities rather than certain number of shared listeners. And with more than 6.5 million subscribers, they seem to be doing something right.

If this approach works with music then there should be no reason why shouldn’t we be able to similarly sequence the genome of literary works. If it is possible to quantify the level of “emotion” in a Jimi Hendrix guitar solo or tone of Fela Kuti’s voice then it should also be possible to score how “engaged” is Nabokov compared to, say, Kafka. And as it turns out, there are people crazy enough to try and do just that.

Apparently, a group of Swedish physicists undertook a study where they examined some of the formal properties of literary works by authors such as Thomas Hardy, DH Lawrence and Herman Melville, quantifying the rate of appearance of new words in their texts, looking for a distinctive pattern, sort of “literary fingerprint”. If you find this intriguing, then here is the article itself, but be warned that it is not a stuff for the squeamish—I once used to be reasonably good with maths but what I found in there had me recoil in terror.

Once all is said and done, the proofs have run their course and the mathematical dust has settled, the authors come to a rather interesting (and very Borgesian) conclusion:

These findings lead us towards the meta book concept—the writing of a text can be described by a process where the author pulls a piece of text out of a large mother book (the meta book) and puts it down on paper. This meta book is an imaginary infinite book which gives a representation of the word frequency characteristics of everything that a certain author could ever think of writing.

Now, if we take this insight and approach it he way Roland Barthes might—claiming that, in terms of what they say or mean, books are “read” rather than “written”—then it would follow that when it comes to literary taste, each reader also has his or her meta book, consisting of everything that this particular reader would like, or find personally moving and/or meaningful. A kind of reader’s fingerprint, a receptive literary genome, if you like.

I do wonder what mine would look like.


