ICWSM Keynote: Jon Kleinberg – Meme-tracking, Diffusion, and the Flow of Online Information

Intersection of news media, technology, and the political process.  Modern SM technology is a disruptive technology, similar to radio/TV in the 20th century.  How does information transmitted broadly by the media interact with the personal influence arising from social networks?

SM erases difference between global and local influence, making more of a continuum.  Speed of media reporting increasing, contributing to a 24 hour news cycle.  “A Challenge to healthy discourse.”  Online media also adds complexity to how political info flows through social networks.

The dynamics of the global news cycle

Examined if the ‘news cycle’ is a metaphorical construct, or is it visible in data.  If it’s visible, can we measure it, describe it?  Used data from Spinn3r, looked at 1M news articles and blog posts per day, 20K sources.

What basic “units” make up the news cycle?  Need some aggregate of articles, vary over the order of days, and handles half-terabyte of data.  Look for “memes”, identify text fragments, phrases, quotes that travel through many articles.  They create a weighted, directed, acyclic graph of mutational variants, that delentes min total edge weight such that each component has a single “sink” node.  This problem is NP-hard, but can apply heuristics based on selecting a single edge out of each quote.  Produces a neat stacked histogram graph that shows the relative frequencies of stories related to a particular quote over time.

Use some analogies to describe temporal variations: eg species competing for a resources in an eco system, or biological systems that synchronize to favor a small number of individuals at any point in time.  A model to describe this might include: imitation term, recency term.

Found a 2.5 hour gap between peak intensity of the story in mainstream media, vs when it peaked in the blogs.

Can also use the data to find stories where blogs lead the media.

The spread of political messages through social networks

Might look at Chain-letter petitions as ‘tracers’ through global social network.  These are good because 1) they are viral – only get via email, 2) comes with its own tracer (signatures on it).  Can’t see the full tree, but copies get posted to mailing lists, which can be found by search engine.  So they can build a partial tree, compensating for the mutations in the signature tree.

It turns out genetic mutation analogies are good…all kinds of mutations happen (people erase names, put funny names in the middle, etc).

Built the tree from two chain letters, and it looked funny.  If we’re in a small world network (six degrees of separation), why is the tree very deep and narrow, like a depth-first search tree.  Why?  Possible timing effects, assuming that nodes act on messages according to some delay.

So we can make some initial analogies like mutation, biology.  But these are really complex, global phenomena, that require richer models and knowledge of human behavior.  Ideas from computing and online media will be crucial to the next steps.