Archive for May 18th, 2009
ICWSM Session 3: Ranking
CourseRank: A Closed-Community Social System through the Magnifying Glass
This paper discusses a social-media course selection site for Stanford University. It combines official university course information, grade distributions, and course reviews with user generated comments, reviews, etc. Has a course planning/recommendations, course clouds to find courses related to certain topics.
85% of Stanford undergrads use the site, way more than open community sites.
Using Tranactional Information to Predict Link Strength in Online Social Networks
Analyzed the Purdue Facebook network. Generated different friend graphs for: Friends, Wall Posting, Pictures. The Wall/Picture graphs have a much lower InDegree/OutDegree than the Friends network. This may indicate that the wall postings may be a better indication of who your ‘real’ friends are. I thought it was interesting that people had, on average 21 people writing on your Wall, but you only write on 7 people’s Wall.
Used the ‘Top Friends’ application as ‘truth’ of who your top friends are. This paper compares 3 types of supervised learning algorithms, and four types of features to predict link strength through four separate experiments.
- Experiment 1: Found 12 of 15 top features are network-tranactional type features, with wall information used best.
- Experiment 2: Network transactional features had highest accuracy
- Experiment 3: Compared link type. Wall features had best accuracy. Picture information quite bad
- Experiment 4: Bagged decision trees had the best accuracy. 97% of performances comes from network transactional features
Network transactional features take into account transactions between person A to person B, moderated by # transactions A makes to everyone else.
RevRank: A Fully Unsupervised Algorithm for Selecting the Most Helpful Book Reviews
We use product reviews to make purchasing decisions. Many reviews (on Amazon) are repetative, limited contribution, poorly written, unnoticed (and, as we learned this morning, confusing or plagerized). Amazon has User Voting, which has some problems (imbalance vote bias, early bird bias, Winner Circle bias).
This work locates helpful reviews based on dominant concepts. Term Dominance is similar to TF-IDF.
Examined 12,000 reviews of 5 books. Compared algorithm to a human user vote and random sample.
RevRank did a good job of finding ‘helpful’ reviews, better than the other two conditions.
ICWSM Session 2: Psychology and Users
Does showing off help to make friends? Experimenting a sociological game on self-exhibition and social networks
Used the site http://socialgeek.com to gather data on how people would choose to portray themselves on a social network site. Showed them various types of pictures (provocative, standard, showing off, body immodesty) and asked whether they would use the photo as a profile picture.
Found a correlation between number of friends and the self-exhibition. They suggest that people may use ’show off’ type pictures to gain online friends.
Also found that people like to be friends with people like them (similar age, socio-economic session, etc). Except people in the study preferred to be friends with women.
I think this is sort of related to our study of online profile photos, but I’m not exactly sure how. This paper used a different personality scale, so I’m not clear how we can compare the two.
What Are They Blogging About? Personality, Topic and Motivation in logs
One way to categorize motivation to blog:
- Internal (documetning lfe, catharsis)
- External (Interests, Opinions)
Using the Five Factor Personality Model to make some hypotheses about personality. Did some text analysis on a blog corpus from BlogMetrics using LIWC text analysis tool, as implemented in TAWC. For bloggers high in these factors:
Neuroticism: self-therapy/catharsis – focusing on self and venting purely negative feelings.
Extraversion: Talk alot about themselves and other people. Use lots of 1st person, 2nd person, 3rd person pronouns. Used lots of positive emotion words.
Openness: Review/evaluation of leasure interests from personal perspective
Conscientiousness: Faithfully document life going on around them, references to others. Lots of talk about their job, people around them.
Agreeableness: positive self-talk focus, negative emotions and leisure activities avoided.
As part of the tutorials yesterday, I took a simple Five Factor Personality test, where I scored high on Agreeableness and very low on Neuroticism. I’d like to look at my blogs and see these findings describe my own behavior.
A Social Identity Approach to identify Familiar Strangers in a Social Network
Familiar Strangers: People you observe repeatedly, but do not know each other. In real life, people you see daily on the train. Online, similar blogging behavior, interests, but not on the same social network. It would be nice to find these people, to understand more niche interests, do predictive modeling and trend analysis, etc.
This is interesting because it focuses on trying to find and connect people with narrow, niche interests (the long tail of the blogosphere).
They use a Social Identify approach. People cluster contacts into meaningful groups. So we really propagate the search through relevant clusters of contacts. We limit the search space.
Used blog tags and content to generate a vector that describes a blog, and then calculated similarity using cosine adjacency. Clustered with k means. Compared their results against 1) exhaustive approach, 2) random approach.
Results indicate the Social Identify approach has accuracy between 80-90%, depending on the dataset — much better accuracy than random, but much faster than an exhaustive search.
This research assumes an egocentric search. You look first at people that are connected to you in the network. But that doesn’t seem realistic. I can find familiar strangers on sites like delicious.com or twitter via tag cloud, rather than searching first through my contacts. I asked the speaker this question. He suggested that his approach would be helpful in locating people near the cluster of people that use a particular tag, but not the precise tag.
You Are Where You Edit: Location Wikipedia Contributors through Edit Histories
Exploring the increasingly prevalent role of geography on the web. Allows geographically informed content retrieval, filtering. Potential invasion of privacy. Looked at Wikipedia Geopages – pages that correspond to a physical location in the real world, with lat/long coordinates.
This paper wants to know if we can characterize the location of the people who contribute to geopages. Used DBPedia to bootstrap finding the geopages. There’s a tradeoff in that wikipedia only collects single point, instead of an extent/area.
330K geopages. They want to find contributors with a large number of edits to geopages constrained to a small area (~ 70mi x 70mi).
Over half of contributors make most of their edits on 1-2 pages. Looked at 100 random user pages to determine motivation: most people live in that place, or where born there.
ICWSM Session #1: Community
Gesundheit! Modeling Contagion through Facebook News Feed
How do ideas spread on social networks? Some people say you have to get ‘influencers’ to spread the word. More recently, Watts and others are saying anyone can really start the propogation. Did a study on Facebook news feeds.
I thought it was interesting that one person doesn’t really start a chain of ideas. Rather, cliques kind of join together to create one large connected network. So its not really as important to influence the influencers. If you can get ‘the people’ behind your idea, the idea kind of propogates on its own.
Seeking and Offering Expertise across Categories
About a Chinese Question/Answer site, simlar to Yahoo! Answers, where users spend and earn points based on asking and answering questions (users offer a certain number of points in answering questions).
Community Structure and Information Flow in Usenet:
Improving Analysis with a Thread Ownership Model
This is another examination of idea diffussion through a network. This time on the Usenet network. This one is a much larger sample than the Gesundheit paper — 19.6M articles, (6.2M cross-posted) over 4.5 years (1/04-6/08)
Found a Power-law relationship between # authors and # edges.
Also looked at Reciprocity: percentage of mutual reply edges. Reciprocity is high in European groups
Similarity beween groups: Jaccard coefficient for cross-posts. # shared articles betwen 2 groups / total number of articles in group. Can also do Shared Authors. Drew some clusters, and network graphs (weighted edge on similarity)
Given those things, they propose an Thread Ownership Model – a way to define the portion of a reply thread that belongs to different groups (this helps address the cross-posting). They define Devotion of an author to a group: Percent of articles exclusive to a group.