Fitzgerald Steele

Usability, User Experience, Social Media, Web Design and Development…

Archive for September 2008

Python tip: Sorting lists by its contents

without comments

Not sure why this was so hard for me to use the sorted() method this morning.

I’m playing with the FriendFeed API, I wanted to retrieve the entries of a FriendFeed room, and then sort the returned entries by the number of comments…

Getting the latest 30 entries is a simple HTTP call.  FriendFeed returns a json structure which is nicely parsed with simplejson (soon to be part of the standard python library).

import simplejson
import urllib2

r = urllib2.urlopen(‘http://friendfeed.com/api/feed/room/science21′)
json = simplejson.loads(r.read())

# I just want to look at the entries in the room
e = json['entries']

I want to sort the entries by the number of comments.  In other words, I want to use the number of comments as a sort key.  Luckly, Python’s sorted() method has a key attribute, which takes a callable that returns a single value to be used as a key.  We use the operator module in order to generate the sorting key method:

ln = lambda x: len(operator.getitem(x,'comments'))
esorted = sorted(e,key=ln,reverse=True)

for i in esorted:
print i['title'].encode(‘utf-8′), len(i['comments'])

That took me all morning to get, which shows some of my Python programming limitations.  My first attempt, I tried this:

In [94]: es = sorted(e, key=len(operator.itemgetter('comments')))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/fitzgeraldsteele/<ipython console> in <module>()

TypeError: object of type ‘operator.itemgetter’ has no len()

Now I recall that an object must have a .__len__() method in order to work with len().  But operator.itemgetter() returns a function, not an object with a .__len__(), hence the TypeError exceptin I got.

Then I tried this:

esorted = sorted(e, key=lambda x: len(operator.getitem(x,’comments’)))

Which is very similar to the solution I ended up with, but this one returns the list in different order.  I’m not sure why yet. Update: Duh…we get a different order because the first one has reverse=True, and the second one doesn’t.

Written by fitzgeraldsteele

September 11, 2008 at 5:05 pm

Towards Evaluating Social Media for Scholarly Communication

without comments

There are A LOT of people and organizations that are looking at ways of using modern web technologies (web2.0, social media, collaboration, and other buzzwords as well) to enhace the creation, modification, and dissemination of their research and other scholarly work.  There’s even a conference going on right now discussing the matter (see some of the conference discussion on FriendFeed).  And it seems like every day there are a host of new tools, start ups, web sites to enhance collaboration, sharing, and communication between scientists.

There were so many web sites evolving so quickly, there was a call to figure out how to critically evaluate, compare, and contrast the tools.  One way to look at all the sites and tools is to examine how well they achieve the core goals of scientific communication * :

  • Registration of a new idea or claim to an individual or group of collaborators
  • Certification / peer-review of a claim
  • Awareness / access to the details of the claim
  • Archival of the claim
  • Reward for the registrant(s)

Imagine we could assign each site or tool a score along each of these goals.  We could then plot the cumulative score on a radar graph like this:

Radar Graph Visualization of Social Media for Scholarly Communication

This type of graph can help decision makers visualize how well different systems fulfill different goals of scholarly communication, how they are lacking, and overall what are the opportunities for development of future tools.

Note that the scores in the image are not at all rigorously determined.  I made up some quick estimates for a few sites, and compared them to made-up estimates for publication in a high impact journal such as Nature.  I made up the estimates based on the following loose criteria:

Registration: A contribution or claim can be attributed to an individual or a set of contributors, with a creation date time stamp and revision history.

Certification: A contribution can be rated by others.  There can be a few, influential raters (eg editorial board) or many (eg crowd sourcing/collaborative filtering).  Ratings can be anonymous or attributed.  Ratings can be meta-rated (eg the Slashdot moderation system).  Ratings can be simple thumbs up/thumbs down, or with comments/feedback.

Awareness/Access: Users can identify new contributions, as well as contributions that are relevant to their interests.  Awareness tools can range from passive (user must browse/search) to active (system recommendations based on clustering or collaborative filtering).  Entery and metadata for contributions are queriable/accessiblly by a publically documented API, open standard, or format.

Archival: A contribution can be identified and accessed by a single URI (possibly with multiple resource URLs — see Fielding’s thesis on REST).  Association to similar data via metadata are present.  Contributions are exportable into a documented open standard or format . Contributions will be available at the URI for the forseeable future.

Reward: A contribution counts toward professional career advancement, or standing within the academic discipline. (Obviously, social media is currently lacking in this area).

These criteria attempt to combine the traditional requirements for scholarly communication, with modern needs/expectations of web2.0 technologies and open science.  There is certainly room for refinement, and I’d welcome comments from the peanut gallery.

Hmm…maybe there’s room for a collaborative filtering type tool for science web2.0 tools.  People can rate sites on each of these, and other interesting criteria…

* See Roosendaal, H., & Geurts, P. A. T. M. (1998). Forces and functions in scientific communication. In . Retrieved July 25, 2008, from http://www.physik.uni-oldenburg.de/conferences/crisp97/roosendaal.html.  Also Van de Sompel, H., Payette, S., Erickson, J., Lagoze, C., & Warner, S. (2004). Rethinking Scholarly Communication: Building the System that Scholars Deserve. D-Lib Magazine, 10(9). Retrieved August 12, 2008, from http://www.dlib.org/dlib/september04/vandesompel/09vandesompel.html.

Written by fitzgeraldsteele

September 9, 2008 at 11:00 pm

New blog for my blatherings

without comments

Ok…I’m starting a new blog.  I needed a place to talk about some of my research/work thoughts.  Facebook didn’t seem right.  Neither did my home/personal/family blog.  So here we are.  Let’s get to it!

Written by fitzgeraldsteele

September 9, 2008 at 9:34 pm

Posted in Uncategorized