Archive for category testing
I enjoyed reading Sway: The Irresitable Pull of Irrational Behavior (Amazon , nice review). There are many cognitive biases that affect how we think. The authors did a nice job of distilling the research on cognitive bias into an accessible popular science book. The book made me think about how I approach web design and evaluation.
Traditional usability testing has its roots in psychology research methods. You spend lots of time designing the study — randomizing participants into experimental groups, ensuring you don’t ask leading or prompting questions, calculating statistical significance or confidence intervals of findings, etc — so that these cognitive biases are minimized or factored out.
Agile development typically feature 1-3 week sprints, forcing UX designers to shorten traditional evaluation methods, use guerrilla usability testing, or do whatever they can to get SOME user feedback in the time allotted. UX designers have been actively discussing how to integrate UX design into agile development teams (or search for “agile ux“). But in speeding up design testing/evaluation, we may become more susceptible to allowing cognitive biases to creep into and taint our study results.
There are two biases in particular I think we need to watch out for.
Confirmation bias is the tendency is a tendency to search for or interpret information in a way that confirms one’s preconceptions (Sciencedaily, or Nickerson 1998 if you’re more psychology-paper inclined).
Give me an example!
You’ve spent the last couple days iterating on an information architecture for a new site. You’re doing a quick evaluation of a paper prototype with three users in three separate sessions, to determine if the latest iteration is the best. You’re watching the users interact with the prototype, asking them to think out loud and asking open-ended questions to encourage them to talk. However, you’ve invested time and effort in this latest prototype: the stakeholders have approved it, and you either really like it or are sick of it and want to move on to the next thing. Confirmation bias might lead you to focus your questioning on behaviors that you expected to see, that confirm or validate your design, or discount some of the negative comments you hear.
Diagnosis bias is the tendency to label things based on our initial impressions, and our difficulty or inability to change our minds after that initial impression is made.
Give me an example!
You’re got some ideas for the next web2.0/cloud/service/mashup/[buzzword]*, and you’re doing user research to prioritize new feature development. A study participant comes in and says, “I don’t know what browser I use…I just fire up AOL to get on the internet.” Ouch, you think to yourself, how am I going to get any useful info from this yokel? This diagnosis bias might make you miss that while they don’t do much at home, their online activity at work makes them a perfect candidate.
How do we avoid biases in Agile UX?
The Sway authors take a small stab at answering this question in the Epilogue of their book, but their answers are a bit simplistic. I guess this is understandable. Humans developed these biases because they help us solve problems related to surviving in an unstable outdoor environment, and to do so in nearly constant motion (Brain Rules). Sometimes you need to make quick, simplifying judgments in order to survive or gain an advantage. So clearly there is no turnkey, 3-step process for overcoming them.
Obviously there is a need to find an appropriate balance between experimental design rigor and doing the least amount of work to get the most value.
- Be Aware of Cognitive Biases.
You can’t do anything about it if you don’t know about them. And you just read this post, so check this one off your list.
- Make a List of Your Assumptions; Reevaluate Assumptions Across Sprints
This is basically just trying to externalize your assumptions and biases. If you can put them out there, and make plans to revisit them over time, it might be easier to catch when they have clouded your judgment. Also, if you publicize your working assumptions it gives others a chance to critique them, or see if many people would draw the same conclusions.
- Think About How to Disprove Your Assumptions, Rather Than How to Prove Them
This goes back to your Research Methods class…it is difficult to objectively critique something if it is not falsifiable. I’m not saying you have to set up your null hypothesis tests. But you can change you design evaluation thinking from “how would I know this is good” to “how would I know if this is bad.” Try to identify 3-4 ways that would indicate something is wrong. If none of those things are evident, you’re more confident that you’re on the right track.
Russ Unger, Todd Zaki Warfel
Traditional barriers to UX research: Time and Money; customer/stakeholder perceived value, and the attidude that “We don’t have time and money to do this.”
Argue, “We don’t have time and money to NOT do this…” Spend a few hours doing some research, in order to allow you to make data driven design decisions
Each methods have strengths and weaknesses. It’s good to combine multiple methods
The Burrito Lunch
- Send out an email, if you fit a profile, come do this, and we’ll give you lunch
- Chocolate snacks are a helpful way to get people to fill out surveys
- Using social media, other tools to get people to give feedback. E.G. Twitter, Facebook
- Coupled with web analytic data
Man on the Street
- Simply just going out and asking people, note trends.
User Research: “You never ask the question you really want answered. If you ask the question you want answered, you’ll miss all kinds of rich information.”
User Research…one of the benefits of the agile UX methods is that you can bring prototypes to user research sessions: gives you access to users, do validation of current design, and research for future sessions.
Designing the Box
- Get people together, some Sharpies, paper, ask people to design the box the product, tool, service would come in.
If this was a COTS product, what are the key features that need to be front and center. As a UX designer, gives you insight into the thought processes involved in prioritizing features, etc.
Guidelines for asking better research questions
- Provide context: “Did you have coffee yesterday? How much coffee did you have yesterday? Was that a normal day? How about the day before that?” It’s our job as researchers to ask questions and identify trends, amounts, etc, rather than asking them directly?
- Helpful to start with a most recent or most memorable experience.
- Start broad and open ended
- Funnel and narrow questions
People want to tell you about their lives. If you can facilitate in a way that allows them to tell their own stories, people are willing to talk. Another way to help people talk: “I’ll share a story about me, then you share a story about you.”
The blog-drought over the last month has been largely due to a big project at work, which has now gone live! *
The project was basically an order form, with various smarts to filter questions based on customer type, adhere to company business rules, etc. In the process of designing and developing the form, and getting client feedback, we found we needed a means to track issues, bugs, and feature requests that was more robust than email. After quickly reviewing several issue tracking options (HP Quality Center, ActiveCollab, trac, Mantis). We decided to try Redmine, mostly because it seemed easy to install (it was), supports the issue management process for our needs, supports multiple projects and LDAP authentication out of the box.
I’ve installed Redmine on my local Mac Pro workstation, and was running it through the command line in Terminal. The problem with that, of course, is that if I close that Terminal window, or log out of my computer, I terminate Redmine and no one else can use it. Redmine allows you to run it as a daemon – an always running process – (ruby script/server webrick -d -e “production” ).
I think, however, the more ‘Mac Approved’ way to to this is to use launchd, which is Apple’s attempt to standardize long-running and/or timed processes. Basically they’re using launchd to replace a host of unixy tools like cron, rc.d, etc. An additional benefit is that it can also restart a job if for some reason it dies.
launchd looks for a configuration file in plist format in one of several places on your hard drive (see the launchd man page). That config file tells launchd which program to run, any program arguments, whether it should run continuously or on demand, how often it should run, and a whole host of other things. Here’s my plist file for redmine:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>KeepAlive</key> <true/> <key>Label</key> <string>org.act.communications.website.redmine</string> <key>ProgramArguments</key> <array> <string>ruby</string> <string>script/server</string> <string>webrick</string> <string>-e</string> <string>production</string> </array> <key>QueueDirectories</key> <array/> <key>RunAtLoad</key> <true/> <key>StandardErrorPath</key> <string>log/error.log</string> <key>StandardOutPath</key> <string>log/access.log</string> <key>WatchPaths</key> <array/> <key>WorkingDirectory</key> <string>/Users/Jerry/code/redmine/</string> </dict> </plist>
If I might paraphrase, that XML file tells launchd:
- this job will be called org.act.communications.website.redmine
- Run the process on system load (RunAtLoad=true key)
- If the process dies, bring it back to life (the KeepAlive =true key)
- Change the working directory to /Users/Jerry/code/redmine
- Run the program arguments (ruby script/server webrick -e production)
- send StandardOut and StandardError to the appropriate log files
By saving this file in one of the directories launchd monitors (in this case, /System/LaunchDaemons), launchd automatically reads the plist file and starts the program. Pretty straightforward, except that its not fun to write/edit XML plist files by hand. Not to worry, there are at least two launchd gui programs. I picked Lingon, mostly because I found it first.
* I’ve been a professional programmer/developer/designer for nearly 10 years. Almost all my work to date have been on non-public, internal applications for clients. This may be my first work that 1) is publically viewable, and 2) meant for a large audience.
Today we launched a new version of our National Career Readiness Certificate website (I think it looks great, even though I didn’t work on it so I can’t take any credit for it).
In the final minutes before launch, we received a request to remove one of the main links from the front page — I’m guessing there was a desire to remove words from the page, reduce clutter, etc. The problem was, our design team thought the link was important…it provided key information we thought the users would find helpful. This type of discussion is difficult to resolve because you’ve got paying clients saying, “we want this,” and you need to find some way to say, “no, I don’t think you really do” without offending anyone’s sense of ownership.
It helps to have some objective metrics or measures that helps move people from an opinion-based discussion to data-driven decision making. We happened to have one. About a month ago, I used CrazyEgg to generate a clickmap of the page. Of the 1000 clicks we recorded, 23% were on the link in question — twice the number of clicks as the second most popular link. That makes a strong case for the fact that the link is something the users are looking for and attracted to when they visit the site, and that it should probably survive the redesign. The client agreed, and the link survived. (We’re planning another set of clickmap measurements to confirm that its still important to have on the new site).
We were able to back up our UX design intuition with some hard numbers and some effective visualizations (clickmaps make pretty pictures that make an immediate impact on clients and stakeholders), to help the team make data-driven decisions about the site user experience. This is something I hope to continue to do in our organization.
About a month ago a wrote a simple web response timer, because I didn’t quickly find a tool out there that could do it.
I should have looked harder.
I finally found pylot — a python program for running HTTP load tests. Pylot does what I needed to do (calculate some statistics around page response time) and a whole lot more. Pylot is designed to benchmark/load test HTTP web services, so you can profile arbitrary URL’s with either HTTP GET or POST requests. Or, you can just give it a simple URL or two like I did. You define a simple XML file with your “test cases” (the URLs you want to profile, along with any parameters), and give it some runtime parameters (number of virtual users that will hit the URL, request interval, rampup time, whether or not to launch a GUI to watch the stats in real time).
Much simpler than what I did. And it generates much prettier reports and graphs, without the need for loading into a separate statistics program.
I’m doing some contract work assisting on some usability tests of a commercial recipe/cooking site. As we lead study participants through various parts of the site and ask them to complete various tasks, one thing we are attempting to capture is their expectations and desires for how the site should behave, the information they feel should be accessible, etc. In Usability/User-Centered Design/User Experience terms, we want to make sure the site information architecture and navigation matches the users mental model (the biggest design errors happen when there is a mismatch between users’ mental model and system model).
So we ask them questions like, “What do you expect to happen when you click that link, ” or “What do you think will happen when you move the mouse here, etc…” This is fairly typical line of questioning; I’ve used it on a number of similar studies.
But today I noticed something about how the users were responding to these questions which caused me to consider whether these questions are really returning valuable information. It seemed to me that users didn’t really have clearly defined expectations or opinions of what will or should happen when they click on a link. They wanted to just click the link to see what would happen. If the new page seemed to get them closer to their goal, great (Ed Chi or someone else at Xerox PARC would say, if the information scent grew stronger). If not, they were happy to go back to the previous page through whatever means were handy (back button normally, but they’d also use in-page navigation).
To use an economic analogy, it was as though clicking on web page links had become so cheap, easy, automatic, and reversible that people didn’t bother to invest the time, energy, and effort to decide what the link should do. It is cheaper to just click it, and if they didn’t like where they were headed, go back.
This may indicate that the process of asking people their expectations when browsing a website may not be a very valid way of testing site usability, because evaluating the expected or perceived value of a link may not something that people really do. It’s not the way that people use the web site. It may be better to find ways to capture user behavior, rather than their expectations (as has long been shown, what users say and what they do are often different) Do they follow the ‘shortest path’ to find a particular bit of information? What side paths did they take? There are lots of quantitative and qualitative usability/user experience metrics out there, but I suspect each particular project team will have to define the particular metrics of interest for their project.
I also wonder if users of a transactional type site, or users doing some type of significant task, might show different behavior. Say, for example, we’re talking about users of an internal company management application. They may make real expectation or value judgements on links as they go about company business. Who knows…
I wonder if there is some way to test this? Maybe evaluate usage of two types of sites — one business transactional, and one commercial informational. Somehow look at reaction times of deciding what to do next? If there is a difference, then I would expect to see longer reaction times on the business site than the commercial site.
As part of my new job as ACT’s User Experience Designer, one of my jobs will be to do some usability testing of the websites we develop. This is a brand new position to ACT, so in addition to educating people about what goes into User Experience Design/User Centered Design/Usability Engineering (the buzzwords change about every 5 years, it seems), I’m looking for a set of usability testing tools that I can use on projects. This week, I’m playing with Silverback, a Mac-only usability testing tool from Clearleft.
Silverback has two things going for it. Silverback itself is pretty simple. It uses the computer’s webcam to record the user’s face, records all the screen activity (highlighting mouse clicks with a pleasing graphic), combines the two and exports a video of the test session into a Quicktime .mov file. Second, at $50, it is crazy affordable.
So here’s how I set up my computer to do usability testing of a web site with Silverback. I have a G5 Mac Pro with 2 monitors, and an old-school external iSight.
- Turned off the second monitor
- I reset the monitor resolution to 1024×768, which is what our weblogs show to be the most common aspect ratio.
- I wanted to hide my desktop image, and all the cluttered documents on my desktop, so as not to distract participants. A nifty donationware app called Camouflage took care of that.
- I used Firefox as the browser for this usability study. I created a new, blank Firefox profile specifically for this usability study. To do this, you have to open the Terminal and invoke firefox with the commandline option –profilemanager. I set the home page of this new profile to the page under study. I added a couple bookmarks to the bookmark bar for a couple other pages I would ask participants to look at during the course of the study.
- Start Silverback. When participants arrived, I had them situate themselves in front of the computer and webcam, and started the Silverback recording session. Participants performed the tasks I asked them to do, and Silverback records audio, video, and on-screen activity.
- When they were done with the tasks, I stopped Silverback recording. I then asked them to fill out a post-test questionnaire, which I was able to quickly set up using a Google Spreadsheet Form.
- After the participant leaves, you can export the video file to .mov. This can take awhile — about 2-3 hours for the 1/2 test sessions I ran. File sizes range from 1-2GB per half/hour session (you can customize video sizes and other options that affect both size and speed of the export).
That’s basically it. And that’s the point. Silverback is designed to be a quick, inexpensive, fast, unobtrusive way to do usability testing, and I’d say they’ve made an excellent start. I’ve gone from zero to a fully-functioning usability testing workstation in the time it takes to download the software. If you’ve got a mac laptop with Apple Remote, you get to add bookmarks to the video so you can mark interesting events during the study.
I tried adding some feedback to their product customer service page, but I couldn’t get the site to work. Some things I hope to see in future versions:
- A preview of the audio levels…some way to make sure the user is speaking loudly enough, and that there isn’t too much background noise
- It would be nice to have some facility for pre- and post-test questionnaires, demographics, etc. Maybe keep a database of questions so that they can be resued in subsequent tests.
Anyway, congratulations to the staff at Clearleft. They’ve got my $50, and I look forward to future iterations of Silverback.