Archive for the ‘programming’ Category
Learning Web Frameworks: Expression Engine, Acquia Drupal, Ruby on Rails
I work for a corporate website team. When our internal ‘clients’ want a new web project — new page, contact or registration form, major page redesign, etc — they download, print, fill out, and mail us a paper form. We recognized the waste involved with this paper form, and the irony that the website team does not have a web enabled project request form. We do most of our work in base PHP, but we’re also evaluating various web toolsets/frameworks/CMSs to adopt for more of our projects going forward. So we used this small form as a little case study. Let’s implement this form in a number of different tools. That way we get practical experience in these different technologies, and we can see first hand what attributes we’re looking for in a web framework.
Project Requirements
We met for a half-hour just to make our project requirements explicit. We documented the requirements in a story map, and made a quick wireframe:
- Client enters their contact details (name, department, phone, email)
- Optional: Pull the contact details from the enterprise LDAP server
- Client enters project details (name, description, budget code, deadline date, and whether the project has VP approval)
- On submission, show confirmation page and email confirmation to client. Also send email notification to our project managers that a new request has been made
- Members of our team may view the list of projects (especially budget codes, which we need for our time sheets), and see project details
- Project managers can add an estimate (number of hours we think the project will take), and estimate description
- When the estimate is created, automatically email an estimate to the client
Now, to build it…
Expression Engine
I got the sense that Expression Engine started its life as a blogging only platform, and someone said, “hey…we could probably use this as a general purpose CMS too!” I felt that if I wanted a blog or multi-blog site, EE would be great, but we clearly did not understand the “EE Way.” EE does not come with a form builder out of the box, but we found the FreeForm module which got us on our way. There’s no GUI form builder or admin panel; it requires a developer to build the form, which was a major downside for our team. At the end of a 3 hour sprint, we had an expression engine form, styled how we wanted, submitting project requests and sending the project request emails to our managers, but not a confirmation to the client. We didn’t get to any edit estimate functionality, and certainly didn’t have a way to email the client when the estimate was made.
We never really felt like we ‘got’ ExpressionEngine. It would definitely take a bit more time to figure out if this is the right tool for our CMS needs. The developers on our team felt like it would’ve been much simpler to just write the thing in plain-old PHP.
Verdict: With 4 people working for 3 hours, we got some minimal functionality, but we must be missing something.
Acquia Drupal
Ahh, Drupal. Apparantly THE open source CMS. Yet, to paraphrase Peter Parker’s uncle, with great power comes great complexity — and a steep learning curve. Acquia is a batteries-included Drupal distribution, with optional paid technical support, which is attractive to our company. Acquia includes the Webform module, which is a GUI, drag/drop form builder from inside the Drupal admin interface. While the team was at lunch, I built the form by myself. Furthermore, the Webform module provides a nice admin interface that allows for viewing all the results, updating entries, even downloading the submissions into a CSV file. One thing I really liked about the Webform module was that it comes with an area where you can post your own custom PHP code to be run after a submission…very cool. Another nice feature was the conditional emails: if the submission selects A, email to person A; selectB, email person B etc.
I had it in the default Acquia Marina template. I don’t think we really get how to use Drupal’s templating system yet, so I’m not sure how we would apply our own custom styles/templates. But I’m confident that we’ll be able to understand Druapl templating before Expression Engine internals (which is based on CodeIgniter — which by the way I really like as a web framework).
Verdict: The webform module is the bomb: 1 person, 15 minutes, done. Not sure how to bend Drupal templates to make it look how we want, though.
Ruby on Rails
I’ve been looking for an opportunity to try a non-blog/wiki type rails app. I checked out the Heads First Rails book from the local library, which taught me a lot.
It turns out Rails is tailor made for this type of job. Rails scaffolding gives you an Apple-esque out of the box experience. One command line basically got the full app functionality. Given the name of the database model, and a list of field names and types, rails generates a database model, and the controller and view code for basic CRUD functionality:
ruby script/generate scaffold project first_name:string last_name:string email:string phone:string project_name:string project_description:text date_needed:date budget_code:integer vp_approval:boolean
Rails just really gives lots of tools to do the things you probably want to do on a web app. Need to add/change the database schema after the initial generate? Migrations are your friend. Send emails that get triggered on code events? Use Active Mailer (ruby script/generate mailer) and you’re mostly done. The views/templating/partials makes sense. Scaffolding also gives you infrastructure for unit tests and acceptance tests (via the Cucumber behavior driven development framework).
So I was super excited about how quickly I could get something up and running. I call over one of our graphic artists and one of our project managers to give them a demo of how to create, scaffold, migrate, and configure a rails app. And I got mixed reactions. Yes, scaffolding is impressive, but you still have to be a bit of at techie/programmer to understand how to do it. I think they were a bit intimidated by the command line knowledge required, as well as the need to learn a new programming language. As I created a number of apps to show different parts of the generate/scaffolding script, the PM noted, “well, it looks like sometimes it’s easier just to build a whole new application rather than fix one.” I thought that was insightful…she recognized that the auto-generated code makes it really ‘cheap’ to build a new application.
Verdict: Convention over configuration is great, as long as you can learn the conventions. Once I started to learn the conventions, I could do a lot in a very little time.
Conclusion
At the risk of sounding cliché, you have to pick the right tool for the job, for your team. For this single, stand-alone project, we’ll probably roll with the Rails app. We can get that up and running super quickly. I don’t know that we would ‘bet the farm’ for our entire site on Rails just yet, though. Big learning curve for our department. We’re really looking for a content management system. We’ve got several instances of WordPress we use, so we’re used to that. Personally, I’m leaning toward Drupal. Yes, high learning curve, but it has a really flexible content model, lots of modules, an active developer community, support for workflow and fine-grained authentication/authorization. And, with Acquia you have commercial support.
But of course the door is still open, and we’re looking at other toys to play with. I’m going to try doing this form in Pylons, which looks heavily influenced by Ruby on Rails, with a couple advantages 1) written in python which I already know, 2) less opinionated about which tools you use at various points in the web stack. It scaffolds a rest style controller, but not the views, models, or tests.
Agile2009 Recap
I had planned liveblog the Agile2009 sessions I attended, but ran into a couple roadblocks*. You can see the Agile2009 sessions I DID blog. Also, Jackson Fox went ahead and wrote a great recap from a UX perspective. It might be more interesting for me to talk about how well I accomplished my goals for attending.
I had three goals for attending Agile2009:
- Learn methods, tools, and best practices for improving my user experience (UX) research and design,
- Help my department to adopt Agile practices in order to respond to increasingly demanding projects and shorter deadlines, and
- Learn ways to more effectively team with colleagues in other parts of our organization to develop and deliver outstanding UX on our enterprise tools.
I was a blend of several of the conference personas: obviously Deanna the UI Designer, but I also felt some affinity for Carlos the Internal Coach, Peter the Programmer, Alex the Architect and David the Developer, even Tara the Tester.
Agile UX Research and Design
I’ve already blogged about some of the UX sessions I attended. I would say that there were two broad categories of UX sessions (at least, that I attended): Agile UX methods and processes (guerilla user research, persona development, task analysis grids or storyboards) and experience reports (how do we do UX on an agile time schedule, with an agile development team). The method and process talks were very high level — almost introductory, and I felt didn’t really have enough time to get into any kind of real discussion on how to really use these UX methods in real life, anymore than reading about the techniques on the web . I did appreciate the 3 hr tutorials from Jeff Patton on Personas and Mike Cohn on User Stories. Coming out of those, I felt like I got a bit of ‘meat,’ which I could chew on — they gave a good framework for how to think about using these tools, instead of just a technique.
From the experience reports, the main message I got was, “here’s what sort of worked for us…we had to be agile and learn, and this is what happened in our shop…good luck with yours.” I had a hard time seeing how some of these were generalizable, or how I might use some of the lessons learned. Clearly, everyone is feeling the time pressure for faster deliverables within Scrum sprits. This means that UX practicioners have to be UX generalists: skilled in visual design, info architecture, interaction design, even development. I think the best takeaway was just networking with the people giving and attending the sessions, and finding a community of people I could talk to about approaches I’m thinking about taking.
I was a bit saddened and disillusioned at the perceptible distance between the UX and Agile Development community. The developers are still saying, “I’ve gotta get this code feature out the door, I’ll let you know when I need to talk to a designer), and the UX people are still asking, “how do I get the working respect and integration with the development team.” One concrete example of this: Todd Warfel and others did a set of three sessions on the whole ‘Agile UX process: Performing User Research, Distilling and Communicating Research Results (mainly via personas, and finally developing wireframes. I noted that the user research sesssion was sparsely attended and almost solely by UX people, the session on personas had a few more, and the session on wireframing was jam-packed. I think this is another example of developers trying to see how to get straight to the design, even though designers know there’s a process involved in getting to the design. Not sure how this dichotomy will play out. I don’t think anyone does yet.
Overall, definitely a positive experience, though. I was really impressed with the LiveAid stage…several UX designers did some agileux methods (guerilla user research, personas, wireframes), then teamed with several agile developers at the conference to create an iPhone app for a non-profit: http://www.manoamano.org. In addition to raising nearly $5000 for the charity at the closing banquet, the exercise of doing UX research and design, development and shipping a live app in 3 days was inspiring.
Agile in our Department
At my company the webteam is undergoing a bit of an evolution, where are roles are changing from simply maintaining the public static websites to designing and developing some mission-critical web applications. Our internal clients are asking our team to provide new (to our company) types of online interactivity and functionality, and the demand for us to be a mini-IT department are increasing. What agile tools/options should we look at in order to keep with the demand and still deliver outstanding webapps?
One way I think we can improve is to better integrate unit testing, acceptance testing and continuous integration into our team workflow. I went to a couple talks on CI – I think mostly by CI software vendors.
I went to a couple agile coaching sessions to get a sense for how to ramp up teams, and get people started doing things an ‘agile way.’ I was impressed by some of the coaching start up team strategies shared Lyssa Adkins. The session on user stories from Mike Cohn was again a great example of how to collect/gather/uncover requirements. In the OpenJam, I poked my head in on some people review Kanban project management.
One takeaway here is that ‘agile software development’ is more of a mindset change. It’s a set of principles, from which several methods, tools, and best practices have been developed. But as warned on numerous occasions, just adopting some or all of the methods without understanding the principles often leads to poor results. I think the key here is to focus on the core principles, and adopt the set of practices that make sense for your team wherever they’re at. I like the idea of the Scrum style — setting a sprint timebox for a delivery of something of business value, and then meeting afterwards for a retrospecting and planning of the next sprint. I’m planning on experimenting with a WIP board for a short sprint on a project this week.
Another takeaway is that, at the end of the day, all these agile documentation and project management methods and tools (user stories, personas, work in progress boards, pair programming, Scrum, Kanban, Blitz Planning) aren’t really rocket science. They all center around creating externalized, shared visualizations of the problem and solution spaces. The goal, it seems, is to extract the implicit knowledged wrapped up in each team members’ head as simply as possible, put them in a shared accessible place as simply as possible. This allows the whole team to see the whole picture, and allows communication and ideas to flow as quickly as possible. That’s why agile emphasizes small, co-located teams over large distributed teams — fewer barriers to communication. That’s why agile emphasizes user stories on index cards, simple personas, and face to face communication over documentation — its easier to exchange ideas and information, to re-arrange and update the knowledge-base. Agile emphasizes working code as documentation because the working code is a clear concrete thing that the entire team can gather around and say, “yes, this is right” or “no, it should be some other way.”
Agile in our Enterprise
I’m going to hold this for a future blog post…this one is long enough as it is.
Other Conference Observations
- I thought I’d be hearing lots of people at this conference saying, “do it in Rails” or “Django” or <name your new-fangled web framework>; I was surprised at all the java and .NET technical talks.
- I loved the Musick Masti sessions over lunches and at night. I got to go down a jam on a sax they had there, as well as play various percussion and other instruments. That was a lot of fun. That seemed to be a theme the conference organizers were going for…things should be fun! The Monday night social/networking event was a good example — various wiis and other games were strewn about the vendor hall providing opportunities to interact with other people through games.
- SO…MUCH…SWAG. Wow…I brought home almost a whole suitcase full of free books, balls, silly putty, race cars, planning poker cards, t-shirts.
* 1) dodgy conference wi-fi access. You’d think ubiquitous wi-fi access would be a given at a techy-geek conference like this. It was great in the Open Jam area where people could congregate and talk about whatever, but it wasn’t as good elsewhere, 2) my Airport wireless card has been giving out for the last few weeks, and finally died at the conference. Luckily, the Chicago Apple Store is not far from the conference venue — they sent it in and fixed it for free, even though my AppleCare expired in April (I love Apple’s customer service, and the fact they know that I’m pro
Find the Latest SVN Repository Tag with Ruby
Here’s a quick and dirty (and I mean DIRTY – use at your own risk) way to extract the latest tag from a SVN repository. This is a helper method I wrote for a Capistrano deploy script:
# Quick and dirty means to pull the latest tag name from an SVN repository
# TODO: Error checking - it should have some
# TODO: Check to see if the repository exists before running code
# TODO: Return nothing if the tag doesn't look right (eg, the tag name is 'tags')
# Given an SVN repository URL, return the name of the latest tag in the repo/tags directory.
# * Assumes the standard SVN setup: trunk/, tags/, branches. It will append the /tags/ directory to the end of the URL
# * Does ZERO error checking...returns whatever it finds.
def get_last_svn_tag(repo)
txt = `svn log #{repo}/tags/ --limit=1 -v`
# thanks to http://www.txt2re.com/ for this
re1='.*?' # Non-greedy match on filler
re2='((?:\\/[\\w\\.\\-]+)+)' # Unix Path 1
re=(re1+re2)
m=Regexp.new(re,Regexp::IGNORECASE);
if m.match(txt)
unixpath1=m.match(txt)[1];
puts "("<<unixpath1<<")"<< "\n"
latesttag = File.basename(unixpath1)
puts "Tag name: "<< latesttag << "\n"
return latesttag
end
end
tag = "get_last_svn_tag("http://some/repo")
puts "Tagname: #{tag}"
User Stories
Mike Cohn – (slides)
Software requirements are a communication problem; balance is critical. If business side dominates, functionality and dates mandated with little regard for constraints. If tech dominates, we make them speak technical jargon and we lose the business need, drivers.
We cannot perfectly predict a software schedule.
So…we make decisions based on the information we have, but do it often
We spread decision-making across the project, rather than making one set of decisions…
Stories
Stories are (Three C’s, Ron Jeffries)
- Card (Note card) – Most visible part
- Conversation – Promise from dev team to product owner: “We will come talk to you before we start”
- Confirmation – Acceptance criteria
Short story about a system feature told from the perspective of a user.
As a …<user> I want to…<goal> so that…<reason>
Sometimes, you need more detail about a story. You can:
- Create new, smaller, more specific stories.
- Define ‘conditions of satisfaction,’ which are really acceptance tests. “What does the product owner need to see so that we can know this story is done?” This basically becomes the ’script’ for the Sprint review.
User Roles
Broaden scope from looking at one user
- Allow users to vary by:
- What they use the software for
- Hwy they use software
This is different than a persona, where personas are about a specific user (based on research), designed to induce empathy in the design team. User roles are more broad, describe types of users.
Can do user role brainstorming to identify different roles:
Thinking about your product, everyone writes a role on an index card.
- Brainstorming, no judgement on what the roles are
- Put related roles near each other
- Combine, consolidate, remove
Advantages of using roles
Avoid saying “The user”.
Can also have system and programmer users. “As a payment verification system, I want all transactions to be well-formed XML…”
Stories, Themes, Epics
User Story – Description of desired functionality told from user perspective
Theme – Collection of related user stories
Epic – Large User Story
Themes, Epics don’t really imply size…Themes aren’t necessarily bigger or small than Epics. They’re just labels.
New Python File — TextMate Template
Textmate has the ability to create new files based on a template you create. It had a few templates for Python, but nothing exactly like I want.
When I start a new python script or module, I want to:
- Follow the Pythonista style
- Parse some command line arguments – usually an input file
- Enable logging (either file based, or to the console)
I created a new TextMate template to do those things.
- Select Bundles > Bundle Editor > Show Bundle Editor
- Select and Open the Python Bundle
- Find the Python Bundle Templtes
- Copy one into a new Template. Give it a sensible name
- Replace the template.py text with the following…
#!/usr/bin/env python
# encoding: utf-8
"""
untitled.py
Created by Jerry Steele on 2009-08-12.
Copyright (c) 2009 ACT. All rights reserved.
"""
import os
import sys
import logging
import optparse
LOG = None
def process_command_line(argv):
"""
Return a 2-tuple: (settings object, args list).
`argv` is a list of arguments, or `None` for ``sys.argv[1:]``.
"""
global LOG
if argv is None:
argv = sys.argv[1:]
# initialize the parser object:
parser = optparse.OptionParser(
formatter=optparse.TitledHelpFormatter(width=78),
add_help_option=None)
# define options here:
parser.add_option("-f", "--file", dest="filename",
help="read data from FILENAME")
parser.add_option("-v", "--verbose", dest="verbose", default=False,
action='store_true', help="write debug log to FILENAME")
parser.add_option("-L", "--log", dest="logfile", help="write debug log to FILENAME")
parser.add_option( # customized description; put --help last
'-h', '--help', action='help',
help='Show this help message and exit.')
options, args = parser.parse_args(argv)
# check number of arguments, verify values, etc.:
# set up logging
if options.verbose:
LOG = setlogging(options.logfile)
if not options.filename:
pass
#LOG.error("Input filename not specified")
#parser.error("You must supply an input file")
# further process settings & args if necessary
return options, args
def main(argv=None):
settings, args = process_command_line(argv)
# application code here, like:
# run(settings, args)
return 0 # success
def setlogging(logfile=None):
consolelevel = logging.DEBUG
logger = logging.getLogger(__name__)
logger.setLevel(consolelevel)
# create formatter and add it to the handlers
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(consolelevel)
ch.setFormatter(formatter)
# add the handlers to logger
logger.addHandler(ch)
# create file handler which logs error messages
if logfile:
filelevel = logging.ERROR
fh = logging.FileHandler(logfile)
fh.setLevel(filelevel)
fh.setFormatter(formatter)
logger.addHandler(fh)
#test logging
logger.debug("debug message")
logger.info("info message")
logger.warn("warn message")
logger.error("error message")
logger.critical("critical message")
return logger
if __name__ == '__main__':
status = main()
sys.exit(status)
Even if you don’t use Textmate, you can still use this to quickstart python modules. Just remove/replace the $TM_ variables
Mac Text Editors…TextMate wins!
Awhile ago a talked about choosing a text editor for web coding (html, javascript, php, python). I’ve been using both BBEdit (for which our company has a site licence) and TextMate (30 day trial, but it seemed like it reset itself when time ran out).
I liked TextMate so much better than BBEdit, I actually went and bought my own personal license for TextMate (which they’ve said I can use at work). It just seems to fit me better. The main advantages for me are:
- smart quoting, brackets, parens, etc. Highlight a section, press left-paren, and it automatically puts the right one in the right place. This is a huge time saver for me.
- Fancy html hotkeys. There are all kinds of nice shortcuts for editing html that seem to really make the process go faster. My favorite is ^-Shift-W, which wraps the current selection in a tag.
- Excellent command completion
- Nice integration with SVN (via the Subversion bundle)
- Run (Python, PHP) scripts right from the editor.
- Cmd-R to reload the current page in all browsers that are currently running on your computer
These are things I used everyday to really speed my workflow, and BBEdit wasn’t able to keep up.
One feature BBEdit has that TextMate (appears to) lack is the ability to open/save files on a remote SFTP server. It is sometimes nice just to open and hack away on a file on a remote server. [UPDATE: In the comments, I learned that CyberDuck FTP client allows you to simulate using TextMate for files on remote servers.] The closest I could get with TextMate was to use MacFuse to mount the remote server as a local filesystem, and then use TextMate to edit the file. However, it seems like TextMate and MacFuse don’t play nicely together — the connection always hung and I couldn’t get it to work.
Launchd example: Start web server at boot-time
The blog-drought over the last month has been largely due to a big project at work, which has now gone live! *
The project was basically an order form, with various smarts to filter questions based on customer type, adhere to company business rules, etc. In the process of designing and developing the form, and getting client feedback, we found we needed a means to track issues, bugs, and feature requests that was more robust than email. After quickly reviewing several issue tracking options (HP Quality Center, ActiveCollab, trac, Mantis). We decided to try Redmine, mostly because it seemed easy to install (it was), supports the issue management process for our needs, supports multiple projects and LDAP authentication out of the box.
I’ve installed Redmine on my local Mac Pro workstation, and was running it through the command line in Terminal. The problem with that, of course, is that if I close that Terminal window, or log out of my computer, I terminate Redmine and no one else can use it. Redmine allows you to run it as a daemon – an always running process – (ruby script/server webrick -d -e “production” ).
I think, however, the more ‘Mac Approved’ way to to this is to use launchd, which is Apple’s attempt to standardize long-running and/or timed processes. Basically they’re using launchd to replace a host of unixy tools like cron, rc.d, etc. An additional benefit is that it can also restart a job if for some reason it dies.
launchd looks for a configuration file in plist format in one of several places on your hard drive (see the launchd man page). That config file tells launchd which program to run, any program arguments, whether it should run continuously or on demand, how often it should run, and a whole host of other things. Here’s my plist file for redmine:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>KeepAlive</key> <true/> <key>Label</key> <string>org.act.communications.website.redmine</string> <key>ProgramArguments</key> <array> <string>ruby</string> <string>script/server</string> <string>webrick</string> <string>-e</string> <string>production</string> </array> <key>QueueDirectories</key> <array/> <key>RunAtLoad</key> <true/> <key>StandardErrorPath</key> <string>log/error.log</string> <key>StandardOutPath</key> <string>log/access.log</string> <key>WatchPaths</key> <array/> <key>WorkingDirectory</key> <string>/Users/Jerry/code/redmine/</string> </dict> </plist>
If I might paraphrase, that XML file tells launchd:
- this job will be called org.act.communications.website.redmine
- Run the process on system load (RunAtLoad=true key)
- If the process dies, bring it back to life (the KeepAlive =true key)
- Change the working directory to /Users/Jerry/code/redmine
- Run the program arguments (ruby script/server webrick -e production)
- send StandardOut and StandardError to the appropriate log files
By saving this file in one of the directories launchd monitors (in this case, /System/LaunchDaemons), launchd automatically reads the plist file and starts the program. Pretty straightforward, except that its not fun to write/edit XML plist files by hand. Not to worry, there are at least two launchd gui programs. I picked Lingon, mostly because I found it first.
* I’ve been a professional programmer/developer/designer for nearly 10 years. Almost all my work to date have been on non-public, internal applications for clients. This may be my first work that 1) is publically viewable, and 2) meant for a large audience.
Even Simpler Web Response Testing in Python with Pylot
About a month ago a wrote a simple web response timer, because I didn’t quickly find a tool out there that could do it.
I should have looked harder.
I finally found pylot — a python program for running HTTP load tests. Pylot does what I needed to do (calculate some statistics around page response time) and a whole lot more. Pylot is designed to benchmark/load test HTTP web services, so you can profile arbitrary URL’s with either HTTP GET or POST requests. Or, you can just give it a simple URL or two like I did. You define a simple XML file with your “test cases” (the URLs you want to profile, along with any parameters), and give it some runtime parameters (number of virtual users that will hit the URL, request interval, rampup time, whether or not to launch a GUI to watch the stats in real time).
Much simpler than what I did. And it generates much prettier reports and graphs, without the need for loading into a separate statistics program.
Using Regular Expressions and Generators to Tokenize a File
Generator Tricks for System Programmers really opened my eyes to the utility of python generators. Yesterday I took an opportunity to use one, and bone up on my regular expression ninja skills to boot.
I received an email from a colleague containing some references to reports our company had written, so that we can post them in appropriate places on our website. He clearly went through some pains to organize the information in a very human-readable manner:
CATEGORY 1 Report Title1 (report type year) Report Title2 (report type year) CATEGORY 2 ...
Which is great. The trouble is, I also need the existing URL associated with each of these reports. And it may make sense to pull this list from a database, so I’d like to treat each one of those as a row with certain attributes:
NAME | TYPE | YEAR | CATEGORY
Oh sure, I could copy the text into Excel and put everything into columns by hand, but where’s the fun of that? Python to the rescue!
Python generators are kind of like a list, but without the list. Like a list, it is a sequence of things (ints, strings, other lists, objects, etc). Like a list, you can iterate through each value of the generator. But instead of storing the entire list in memory, it evaluates some function to generate the next value. This means that generators can more efficiently deal with very large sets of inputs. You’re not loading the whole input set into memory, instead you just ask for the ‘next’ value, and the generator decides what to spit out. It might mean reading the next line of a file, adding or modifying an object, or calculating the next value in the Fibonacci sequence.
Check out the System Admin’s Guide to Generators for a more full explanation, or look up the documentation on the yield statement. Python also has a nice shorthand for generator expressions, that is very similar to how list comprehensions are done. It often leads to pretty clean, readable code. Here’s the key section of code from this example:
regex = re.compile(r"(?P<name>.*) \((?P<type>.*) (?P<year>\d*)\)")
with open ('researchreports.txt') as infile:
lines = (l.rstrip() for l in infile)
matches = ((regex.search(l),l) for l in lines)
newline = (matcherfunction(m) for m in matches)
First, we define the regular expression used to parse the line, and extract the report name, type, and year. The next four lines do the actual work:
- Open the file for reading
- From the open file, spit out each line, stripping off whitespace from the end
- For each of those lines, run the regular expression. Spit out a tuple of (Match Object, original line).
- For each of those tuples, run the matcher function, which spits out the tuple (name, type, year), or the original line in the event where the original line wasn’t in the right format.
That’s basically the end of the magic. The rest is just writing out to a csv file. Python’s CSV module to the rescue. Here’s the whole code in case you’re interested.
#!/usr/bin/env python
from csv import writer
import re
def matcherfunction(m):
"""if we have a MatchObject, return the parsed output. if not, return the original line"""
if m[0]:
return (m[0].group('name'), m[0].group('type'), m[0].group('year'))
else:
return m[1],
regex = re.compile(r"(?P<name>.*) \((?P<type>.*) (?P<year>\d*)\)")
with open ('researchreports.txt') as infile:
lines = (l.rstrip() for l in infile)
matches = ((regex.search(l),l) for l in lines)
newline = (matcherfunction(m) for m in matches)
with open ('researchreports.csv', 'w') as outf:
csvfile = writer(outf)
headers = ['','TYPE','YEAR', 'URL', 'CATEGORIES']
csvfile.writerow(headers)
csvfile.writerows(newline)
Did I over complicate the problem? Probably. I could’ve just read in the whole file as a string, and then done a global regex search/replace. But that would be problematic if I were dealing with a huge input file. The advantage of this approach is that it doesn’t matter how many rows there are; it’ll march through them with no worries about memory limitations. Second, it’ll be easier to modify and reuse this approach than a custome RegEx. Finally, it apparantly fits my mental model of how to solve the problem.
What I really want is for someone to show me how to do this in one line with awk/sed. =)
Simple Web Response Time Testing with Python
For my day job, I’m creating a series of HTML pages that each have a table that shows how our various services and solutions map onto problems our customers are likely to have. The main site is currently thousands of static HTML pages, with a bit of PHP thrown in a few pages to do page footers. We’re working on upgrading to a dynamic CMS type site. In the meantime, I used the opportunity to learn a bit more about PHP and I wrote a small function to generate the table HTML given a JSON document describing the table headers, rows, and content.
As I was debugging the sites, I felt like there was sometimes a noticeable delay in rendering the page that wasn’t there on the existing static pages. Was this my imagination, or something that our users might notice and complain about. Hmm, I don’t have any web profiling software, and I couldn’t find anything that I could quickly install and run. And I had some time. Looks like I have to write some code. In the immortal words of Leeroy Jenkins, Let’s Do This!”
Python timeit Module
Python’s mantra is Batteries Included, implying that for whatever coding task you have, there’s probably something in the standard library that will do muct of what you want. You shouldn’t have to go and write something completely from scratch. I knew about python’s time module. I was planning on using it to mark the time before fetching my webpage, mark the time after fetching the page, and comparing the two. But I stumbled onto the timeit module, which makes it even a bit easier. Timeit basically wraps up that logic of marking time before and after some bit of code in convenient package. You give the timeit.Timer() class a bit of code that you want to time. The timeit() method will run the code a specified number of times (default 1,000,000) and return the average time for code execution. The repeat() method will run the timeit() method a specified number of times, and return a list of the average times.
In action, it looks like this:
import timeit
# Request the page 100 times, time the response time
t = timeit.Timer("h.request('http://PAGE/URL',headers={'cache-control':'no-cache'})", "from httplib2 import Http; h=Http()")
times_p1 = t.repeat(100,1)
Three lines of code…not bad. The Timer() class takes two strings as parameters: 1) The python code you would like repeated and timed, 2) Python code required to run before each run of the test code. If you’re familiar with Unit Testing, then the 2nd parameter is like the setUp() method. Notice I’m using the httplib2 library instead of the standard urllib library. I like httplib2 for requesting urls because I’m familiar with it, it combines requesting the url and reading its contents, and its really good about dealing with caching. In this case, I don’t want the server to cache.
The second line instructs my Timer() to run 100 sets of my test code, with 1 trial per set. The output is a list of 100 times.
The documentation for timeit.repeat() gives some good advice on how much stock to put into these numbers, and using mean/standard deviation to describe the performance. But what I really wanted to know was whether or not my page took significantly longer to load than a similar page with no dynamic content. I expanded my code to repeatedly time a second, static page, and the two lists in two columns of a csv file.
import timeit
from csv import writer
# Hit the dynamic page 100 times, time the response time
t = timeit.Timer("h.request('http://PAGE1/URL',headers={'cache-control':'no-cache'})","from httplib2 import Http; h=Http()")
times_p1 = t.repeat(100,1)
# Now hit a similar static page 100 times
t = timeit.Timer("h.request('http://PAGE2/URL', headers={'cache-control':'no-cache'})","from httplib2 import Http; h=Http()")
times_p2 = t.repeat(100,1)
# the times to a CSV file
times = zip(times_p1,times_t2)
with open('times.csv','w') as f:
w = writer(f)
w.writerows(times)
Note we’re using the python with statement from Python 2.5+, which encapsulates some of the try/except/finally logic you’d normally write when opening a file. Because I had even more spare time, I imported my new times.csv file into a statistics program (SPSS) to calculate mean, and perform a T-Test to see if the means of the two columns they are statistically different. I also could have used various statistics scripting tools: scipy, R, for example. But I didn’t have THAT much time.
There was a statistically significant difference. The dynamic page was, on average, about 1.2 ms slower than the static page. This makes practically no difference to the user experience of the page, and makes my development life much easier (and also illustrates how practical significance may differ from statistical significance). I’ll continue to generate pages dynamically.