Revisiting Perl and Python's Speed

April 02, 2012

Revisiting Perl and Python's Speed

I was really surprised to see the discussion that was generated as the result of my previous post comparing the speed of Python and Perl. Many people much wiser than me posted a lot of valuable comments and suggestions, and two people were kind enough to post total rewrites of my routines which (to nobody's surprise) were much faster than the codes I wrote.

A few people (both here on the blog and through other discussion) raised legitimate points:

My Python code was recompiling the regex every loop iteration because I was confused by how regex compilation and regex match objects work. Fixing this problem alone increased speed by 10%-25%.
The timings I posted were sub-second and someone suggested that startup overhead may have been hurting Python. To address this, I used a more "real-life" input file that was 3,750 MB rather than the 8.588 MB input file I used earlier.
The style of Perl I was using was archaic, and the style of Python I was using wasn't terribly Pythonic. I live in a programming bubble; I learned both of these languages from their respective O'Reilly books and that's it. I don't know anyone who knows either Perl or Python in real life, and I have never seen anyone else's code in either language. But as it turns out, poorly written Perl and poorly written Python follow the same trends as well-written Perl and Python (see below).

So as to be a little more scientific about this (since I am a scientist and all), here are my starting parameters:

Software
- Ubuntu Server 10.04 LTS
- Python 2.6.5 provided by the distribution
- Perl 5.10.1 provided by the distribution
- data resides on an ext4 lvm
Hardware
- HP DL360 G7
- 2x Xeon X5672, 3200 MHz
- 24GB DDR3 RAM
- data resides on 6Gbit SAS RAID5
Codes
- "Old Perl" code is the code shown in my previous post.
- "Old Python" code is also shown in my previous post.
- "New Perl" code is the code written by gnustavo.
- "New Python" code is the code written by Paul Davis.

Methodology: I ran each code on the same 3750 MB input file five times in serial succession. Each execution was timed using the `time` builtin provided by the bash 4.1.5(1) included with Ubuntu 10.04. stdout was redirected straight to /dev/null.

Trial	Walltime	Trial 1	Trial 2	Trial 3	Trial 4	Trial 5
Old Python	309.032	310.971	308.228	311.331	307.170	307.461
New Python	176.880	178.099	174.742	175.463	178.235	177.863
Old Perl	167.051	166.916	165.911	167.361	168.735	166.333
New Perl	126.860	125.913	124.709	130.125	127.809	125.746

So even cleaner code runs over 40% faster in Perl than Python, which is not far off from the 50% slowdown I noted with my two crumbier versions of the code. Furthermore, it seems easier for a relative novice like myself to write inefficient Python code over Perl code. Of course, it's also easier to write Perl code that doesn't do what you expect, and trying to understand someone else's code is a crapshoot.

Judging by what others have told me and some comments have pointed out though, Python just isn't optimized for "practical extraction and reporting." Maybe someday I'll find a use for Python in my work.

In case the links to the codes I used ever go bad, here they are on pastebin:

I'd post the input files I used, but I don't have anywhere I can host 3.7 GB files. If you're interested in the input data, let me know and I can send a private link.

Search This Blog

Glenn K. Lockwood

Revisiting Perl and Python's Speed