When Speed Is King: C vs Lisp

Yannick Gingras is a fellow Lisp and fractal enthusiast. Like me, he has also entered the Apress Fractal Contest. Although Yannick does fractals in Lisp (fract is pretty cool), he chose to enter a C program. We talked about this on irc.freenode.net #lisp (scroll down to 01:30:41). I guess Yannick really wants that PSP ;-)

Although I'm disappointed that there isn't a Lisp entry from Yannick because his approach is different from mine, he was kind enough to share his C fractal entry. (A copy is here.) This gave me a chance to compare my Lisp code performance against Yannick's C code performance:

david@apostrophe:~/usr/src/apress-fractal/ygingras/fast_fract-1.0
$ time src/fast_fract ../../sampleinput.txt

real    0m3.559s
user    0m3.456s
sys     0m0.050s
david@apostrophe:~/usr/src/apress-fractal
$ time ./run sampleinput.txt
Generating fractals from "sampleinput.txt"

File: #P"outputpic/image01.jpeg"
left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999
File: #P"outputpic/image02.jpeg"
left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088
File: #P"outputpic/image03.jpeg"
left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613
File: #P"outputpic/image04.jpeg"
left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286
File: #P"outputpic/image05.jpeg"
left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699
File: #P"outputpic/image06.jpeg"
left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817
File: #P"outputpic/image07.jpeg"
left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543
File: #P"outputpic/image08.jpeg"
left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101
File: #P"outputpic/image09.jpeg"
left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904
File: #P"outputpic/image10.jpeg"
left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175
File: #P"outputpic/image11.jpeg"
left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064
File: #P"outputpic/image12.jpeg"
left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517
File: #P"outputpic/image13.jpeg"
left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455
File: #P"outputpic/image14.jpeg"
left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348
File: #P"outputpic/image15.jpeg"
left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846
File: #P"outputpic/image16.jpeg"
left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265
File: #P"outputpic/image17.jpeg"
left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182
File: #P"outputpic/image18.jpeg"
left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654
File: #P"outputpic/image19.jpeg"
left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675
File: #P"outputpic/image20.jpeg"
left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478
File: #P"outputpic/image21.jpeg"
left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216
DONE


real    0m9.492s
user    0m9.104s
sys     0m0.177s
david@apostrophe:~/usr/src/apress-fractal
$ cpuid
 eax in    eax      ebx      ecx      edx
00000000 00000002 756e6547 6c65746e 49656e69
00000001 00000683 00000001 00000000 0383f9ff
00000002 03020101 00000000 00000000 0c040841

Vendor ID: "GenuineIntel"; CPUID level 2

Intel-specific functions:
Version 00000683:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 8 - Pentium III/Pentium III Xeon - internal L2 cache
Stepping 3
Reserved 0

Brand index: 1 [Celeron processor]

Feature flags 0383f9ff:
FPU    Floating Point Unit
VME    Virtual 8086 Mode Enhancements
DE     Debugging Extensions
PSE    Page Size Extensions
TSC    Time Stamp Counter
MSR    Model Specific Registers
PAE    Physical Address Extension
MCE    Machine Check Exception
CX8    COMPXCHG8B Instruction
SEP    Fast System Call
MTRR   Memory Type Range Registers
PGE    PTE Global Flag
MCA    Machine Check Architecture
CMOV   Conditional Move and Compare Instructions
FGPAT  Page Attribute Table
PSE-36 36-bit Page Size Extension
MMX    MMX instruction set
FXSR   Fast FP/MMX Streaming SIMD Extensions save/restore
SSE    Streaming SIMD Extensions instruction set

TLB and cache info:
01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries
03: Data TLB: 4KB pages, 4-way set assoc, 64 entries
41: 2nd-level cache: 128KB, 4-way set assoc, 32 byte line size
08: 1st-level instruction cache: 16KB, 4-way set assoc, 32 byte line size
04: Data TLB: 4MB pages, 4-way set assoc, 8 entries
0c: 1st-level data cache: 16KB, 4-way set assoc, 32 byte line size
david@apostrophe:~/usr/src/apress-fractal
$ free -k
             total       used       free     shared    buffers     cached
Mem:        386288     351492      34796          0      12804     230436
-/+ buffers/cache:     108252     278036
Swap:      2000084        796    1999288
david@apostrophe:~/usr/src/apress-fractal
$ uname -a
Linux apostrophe 2.6.11.9-686 #1 Thu Aug 25 04:40:51 EDT 2005 i686 GNU/Linux

The CPU is a 566Mhz Celeron. This is my webserver.

To do the timings, I first ran the program to get data and code into cache. Then I did a single timing run. This is as close to scientific as I care to bother with. The timings tend to be fairly consistent in my experience, so there is no real point in running a whole lot of them. As you can see, Yannick's code is about 2.667x faster than mine in this test. Slowing my code down is the fact that I print out the status, calculate an extra region (there are 21 regions in the sampleinput.txt file), and I have the overhead of launching SBCL from a shell script. I do a bit better if I run from the REPL:

david@apostrophe:~/usr/src/apress-fractal
$ sbcl --core bin/apress-fractal.core 
This is SBCL 0.9.4, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
* (time (generate-fractals-from-input-file "sampleinput.txt"))
File: #P"outputpic/image01.jpeg"
left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999
File: #P"outputpic/image02.jpeg"
left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088
File: #P"outputpic/image03.jpeg"
left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613
File: #P"outputpic/image04.jpeg"
left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286
File: #P"outputpic/image05.jpeg"
left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699
File: #P"outputpic/image06.jpeg"
left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817
File: #P"outputpic/image07.jpeg"
left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543
File: #P"outputpic/image08.jpeg"
left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101
File: #P"outputpic/image09.jpeg"
left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904
File: #P"outputpic/image10.jpeg"
left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175
File: #P"outputpic/image11.jpeg"
left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064
File: #P"outputpic/image12.jpeg"
left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517
File: #P"outputpic/image13.jpeg"
left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455
File: #P"outputpic/image14.jpeg"
left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348
File: #P"outputpic/image15.jpeg"
left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846
File: #P"outputpic/image16.jpeg"
left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265
File: #P"outputpic/image17.jpeg"
left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182
File: #P"outputpic/image18.jpeg"
left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654
File: #P"outputpic/image19.jpeg"
left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675
File: #P"outputpic/image20.jpeg"
left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478
File: #P"outputpic/image21.jpeg"
left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216
Evaluation took:
  9.362 seconds of real time
  9.047625 seconds of user run time
  0.151977 seconds of system run time
  0 page faults and
  5,454,976 bytes consed.
NIL
* (time (generate-fractals-from-input-file "sampleinput.txt"))
File: #P"outputpic/image01.jpeg"
left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999
File: #P"outputpic/image02.jpeg"
left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088
File: #P"outputpic/image03.jpeg"
left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613
File: #P"outputpic/image04.jpeg"
left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286
File: #P"outputpic/image05.jpeg"
left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699
File: #P"outputpic/image06.jpeg"
left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817
File: #P"outputpic/image07.jpeg"
left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543
File: #P"outputpic/image08.jpeg"
left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101
File: #P"outputpic/image09.jpeg"
left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904
File: #P"outputpic/image10.jpeg"
left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175
File: #P"outputpic/image11.jpeg"
left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064
File: #P"outputpic/image12.jpeg"
left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517
File: #P"outputpic/image13.jpeg"
left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455
File: #P"outputpic/image14.jpeg"
left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348
File: #P"outputpic/image15.jpeg"
left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846
File: #P"outputpic/image16.jpeg"
left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265
File: #P"outputpic/image17.jpeg"
left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182
File: #P"outputpic/image18.jpeg"
left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654
File: #P"outputpic/image19.jpeg"
left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675
File: #P"outputpic/image20.jpeg"
left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478
File: #P"outputpic/image21.jpeg"
left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216
Evaluation took:
  8.998 seconds of real time
  8.699678 seconds of user run time
  0.084987 seconds of system run time
  0 page faults and
  4,243,440 bytes consed.
NIL
* (/ 8.998 3.559)

2.5282383
* (quit)

As you can see, this gets me to only 2.528x slower than Yannick's code. Yannick has pointed out that this comparison is still not quite fair. For one, as I've already mentioned, Yannick's code does only 20 regions, not 21. Another is that for the sampleinput.txt file, Yannick's code is running fewer iterations, producing lower quality images. Yannick has also told me that his code uses fork() to multiplex io and computation. Most of his time is spent in the jpeg encoding. Interestingly enough, that is where my code also spends most of its time. Another factor is the vagueness of the Apress specifications for the fractal generator code. As a result, comparing my Lisp code against Yannick's C code is somewhat of an apples to oranges comparison.

My code can be optimized further, but that would be a lot of effort. In the end, the C code is still faster and I am unlikely to be able to change that. In spite of all that, I'm rather pleased that I was able to approach the performance of heavily optimized C code with Lisp. That's not a bad deal.