When Speed Is King: C vs Lisp
Yannick Gingras is a fellow Lisp and fractal enthusiast. Like me, he has also entered the Apress Fractal Contest. Although Yannick does fractals in Lisp (fract is pretty cool), he chose to enter a C program. We talked about this on irc.freenode.net #lisp (scroll down to 01:30:41). I guess Yannick really wants that PSP ;-)
Although I'm disappointed that there isn't a Lisp entry from Yannick because his approach is different from mine, he was kind enough to share his C fractal entry. (A copy is here.) This gave me a chance to compare my Lisp code performance against Yannick's C code performance:
david@apostrophe:~/usr/src/apress-fractal/ygingras/fast_fract-1.0
$ time src/fast_fract ../../sampleinput.txt
real 0m3.559s
user 0m3.456s
sys 0m0.050s
david@apostrophe:~/usr/src/apress-fractal
$ time ./run sampleinput.txt
Generating fractals from "sampleinput.txt"
File: #P"outputpic/image01.jpeg"
left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999
File: #P"outputpic/image02.jpeg"
left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088
File: #P"outputpic/image03.jpeg"
left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613
File: #P"outputpic/image04.jpeg"
left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286
File: #P"outputpic/image05.jpeg"
left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699
File: #P"outputpic/image06.jpeg"
left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817
File: #P"outputpic/image07.jpeg"
left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543
File: #P"outputpic/image08.jpeg"
left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101
File: #P"outputpic/image09.jpeg"
left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904
File: #P"outputpic/image10.jpeg"
left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175
File: #P"outputpic/image11.jpeg"
left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064
File: #P"outputpic/image12.jpeg"
left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517
File: #P"outputpic/image13.jpeg"
left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455
File: #P"outputpic/image14.jpeg"
left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348
File: #P"outputpic/image15.jpeg"
left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846
File: #P"outputpic/image16.jpeg"
left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265
File: #P"outputpic/image17.jpeg"
left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182
File: #P"outputpic/image18.jpeg"
left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654
File: #P"outputpic/image19.jpeg"
left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675
File: #P"outputpic/image20.jpeg"
left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478
File: #P"outputpic/image21.jpeg"
left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216
DONE
real 0m9.492s
user 0m9.104s
sys 0m0.177s
david@apostrophe:~/usr/src/apress-fractal
$ cpuid
eax in eax ebx ecx edx
00000000 00000002 756e6547 6c65746e 49656e69
00000001 00000683 00000001 00000000 0383f9ff
00000002 03020101 00000000 00000000 0c040841
Vendor ID: "GenuineIntel"; CPUID level 2
Intel-specific functions:
Version 00000683:
Type 0 - Original OEM
Family 6 - Pentium Pro
Model 8 - Pentium III/Pentium III Xeon - internal L2 cache
Stepping 3
Reserved 0
Brand index: 1 [Celeron processor]
Feature flags 0383f9ff:
FPU Floating Point Unit
VME Virtual 8086 Mode Enhancements
DE Debugging Extensions
PSE Page Size Extensions
TSC Time Stamp Counter
MSR Model Specific Registers
PAE Physical Address Extension
MCE Machine Check Exception
CX8 COMPXCHG8B Instruction
SEP Fast System Call
MTRR Memory Type Range Registers
PGE PTE Global Flag
MCA Machine Check Architecture
CMOV Conditional Move and Compare Instructions
FGPAT Page Attribute Table
PSE-36 36-bit Page Size Extension
MMX MMX instruction set
FXSR Fast FP/MMX Streaming SIMD Extensions save/restore
SSE Streaming SIMD Extensions instruction set
TLB and cache info:
01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries
02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries
03: Data TLB: 4KB pages, 4-way set assoc, 64 entries
41: 2nd-level cache: 128KB, 4-way set assoc, 32 byte line size
08: 1st-level instruction cache: 16KB, 4-way set assoc, 32 byte line size
04: Data TLB: 4MB pages, 4-way set assoc, 8 entries
0c: 1st-level data cache: 16KB, 4-way set assoc, 32 byte line size
david@apostrophe:~/usr/src/apress-fractal
$ free -k
total used free shared buffers cached
Mem: 386288 351492 34796 0 12804 230436
-/+ buffers/cache: 108252 278036
Swap: 2000084 796 1999288
david@apostrophe:~/usr/src/apress-fractal
$ uname -a
Linux apostrophe 2.6.11.9-686 #1 Thu Aug 25 04:40:51 EDT 2005 i686 GNU/Linux
The CPU is a 566Mhz Celeron. This is my webserver.
To do the timings, I first ran the program to get data and code into cache. Then I did a single timing run. This is as close to scientific as I care to bother with. The timings tend to be fairly consistent in my experience, so there is no real point in running a whole lot of them. As you can see, Yannick's code is about 2.667x faster than mine in this test. Slowing my code down is the fact that I print out the status, calculate an extra region (there are 21 regions in the sampleinput.txt file), and I have the overhead of launching SBCL from a shell script. I do a bit better if I run from the REPL:
david@apostrophe:~/usr/src/apress-fractal $ sbcl --core bin/apress-fractal.core This is SBCL 0.9.4, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (time (generate-fractals-from-input-file "sampleinput.txt")) File: #P"outputpic/image01.jpeg" left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999 File: #P"outputpic/image02.jpeg" left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088 File: #P"outputpic/image03.jpeg" left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613 File: #P"outputpic/image04.jpeg" left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286 File: #P"outputpic/image05.jpeg" left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699 File: #P"outputpic/image06.jpeg" left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817 File: #P"outputpic/image07.jpeg" left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543 File: #P"outputpic/image08.jpeg" left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101 File: #P"outputpic/image09.jpeg" left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904 File: #P"outputpic/image10.jpeg" left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175 File: #P"outputpic/image11.jpeg" left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064 File: #P"outputpic/image12.jpeg" left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517 File: #P"outputpic/image13.jpeg" left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455 File: #P"outputpic/image14.jpeg" left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348 File: #P"outputpic/image15.jpeg" left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846 File: #P"outputpic/image16.jpeg" left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265 File: #P"outputpic/image17.jpeg" left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182 File: #P"outputpic/image18.jpeg" left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654 File: #P"outputpic/image19.jpeg" left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675 File: #P"outputpic/image20.jpeg" left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478 File: #P"outputpic/image21.jpeg" left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216 Evaluation took: 9.362 seconds of real time 9.047625 seconds of user run time 0.151977 seconds of system run time 0 page faults and 5,454,976 bytes consed. NIL * (time (generate-fractals-from-input-file "sampleinput.txt")) File: #P"outputpic/image01.jpeg" left: -1.9999999, right: 1.9999999, bottom: -1.9999999, top: 1.9999999 File: #P"outputpic/image02.jpeg" left: 0.338, right: 0.778, bottom: -0.3312, top: 0.1088 File: #P"outputpic/image03.jpeg" left: -1.7904, right: -1.4355613, bottom: -0.4392, top: -0.0843613 File: #P"outputpic/image04.jpeg" left: -1.0132, right: -0.4417714, bottom: 0.1448, top: 0.7162286 File: #P"outputpic/image05.jpeg" left: 1.1408, right: 1.2733301, bottom: -0.4236, top: -0.2910699 File: #P"outputpic/image06.jpeg" left: -1.7464, right: -1.5645818, bottom: -1.5524, top: -1.3705817 File: #P"outputpic/image07.jpeg" left: 1.1588, right: 1.5016571, bottom: -1.3704, top: -1.027543 File: #P"outputpic/image08.jpeg" left: -0.5892, right: -0.4881899, bottom: 1.508, top: 1.6090101 File: #P"outputpic/image09.jpeg" left: -0.8404, right: -0.7445096, bottom: 1.618, top: 1.7138904 File: #P"outputpic/image10.jpeg" left: -0.3536, right: -0.1606175, bottom: -1.1744, top: -0.9814175 File: #P"outputpic/image11.jpeg" left: -1.2624, right: -1.0124, bottom: 0.814, top: 1.064 File: #P"outputpic/image12.jpeg" left: 0.5652, right: 0.810483, bottom: -0.4008, top: -0.155517 File: #P"outputpic/image13.jpeg" left: 0.676, right: 0.7555455, bottom: 1.2732, top: 1.3527455 File: #P"outputpic/image14.jpeg" left: 0.412, right: 0.5424348, bottom: 0.0676, top: 0.1980348 File: #P"outputpic/image15.jpeg" left: -1.8728, right: -1.7478, bottom: -1.4096, top: -1.2846 File: #P"outputpic/image16.jpeg" left: 1.8312, right: 1.9878265, bottom: 0.9712, top: 1.1278265 File: #P"outputpic/image17.jpeg" left: 0.1364, right: 0.9545818, bottom: -1.9428, top: -1.1246182 File: #P"outputpic/image18.jpeg" left: -0.2828, right: -0.1578, bottom: 1.7404, top: 1.8654 File: #P"outputpic/image19.jpeg" left: -1.5624, right: -0.7499, bottom: -1.68, top: -0.8675 File: #P"outputpic/image20.jpeg" left: -1.1324, right: -0.9367478, bottom: -0.5792, top: -0.3835478 File: #P"outputpic/image21.jpeg" left: 1.216, right: 1.4120784, bottom: -1.572, top: -1.3759216 Evaluation took: 8.998 seconds of real time 8.699678 seconds of user run time 0.084987 seconds of system run time 0 page faults and 4,243,440 bytes consed. NIL * (/ 8.998 3.559) 2.5282383 * (quit)
As you can see, this gets me to only 2.528x slower than Yannick's code. Yannick has pointed out that this comparison is still not quite fair. For one, as I've already mentioned, Yannick's code does only 20 regions, not 21. Another is that for the sampleinput.txt file, Yannick's code is running fewer iterations, producing lower quality images. Yannick has also told me that his code uses fork() to multiplex io and computation. Most of his time is spent in the jpeg encoding. Interestingly enough, that is where my code also spends most of its time. Another factor is the vagueness of the Apress specifications for the fractal generator code. As a result, comparing my Lisp code against Yannick's C code is somewhat of an apples to oranges comparison.
My code can be optimized further, but that would be a lot of effort. In the end, the C code is still faster and I am unlikely to be able to change that. In spite of all that, I'm rather pleased that I was able to approach the performance of heavily optimized C code with Lisp. That's not a bad deal.