Thursday, April 24, 2008

Comparision of NET2005 to MFC6

machine: eng-azhurav64
MFC6 36.27
NET2005 39.14 and that's with SSE2! WTF?
NET2005 without progress bar: 38.9 so it's not that...

machine: eng-ckorda
MFC6 40.50
NET2005 43.97 with SSE2, FP = fast
NET2005 45.90 with SSE2, FP = precise
NET2005 38.05 SSE disabled, FP = fast
NET2005 36.78 SSE disabled, FP = precise <-- fastest!!!

Looks like SSE2 is SLOWER than the FPU for the rendering loop.
Need to hand-code it and process two pixels at once!

deep zoom benchmark (same coords but 640 x 480, 1x)
machine: eng-ckorda
MFC6 253.27
NET2005 213.03 significantly faster

eng-sfreed: 1024x768, 4096, 4x, GMP = 49 minutes... that's 30 max frames per day

Thursday, April 17, 2008

GmpBench

coordinates: deep zoom +132 (768 bits).frs
checksum: 24454011
------------------------
machine: ckci-home (Pentium III)
compiler: VC++ 6.0
GMP version: 4.1.2
GMP target: P6
GmpBench: 446.96
Fractice: 445.71

machine: eng-ckorda (2.66 GHz Intel CoreDuo)
compiler: VC++ 6.0
GMP version: 4.1.2 (my port)
GMP target: Pentium4
GmpBench: 106.02, 105.86

machine: eng-ckorda (2.66 GHz Intel CoreDuo)
compiler: .NET 2005
GMP version: 4.1.2 (my port)
GMP target: Pentium4
GmpBench: 105.25, 105.33

machine: eng-ckorda (2.66 GHz Intel CoreDuo)
compiler: .NET 2005
GMP version: 4.2.2 (Gladman's port)
GMP target: Pentium4
GmpBench: 111.29, 111.34

Many differences between Gladman's 4.2 and my 4.1, likely candidates include
config.h (alloc differences!)
gmp-mparams.h
gmp-impl.h
longlong.h (long long is enabled in Gladman but not mine)

Wednesday, April 16, 2008

GMP assembler port

benchmark
pc: z / eng-ckorda
file: deep zoom +136 (768 bits).frs
bits: 768

ASM pass1 pass2
---- ----- -----
generic 173.6 174.3
x86 66.7 65.9
P6 60.0 61.0
P4 59.0 59.0
.NET 71.0 71.0

x86 gives a huge gain, nearly 3x faster!!
p6 is probably worth it, but p4 not
.NET is slower? WTF?

Monday, April 14, 2008

HPAlib test results

default
HPA 6.741
GMP 15.267

deep zoom to +23.frs
HPA 86
GMP 224

Conclusion:
HPA is between 2 and 2.6 times faster than GMP. Unfortunately it runs out of precision at around +32, so it's not much of an improvement over the FPU.