32-bit vs. 64-bit FractServ benchmarks
32-bit code: GMP 4.1.2 with P4 assembler
64-bit code: MPIR 1.3.0 rc3 with Core2 assembler
test project: bench mpir64.frp
Record mode, only two frames, one to local machine, one to server being benchmarked
AFAIK all machines are Core2 running Vista64 except wzenge (i7 running Vista64)
PC 32-bit 64-bit gain
ckeny 466 186 2.51
nstone 523 219 2.39
dtopp 525 1064 0.49 <-- WTF?!? better check this one
dpeder 545 225 2.42
bbetts 463 193 2.40
sfreed 521 215 2.42
jperre 683 535 1.28 <-- ? another mystery
wzenge 251 99 2.54 <-- i7
Porting Fractice itself will take more time. It took quite a bit of fussing to get the code compiled cleanly in 64-bit. Some of the more common issues are described succinctly in Intel's article Code Cleaning MFC/ATL Applications for 64-Bit Intel Architecture: basically Polymorphic data types (i.e. INT_PTR), DoModal return value, SendMessage return value, failure to use WPARAM/LPARAM in prototypes, and item data (e.g. SetItemData). Some issues they don't mention: CArray GetSize now returns 64-bit, and the prototype of OnTimer changed.
The key to my solution is this block of code, which is included by stdafx.h:
#ifdef _WIN64
#define INT64TO32(x) static_cast(x)
#define UINT64TO32(x) static_cast(x)
#define GCL_HBRBACKGROUND GCLP_HBRBACKGROUND
typedef INT_PTR W64INT;
typedef UINT_PTR W64UINT;
#include "ArrayEx.h"
typedef CArrayExCDWordArrayEx;
#define CDWordArray CDWordArrayEx
typedef CArrayExCPtrArrayEx;
#define CPtrArray CPtrArrayEx
typedef CArrayExCByteArrayEx;
#define CByteArray CByteArrayEx
#else
typedef int W64INT;
typedef UINT W64UINT;
#define INT64TO32(x) x
#define UINT64TO32(x) x
#endif
Anywhere I need to cast to 32-bit, I use INT64TO32(x) or UINT64TO32(x) which makes the changes compact and easy to find. For example getting item data from a control is a very common case:
int idx = INT64TO32(m_List.GetItemData());
Another common case is ON_MESSAGE handlers, or any other handler where the arguments are generic WPARAM/LPARAM and you're using them as int or UINT.
Since I already have a wrapper (CArrayEx), and I don't require arrays with more than 2 billion elements, I can get away with overriding GetSize to return a 32-bit int. I also redefine the CDWordArray, CPtrArray, CByteArray etc. as CArrayEx instances, so that they inherit the 32-bit GetSize. This avoids LOTS of tedious rewriting of code that doesn't need to be 64-bit anyway.
I define polymorphic types using W64INT or W64UINT. Yes I could use INT_PTR/DWORD_PTR but the extra indirection doesn't hurt a bit and I'm getting tired of M$ changing the rules. Once burned, twice shy. The most common cases are:
Timer instances
OnTimer nIDEvent argument
DoModal return value
SerializeElements nCount argument
The only serious bug so far was in CDib::Serialize, which reads/writes a BITMAP struct. The issue is that one of BITMAP's members (dwBits) is a pointer, which means the struct had to grow in 64-bit Windows, from 24 bytes to 32 bytes. My code doesn't use dwBits, but that doesn't matter. I guess I should have known better, but BITMAP has been around so long I tend to think of it as bedrock. It's certainly not an internal struct or anything. I guess this is what you might call a "breaking change" in Windows. I could have stored the size of the struct in the archive, but I didn't. I could have made my own struct and copied the BITMAP members I care about to/from it, but I was too lazy. So I'm left with a minor kludge:
void CDib::Serialize(CArchive& ar)
{
// The BITMAP struct got bigger in 64-bit Windows, due to the bmBits member
// being a pointer. To keep our archives compatible with 32-bit Windows, we
// must use the original size of BITMAP. The 64-bit load case leaves bmBits
// uninitialized, but since we don't use bmBits here it doesn't matter.
#ifdef _WIN64
static const int BITMAP_SIZE = 24; // size of BITMAP in 32-bit Windows
#else
static const int BITMAP_SIZE = sizeof(BITMAP);
#endif
if (ar.IsStoring()) {
BITMAP bmp;
if (m_pBits == NULL || !GetBitmap(&bmp))
AfxThrowArchiveException(CArchiveException::genericException, ar.m_strFileName);
ar.Write(&bmp, BITMAP_SIZE);
ar.Write(m_pBits, bmp.bmWidthBytes * bmp.bmHeight);
} else {
BITMAP bmp;
ar.Read(&bmp, BITMAP_SIZE);
if (!Create(bmp.bmWidth, bmp.bmHeight, bmp.bmBitsPixel))
AfxThrowArchiveException(CArchiveException::genericException, ar.m_strFileName);
ar.Read(m_pBits, bmp.bmWidthBytes * bmp.bmHeight);
}
}
There are still some outstanding problems.
1. Fractice movie recording depends on the BmpToAvi DLL, which is 32-bit code. I sure as hell don't want to deal with porting all that nasty DirectShow filter code to 64-bit. Instead I plan run BmpToAvi as a separate 32-bit application. The DLL will just be a 64-bit proxy for the 32-bit app. The DLL will send commands to the app using registered messages. The commands will show the compressor dialog, open an AVI, add a frame to the AVI, close the AVI, etc. It won't be without its difficulties but it's got to be easier than debugging 64-bit filter chains.
2. Since inline assembler isn't supported in 64-bit, the Mandelbrot/Mandelbar SSE2 code is a problem. The options are either rewrite it using intrinsics, or port it to YASM and make it an external function with "C" linkage. I don't have to bench the intrinsics to know that they would generate horribly inefficient code, I'll take Lee Avery's word for it. We're talking about the innermost triple-nested loop code here, critical path is an understatement. So really the external YASM is the only option. That's a significant project but also a highly worthwhile one, and not just because I'll be able to put 64-bit assembler on my resume.
No comments:
Post a Comment