- #1
Chris Miller
- 371
- 35
Been tracking down the most unusual bug I've seen in 40 years of programming (mostly in assemlber and C). It manifests in the test suite for an object oriented XBase language implementation that's been ported everywhere from Palm Pilot to OS/2 to the Blackberry tablet to Windows to, now, Qnx7 (from Qnx4, from Qnx2).
I've tracked it down to a read() failure. A read(int fd, char *buffer, int bytes) call returns bytes with errno=0, but the wrong bytes are in the buffer (with some not overwritten at all). A second read() into a different buffer from the same file offset returns the correct ones. It sometimes only manifests after several passes through the test suite. A 100 passes is always enough. Running concurrently with another process pounding the file system seems to reduce if not eliminate its occurrence.
But the weirdest thing about it for me is what "fixes" it. Instead of reading n bytes directly into the buffer passed to my calling/cover function, I malloc a pointer to n bytes, read into that, memcpy these n bytes into the buffer passed to my cover function, and free the mallcoed pointer. Then all tests work perfectly. To me, this is a complete no-op, and seems to rule out memory corruption and compiler errors (which I've tested for to a fair extent anyway). I'd be curious to hear anyone's thoughts on this.
I've tracked it down to a read() failure. A read(int fd, char *buffer, int bytes) call returns bytes with errno=0, but the wrong bytes are in the buffer (with some not overwritten at all). A second read() into a different buffer from the same file offset returns the correct ones. It sometimes only manifests after several passes through the test suite. A 100 passes is always enough. Running concurrently with another process pounding the file system seems to reduce if not eliminate its occurrence.
But the weirdest thing about it for me is what "fixes" it. Instead of reading n bytes directly into the buffer passed to my calling/cover function, I malloc a pointer to n bytes, read into that, memcpy these n bytes into the buffer passed to my cover function, and free the mallcoed pointer. Then all tests work perfectly. To me, this is a complete no-op, and seems to rule out memory corruption and compiler errors (which I've tested for to a fair extent anyway). I'd be curious to hear anyone's thoughts on this.