Buffer overflows using NT 3.1
05 Aug 2023Recently I attempted to use NT 3.1 as a development system, performing editing, compiling and debugging on an NT 3.1 host. This revealed some bugs in the software I was writing, but also uncovered some new (to me) bugs in the operating system itself.
For context, Yori has a debug mode memory allocator that places all allocations to end at a page boundary and marks the next page as no access. This causes any buffer overflows to immediately crash. Running on NT 3.1 exercised that path unexpectedly.
First bug: GetTempPathW
The following is a short snippet of code from GetTempPathW:
77884691 call 778a68e0 ; Call underlying function which does most of the work 77884696 mov ecx,eax ; save the length of the string, in bytes 77884698 shr ecx,1 ; right shift to convert bytes to characters 7788469a or eax,eax ; Check if zero 7788469c jz 778846eb ; If zero, leave 7788469e cmp eax,esi ; Check if returned size is within buffer size 778846a0 jnb 778846eb ; If not, leave 778846a2 lea edx,dword ptr [edi+ecx*2] ; Get a pointer to the buffer after the end of the string 778846a5 cmp word ptr [edx-02],005c ; Check for trailing backslash 778846aa jz 778846e0 ; If it's there, leave 778846ac lea ecx,dword ptr [eax+02] ; Calculate number of bytes with an additional WCHAR 778846af cmp esi,ecx ; Check if it fits in buffer 778846b1 jbe 778846d0 ; If not, leave 778846b3 mov word ptr [edx],005c ; Write trailing backslash to end of buffer 778846b8 mov word ptr [edi+eax*2+02],0000 ; Write NULL to buffer start plus BYTE count ; multiplied by two, plus two
The final instruction clearly intended to write to the character index but instead wrote to the byte index, thereby writing the NULL terminator well off the end of the buffer. Once the bug is known the instructions above look rather silly, because the "correct" final instruction would have been a relatively simple:
mov word ptr [edx+02], 0000
...which is smaller and simpler than the contortion the compiler needed to generate the bug. I worked around this by supplying an excessively long buffer to GetTempPath, and manually NULL terminating the result.
Obviously this bug is potentially serious in that it will stomp a NULL over random memory. But one mitigating factor is that the ANSI functions, such as GetTempPathA, end up using a scratch buffer for Unicode strings and converting the result back into the caller's buffer. For any program using the ANSI function where the temporary path was less than 128 characters - which would almost always be the case - this NULL is written into the scratch buffer with no ill-effects.
Second bug: All of user32?
The second piece of code seems to be common to a lot of string manipulation in User32:
76a0a595 lea edx,dword ptr [00000005+eax*2] ; Convert characters to bytes and add 5 (huh?) 76a0a59c and edx,fffffffc ; Strip any trailing bytes to form a 32 bit multiple 76a0a59f mov edi,dword ptr [esp+0c] 76a0a5a3 lea eax,dword ptr [edx+edi] 76a0a5a6 cmp dword ptr [esp+10],eax 76a0a5aa jb 76a0a5c6 76a0a5ac cmp edi,eax 76a0a5ae jnb 76a0a5c6 76a0a5b0 mov ecx,edx ; Load the number of bytes to copy into ecx 76a0a5b2 shr ecx,02 ; Shift right so it's the number of 32 bit copy operations 76a0a5b5 rep movsd ; Perform 32 bit copies 76a0a5b7 mov ecx,edx ; Load the number of bytes to copy 76a0a5b9 and ecx,00000003 ; Remove anything that was 32 bit aligned above 76a0a5bc rep movsb ; Copy the remaining bytes
This is a bit odd because the code at the bottom of the function clearly intends to round down to the number of 32 bit aligned values, copy those, then perform a byte-by-byte copy of the remainder. But the code at the top is adding 5 presumably to round up to the next whole 32 bit value. With a debug allocator, this code routinely copies one WCHAR off the end of the buffer and crashes. I "fixed" this by ensuring there's always one more WCHAR than required. Fortunately Yori doesn't use User32 very much, since it's primarily a set of command line tools.
Both of these bugs seemed a little odd because with a debug memory allocator they're trivial to find. But on the other hand, the reason I use a debug memory allocator is because of hard-learned experience that they find bugs, and that experience came into existence after seeing this class of bug in early Windows development. A lot of Windows code was written before learning this lesson.