Visual C++ 6 linking for NT 3.1
27 May 2017For years I naievely trusted that Visual C++ 6 would not target anything older than NT 3.51, because the linker defaulted to subsystem 4.0. However, the linker will target an older subsystem if it's explicitly specified. It's important to remember that the subsystem for NT 3.1 is 3.10, not 3.1; the linker will rightly reject an attempt to target 3.1.
Having discovered this, and tried to use it, I noticed that the executable was larger when targeting NT 3.1. This seemed odd to me since it's possible to support older OSes by just changing the subsystem version field. Just to test my sanity, I manually edited the subsystem version field of a 4.0 binary, then took both back to NT 3.1, and they both ran fine. So why the size difference?

The short answer is that the NT 3.1 version contains two more segments than the 4.0 version:
Dump of file sdir31.exe
File Type: EXECUTABLE IMAGE
  Summary
        1000 .bss
        1000 .data
        1000 .idata
        4000 .rdata
        8000 .text
Dump of file sdir40.exe
File Type: EXECUTABLE IMAGE
  Summary
        1000 .data
        4000 .rdata
        8000 .text
There are two new segments: .bss and .idata. However, this raises more questions: what are they, and why are they there?
The what part is straightforward. .bss is a segment that means, "here are symbols that should be initialized to zero. Don't bother recording the zeroes in the file, just make them when the program starts." As an optimization, that makes complete sense; if a program had a lot of zero data, this would be a big win. In sdir's case though, it's not a win at all, because there's just not much data like that. Remember, all segments with data need to be aligned to some value, in this case 512 bytes; so if a program has a small amount of data to be zero initialized and another segment has free space for it, it's more efficient to just store the zeroes and eliminate this section. That seems to be true for .bss in sdir, and it seems to be what the 4.0 linker is doing:
SECTION HEADER #2
    .bss name
      34 virtual size
SECTION HEADER #4
   .data name
      C8 virtual size
    E000 virtual address
     200 size of raw data
The other new section is .idata. .idata means, "this is writable data that's initialized to some value, and by convention, describes the import table." The convention part is interesting, because there's already a segment that contains writable data, and it's just as capable of containing an import table as anything else, so it makes sense to just combine the two.
SECTION HEADER #4
   .data name
      C8 virtual size
...
C0000040 flags
         Initialized Data
         Read Write
SECTION HEADER #5
  .idata name
     6E6 virtual size
...
C0000040 flags
         Initialized Data
         Read Write
So the NT 3.1 version contains 0x34 bytes in a .bss segment, 0xC8 bytes in a .data segment, and 0x6E6 bytes in a .idata segment. These need 0x200 + 0x800 bytes on disk (for .data and .idata) plus the headers for the three. Combining these means 0x7E2 bytes, which needs 0x800 bytes on disk, eliminating 0x200 bytes, as well as two sets of headers.
Edit: One other consequence of this is that each section needs to be aligned on its own 4Kb page, so adding two extra sections increases the amount of writable memory in the process by 8Kb. Not huge, but also not necessary; the same optimizations that decrease disk footprint also decrease memory footprint.
Going back to why though, I don't think I'll ever know. It really looks like asking for a 3.1 binary just disabled a pile of link time optimizations. Those were just as effective on NT 3.1 as 4.0, as evidenced by the fact that the 4.0 program runs just fine on 3.1. It's possible that there is some environment out there (Win32s perhaps?) that really needs a seperate .idata, but I haven't seen it yet. .idata appeared to go away when moving from Visual C++ 5 to Visual C++ 6 but Visual C++ 6 was still supporting Windows 95 and NT 3.51, so the .idata requirement can't have existed on those platforms, and Visual C++ 5 didn't support anything else. The .bss elimination optimization is a total no-brainer: it's inconceivable to imagine a system that is capable of initializing data to values through .data but has a hard dependency on any initialization to zero to be performed through .bss.
My best guess is that in order to maintain compatibility with older OSes the linker team just forked a pile of logic for "new" platforms and enhanced that logic while leaving the old logic untouched. That's most unsatisfying because it implies there's a set of things that older OSes really do need to be performed differently, which I just haven't found yet.