gpt_loader.sys revisited, file read problem |hardwarefetish.com

gpt_loader.sys revisited, file read problem

By dose | June 28, 2015
Under: Uncategorized

It’s been over a year since I last analyzed and fixed a bug in the Paragon GPT
loader driver which enables us Windows XP users to use GPT partitioned drives
beyond 2TB in size.
Last time, I fixed a severe bug that caused the driver to crash.
This time a user reported a strange bug with the driver in the comments section
which I also experienced once but first ignored it:

The Problem

When reading files that are located beyond the 2TB area, massive memory usage
occurs and as a result, the computer slows down to a crawl. This is especially
a problem if you are copying large files from your 2TB harddisk.
More informations about the problem can be found in the comments for the last
fix where a user reported them.

Time to take a further look at the problem:

When copying file from T: to S: where T: is the GPT-drive that gpt_loader.sys
handles, during copy the physical memory usage increases a lot and the
Lazywriter thread starts flushing data to the SOURCE file on T: as can be seen
in Filemon:
22:37:23,1432674 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.833.536, Length: 32.768 22:37:23,1434848 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 18.677.760, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1437603 System 4 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.899.072, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1437950 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.833.536, Length: 32.768 22:37:23,1438333 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.866.304, Length: 32.768 22:37:23,1438498 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 18.743.296, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1439166 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.866.304, Length: 32.768 22:37:23,1440081 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.899.072, Length: 32.768 22:37:23,1442021 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 65.536 22:37:23,1442089 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 0, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1442179 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 18.808.832, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1442302 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 131.072 22:37:23,1442350 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 65.536, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1442485 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 196.608 22:37:23,1442530 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 131.072, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1442665 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 262.144 22:37:23,1442716 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 196.608, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1442846 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 327.680 22:37:23,1442893 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 262.144, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1443031 System 4 FASTIO_ACQUIRE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS EndingOffset: 393.216 22:37:23,1443078 System 4 WriteFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 327.680, Length: 65.536, I/O Flags: Non-cached, Paging I/O 22:37:23,1444213 System 4 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.964.608, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1444334 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.899.072, Length: 32.768 22:37:23,1444757 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.931.840, Length: 32.768 22:37:23,1445042 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.931.840, Length: 32.768 22:37:23,1445740 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.964.608, Length: 32.768 22:37:23,1445810 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 18.874.368, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1448946 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1449500 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 18.939.904, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1452567 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1453098 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.005.440, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1455882 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1456700 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.070.976, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1459316 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1460420 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.136.512, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1462664 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1465207 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.202.048, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1466096 System 4 FASTIO_RELEASE_FOR_MOD_WRITE T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS 22:37:23,1468814 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.267.584, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1470371 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.964.608, Length: 32.768 22:37:23,1470723 System 4 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.030.144, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1471336 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.997.376, Length: 32.768 22:37:23,1471603 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 32.997.376, Length: 32.768 22:37:23,1471962 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.030.144, Length: 32.768 22:37:23,1472509 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.333.120, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1475671 System 4 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.095.680, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1476093 System 4 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 19.398.656, Length: 65.536, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O 22:37:23,1476369 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.030.144, Length: 32.768 22:37:23,1477412 TOTALCMD.EXE 1152 ReadFile T:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.062.912, Length: 32.768 22:37:23,1477688 TOTALCMD.EXE 1152 WriteFile S:\(2013-12-01 23-01) Mama Illegal ORF2 N.ts SUCCESS Offset: 33.062.912, Length: 32.768 … 
Looking at the callstack of the first write to the source file, the write
operation roots at nt!MiMappedPageWriter.
This is the lazy writer thread that periodically sweeps through the dirty
pages and flushes them to disk.
So first conclusion is that there must be some memory mapped page containing
the source file whose pages got dirty for some reason. As they are dirty, the
system needs to cache their content which in turn seems to create the huge
memory usage ignoring the disk cache limits.
Dirty pages also need to be flushed back to disk which causes additional load
for no reason and theoretically may be even dangerous as a file that is
only being read may get corrupted on power loss. As read files normally
don’t get corrupted it is assumed that pages are dirty even though they
haven’t changed their content.
The thread is processing the MmMappedPageWriterList.

Now when trying with another file not in cache and checking some statistics,
it can be seen that there are many dirty pages for the SOURCE file being copied
(which are being flushed back to disk):

lkd> !memusage
...
Control Valid Standby Dirty Shared Locked PageTables  name
8a0c75d8   748  77632 577680     0     0     0  mapped_file( (2013-12-28 20-10) Polt- ORF2 N.ts )
8a074c80   208  654244  1636     0     8     0  mapped_file( (2013-12-28 20-10) Polt- ORF2 N.ts )
...

See amount of dirty pages!

lkd> !ca 8a0c75d8

ControlArea  @ 8a0c75d8
Segment      e55cf488  Flink      00000000  Blink        00000000
Section Ref         1  Pfn Ref       2cc80  Mapped Views        4
User Ref            0  WaitForDel        0  Flush Count         0
File Object  8a182408  ModWriteCount     0  System Views        4

Flags (8080) File WasPurged

File: \(2013-12-28 20-10) Polt- ORF2 N.ts

Segment @ e55cf488
Type nt!_MAPPED_FILE_SEGMENT not found.
lkd> !ca 8a074c80

ControlArea  @ 8a074c80
Segment      e386d998  Flink      00000000  Blink        00000000
Section Ref         1  Pfn Ref       2cc50  Mapped Views        2
User Ref            0  WaitForDel        0  Flush Count         0
File Object  8a1d4628  ModWriteCount     0  System Views        2

Flags (8080) File WasPurged

File: \(2013-12-28 20-10) Polt- ORF2 N.ts

Segment @ e386d998
Type nt!_MAPPED_FILE_SEGMENT not found.
lkd> !fileobj 8a182408

\(2013-12-28 20-10) Polt- ORF2 N.ts

Device Object: 0x8b560be8   \Driver\gpt_loader
Vpb: 0x8b57a450
Event signalled
Access: Read SharedRead

Flags:  0xc0062
Synchronous IO
Sequential Only
Cache Supported
Handle Created
Fast IO Read

FsContext: 0xe5584850    FsContext2: 0xe55849a8
Private Cache Map: 0x89e93b50
CurrentByteOffset: 2cc50000
Cache Data:
Section Object Pointers: 8a1dba3c
Shared Cache Map: 89e93a78         File Offset: 2cc50000
Vacb: 8b5d87f8
Your data is at: d3ad0000
lkd> !fileobj 8a1d4628

\(2013-12-28 20-10) Polt- ORF2 N.ts

Device Object: 0x8b578e30   \Driver\Ftdisk
Vpb: 0x8b586af0
Event signalled
Access: Read Write SharedRead SharedWrite

Flags:  0x43062
Synchronous IO
Sequential Only
Cache Supported
Modified
Size Changed
Handle Created

FsContext: 0xe1584990    FsContext2: 0xe1584ae8
Private Cache Map: 0x89dd72d0
CurrentByteOffset: 2cc50000
Cache Data:
Section Object Pointers: 896adb14
Shared Cache Map: 89dd71f8         File Offset: 2cc50000
Vacb: 8b5dba68
Your data is at: c2ad0000

lkd> !object 8a182408
Object: 8a182408  Type: (8b60ee70) File
ObjectHeader: 8a1823f0 (old version)
HandleCount: 1  PointerCount: 3
Directory Object: 00000000  Name: \(2013-12-28 20-10) Polt- ORF2 N.ts {HarddiskGptVolume1}

So this is the file being READ from the GPT disk as suspected and it has dirty
pages for some unknown reason.
It is possible that the view originates from the cache manager.
Cache manager normally has some sort of write throttling so that available
cache memory cannot be exceeded, but as this occurs on a READ file, the
throttling doesn’t have any effect here leading to excessive memory usage.

So it is time to have a look at what gpt_loader is actually doing in its
processing routine for read/write. Translated to Pseudo C-Code, it’s
basically the following (largely shortened to the relevant calls):

ATA_PASS_THROUGH_DIRECT InputBuffer;
IO_STATUS_BLOCK IoStatusBlock;
NTSTATUS Status;
KEVENT Event;
PIRP AtaIRP;
union {
  USHORT AtaFlags;
  BOOL bRead;
} flg;
DWORD IoStatusInformation; // Returned later in Irp->IoStatus.Information as number of bytes transferred
DWORD nSectors = IoGetCurrentStackLocation(Irp)->Parameters.Read.ByteOffset.QuadPart / this->dw124;

flg.bRead = IoGetCurrentStackLocation(Irp)->MajorFunction == IRP_MJ_READ;
InputBuffer.DataBuffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddress, HighPagePriority);
flg.AtaFlags = ATA_FLAGS_48BIT_COMMAND | ATA_FLAGS_USE_DMA | (flg.bDoRead?ATA_FLAGS_DATA_IN:ATA_FLAGS_DATA_OUT);

for (IoStatusInformation = 0; nSectors > 0; IoStatusInformation+=InputBuffer.DataTransferLength)
{
  nSectorsRead = nSectors>(31 * (4096 / this->nBytesPerSector))?(31 * (4096 / this->nBytesPerSector)):nSectors;
  InputBuffer.DataTransferLength = nSectorsRead * this->nBytesPerSector;
  InputBuffer.AtaFlags = AtaFlags;
  // Omitted here: Fill InputBuffer with ATA-read command and data to read/write ...
  KeInitializeEvent(&Event, 0, 0);
  AtaIRP = IoBuildDeviceIoControlRequest(
    IOCTL_ATA_PASS_THROUGH_DIRECT,
    this->DeviceObject,
    &InputBuffer,
    sizeof(InputBuffer),
    &InputBuffer,
    sizeof(InputBuffer),
    0,
    &Event,
    &IoStatusBlock);
  if ((Status = IoCallDriver(this->DeviceObject, AtaIRP)) == STATUS_PENDING) {
    KeWaitForSingleObject(&Event, 0, 0, 0, 0);
    Status = IoStatusBlock.Status;
  }
  if (!NT_SUCCESS(Status)) break;
  nSectors -= nSectorsRead;
  InputBuffer.DataBuffer += nSectorsRead * this->nBytesPerSector;
}

When reading the documentation and what we can see here is that
IOCTL_ATA_PASS_THROUGH_DIRECT call requires not a MDL but a virtual address
where to read data to. So the driver does the obvious: It gets virtual
address from MDL via MmGetSystemAddressForMdlSafe and passes the pointer
to it to the lower level ATA driver so that the buffer gets read and filled.
Seems fine, right? And obviously works.
But from what I can see the following happens down the chain which causes the
unpleasant phenomenon mentioned above:
The lower level driver atapi.sys needs an MDL to read to, so
in IdeAtaPassThroughSetupIrp it does IoAllocateMdl for write access
with the virtual address passed in, assigns it to Irp->MdlAddress,
locks it with MmProbeAndLockPages and passes the call
through to the next driver. When the passthrough is done, it calls
its function IdeAtaPassThroughFreeIrp which does MmUnlockPages(Irp->MdlAddress).
On unlock, the page table entries of the write pages are marked as Modified
causing the unpleasant behaviour mentioned above.

Fixing it

So in order to circumvent this problem, the gpt_loader.sys driver instead
would need to allocate a buffer with size 0x1F000 bytes (maximum size supported
is 4096 * 31 for a block and it’s better to allocate the buffer once and reuse
it on every call than allocating and freeing it on every call, which looks a bit
expensive), let the lower level ATAPI driver read to that buffer
and then memcpy the read data from this buffer to the input buffer
to circumvent marking the pages dirty.

Now can this be fixed with patching? It seems to be quite hard as we must
actually add instructions to the driver without increasing its size or
overwriting vital functions.
First problem is the buffer space. This turns out to be easy. In generateLoader,
memory for the handling class is allocated with:

HandlerClass = malloc_pool(0x154u, NonPagedPool);

.00010877: 57                           push        edi
.00010878: 6854010000                   push        000000154
.0001087D: E8B4650000                   call       .000016E36

So we just add 0x1F000 to the size of the class structure and address
HandlerClass+0x154 as the buffer. This also ensures that it gets freed properly
on exit without the need to add free-function:

.00010878: 6854200000                   push        00001F154

The harder part is fixing the processIrp routine. Looking at the pseudo-code
above, we basically need to change the routine to the following:

ATA_PASS_THROUGH_DIRECT InputBuffer;
IO_STATUS_BLOCK IoStatusBlock;
NTSTATUS Status;
KEVENT Event;
PIRP AtaIRP;
union {
  USHORT AtaFlags;
  BOOL bRead;
} flg;
DWORD IoStatusInformation; // Returned later in Irp->IoStatus.Information as number of bytes transferred
DWORD nSectors = IoGetCurrentStackLocation(Irp)->Parameters.Read.ByteOffset.QuadPart / this->dw124;
PBYTE Buffer = MmGetSystemAddressForMdlSafe(Irp->MdlAddress, HighPagePriority);

flg.bRead = IoGetCurrentStackLocation(Irp)->MajorFunction == IRP_MJ_READ;
InputBuffer.DataBuffer = flg.bRead?this->offs154:Buffer;
flg.AtaFlags = ATA_FLAGS_48BIT_COMMAND | ATA_FLAGS_USE_DMA | (flg.bDoRead?ATA_FLAGS_DATA_IN:ATA_FLAGS_DATA_OUT);

for (IoStatusInformation = 0; nSectors > 0; IoStatusInformation+=InputBuffer.DataTransferLength)
{
  nSectorsRead = nSectors>(31 * (4096 / this->nBytesPerSector))?(31 * (4096 / this->nBytesPerSector)):nSectors;
  InputBuffer.DataTransferLength = nSectorsRead * this->nBytesPerSector;
  InputBuffer.AtaFlags = AtaFlags;
  // Omitted here: Fill InputBuffer with ATA-read command and data to read/write ...
  KeInitializeEvent(&Event, 0, 0);
  AtaIRP = IoBuildDeviceIoControlRequest(
    IOCTL_ATA_PASS_THROUGH_DIRECT,
    this->DeviceObject,
    &InputBuffer,
    sizeof(InputBuffer),
    &InputBuffer,
    sizeof(InputBuffer),
    0,
    &Event,
    &IoStatusBlock);
  if ((Status = IoCallDriver(this->DeviceObject, AtaIRP)) == STATUS_PENDING) {
    KeWaitForSingleObject(&Event, 0, 0, 0, 0);
    Status = IoStatusBlock.Status;
  }
  if (!NT_SUCCESS(Status)) break;
  nSectors -= nSectorsRead;
  if (flg.AtaFlags & ATA_FLAGS_DATA_IN) {
    RtlCopyMemory(Buffer, InputBuffer.DataBuffer, InputBuffer.DataTransferLength);
    Buffer += nSectorsRead * this->nBytesPerSector;
  } else InputBuffer.DataBuffer += nSectorsRead * this->nBytesPerSector;
}

First, we need more space on the stack for our pointer:

.00015DEA: 8BFF          mov    edi,edi
.00015DEC: 55            push   ebp
.00015DED: 8BEC          mov    ebp,esp
.00015DEF: 81EC8C000000  sub    esp,00000008C
.00015DF5: A1008C0100    mov    eax,[00018C00]

So, change it to sub esp, 90h, so that [ebp-90h] is our new pointer:

.00015DEF: 81EC90000000  sub    esp,000000090

As there is new code to add, we need to create a new section for the code,
because there is not enough space to stuff all that into the original function.
We can cut off 0x200 bytes of the end of the .reloc section and create a new
code section for our code there.
But due to the alignment of .reloc, we also have to change the
section table to remove the discardable flag of .reloc, otherwise our code
will vanish when .reloc gets discarded. This unfortunately adds 1,75KB of
increased memory usage to our driver, but that shouldn’t hurt you too much
I guess 😉
Next we have to ensure that our new buffer pointer gets initialized properly
with the target and the InputBuffer.DataBuffer gets setup correctly to our
new buffer. Here is the original code where buffer gets initialized:

.00015E92: 8945CC        mov    [ebp][-34],eax        ; InputBuffer.DataBuffer
.00015E95: 3BC3          cmp    eax,ebx
.00015E97: 7517          jne    .000015EB0
.00015E99: BE170000C0    mov    esi,0C0000017

We are moving this to a seperate routine in order to be able to place a
call in here:

.00015E92: E8694F0000    call   .00001AE00
.00015E97: 7517          jne    .000015EB0

Now there is one very important thing to consider: The routine we are patching is
a read/write routine, so we only need to do all that buffer copy magic on read,
not on write or we will be toast!
[ebp][-49] contains a flag that is set when reading and not set when writing.
We can use that.

In our new routine at 00001AE00:

0001AE00: 385DB7         cmp    [ebp][-49],bl         ; Check if we want to read or write
0001AE03: 8BD8           mov    ebx,eax               ; On write, set eax buffer directly like it used to be
0001AE05: 740C           je     .00001AE13            ; Jump on write, on read instead:
0001AE07: 898570FFFFFF   mov    [ebp][-00000090],eax  ; Fill our stack variable with ptr to dest buffer
0001AE0D: 8D9E54010000   lea    ebx,[esi][00000154]   ; Pointer to buffer in Class that we allocated on read
0001AE13: 895DCC         mov    [ebp][-34],ebx        ; Set InputBuffer.DataBuffer to class-buffer on read, to eax (direct MDL buffer) on write
0001AE16: 33DB           xor    ebx,ebx               ; Restore abused ebx to 0
0001AE18: 3BC3           cmp    eax,ebx               ; Do comparison we had to eliminate for CALL
0001AE1A: C3             retn                         ; ...and back

Next comes the part IoStatusInformation+=InputBuffer.DataTransferLength at the
end of the loop that needs to be adapted so that content of temporary buffer
can be copied to input IObuffer from MDL:

.00016001: 11559C        adc    [ebp][-64],edx
.00016004: 0FAF45A8      imul   eax,[ebp][-58]
.00016008: 0145CC        add    [ebp][-34],eax       ; InputBuffer.DataBuffer+=eax
.0001600B: 8B45C0        mov    eax,[ebp][-40]       ; eax=InputBuffer.DataTransferLength
.0001600E: 014594        add    [ebp][-6C],eax       ; IoStatusInformation+=eax
.00016011: 395DA0        cmp    [ebp][-60],ebx       ; nSectors==0?
.00016014: 0F87C6FEFFFF  ja     .000015EE0
.0001601A: EB32          jmps   .00001604E

ecx isn’t used for anything in this routine starting from this point, so
we can reuse it as counter for memcpy without saving.
eax also isn’t used anywhere else so we can just fill ecx instead
of eax here and use eax for incrementing src ptr later
(as DataTranferLength theoretically can be < nSectorsRead*BytesPerSector
on incomplete reads, although that shouldn’t happen).
But we also need to skip InputBuffer.DataBuffer+=eax on read operation,
as we are always reading to the same temp buffer and only incrementing
our dest pointer we copy the memory to. Therefore we move up
eax=InputBuffer.DataTransferLength to overwrite the add and then call
our new routine at AE20:

.00016008: 8B4DC0        mov    ecx,[ebp][-40]       ; ecx=InputBuffer.DataTransferLength
.0001600B: 014D94        add    [ebp][-6C],ecx       ; IoStatusInformation+=ecx
.0001600E: E80D4E0000    call   .00001AE20           ; Call our new routine
.00016013: 90            nop                         ; Padding

In our new routine, we can finally do the copy.
As copying up to 0x1900 bytes with repe movsb doesn’t look particularily fast,
better use memcpy which is located at 6CF2 in our driver version.
Unfortunately the bRead-Flag is a union with the USHORT AtaFlags which gets set at
5ED3, so we have to test for ATA-Flags now

Read = ATA_FLAGS_48BIT_COMMAND | ATA_FLAGS_USE_DMA | ATA_FLAGS_DATA_IN = 0x1A,
Write = ATA_FLAGS_48BIT_COMMAND | ATA_FLAGS_USE_DMA | ATA_FLAGS_DATA_OUT = 0x1C

.0001AE20: 8A5DB6        mov    bl,[ebp][-4A]        ; Fetch ATA Flags
.0001AE23: 80FB1C        cmp    bl,01C               ; Read or write?
.0001AE26: 7505          jne    .00001AE2D           ; Jump on read, on write do:
.0001AE28: 0145CC        add    [ebp][-34],eax       ; InputBuffer.DataBuffer+=eax
.0001AE2B: EB17          jmps   .00001AE47           ; Do other missing stuff and back, on read do:
.0001AE2D: 50            push   eax                  ; Save eax to increment Dest ptr
.0001AE2E: 51            push   ecx                  ; Length = ecx
.0001AE2F: FF75CC        push   d,[ebp][-34]         ; Source = InputBuffer.DataBuffer
.0001AE32: FFB570FFFFFF  push   d,[ebp][-00000090]   ; Destination = Dest ptr to current loc in MDL
.0001AE38: E8B5BEFFFF    call   memcpy               ; ntoskrnl.exe
.0001AE3D: 83C40C        add    esp,00C              ; Fix the stack
.0001AE40: 58            pop    eax                  ; Restore eax
.0001AE41: 018570FFFFFF  add    [ebp][-00000090],eax ; Update Dest ptr to current loc in MDL
.0001AE47: 33DB          xor    ebx,ebx              ; Restore ebx
.0001AE49: 395DA0        cmp    [ebp][-60],ebx       ; nSectors == 0? (we deleted this with our call)
.0001AE4C: C3            retn                        ; ...and back

Patch

Of course use that on your own risk, I do not guarantee for anything, but for me
this fixes the bug and the driver now works flawlessly 🙂
I wrote a little patcher that
patches the driver accordingly. Just run it and
if it patched successfully, reboot the system to load the fixed version of the
driver.
Feel free to try it and if you are also suffering from this problem, you can
leave a comment if this actually fixes it for you too.

If you haven’t done it already, also apply the first patch which fixes a
BSOD problem, every patch is for a certain problem only, this patch
therefore doesn’t contain the fixes from the BSOD patch.

For those who use crappy Antivirus programs like Antivir, don’t get fooled by the generic Antivirus signature-match TR/Downloader.gen (which is really stupid, as the exeutable doesn’t even call any Internet functions, so how should that download anything??), you can check with Virustotal.
If you have such an Antivirus program, use this build instead which is a larger executable but isn’t subject to false positives.

26 comments | Add One

Comments

Isidro - 07/6/2015 at 03:50
I have two GPT_LOADER, one that comes with Hitachi HD (although it does work without an Hitachi), and the one in Paragon. Patch1 solved BSOD. Patch2 caused read errors (without GPT drives): Chipset intel B85, two 2TB drives using standard partitions. Returning to non Patch2 solved read errors. Will test the non Hitachi GPT_LOADER and report later.
Isidro - 07/8/2015 at 05:54
Non Hitachi GPT_loader works ok with both patches.
Only annoyance (of all versions), is the BSOD when one tries to disable an HD from Device Manager, and the imposibility to detect a new HD hot plugged. (Detect new hardware does not detect it if gptloader installed).
Atlan - 07/8/2015 at 18:59
GPT Loader is a SCSI Driver and works only in Pata Mode and not for sata mode or raid. Pata doesn’t support hot plugging for HDs. If you switched the SATA Controller Mode in Bios from SATA (AHCI) to PATA Compatible, so you can’t changed HDs by running System, if it using GPT Driver. That means Harddisk bigger than 2TB or smaller 2TB but with GPT Partition. 2TB Harddisk in MBR shouldn’t resulting in BSOD by disabling them. If you can make different Settings for Controller, so only the Controller set to PATA will result in BSOD. That is normal. Only SATA supporting the hotplug.

The Hitachi GPT Loader is a special Version from Paragon for Hitachi. There are some differents only Hitachi knows. Paragon doesn’t offer support for the hitachi Version of GPT Loader.
Atlan - 07/8/2015 at 19:05
With the GPT Loader by Paragon Disk Manager Professional 12 both patches works great here, now. Really good work. All my test was positive successfully. No bugs or errors by copy transfer. Files are ever binary identical on my tests. No problem by system performance anymore. Thanks.
DCT - 10/18/2015 at 13:03
Hello! I’ve found another issue using gpt_loader – BSOD 0x00000035 NO_MORE_IRP_STACK_LOCATIONS
It seem to happen when there are many HDD/Flash drives connected to motherboard and there are read/write operations with 3TB HDD. The bug is severe on my configuration when there are 7 HDDs connected but not happenned when there are only 2-3 HDDs.
BSOD is caused by PartMgr.sys+921 (according to BlueScreenView).
The current way to fix it is to substitute disk.sys and partmgr.sys by files from windows 2003 server sp2 (the server OS is more tolerant to many HDDs, the files could be unpacked from sp2 update of windows 2003 server available on Microsoft site). But the substitution of windows xp original files are not quite a good idea.
Is there any smart way to fix this bug? For example, is it possible to change maximum size of IRP_STACK of PartMgr.sys? Or could it be fixed by patching gpt_loader.sys to use less IRP?
DCT - 10/18/2015 at 14:37
>[i]Only annoyance (of all versions), is the BSOD when one tries to disable an HD from Device Manager, and the imposibility to detect a new HD hot plugged. (Detect new hardware does not detect it if gptloader installed).[/i]

See my reply in 1st patch topic how to fix it.
Rickz - 05/9/2017 at 08:48
@dose

any advices to fully reproduce the bug’s behaviour before using this patch??

i want to to fully test, disabling the AHCI/drivers did part of the trick for me, i wonder if i will have to use this patch, how to test my disk ? any recommendations ??

My Best Regards
dose - 05/9/2017 at 10:10
@Rickz:
You can fill up your harddisks with files until you are sure that you are using >2TB. Then create a huge file with 100GB or something like that and try to copy it to another harddisk and watch memory usage and Disk activity after copying the file. Eve after copying the file, your Disk activity on the source disk should be high as cache manager unnecessarily is writing back the file that has been read.
If you only have say 2GB of RAM and swap file enabled, your system may also become slow due to the high amount of paging activity.
Rickz - 05/10/2017 at 06:25
“writing back the file that has been read”
that sounds dangerous some how, could be data became corrupted ??
i mean i have almost a partition of 2.2 TB full, only 186 GB is empty , other partition has 1.2 TB free, the disk has a total of 5 TB capacity,

i’m thinking of copying some mkv movies , total of 100 GB as you sugessted , to other disk, posibly an external usb one. that could be the test you’re telling us ?

please confirm

My Best Regards
Rickz - 05/10/2017 at 06:27
@dose
just in case that matters, i have a swap file enabled and 3.5 GB of memory
dose - 05/10/2017 at 13:01
@Rickz: Under normal circumstances, no data corruption should occur, as the exact same data is written beack from the memory pages, but it may slightly increase the chance for data corruption in case of unexpected power loss or other disruptions of the writer thread.
To reproduce the issue, you must ensure that the data you read from the drive is located beayond the 2TB area. Due to the fragmentation of the drive, it may or may not be the case, but as your drives are pretty filled, the chance is high that you may hit the bug when you read a big file that you recently wrote to the disk.
You can check what’s going on with Sysinternals’ Process Monitor. If the System-Process also writes your source file back like mentioned in the article, you hit the bug.
Rickz - 05/11/2017 at 05:46
@dose
i applied your second patch because i trust you know what you’re doing here,
still haven’t make the long test.
i enabled AHCI back so my ssd will increase performance, had to use adaptive restore from paragon to remove some drivers that wouldn’t let me boot after enabling AHCI, system started ok, what i see is that i have some suspicious file called gzflt.sys which is in every driver of the other 2 disk attached
i don’t remember exactly which driver originally use windows for the drives.
this screen is the same for the 3 drives i have:
https://ibb.co/fBwKVk
i want to revert back to only oficial windows driver and gpt loaer for 5 TB drive, could you point me in the right direction?
looking around i found out that gzflt.sys maybe belong to bitdefender, i never have installed this AV before, no idea if this suspicious file is safe or not.
please help needed!
Rickz - 05/11/2017 at 06:05
correction: is not gzflt , is Gdflt.sys
don’t even have properties to check company. does paragon use this file ??
dose - 05/11/2017 at 12:00
@Rickz: Gdflt.sys is not part of Paragon GPT loader.
It seems to be Gigabyte GA-Z68X-UD5-B3 (rev. 1.0) 3TB+ Unlock:
https://www.gigabyte.com/microsite/276/3tb.html

Judging from the description, it allows you to use space beyond the 3TB area and creating partitions in that space which wouldn’t be possible in Windows XP default installation. As I don’t know and don’t use this utility, I cannot give you any advice for it.
Rickz - 05/11/2017 at 13:19
@dose
thanks for looking up!!i remember i installed this driver which didn’t work and used the uninstaller to remove it, so i see some program are garbage that left all traces even if removed.

how do you recomend to uninstall this file or drivers?
dose - 05/11/2017 at 15:33
@Rickz: Drivers normally have an .inf file shipped with them, which should contain uninstall information.
You can try to run it by using
RunDll32 setupapi.dll,InstallHinfSection DefaultUninstall 132
If no .inf available, you can try to remove the filter driver via devcon urility, i.e.:
devcon classfilter Ports upper !

Use at your own risk.
Rickz - 05/13/2017 at 17:40
Regards,

for the records and for someone who like to test,
according to this info:
http://www.overclock.net/t/1227636/how-to-change-sata-modes-after-windows-installation

anyone can switch from ide to ahci and viceversa, i was checking and both keys i have has :
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\iaStor]
“Start”=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\PCIIde]
“Start”=dword:00000000

i have switched between both modes and no BSOD, i was checking again and values haven’t change.
anyway, my 5 TB disk (2 partitions) did not appear in disk management, since i’m interested in having ahci for my ssd boot disk performance, was digging some utility like the gigabyte one that is suppose to past the XP limits for big hard drives.
i have an asus board with intel p43 chipset, so i downloaded this utility:
http://event.asus.com/mb/2010/Disk_Unlocker/
installed and restart windows, now i have my disk working with ahci, i haven’t uninstall paragon, i believe having both driver made the magic to make this possible. so far everything is working,
while testing and uninstalling/installing everything broke my paragon installation working with ahci, the disk never appeared again in disk management i couldn’t reproduce this so that’s why i installed disk unlocker from Asus, disk appeared again and working, i have applied 2 patch from @dose to make sure paragon will work as it should. i will report back if something goes wrong.

My Best Regards
Rickz - 05/21/2017 at 00:50
@dose

unfortunately, i have bad news, still the bug persist, tried to copy on a partition where 182 GB are free a 4 gb video file, the copy process starts OK, says missing 30 seconds to finish copy then time increase to 1 minute, 2, 25 minute,,, this happens when reach copying the half part of the file, system become unstable and unresponsive, so i had to forcely shut it down, tried reproducing this twice and same result
Rickz - 05/21/2017 at 01:10
btw the file never finished the copy process, compared both CRC32 and are not the same.
dose - 05/21/2017 at 09:11
@Rickz: As said, you can verify if it is the bug described here using Sysinternals Process Monitor.
Could also be related to AHCI usage or Disk unlocker, I’m not visionary 😉
What you can try to do is to split your partitions so that they don’t exceed 2TB in size.
Rickz - 05/22/2017 at 09:27
@dose
switched back to IDE/Enhanced mode as Asus bios says, the files was copied OK to the 186 Gg free space partition, in the other partition which still has 1.08 TB the file was copied either in AHCI, the conclution i guess: when the HD has less size than half of partition space then comes the problem while copying in AHCI mode, i’m i wrong ?
both partition has 2.26/2.27 TB

My Best Regards
Rickz - 05/22/2017 at 09:32
i believe either a patch could be made to pass those limits, so this driver could work in both mode ahci/ide
dose - 05/23/2017 at 10:46
@Rickz: As said, the driver only operates with IDE commands, there is no way around that.
If the driver down the chain that tries to translate the IDE commands to AHCI somehow has problems, you need to contact the driver’s vendor for further support.
xpuser - 02/20/2018 at 13:31
Hi dose, I tried to use your patch but unfortunately it doesn’t work for me. It gives me BSOD (the 0x0000007B error) on boot, even in safe mode, and it not possible to start Recovery (for another reason) to check the disks. I managed to restore the original driver and I can boot now.

My HD is 4T, and it is separated into two partitions. The first partition is 2048MB and the second is ~17000MB. I have the memory problem with the second partition only.
I would appreciate any help you can provide, thanks.
xpuser - 02/23/2018 at 23:04
I was wrong, I get the error because windows can’t find or start the driver. Probably I have different driver.
Georg - 04/23/2018 at 23:16
Installed patches gptpatch. Everything worked well.
Recently, often a BSOD error win32k.sys
Please tell me to solve the problem, please.

Full text of BSOD
STOP: 0X0000008e (0xC0000005, 0xBF84ADAE, 0xB2160A60, 0x00000000)
win32k.sys – Address BF84ADAE base at BF800000, DateStamp 59e3bd69

Trackbacks

« Recovery of a Panasonic DVR Recorder Harddisk

Windows 7 x64 and Office 2013 refusing to print »

hardwarefetish.com