Thursday, November 24, 2011

Investigating an SR.sys Issue

This week I've looked at an interop issue with SR.sys and I want to share the results and the investigation. The filter I was looking at (MyFilter) is an SFO type filter which completes certain IRP_MJ_CREATEs for specific FILE_OBJECTs and then implements all the requests for them. As usual great care must be taken that none of the FILE_OBJECTs that MyFilter owns (SFOs) are ever seen below my filter (in other filters or in the file system) or those filters or the file system might think they own the FILE_OBJECTs and start walking their private structures (like the FCB or CCB) and get very confused about things and then either end with a bugcheck in the good case or data corruption in the bad case.
The problem I was investigating was exactly one of these, where somehow one of my FILE_OBJECTs ended up on the lower file system, NTFS. This is what the stack looked like:
f511ee9c f840cedc f511ef88 82035578 f511eed8 Ntfs!NtfsDecodeFileObject+0x37
f511ef10 f840b49c f511ef88 81fd5900 81cfb770 Ntfs!NtfsCommonQueryInformation+0x56
f511ef74 f840b4d5 f511ef88 81fd5900 00000001 Ntfs!NtfsFsdDispatchSwitch+0x12a
f511f098 804ef18f 81cfb770 81fd5900 81fd5900 Ntfs!NtfsFsdDispatchWait+0x1c
f511f0a8 f849852d 81d1fc10 f511f16e f511f15c nt!IopfCallDriver+0x31
f511f0d4 f849282c 81cfb770 82035578 f511f16e sr!SrQueryInformationFile+0x99
f511f100 f8492f33 00000034 82035578 f511f15c sr!SrpGetFileName+0x32
f511f270 f84936d1 81d1fc10 f511f2e0 f511f2d7 sr!SrpExpandPathOfFileName+0x19f
f511f290 f8493873 81d1fc10 823896a0 f511f2e0 sr!SrpGetFileNameFromFileObject+0xe7
f511f3f4 f848e8c2 81d1fc10 823896a0 00000000 sr!SrFileAlreadyExists+0x5f
f511f44c 804ef18f 81d1fc10 00000002 82130c98 sr!SrCreate+0x19c
f511f45c f84ab6c3 823896a0 82130ca8 823ca2e0 nt!IopfCallDriver+0x31
f511f48c 804ef18f 81e32020 82130c98 82130c98 fltMgr!FltpCreate+0x1d9
f511f49c 805831fa 81ccdb68 81fddfd4 f511f634 nt!IopfCallDriver+0x31
f511f57c 805bf444 81ccdb80 00000000 81fddf30 nt!IopParseDevice+0xa12
f511f5f4 805bb9d0 00000000 f511f634 00000640 nt!ObpLookupObjectName+0x53c
f511f648 80576033 00000000 00000000 36039c00 nt!ObOpenObjectByName+0xea
f511f6c4 80576a20 f511f848 00100002 f511f850 nt!IopCreateFile+0x407
f511f70c f84ad5b9 f511f848 00100002 f511f850 nt!IoCreateFileSpecifyDeviceObjectHint+0x52
f511f7b8 f84ada28 81d2d2d0 81d31008 f511f848 fltMgr!FltCreateFileEx+0x113
f511f7fc f500ea00 81d2d2d0 81d31008 f511f848 fltMgr!FltCreateFile+0x36
f511f868 f500ec87 81fc0994 81d31008 00000000 myfilter!MyCreateFile+0x100
Looking at the stack one thing looks really strange: FltMgr is calling into SR which is calling into NTFS. This indicates that SR is somehow loaded between FltMgr and NTFS. So I decided to see that the file system stack looks like:
1: kd> !fltkd.volumes

Volume List: 820210a0 "Frame 0" 
   FLT_VOLUME: 820ad668 "\Device\WebDavRedirector"
   FLT_VOLUME: 81d01168 "\Device\LanmanRedirector"
   FLT_VOLUME: 82005b48 "\Device\HGFS"
   FLT_VOLUME: 81ccbc18 "\Device\VhdDisk00000003"
      FLT_INSTANCE: 81d31008 "MyFilter Default Instance" "137000"
   FLT_VOLUME: 81d01c18 "\Device\VhdDisk00000002"
   FLT_VOLUME: 81cc8c18 "\Device\VhdDisk00000001"
   FLT_VOLUME: 8207fbd8 "\Device\HarddiskDmVolumes\PhysicalDmVolumes\BlockVolume1"
   FLT_VOLUME: 81e9a8a0 "\Device\HarddiskVolume2"
   FLT_VOLUME: 81d2e008 "\Device\HarddiskVolume1"
1: kd> !fltkd.volume 81ccbc18 

FLT_VOLUME: 81ccbc18 "\Device\VhdDisk00000003"
   FLT_OBJECT: 81ccbc18  [04000000] Volume
      RundownRef               : 0x00000004 (2)
      PointerCount             : 0x00000001 
      PrimaryLink              : [81d01c24-82005b54] 
   Frame                    : 82021000 "Frame 0" 
   Flags                    : [00000004] SetupNotifyCalled
   FileSystemType           : [00000002] FLT_FSTYPE_NTFS
   VolumeLink               : [81d01c24-82005b54] 
   DeviceObject             : 81e32020 
   DiskDeviceObject         : 81ccdb80 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Ntfs" 
   CDODriverName            : "\FileSystem\Ntfs" 
   Callbacks                : (81ccbca8)
   ContextLock              : (81ccbe38)
   VolumeContexts           : (81ccbe70)  Count=0
   StreamListCtrls          : (81ccbe74)  rCount=56 
   NameCacheCtrl            : (81ccbeb8)
   InstanceList             : (81ccbc64)
      FLT_INSTANCE: 81d31008 "MyFilter Default Instance" "137000"
1: kd> !devstack 81e32020 
  !DevObj   !DrvObj            !DevExt   ObjectName
> 81e32020  \FileSystem\FltMgr 81e320d8  
  81d1fb58  \FileSystem\sr     81d1fc10  
  81cfb770  \FileSystem\Ntfs   81cfb828 
So as you can see SR is indeed loaded between NTFS and FltMgr's frame0. As I mentioned in my post on how the file system stack is layered, in XP FltMgr doesn't create frame0 immediately and instead it waits for the first minifilter to register before it creates it. Since on my system there is no other minifilter FltMgr never created frame0 until I manually loaded my filter. However, since SR is a boot start driver by the time FltMgr initialized frame0 SR was already attached and FltMgr had no option but to attach on top of it.
Now, the FILE_OBJECT that NTFS chokes on is 82035578 and it is indeed an SFO. Looking on the stack we can see that it first appears when SR calls SrpGetFileName. There were two possibilities. Either I had leaked my SFO below my filter (either in this operation or at some point in the past) and SR got it and was using it or SR got their own FILE_OBJECT from my filter by issuing a request above my filter (most likely to the top of the stack). So I decided to see what the function that calls SrpGetFileName does (the function is rather long so I trimmed it a bit but it's still quite long so i tried to highlight things):
1: kd> uf sr!SrpExpandPathOfFileName
sr!SrpExpandPathOfFileName:
f8492d94 8bff            mov     edi,edi
f8492d96 55              push    ebp
…
f8492e3a 56              push    esi
f8492e3b 6800080000      push    800h
f8492e40 56              push    esi
f8492e41 56              push    esi
f8492e42 8385d4feffff02  add     dword ptr [ebp-12Ch],2
f8492e49 56              push    esi
f8492e4a 56              push    esi
f8492e4b 6821400000      push    4021h
f8492e50 6a01            push    1
f8492e52 6a03            push    3
f8492e54 6880000000      push    80h
f8492e59 56              push    esi
f8492e5a 8d85b0feffff    lea     eax,[ebp-150h]
f8492e60 50              push    eax
f8492e61 8d85b8feffff    lea     eax,[ebp-148h]
f8492e67 50              push    eax
f8492e68 6800001000      push    100000h
f8492e6d 8d85d8feffff    lea     eax,[ebp-128h]
f8492e73 50              push    eax
f8492e74 c785b8feffff18000000 mov dword ptr [ebp-148h],18h
f8492e7e 89b5bcfeffff    mov     dword ptr [ebp-144h],esi
f8492e84 c785c4feffff00020000 mov dword ptr [ebp-13Ch],200h
f8492e8e 899dc0feffff    mov     dword ptr [ebp-140h],ebx
f8492e94 89b5c8feffff    mov     dword ptr [ebp-138h],esi
f8492e9a 89b5ccfeffff    mov     dword ptr [ebp-134h],esi
f8492ea0 e8296bffff      call    sr!IoCreateFileSpecifyDeviceObjectHint (f84899ce)
f8492ea5 3bc6            cmp     eax,esi
f8492ea7 8985e8feffff    mov     dword ptr [ebp-118h],eax
f8492ead 7d08            jge     sr!SrpExpandPathOfFileName+0x123 (f8492eb7)

sr!SrpExpandPathOfFileName+0x11b:
f8492eaf c60701          mov     byte ptr [edi],1
f8492eb2 e9c0010000      jmp     sr!SrpExpandPathOfFileName+0x2e3 (f8493077)

sr!SrpExpandPathOfFileName+0x123:
f8492eb7 56              push    esi
f8492eb8 8d85dcfeffff    lea     eax,[ebp-124h]
f8492ebe 50              push    eax
f8492ebf a1389b48f8      mov     eax,dword ptr [sr!_imp__IoFileObjectType (f8489b38)]
f8492ec4 56              push    esi
f8492ec5 ff30            push    dword ptr [eax]
f8492ec7 56              push    esi
f8492ec8 ffb5d8feffff    push    dword ptr [ebp-128h]
f8492ece ff15349b48f8    call    dword ptr [sr!_imp__ObReferenceObjectByHandle (f8489b34)]
f8492ed4 3bc6            cmp     eax,esi
f8492ed6 8985e8feffff    mov     dword ptr [ebp-118h],eax
f8492edc 0f8c95010000    jl      sr!SrpExpandPathOfFileName+0x2e3 (f8493077)

sr!SrpExpandPathOfFileName+0x14e:
f8492ee2 8b85e0feffff    mov     eax,dword ptr [ebp-120h]
f8492ee8 ff7048          push    dword ptr [eax+48h]
f8492eeb ff15149c48f8    call    dword ptr [sr!_imp__IoGetAttachedDevice (f8489c14)]
f8492ef1 ffb5dcfeffff    push    dword ptr [ebp-124h]
f8492ef7 8bf8            mov     edi,eax
f8492ef9 ff15309b48f8    call    dword ptr [sr!_imp__IoGetRelatedDeviceObject (f8489b30)]
f8492eff 3bc7            cmp     eax,edi
f8492f01 7418            je      sr!SrpExpandPathOfFileName+0x187 (f8492f1b)

sr!SrpExpandPathOfFileName+0x16f:
f8492f03 8b85d0feffff    mov     eax,dword ptr [ebp-130h]
f8492f09 c785e8feffffd40000c0 mov dword ptr [ebp-118h],0C00000D4h
f8492f13 c60001          mov     byte ptr [eax],1
f8492f16 e95c010000      jmp     sr!SrpExpandPathOfFileName+0x2e3 (f8493077)

sr!SrpExpandPathOfFileName+0x187:
f8492f1b 8d85ecfeffff    lea     eax,[ebp-114h]
f8492f21 50              push    eax
f8492f22 ffb5dcfeffff    push    dword ptr [ebp-124h]
f8492f28 ffb5e0feffff    push    dword ptr [ebp-120h]
f8492f2e e8c7f8ffff      call    sr!SrpGetFileName (f84927fa)
…
So looking at the function we can see that SR is calling IoCreateFileSpecifyDeviceObjectHint and then gets the FILE_OBJECT associated with the handle it has (by calling ObReferenceObjectByHandle). From this call we can infer that the handle is stored in the local variable @ebp-128h and that the FILE_OBJECT is stored in the variable @ebp-124h. Then SR compares the device of the FILE_OBJECT with the device it is attached to and if they don't match it fails with status 0C00000D4h (STATUS_NOT_SAME_DEVICE; also please note that the status gets put into @ebp-118h). Then the function calls SrpGetFileName with the FILE_OBJECT it got from the call to IoCreateFileSpecifyDeviceObjectHint. Since all these are stored in local variables we can get the value of EBP for that function from the ChildEBP column (0xf511f270) and see which FILE_OBJECT they got:
1: kd> !error 0C00000D4h
Error code: (NTSTATUS) 0xc00000d4 (3221225684) - {Incorrect Volume}  The target file of a rename request is located on a different device than the source of the rename request.
1: kd> dp f511f270-0x128
f511f148  800005f4 82035578 81d1fc10 0000001e
f511f158  00000000 00fe0000 f511f16e 000000fe
f511f168  00000000 01000000 f511f0a8 f511f858
f511f178  f511f32c f83ea75b e113f858 ffffffff
f511f188  f83e5b5c e1cbada0 f511f858 81de4290
f511f198  00000000 00000000 f511f118 f511f1bc
f511f1a8  f511f32c f83ea75b f511f1c8 f511f2dc
f511f1b8  804e1ec4 f511f1fc f511f1d8 81cfb850
1: kd> !handle 800005f4 

PROCESS 81f43020  SessionId: 0  Cid: 0714    Peb: 7ffde000  ParentCid: 03ec
    DirBase: 02b40320  ObjectTable: e1518008  HandleCount:  32.
    Image: ifstest.exe

Kernel handle table at e1004000 with 345 entries in use

800005f4: Object: 82035578  GrantedAccess: 00100000 Entry: e1004be8
Object: 82035578  Type: (823aead0) File
    ObjectHeader: 82035560 (old version)
        HandleCount: 1  PointerCount: 2
        Directory Object: 00000000  Name: \opcreatg\ {VhdDisk00000003}
So as we can see SR actually got one my SFOs by sending a create to the top of the stack and then sent it directly to NTFS in a query. This was puzzling because SR is actually using IoCreateFileSpecifyDeviceObjectHint which is exactly what I expected it would use to target the IRP_MJ_CREATE appropriately. However, when looking at the call we can see that the DeviceObject member is passed in as "push esi" and it's hard to track exactly what a given register's value was at the time of the call without carefully analyzing the code. In this case however it seems we got lucky because a lot of other parameters are set up using the "push esi" instruction which means that either SR called IoCreateFileSpecifyDeviceObjectHint with a lot of the parameters set to the DEVICE_OBJECT or that it passed in NULL for the DEVICE_OBJECT. Looking at where ESI is initialized (which is not in the chunk of code I pasted here) we can see that indeed ESI is set to 0 and so now we know what is going on:
  1. SR calls IoCreateFileSpecifyDeviceObjectHint and sends a request to the top of the stack.
  2. SR takes the handle and resolves it to a FILE_OBJECT
  3. SR compares the DEVICE_OBJECT for the FILE_OBJECT it just got with the DEVICE_OBJECT it is attached to and if they are different it fails with STATUS_NOT_SAME_DEVICE.
  4. Finally SR uses the FILE_OBJECT in a call that is targeted below itself and thus my SFO reaches NTFS.
I'm not exactly sure why SR does this. It looks like the code was written so that it used IoCreateFileSpecifyDeviceObjectHint but then the DeviceObject was passed in as NULL which effectively changes it to IoCreateFile. But then the code itself checks whether the device for the FILE_OBJECT it gets is the same as the one it's attached on, which is something IoCreateFileSpecifyDeviceObjectHint would have done if used with a DeviceObject parameter. Anyway the problem is that SR sent requests to two different points on the stack, the IRP_MJ_CREATE to the top of the stack and then subsequent requests below itself on the stack and thus it runs into trouble when things change between the top of the stack and the altitude where SR is located. Had SR sent the IRP_MJ_CREATE below itself (which is the right behavior from a layering perspective) or the subsequent requests to the top of the stack (which is actually still wrong because it could lead to infinite loops and such) then it would have avoided this problem.

Thursday, November 17, 2011

Controlling the Load Order of File System Filters

In this post I'd like to talk about the factors that contribute to the loading order of file system filters. Of course, if all the filters on a system are minifilters then the load order is completely determined by their altitudes. But as it happens there are still some legacy filters out there and so one does occasionally have to deal with order inversions.
Before we go any further I'd like to add some links to some documentation that describes this. For minifilters altitudes there is the Load Order Groups and Altitudes for Minifilter Drivers MSDN page. For how the load order of regular drivers is calculated there is KB article 115486 on How To Control Device Driver Load Order. Also, for a refresher on how file system attach in general I'll refer you to my blog (Part1 and Part2) and to my post on the FLTP_FRAME structure.
It is useful to quickly go over how legacy file system filters typically attach:
  1. When the driver is loaded it calls IoRegisterFsRegistrationChange() to find the file system control device objects (CDOs).
  2. Then the driver attaches to each CDO and enumerates any existing volumes (VDOs) and attaches to them.
  3. For any mount request that arrives on the CDO the legacy file system filter can attach right in the mount path.
However, please note that it's possible (though not very common) that the file system filter has a user mode component that tells the filter to attach to some specific volume and as such it's possible that the legacy filter attaches to the volume out-of-order (or rather in no particular order). As you can expect it's impossible to predict or control the load order in this case because the filter will simply attach on top of whatever happens to be the top of the stack at that time.
Filter Manager is a legacy filter and it follows the usual legacy filter steps, but there is an added twist. FltMgr can attach to the same volume multiple times (each attachment is called a frame) and each such attachment follows the same steps as if it were a complete new filter (FltMgr attaches to the CDOs for all the file systems, enumerates volumes and attaches to them and so on). The frame right on top of the file system is frame 0 and the one on top of it is frame 1 and so on. The decision that a new frame is required is made when a new filter is registered and it is based on the following factors:
  • whether the altitude for the filter is higher than the highest altitude in the top frame. Each frame has a an bottom altitude and a top altitude and any filter with the altitude in that range belongs to that frame. On my Win7 machine when FltMgr creates frame0 it sets a bottom altitude of 0 and a top altitude of 49999 (though I'm not sure why or what are the guarantees around this top altitude; this post also indicates that things used to be different at some point). Naturally if the altitude already fits in one of the existing frames then the filter will be placed in that frame at the right place.
  • whether one or more legacy filters have attached on top of FltMgr. If no legacy filters have attached then FltMgr simply changes the altitude of the top frame to the altitude of the filter and registers the filter with that frame.
  • OS version. On XP, if the filter can't fit in the existing frames and there is a new legacy filter attached then FltMgr simply creates a new frame that has the bottom altitude right above the one of the previous top frame. In Vista and newer Oses the behavior is a bit different. If the altitude for the new filter is higher that the upper altitude of the topmost frame and a legacy filter has attached then FltMgr tries to identify the type of filter the legacy filter is by looking at the LoadOrderGroup and based on that it generates a fake altitude for the legacy filter and then if the top frame's upper altitude is below that fake altitude for the legacy filter then it adjusts the top frame's upper altitude to be right up to the legacy filter's fake altitude (legacy filters can only attach on top of the topmost frame and so only that frame's upper altitude changes). At this point FltMgr checks whether the filter will now fit in the top frame (which might happen since its altitude range has been extended) and if the new filter still doesn't fit there it will finally create a new frame.
Since DriverEntry is where most legacy filters attach to CDOs and most minifilters call FltRegisterFilter() from, the order in which events happen can generally be inferred just by looking at the load order group for each filter (legacy and mini) and by following the rules. Just to illustrate this let's say that we have two minifilters, MF1 (altitude 134999 which should make it a virtualization filter) and MF2 (altitude 324999 which makes it an anti-virus filter) and a legacy encryption filter, LF1. All the filters are BOOT start drivers. Ideally all of these would have their appropriate Load Order Group and things would work just fine. However I'd like to show a couple of scenarios where things can go wrong. For the scenarios please assume that all the drivers that I'm not specifically calling out are loading in their appropriate group.
  1. Let's say that MF1 discovers that there is no "FSFilter Virtualization" group on XP and decides to change its Load Order Group to the next group, the FSFilter Encryption group. Now what will happen is that either LF1 or MF1 can be started first (depending most likely on the order on which they were installed on the system). Let's say we are on XP and LF1 loads first. When MF1 loads and calls FltRegisterFilter() FltMgr will see that it has frame 0 with a range of 0-49999 and since LF1 is loaded and this is XP FltMgr will create Frame1 with the range 49999-134999 and load MF1 into that frame. The net result here is that MF1 is layered above LF1. However, all of Frame1 is on top of LF1 and so all the minifilters in the groups that fall into that range (FSFilter Copy Protection, FSFilter Security Enhancer, FSFilter Open File, FSFilter Physical Quota Management, FSFilter Virtualization and so on) will be above LF1 which might lead to issues in the long run. Now, on Vista and newer OSes the behavior for this scenario will be different. FltMgr will figure out that LF1 is an encryption filter and it will extend the range of frame0 to 0-149999 so that it covers the encryption range and so everything will be layered correctly. In my opinion it would be better if MF1 would select a load order group that is below the FSFilter Virtualization group for XP, FSFilter Physical Quota Management, which would guarantee that the minifilter loads before any legacy encryption filters and thus avoid the altitude inversion.
  2. Another possible scenario is where MF2 wants to load as early as possible and so it sets the LoadOrderGroup to FSFilter Bottom so that it loads and attaches really early on. In this case FltMgr will extend frame0 from 0-49999 to 0-324999 and load the minifilter in frame0. Then, when it loads LF1 it will attach it on top of frame0 and so now all the minifilters in frame0 will see only encrypted file data flowing through. As you can expect this will likely lead to problems at some point, either for MF2 or for some other filter that might be added to the system at a later time.
There really isn't much more I can say of the subject, all of it is fairly well documented except for how FltMgr does the frame altitude adjustment for Vista+, which isn't very complicated. I'll wrap things up with a couple of things I think filters should be aware of.
  • Minifilters still need to set an appropriate LoadOrderGroup, they can't just rely on the altitude mechanism because of interaction with legacy filters.
  • Legacy filters must use an appropriate LoadOrderGroup as well and they must also call IoRegisterFsRegistrationChange() (or IoRegisterFsRegistrationChangeMountAware() where available) otherwise FltMgr will not become aware of the legacy filter and will keep adding minifilters in existing top frame, leading to very interesting bugs.
  • Even though it's allowed that minifilters call FltRegisterFilter() at some later time (as opposed to registering directly from DriverEntry), it's generally better to call FltRegisterFilter() from DriverEntry which will associate the minifilter with the appropriate frame and perform the any altitude adjustment and should reduce interop issues with legacy filters.
  • Once a frame is no longer the top frame (i.e. after a new frame is created) its altitude range can no longer change at all. Only the top frame can change its altitude range, and only the upper altitude can change.
  • The upper altitude range for a frame can never decrease, it only increases.

Thursday, November 10, 2011

Filters And IRP_MJ_QUERY_INFORMATION


IRP_MJ_QUERY_INFORMATION is a request that file system filters must interact with quite frequently, either to process it or to issue a query to get some information from the underlying file system. The semantics are fairly simple and fairly well documented but still there are some implementation details that might make things interesting for a filter.
Looking at the documentation for IRP_MJ_QUERY_INFORMATION the following phrase stands out:


Although the FileAccessInformation, FileAlignmentInformation, and FileModeInformation information types can also be passed as a parameter to ZwQueryInformationFile, this information is file-system-independent. Thus ZwQueryInformationFile supplies this information directly, without sending an IRP_MJ_QUERY_INFORMATION request to the file system.


What this means is that the IO manager can extract the information from some other place, and considering this information can be requested on a per-handle basis, it's pretty clear that the information must come from the FILE_OBJECT or the handle information that the IO manager keeps in its handle tables. And indeed, if we look at the information classes that are singled out (FileAccessInformation, FileAlignmentInformation, and FileModeInformation) we can see where the information might come from in each case:

  • FileAccessInformation - the access rights for each handle are managed by the IO manager internally and they are not visible on the FILE_OBJECT or even stored in the file system. So even if the IO manager were to send an IRP, the file system itself wouldn't be able to answer the request because it just doesn't have that information. I should mentioned that this isn't necessarily true for remote file systems since the remote file system must perform access checks for the requests it receives over the wire anyway.
  • FileAlignmentInformation - this alignment is not something required by the file system anyway. This is related to the storage device on top of which the file system is mounted and so the IO manager could get it by querying the storage device. However, that's not really necessary since each DEVICE_OBJECT has an AlignmentRequirement member (again, this might not be true for remote file systems).
  • FileModeInformation - this information comes from the FILE_OBJECT and it's pretty transparent how it maps to the various FILE_OBJECT flags.

Frankly I expected to see another information class on the list, the FilePositionInformation. I thought the current pointer is maintained in the FILE_OBJECT->CurrentByteOffset and so the IO manager could just get it from there and not bother to send a request into the file system (after all the information must be stored on a per-FILE_OBJECT basis and so it can't be in the SCB or anything like that).
Anyway, it's interesting to see what the FastFat file system does for these requests. So looking at \src\filesys\fastfat\Win7\fileinfo.c we can see that for the three information classes mentioned in the documentation the IRP would be failed with STATUS_INVALID_PARAMETER. I was also curious to see what FastFat does for FilePositionInformation and while FastFat doesn't fail the request, it does what I thought it would do, which is to return the FILE_OBJECT->CurrentByteOffset value.
So far it's all pretty clear and it's not really problematic for filters since in most cases they don't really care about these information classes anyway and there is no chance to get an IRP_MJ_QUERY_INFORMATION request for any of them from the IO manager. However, there is a twist here. The FileAllInformation class includes all the four information classes mentioned above (FileAccessInformation, FileAlignmentInformation, FileModeInformation and FilePositionInformation) and is actually sent in the form of an IRP. So how does the file system get that information ?
Looking again at the FastFat implementation we can see the code fragment that is used to implement the FileAllInformation call:
            case FileAllInformation:

                //
                //  For the all information class we'll typecast a local
                //  pointer to the output buffer and then call the
                //  individual routines to fill in the buffer.
                //

                AllInfo = Buffer;
                Length -= (sizeof(FILE_ACCESS_INFORMATION)
                           + sizeof(FILE_MODE_INFORMATION)
                           + sizeof(FILE_ALIGNMENT_INFORMATION));

                FatQueryBasicInfo( IrpContext, Fcb, FileObject, &AllInfo->BasicInformation, &Length );
                FatQueryStandardInfo( IrpContext, Fcb, &AllInfo->StandardInformation, &Length );
                FatQueryInternalInfo( IrpContext, Fcb, &AllInfo->InternalInformation, &Length );
                FatQueryEaInfo( IrpContext, Fcb, &AllInfo->EaInformation, &Length );
                FatQueryPositionInfo( IrpContext, FileObject, &AllInfo->PositionInformation, &Length );
                FatQueryNameInfo( IrpContext, Fcb, Ccb, &AllInfo->NameInformation, &Length );

                break;
So as we can see FastFat doesn't even attempt to return the data for those information classes mentioned in the documentation. So how are they populated ? My first guess was that the IO manager populates them after the request completes and before returning the buffer to the caller. But when I tried to validate my assumption by setting a write breakpoint on the location in the buffer where the FILE_ACCESS_INFORMATION structure is the breakpoint never got hit in the path I expected it to.. After some more investigation I realized that by the time my filter got the request, the FILE_ACCESS_INFORMATION was already populated:

1: kd> kn
 # ChildEBP RetAddr  
00 a625eb6c 96016aeb myfilter!PreQueryInformation+0x29c
01 a625ebd8 960199f0 fltmgr!FltpPerformPreCallbacks+0x34d
02 a625ebf0 96019f01 fltmgr!FltpPassThroughInternal+0x40
03 a625ec14 9601a3ba fltmgr!FltpPassThrough+0x203
04 a625ec44 828884bc fltmgr!FltpDispatch+0xb4
05 a625ec5c 82aa8f24 nt!IofCallDriver+0x63
06 a625ed18 8288f44a nt!NtQueryInformationFile+0x779

1: kd> ?? ((PFILE_ALL_INFORMATION)Data->Iopb->Parameters.QueryFileInformation.InfoBuffer)->AccessInformation
struct _FILE_ACCESS_INFORMATION
   +0x000 AccessFlags      : 0x120089
So the way NtQueryInformationFile works for FileAllInformation is by populating the buffer with the information it has access to before sending it to the file system and then the file system fills in the rest. With this in mind there are a couple of things that filters must be careful about:

  • When processing a FileAllInformation request the filter must be careful not to overwrite the information that was already written by the IO manager. So don't call RtlZeroMemory() for that buffer or reuse it for some other purpose. Also, if completing an FileAllInformation query from assembling bits from some other sources (other queries into an underlying file system or some such) the filter must be careful about how it copies the data into the user's buffer. I've see cases where in response to a FileAllInformation request the filter allocated its own buffer, sent its request using FltQueryInformationFile() and then copied the resulting buffer over the user's buffer and that is broken. This is because:
  • FltQueryInformationFile() is not meant to be identical to ZwQueryInformationFile(). It is simply a wrapper over allocating an IRP and sending the request to the file system, so some (all ?) of the requests that would be completed by the IO manager without sending an IRP will just fail for FltQueryInformationFile().
  • Filters that implement more of the file system functionality need to behave more like a file system so for example a filter that owns its own FILE_OBJECTs must make sure to keep the CurrentByteOffset updated since the FilePositionInformation request might be completed above them by the IO manager or some other filter that will simply look in the FILE_OBJECT.
Finally, I wanted to mention one particular documentation page on MSDN that I find very useful when dealing with Information classes, the page for FileInformation Classes. I have a hard time remembering which ones are only for set and which ones are query-only and which are both and how they are handled and this page helps a lot. Please note however that this page is written with remote file systems in mind and so some of the information isn't exactly the same for local file systems. Still I find it quite useful whenever I have to deal with this topic.

Thursday, November 3, 2011

Byte Range Locks and IRP_MJ_CLEANUP


Byte range locks are not a complicated concept but there are some interesting implementation details that might make life hard for a filter. I ran into this a couple of days ago when I was tracking down some IFS tests failures related to locking (in particular the UnlockRangeOnCloseTest test from the FileLocking group).
Byte range locks are documented fairly well, at least when compared with other concepts. There is the Lock 'Em Up - Byte Range Locking OSR article and an MSDN page on Locking and Unlocking Byte Ranges in Files. However, for this discussion, the relevant feature is described in the user mode API for locking files, LockFileEx(). This is the quote:


If a process terminates with a portion of a file locked or closes a file that has outstanding locks, the locks are unlocked by the operating system. However, the time it takes for the operating system to unlock these locks depends upon available system resources. Therefore, it is recommended that your process explicitly unlock all files it has locked when it terminates. If this is not done, access to these files may be denied if the operating system has not yet unlocked them.


So what this means is that a process doesn't necessarily have to release all its locks on a file before closing the handle it has and the OS will release all the locks on its behalf (though this is not the recommended way of doing things). There is an interesting aspect here that is worth noting. In fact, whenever the documentation says that something happens automatically for a handle when its closed I immediately think about what happens about handles in different processes that point to the same object. For example, what happens when a file is opened with handle A (HA) in process A and then process A creates process B in such a way that process B inherits the handle from process A (HB). Both HA and HB point to the same FILE_OBJECT and when the first handle is closed nothing particularly interesting happens for the file system (the IRP_MJ_CLEANUP only gets sent when the last handle to a FILE_OBJECT is closed). For the rest of this post let's assume that HA is closed first and then HB is closed and the closing of the HB handle is the one that prompts the IO manager to send the IRP_MJ_CLEANUP call.
So now let's look at what happens in FastFat to handle this case. Looking at the code that processes IRP_MJ_CLEANUP (in \src\filesys\fastfat\Win7\cleanup.c) we find this block of code:
            //
            //  Unlock all outstanding file locks.
            //

            (VOID) FsRtlFastUnlockAll( &Fcb->Specific.Fcb.FileLock,
                                       FileObject,
                                       IoGetRequestorProcess( Irp ),
                                       NULL );
There are two interesting things to note about this call.

  • First we can see that a process is passed in (and this is the process associated with the IRP which FastFat gets from IoGetRequestorProcess()). Moreover, the process is a mandatory parameter, as we can see from the declaration for FsRtlFastUnlockAll():
    NTSTATUS FsRtlFastUnlockAll(
      __in      PFILE_LOCK FileLock,
      __in      PFILE_OBJECT FileObject,
      __in      PEPROCESS ProcessId,
      __in_opt  PVOID Context
    );
    
    The documentation clearly states that the locks that are released are specific to a process and so during IRP_MJ_CLEANUP FastFat will automatically close the handles associated with the handle on which the IRP_MJ_CLEANUP call came. For our example, handle HB. But what about the locks acquired on handle HA ? Are they going to be left behind ?
  • The second interesting thing to note is that the FILE_LOCK structure is a private member of the FCB, not part of the FSRTL_ADVANCED_FCB_HEADER. So the IO manager can't know where that structure is located without specific knowledge about each file system and as such it can't call FsRtlFastUnlockAll by itself.

Searching for FsRtlFastUnlockAll() in the FastFat source we find that there is another place where it is called, in the FatFastUnlockAll() function (in \src\filesys\fastfat\Win7\lockctrl.c). As the name suggests, FatFastUnlockAll() is a fast IO callback for FastFat and it really doesn't do much else than release all the byte range locks associated with the calling process. This looks like a good mechanism to have the IO manager call the file system to instruct it to release all the locks when a handle is closed. However, there was still one puzzling aspect. FastIO is supposed to be optional so what happens if a filter fails the FastIO or a file system doesn't implement it at all ? I expected there would be an IRP equivalent for this FastIO but there is no other place in the code where FsRtlFastUnlockAll() is called. Well, in fact there is an IRP equivalent for the FastIO but it is not explicitly processed by the FastFat file system. Instead all the lock processing associated with the IRP_MJ_LOCK_CONTROL IRP is handled inside FatCommonLockControl(), which simply calls FsRtlProcessFileLock() and lets the FsRtl package handle it.
Finally, now that we know how the IO manager calls the file system to tell it to release the locks associated with a process, there is one more twist. Does the IO manager call an unlock all every time a handle is closed ? Or, if not, how does it know when to do it ? Clearly it doesn't need to do it for the last handle (since the file system's IRP_MJ_CLEANUP routine will do it) but what about the other handles ? It turns out that there is an optimization here. Whenever the IO manager issues a byte range lock request to the file system it sets the FILE_OBJECT->LockOperation boolean to TRUE. Then, whenever it is closing a handle, if FILE_OBJECT->LockOperation is set it knows that it must notify the file system to release any potential locks. Please note that this flag appears to never be cleared (i.e. even if a process locks and then unlocks all the ranges so that there are no locks to release when closing the handle) so don't be surprised if you receive this in your filter even when there are no locked ranges.
So to summarize things, this is the logic involved here:

  • On every lock operation the IO manager sets FILE_OBJECT->LockOperation. It is worth mentioning that LockOperation is never actually used by the file system (at least not that I've seen in any file system I've looked at).
  • When a handle is closed, if the FILE_OBJECT->LockOperation is set then the IO manager knows there were some locks taken on the FILE_OBJECT and so it must release them. So the IO manager will issue the IRP_MJ_LOCK_CONTROL IRP with the IRP_MN_UNLOCK_ALL minor function (or it will call the FastIO equivalent) to tell the file system to release all the locks. However, this is not necessary if this is the last handle for the FILE_OBJECT because the IO manager will issue the IRP_MJ_CLEANUP IRP in that case and the file system will release all the locks for that process anyway.
  • When a file system processes the IRP_MJ_CLEANUP IRP must also release all the byte range locks for the FILE_OBJECT for that process.

Ok, so now let's look at some of the problems that filters might introduce or might run into:

  • A filter that acquires locks on a FILE_OBJECT without going through the IO manager (i.e. without calling ZwLockFile() but by issuing their own IO (IRP or FLT_CALLBACK_DATA)) should also set the FILE_OBJECT->LockOperation flag so that the IO manager knows locks have been taken on that file because otherwise it'll be really complicated to release the locks at the right time.
  • A filter that duplicates a handle for a FILE_OBJECT might also change the behavior a bit depending on when it closes the handle. If for example if closes the handle after the user has closed his handle then the IRP_MJ_CLEANUP IRP will be sent for their close and not the user's close. Now, the IO manager should handle this properly and frankly I don't see any problem with it off the top of my head, but it's something to keep in mind.
  • When a filter calls ZwClose (or FltClose) for a handle they've opened the IoGetRequestorProcess() call for the IRP_MJ_CLEANUP IRP will return the system process, so the file system will release all byte range locks on the FILE_OBJECT in the system process. This might be broken if, for example, there are two handles, H1 and H2 for the same FILE_OBJECT in the system process and a lock was taken on handle H1 but then the filter closes H2 and the IO manager finds FILE_OBJECT->LockOperation set and it tells the file system to release all the locks in the system process for that FILE_OBJECT and thus it releases the byte range lock that H1 had.
  • Also, there are some filters that open their own handles to certain files and then they forward some requests that arrive on other files to the files they've opened (for example some back-up filter might forward all IRP_MJ_WRITE for each file (foo.txt) requests to another file (foo.txt.bak)). Also Shadow File Object type filters will often exhibit the same behavior. Now, if they ever forward a byte range lock request to the file they've opened (by doing something like changing the TargetFileObject) then when they close their file that close will most likely not be in the same process as the process that requested the byte range lock originally and so some ranges of the file they've opened might remain locked. In this case the filter might need to call IRP_MJ_LOCK_CONTROL with IRP_MN_UNLOCK_ALL itself from the process context where the forwarded lock request originated.

Finally, there is one more thing I'd like to say. There are no Flt functions equivalents for ZwLockFile or ZwUnlockFile. A filter that wants to lock files on the file system below must issue their own requests. However, there are some Flt special functions for byte range locks (like FltProcessFileLock()) but they are meant for filters that implement byte range locks for some FILE_OBJECTs (like a file system would). For example FltProcessFileLock() should be called where a file system would call the FsRtlProcessFileLock() function. However, since the FsRtlProcessFileLock() requires an IRP parameter FltMgr had to implement a wrapper function that takes a FLT_CALLBACK_DATA structure instead of that IRP. This is not the case for all the FsRtlXxxLock() functions because not all of them take an IRP parameter (for example FsRtlFastUnlockAll() doesn't take an IRP and there is no Flt equivalent and instead a filter that implements file locks simply calls FsRtlFastUnlockAll() directly). Basically a filter that implements file locks must mix calls to FsRtl functions with calls to Flt functions.