I am encountering a hang or BSOD issue with a SCSI miniport driver during
device testing under Windows XP. Can anyone shed some light on a cause or
suggest a possible solution?
We are performing tests on a SCSI(SAS) target device. The test performs
single block reads or writes and interrupts power for 10-100 ms at random
times during the I/O's to verify the device behaves and recovers properly.
The test is running under Windows XP. We have our own mini-port driver that
I can add dbgprint's to for additional instrumentation. So far I have not
seen misbehavior of the target or mini-port driver. A timeline of events
common to both types of failures are:
1: Target is running I/O's normally.
2: Target gets a power glitch from the test
3: Target re-initialized the bus correctly and becomes ready(about 1.5
second delay)
4: Initiator issues read or write commands to the target up to the specified
queue depth(8 max).
5: Target responds correctly with power up unit attention and not ready
status to all commands.
6: Windows either blue-screens here, or issues a mysterious request sense
command(I did not issue this command).
7: if the request sense command is issued, the mini-port driver will no
longer get any commands from the application and Windows usually does not
blue-screen. The command that does not complete appears to be stuck on a
busy queue?
Note the problem does not seem to occur if the queue depth is set to 1. The
problem also appears to be timing dependent as it will not fail for every
device model that is attached.
In the case of the BSOD, the bug check status is shown in exhibit 1 below.
I have several memory dumps if anyone is interested. This fault is reported
in the SCSI port driver and I have not been able to trace back the cause of
the problem. I am guessing windows is trying to complete a command that has
already completed?
In the case of the hang, the status of the outstanding IRP is show in
exhibit 2. The command is sent to the driver, but does not reach the
mini-port. What is windows doing that it will not return a status or forward
the IRP?
Thanks in advance for any insight.
****EXHIBIT 1*****
Windows BLue screen
1: kd> !analyze -v
************************************************** *****************************
*
*
* Bugcheck Analysis
*
*
*
************************************************** *****************************
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 000000e8, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, bitfield :
bit 0 : value 0 = read operation, 1 = write operation
bit 3 : value 0 = not an execute operation, 1 = execute operation (only on
chips which support this level of status)
Arg4: 806e4a16, address which referenced memory
Debugging Details:
------------------
WRITE_ADDRESS: 000000e8
CURRENT_IRQL: 2
FAULTING_IP:
hal!KeAcquireInStackQueuedSpinLock+26
806e4a16 8711 xchg edx,dword ptr [ecx]
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0xA
PROCESS_NAME: Kodiak.exe
TRAP_FRAME: f79d4e50 -- (.trap 0xfffffffff79d4e50)
ErrCode = 00000002
eax=f79d4ed8 ebx=00000000 ecx=000000e8 edx=f79d4ed8 esi=00000000 edi=86fd59f0
eip=806e4a16 esp=f79d4ec4 ebp=f79d4ee4 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
hal!KeAcquireInStackQueuedSpinLock+0x26:
806e4a16 8711 xchg edx,dword ptr [ecx]
ds:0023:000000e8=????????
Resetting default scope
LAST_CONTROL_TRANSFER: from 804f8cb1 to 8052a834
STACK_TEXT:
f79d4a04 804f8cb1 00000003 f79d4d60 00000000 nt!RtlpBreakWithStatusInstruction
f79d4a50 804f989c 00000003 000000e8 806e4a16 nt!KiBugCheckDebugBreak+0x19
f79d4e30 80543930 0000000a 000000e8 00000002 nt!KeBugCheck2+0x574
f79d4e30 806e4a16 0000000a 000000e8 00000002 nt!KiTrap0E+0x238
f79d4ec4 804fc352 86fd59f0 86fd59b0 00000000
hal!KeAcquireInStackQueuedSpinLock+0x26
f79d4ee4 804f16ec 86fd59f0 00000000 00000000 nt!KeInsertQueueApc+0x20
f79d4f18 f73c28f8 86f36ae8 86fd59b0 f79d4f5c nt!IopfCompleteRequest+0x1d8
f79d4f28 f73c2436 86f6ddc8 00000001 00000000 SCSIPORT!SpCompleteRequest+0x5e
f79d4f5c f73c26f7 86f36ae8 86f6ddc8 f79d4fcb
SCSIPORT!SpProcessCompletedRequest+0x632
f79d4fcc 805450bf 86f36aa4 86f36a30 00000000
SCSIPORT!ScsiPortCompletionDpc+0x2b5
f79d4ff4 80544c2b ee2c2c4c 00000000 00000000 nt!KiRetireDpcList+0x61
f79d4ff8 ee2c2c4c 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x2b
WARNING: Frame IP not in any known module. Following frames may be wrong.
80544c2b 00000000 00000009 0081850f bb830000 0xee2c2c4c
STACK_COMMAND: kb
FOLLOWUP_IP:
SCSIPORT!SpCompleteRequest+5e
f73c28f8 5e pop esi
SYMBOL_STACK_INDEX: 7
SYMBOL_NAME: SCSIPORT!SpCompleteRequest+5e
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: SCSIPORT
IMAGE_NAME: SCSIPORT.SYS
DEBUG_FLR_IMAGE_TIMESTAMP: 41107b4b
FAILURE_BUCKET_ID: 0xA_W_SCSIPORT!SpCompleteRequest+5e
BUCKET_ID: 0xA_W_SCSIPORT!SpCompleteRequest+5e
Followup: MachineOwner
---------
****EXHIBIT 2*****
Below is the case where a command sent to the device does not return a
status. The IRP shown is the one which does not complete.
0: kd> !irp 85BBF348
Irp is active with 2 stacks 2 is current (= 0x85bbf3dc)
Mdl=85ba5580: No System Buffer: Thread 8643d440: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
orionLSI.sys
\Driver\OrionLSI*** ERROR: Module load completed but symbols could not be
loaded for stsclass.SYS
stsclass
Args: 85b58268 00000000 00000000 00000000
0: kd> !devobj 86f788e8
Device object (86f788e8) is for:
OrionLSI3Port5Path0Target1Lun0 \Driver\OrionLSI DriverObject 86f63f38
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00001050
Dacl e10201ec DevExt 86f789a0 DevObjExt 86f78e80 Dope 86f76d40 DevNode
86f77ee8
ExtensionFlags (0000000000)
AttachedDevice (Upper) 86f76030 \Driver\Disk
Device queue is busy -- Queue empty.