StorDiag Errors on Server 2016 with Samsung 970 EVO

A few months ago, I added a blazing-fast Samsung 970 EVO NVMe drive to my old Lenovo TS140 server. It’s been performing well, but I noticed that I’m getting thousands of these events every day:

Log Name:      Microsoft-Windows-Storage-ClassPnP/Operational
Source:        Microsoft-Windows-StorDiag
Date:          4/8/2019 6:30:18 PM
Event ID:      504
Task Category: Class
Level:         Error
Keywords:      Device I/O control request
User:          SYSTEM
Computer:      MYSERVER
Description:
Completing a failed IOCTL request.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <EventData>
    <Data Name="DeviceGUID">{F0913BC8-3C33-D4B6-41EF-B949621250A2}</Data>
    <Data Name="DeviceNumber">4</Data>
    <Data Name="Vendor">NULL</Data>
    <Data Name="Model">Samsung SSD 970 EVO 1TB</Data>
    <Data Name="FirmwareVersion">2B2QEXE7</Data>
    <Data Name="SerialNumber">1111_2222_3333_4444.</Data>
    <Data Name="IrpStatus">0xc0000472</Data>
    <Data Name="IoctlControlCode">0x2d9404</Data>
  </EventData>
</Event>

The event log is only 1MB. It stores about 2000 messages and takes about 90 minutes to fill. So I’m getting roughly 32,000 of these messages per day. When I see that kind of “thrashing,” I assume something is wrong.

I’ve had StorDiag 504 errors before, as reported here. In that situation, the solution was to disable and stop using a certain USB hub. That’s not an option here.

StarTech

I’m hosting the Samsung SSD in a StarTech PEX4M2E1 adapter. A very helpful StarTech support rep clarified a couple things for me:

  • The adapter doesn’t need drivers; it just routes the PCIe bus to the NVMe drive.
  • The IoctlControlCode errors can be found on the www.ioctls.net site. The particular error in this event, 0x2d9404, is referenced here: http://www.ioctls.net/?filter=0x2d9404:  “The IOCTL_STORAGE_MANAGE_DATA_SET_ATTRIBUTES control code communicates attribute information to the device for trim optimizations if the device supports it.”

Google the Russians

The StarTech rep suggested that “the drive does not support this TRIM function.” But which trim function? According to the specs on the Samsung 970 EVO page, the drive itself does support TRIM. But are there specific TRIM functions that are not supported? Let’s search the other data from the message: IrpStatus = 0xc0000472. This took me to a page in Russian that lists that error:

0xC0000472

STATUS_TRIM_READ_ZERO_NOT_SUPPORTED

LogicalBlockProvisioningReadZero Not Supported. The target device does not support read returning zeros from trimmed / unmapped blocks.

I think I get that. When researching TRIM support, I found an article that used trimcheck to determine whether a device supports TRIM. Based on the screenshot in the article (I haven’t tried this), trimcheck writes data, then trims the drive, then tries to confirm that that area of the hard drive contains zero’d bytes (0x00).  The trimcheck for the 970 SSD passed, but I’m guessing that Windows is issuing a different command that is not able to read the 0x00 bytes.

What to Do?

If a command always fails because the drive you are running it against does not support the command, the obvious thing would be to stop running the command. But how? According to ghacks.net, StorDiag is a new in Windows 10 (and thus Server 2016). tenforums also covers it. But both only talk about running stordiag.exe manually for one-off analysis. How/where is StorDiag running automatically? I don’t see it in Task Scheduler (and honestly, 20+ invocations per minute would surprise me if it came from Task Scheduler).

I could probably disable the StorDiag errors entirely using the PowerShell command Stop-StorageDiagnosticLog (another thanks to the StarTech rep). But I prefer to leave error logs enabled so I get notified of, you know, errors.

Any other suggestions?

Update April 12, 2019

I realized that the repeated logging itself may be “using up” the SSD hosting my OS (which is not the Samsung 970) just by writing to the log so much. So I decided to go ahead and disable the log, even though that means I won’t see other errors. (I probably wouldn’t see them anyway as they would be lost in the flood of bogus errors.)

After reviewing the Stop-StorageDiagnosticLog PowerShell command, it looks like that is intended for interactive use (with the corresponding Start- and Get- commands). So I just disabled the event log in the GUI by right-clicking, selecting Properties, then unchecking Enable logging:

StorDiag

3 thoughts on “StorDiag Errors on Server 2016 with Samsung 970 EVO

  1. Ted

    Have a look at output of following command, it should explain last issue from this post:

    logman query providers Microsoft-Windows-StorDiag

  2. Mark Berry Post author

    Ted – That’s an interesting list of values related to Microsoft-Windows-StorDiag event log, but I’m not clear on what it explains?

  3. Mike

    Thank you very much indeed!!

    I got thousands and thousands of EventID 521 errors – a few times per second…
    “The miniport logged an event” (AhciPortErrorRecovery, PortNumber 0, bus 0, target 0, LUN 0) which has to do with the SSD. All has been working fine for years and SMART does not report any errors.
    My guess(!) is that Windows expects the SSD to be DISK 0, whereas in my case it is DISK 1 (diskmanagement).

    Anyway, I disabled logging in Microsoft-Windows-Storage-Storport/Operational (like above screenshot) and finally… got rid of these ‘Errors’ in the eventlog.

    Again, thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.