fix pci devices duplicates#82
fix pci devices duplicates#82GregWhiteyBialas wants to merge 2 commits intojenningsloy318:masterfrom
Conversation
Co-authored-by: Will Szumski <will@stackhpc.com>
| for _, pcieDevice := range pcieDevices { | ||
| _, exists := processed[pcieDevice.ODataID] | ||
| if exists { | ||
| systemLogContext.WithField("operation", "system.PCIeDevices()").Info(fmt.Sprintf("Ignoring duplicate pci device: %s", pcieDevice.ODataID)) |
There was a problem hiding this comment.
Can you see the duplicate devices in the Redfish response?
There was a problem hiding this comment.
Yes. Here is sample of what I got, when I made curl call:
"PCIeDevices": [ { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/129-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/129-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-7" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-4" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-8" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-7" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-2" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-8" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-1" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-3" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-8" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-7" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-7" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-3" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-8" },
|
Formatted to make it a little easier to read: PCIeDevices:
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/129-0
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/129-0
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-7
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-4
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-8
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-7
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-2
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-8
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-1
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-3
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/128-0
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-8
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-7
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-7
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-3
- "@odata.id": /redfish/v1/Systems/System.Embedded.1/PCIeDevices/160-8 |
|
@stmcginnis Thanks! I shouldn't be so laconic in description of this PR. So I am correcting myself. I will truncate all outputs for redability: Curl of and: As you can see Curl-ing this endpoint gives: So we see that this device has two functions. Which can be accessed by curl: I see in code, that list of PCI devices isn't used to get PCI devices functions, which is pulled from other API endpoint, so this fix shouldn't have side effect in PCIFunction metrics not scraped. I am not sure if this is bug in this version of firmware or specific hardware configuration which results in duplicated items. |
|
I think I've seen this bug in a vendors firmware implementation. It would be good to raise it with your vendor so they can fix it. Redfish definitely should not be reporting the same device twice. Still good if there is a way to make handling of that condition more robust though. |
dougszumski
left a comment
There was a problem hiding this comment.
If I understand correctly, a single PCI device with multiple functions, can, at least for one set of hardware, appear multiple times in the list of PCI devices (with the same identifier). This causes a failure in the current code. This fix works around the issue by skipping any 'duplicate' devices in a backwards compatible way. I have one minor suggestion, but otherwise I'm in favour of merging.
| wg5.Add(len(pcieDevices)) | ||
| for _, pcieDevice := range pcieDevices { | ||
| _, exists := processed[pcieDevice.ODataID] | ||
| if exists { |
There was a problem hiding this comment.
nit: In the absence of any unit tests it would be helpful to add the scrape data where we see this issue to the sampleOut folder.
A comment could also be added here, referencing the sample data.
There was a problem hiding this comment.
Good idea. I have added sample data and comment about it.
|
@jenningsloy318 is it possible to move this forward somehow? |
Hi,
on some systems scraping fails and crashes exporter because of duplicated PCI devices in redfish output.
This patch aims to workaround this.