Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

IRQ Balancing on HP ProLiant DL380 G6

Linux Kernel, Network, and Services configuration.
Post Reply
Message
Author
sHk
Posts: 1
Joined: 2017-10-19 13:51

IRQ Balancing on HP ProLiant DL380 G6

#1 Post by sHk »

Hi everyone,

I've got a problem regarding Linux hardware interrupt request balancing on the HP ProLiant DL380 G6 Hardware using Intel(R) Xeon(R) X5550 @ 2.67GHz. I'm using this Host as LIO iSCSI Target and this is a serious performance bottleneck on high network load.

First of all a little bit of generic and relevant information:

Im using Jessie:
cat /etc/debian_version
8.9

A uname -a reveals: Linux host 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26) x86_64 GNU/Linux


And now regarding the problem:
  • A shortend cat of /proc/interrupts shows:
    CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
    82: 3002619 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge hpsa1
    83: 708306 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge hpsa1 (hpsa is a Kernel Module for HP Hardware RAIDs)
    98: 34357 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-0
    107: 958957 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-0
The SMP affinity mask is not the problem:
cat /proc/irq/82/smp_affinity_list
0-15
cat /proc/irq/83/smp_affinity_list
0-15
cat /proc/irq/98/smp_affinity_list
0-15
cat /proc/irq/107/smp_affinity_list
0-15

1. I think the problem is specificly related to the Hardware in use. On other Hosts with other Hardware Setups Kernel IRQ balancing is working as expected.

2. I know there is an user space daemon for this compensate a little bit althogh this irqbalance daemon is unsatisfactory to me because it's not as good in balancing as the integrated Kernel balancing.

3. CPU0 is handling every hardware interrupt of every PCI device (not just NICs and Disks, but Networking and I/O are most importent in this usecase). This makes me suggesting that it's not related to a specific pci peripheral model.

Someone posted the same issue here about 5 years ago without being answered:
https://ubuntuforums.org/showthread.php ... a523776dc7

Details about my onboard Ethernet Controller reveal:
  • 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
    Subsystem: Hewlett-Packard Company NC382i Integrated Multi-port PCI Express Gigabit Server Adapter
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 16
    Region 0: Memory at f4000000 (64-bit, non-prefetchable) [size=32M]
    Capabilities: [48] Power Management version 3
    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
    Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
    Capabilities: [50] Vital Product Data
    Product Name: HP NC382i Multifunction Gigabit Server Adapter
    Read-only fields:
    [PN] Part number: N/A
    [EC] Engineering changes: N/A
    [SN] Serial number: 0123456789
    [MN] Manufacture ID: 31 30 33 43
    [RV] Reserved: checksum good, 37 byte(s) reserved
    End
    Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
    Address: 0000000000000000 Data: 0000
    Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
    Vector table: BAR=0 offset=0000c000
    PBA: BAR=0 offset=0000e000
    Capabilities: [ac] Express (v2) Endpoint, MSI 00
    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
    ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
    RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+
    MaxPayload 128 bytes, MaxReadReq 4096 bytes
    DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
    LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <2us
    ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
    ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 2.5GT/s, Width x2, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
    LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
    Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
    Compliance De-emphasis: -6dB
    LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
    EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [100 v1] Device Serial Number 00-26-55-ff-fe-2b-a0-64
    Capabilities: [110 v1] Advanced Error Reporting
    UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
    UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
    UESvrt: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
Has anoyne ever seen something like this? May a newer custom kernel solve this?

Thanks and best wishes,

sHk

Post Reply