Friday 31 October 2014

Kernel 3.16.0 error on skb_warn_bad_offload with bnx2 and bridge

After upgrading to Ubuntu Server 14.10 I've been getting slow performance and constant messages in dmesg/syslog relating to skb_warn_bad_offload.  Possibly as a result of some change in the 3.16.0 kernel that ships with it.

It appears to be related to using bridges on top of an interface using the bnx2 kernel module.

While the error message related to gso (generic segmentation offload) the resolution is to turn off the 'scatter-gather' option for any bridge interface that has an IP address assigned which appears to also turn off various offload options on the interface (tso,gso).  I first tried just turning off tso and gso but this did not resolve the issue.

To see interface options: (I already have sg option off)
guy@cyclopes:~$ sudo ethtool -k br28
Features for br28:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off
tx-scatter-gather-fraglist: off
tcp-segmentation-offload: off
tx-tcp-segmentation: off [requested on]
tx-tcp-ecn-segmentation: off [requested on]
tx-tcp6-segmentation: off [requested on]
udp-fragmentation-offload: off [requested on]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [requested on]
tx-fcoe-segmentation: off [requested on]
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
tx-mpls-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

To turn off scatter gather:
sudo ethtool -K br98 sg off

If you have a lot of traffic on a bridge interface the performance of the machine is considerably reduced.  My load average was sitting at 18 - 19.  Probably due to the extra work of generating and logging the error messages.  Once sg was turned off on the bridge interface, load average came down to 0.18!

I have this set permanently in /etc/network/interfaces:
auto br98
iface br98 inet static
 address 10.0.7.136
 netmask 255.255.255.128
 vlan_raw_device bond0
 bridge_ports bond0.98
 pre-up /sbin/ethtool -K br98 sg off

Kernel log for Google:
Oct 31 06:47:22 cyclopes kernel: [28195.789432] WARNING: CPU: 3 PID: 0 at /build/buildd/linux-3.16.0/net/core/dev.c:2246 skb_warn_bad_offload+0xc7/0xe0()
Oct 31 06:47:22 cyclopes kernel: [28195.789435] : caps=(0x0000000004197ba9, 0x000000801fdb78e9) len=2962 data_len=2896 gso_size=1448 gso_type=1 ip_summed=3
Oct 31 06:47:22 cyclopes kernel: [28195.789437] Modules linked in: dm_snapshot dm_bufio dm_zero dm_service_time ses enclosure ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge 8021q garp stp mrp llc dm_round_robin radeon nfsd intel_powerclamp coretemp auth_rpcgss kvm_intel nfs_acl kvm nfs ttm crct10dif_pclmul bonding drm_kms_helper crc32_pclmul gpio_ich drm ghash_clmulni_intel lockd aesni_intel i2c_algo_bit aes_x86_64 sunrpc dm_multipath lp lpc_ich acpi_power_meter i7core_edac hpwdt parport ipmi_si fscache lrw gf128mul edac_core ipmi_msghandler joydev serio_raw glue_helper hpilo ablk_helper scsi_dh shpchp cryptd mac_hid hid_generic uas usbhid be2net bnx2 psmouse pata_acpi usb_storage hid vxlan hpsa
Oct 31 06:47:22 cyclopes kernel: [28195.789491] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W I   3.16.0-24-generic #32-Ubuntu
Oct 31 06:47:22 cyclopes kernel: [28195.789493] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
Oct 31 06:47:22 cyclopes kernel: [28195.789495]  0000000000000009 ffff880d9fa23290 ffffffff8177fcbc ffff880d9fa232d8
Oct 31 06:47:22 cyclopes kernel: [28195.789498]  ffff880d9fa232c8 ffffffff8106fd8d ffff88045c9c3ce8 ffff880d6ecf6000
Oct 31 06:47:22 cyclopes kernel: [28195.789501]  0000000000000001 0000000000000000 0000000000000081 ffff880d9fa23328
Oct 31 06:47:22 cyclopes kernel: [28195.789505] Call Trace:
Oct 31 06:47:22 cyclopes kernel: [28195.789506]  <IRQ>  [<ffffffff8177fcbc>] dump_stack+0x45/0x56
Oct 31 06:47:22 cyclopes kernel: [28195.789517]  [<ffffffff8106fd8d>] warn_slowpath_common+0x7d/0xa0
Oct 31 06:47:22 cyclopes kernel: [28195.789519]  [<ffffffff8106fdfc>] warn_slowpath_fmt+0x4c/0x50
Oct 31 06:47:22 cyclopes kernel: [28195.789521]  [<ffffffff8139f827>] ? ___ratelimit+0x87/0x100
Oct 31 06:47:22 cyclopes kernel: [28195.789523]  [<ffffffff8167aa77>] skb_warn_bad_offload+0xc7/0xe0
Oct 31 06:47:22 cyclopes kernel: [28195.789526]  [<ffffffff8167d63d>] skb_checksum_help+0x13d/0x170
Oct 31 06:47:22 cyclopes kernel: [28195.789527]  [<ffffffff8168097d>] dev_hard_start_xmit+0x46d/0x5d0
Oct 31 06:47:22 cyclopes kernel: [28195.789530]  [<ffffffff81680e23>] __dev_queue_xmit+0x343/0x4f0
Oct 31 06:47:22 cyclopes kernel: [28195.789532]  [<ffffffff81680fe0>] dev_queue_xmit+0x10/0x20
Oct 31 06:47:22 cyclopes kernel: [28195.789535]  [<ffffffffc025af45>] vlan_dev_hard_start_xmit+0x95/0x130 [8021q]
Oct 31 06:47:22 cyclopes kernel: [28195.789537]  [<ffffffff81680845>] dev_hard_start_xmit+0x335/0x5d0
Oct 31 06:47:22 cyclopes kernel: [28195.789539]  [<ffffffff81680e23>] __dev_queue_xmit+0x343/0x4f0
Oct 31 06:47:22 cyclopes kernel: [28195.789541]  [<ffffffff816b1ee4>] ? nf_hook_slow+0x74/0x130
Oct 31 06:47:22 cyclopes kernel: [28195.789544]  [<ffffffffc0337a50>] ? deliver_clone+0x50/0x50 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789546]  [<ffffffff81680fe0>] dev_queue_xmit+0x10/0x20
Oct 31 06:47:22 cyclopes kernel: [28195.789550]  [<ffffffffc0337ac4>] br_dev_queue_push_xmit+0x74/0xa0 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789553]  [<ffffffffc0337b12>] br_forward_finish+0x22/0x60 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789556]  [<ffffffffc0337ba3>] __br_deliver+0x53/0x100 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789560]  [<ffffffffc0338093>] br_deliver+0x63/0x70 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789563]  [<ffffffffc03358bc>] br_dev_xmit+0x13c/0x270 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789564]  [<ffffffff81680845>] dev_hard_start_xmit+0x335/0x5d0
Oct 31 06:47:22 cyclopes kernel: [28195.789567]  [<ffffffff81680e23>] __dev_queue_xmit+0x343/0x4f0
Oct 31 06:47:22 cyclopes kernel: [28195.789568]  [<ffffffff81680fe0>] dev_queue_xmit+0x10/0x20
Oct 31 06:47:22 cyclopes kernel: [28195.789570]  [<ffffffff816be3e1>] ip_finish_output+0x511/0x850
Oct 31 06:47:22 cyclopes kernel: [28195.789572]  [<ffffffff816bf0a8>] ip_output+0x58/0x90
Oct 31 06:47:22 cyclopes kernel: [28195.789575]  [<ffffffff811788e5>] ? put_page+0x35/0x40
Oct 31 06:47:22 cyclopes kernel: [28195.789576]  [<ffffffff816be800>] ip_local_out_sk+0x30/0x40
Oct 31 06:47:22 cyclopes kernel: [28195.789578]  [<ffffffff816beb83>] ip_queue_xmit+0x143/0x3c0
Oct 31 06:47:22 cyclopes kernel: [28195.789580]  [<ffffffff816d5f34>] tcp_transmit_skb+0x484/0x940
Oct 31 06:47:22 cyclopes kernel: [28195.789582]  [<ffffffff816d65bc>] tcp_write_xmit+0x1cc/0xd60
Oct 31 06:47:22 cyclopes kernel: [28195.789584]  [<ffffffff816d73ce>] __tcp_push_pending_frames+0x2e/0xd0
Oct 31 06:47:22 cyclopes kernel: [28195.789586]  [<ffffffff816d291a>] tcp_rcv_established+0x11a/0x6e0
Oct 31 06:47:22 cyclopes kernel: [28195.789588]  [<ffffffff816dd465>] tcp_v4_do_rcv+0x1b5/0x4e0
Oct 31 06:47:22 cyclopes kernel: [28195.789590]  [<ffffffff81680d24>] ? __dev_queue_xmit+0x244/0x4f0
Oct 31 06:47:22 cyclopes kernel: [28195.789591]  [<ffffffff816df08e>] tcp_v4_rcv+0x6ce/0x7b0
Oct 31 06:47:22 cyclopes kernel: [28195.789596]  [<ffffffffc0236922>] ? bond_dev_queue_xmit+0x32/0x80 [bonding]
Oct 31 06:47:22 cyclopes kernel: [28195.789598]  [<ffffffff816b895a>] ip_local_deliver_finish+0xaa/0x220
Oct 31 06:47:22 cyclopes kernel: [28195.789600]  [<ffffffff816b8c68>] ip_local_deliver+0x48/0x80
Oct 31 06:47:22 cyclopes kernel: [28195.789602]  [<ffffffff816b85e4>] ip_rcv_finish+0x84/0x350
Oct 31 06:47:22 cyclopes kernel: [28195.789604]  [<ffffffff816b8f1f>] ip_rcv+0x27f/0x380
Oct 31 06:47:22 cyclopes kernel: [28195.789606]  [<ffffffff8167eb92>] __netif_receive_skb_core+0x572/0x820
Oct 31 06:47:22 cyclopes kernel: [28195.789608]  [<ffffffff816b1e5a>] ? nf_iterate+0x9a/0xb0
Oct 31 06:47:22 cyclopes kernel: [28195.789610]  [<ffffffff8167ee56>] __netif_receive_skb+0x16/0x70
Oct 31 06:47:22 cyclopes kernel: [28195.789611]  [<ffffffff8167eed3>] netif_receive_skb_internal+0x23/0x90
Oct 31 06:47:22 cyclopes kernel: [28195.789613]  [<ffffffff8167ef5c>] netif_receive_skb+0x1c/0x70
Oct 31 06:47:22 cyclopes kernel: [28195.789617]  [<ffffffffc0339026>] br_handle_frame_finish+0x1b6/0x3f0 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789621]  [<ffffffffc03400cb>] br_nf_pre_routing_finish+0x12b/0x3c0 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789625]  [<ffffffffc03404e7>] br_nf_pre_routing+0x187/0x740 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789629]  [<ffffffffc0338e70>] ? br_handle_local_finish+0x90/0x90 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789631]  [<ffffffff816b1e5a>] nf_iterate+0x9a/0xb0
Oct 31 06:47:22 cyclopes kernel: [28195.789634]  [<ffffffffc0338e70>] ? br_handle_local_finish+0x90/0x90 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789636]  [<ffffffff816b1ee4>] nf_hook_slow+0x74/0x130
Oct 31 06:47:22 cyclopes kernel: [28195.789640]  [<ffffffffc0338e70>] ? br_handle_local_finish+0x90/0x90 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789643]  [<ffffffffc0339408>] br_handle_frame+0x1a8/0x260 [bridge]
Oct 31 06:47:22 cyclopes kernel: [28195.789645]  [<ffffffff8167e8b2>] __netif_receive_skb_core+0x292/0x820
Oct 31 06:47:22 cyclopes kernel: [28195.789647]  [<ffffffff8101c2f9>] ? read_tsc+0x9/0x20
Oct 31 06:47:22 cyclopes kernel: [28195.789649]  [<ffffffff8167ee56>] __netif_receive_skb+0x16/0x70
Oct 31 06:47:22 cyclopes kernel: [28195.789651]  [<ffffffff8167eed3>] netif_receive_skb_internal+0x23/0x90
Oct 31 06:47:22 cyclopes kernel: [28195.789652]  [<ffffffff8167f950>] napi_gro_receive+0xc0/0xf0
Oct 31 06:47:22 cyclopes kernel: [28195.789657]  [<ffffffffc0090b09>] bnx2_rx_int+0x3c9/0x1050 [bnx2]
Oct 31 06:47:22 cyclopes kernel: [28195.789661]  [<ffffffff81366215>] ? blk_run_queue+0x35/0x40
Oct 31 06:47:22 cyclopes kernel: [28195.789665]  [<ffffffffc0091ae0>] bnx2_poll_work+0x350/0x450 [bnx2]
Oct 31 06:47:22 cyclopes kernel: [28195.789669]  [<ffffffffc0091c14>] bnx2_poll_msix+0x34/0xc0 [bnx2]
Oct 31 06:47:22 cyclopes kernel: [28195.789671]  [<ffffffff8167f2a2>] net_rx_action+0x142/0x250
Oct 31 06:47:22 cyclopes kernel: [28195.789673]  [<ffffffff810755f4>] __do_softirq+0x124/0x2e0
Oct 31 06:47:22 cyclopes kernel: [28195.789675]  [<ffffffff810759ad>] irq_exit+0xfd/0x110
Oct 31 06:47:22 cyclopes kernel: [28195.789676]  [<ffffffff8178ab06>] do_IRQ+0x56/0xe0
Oct 31 06:47:22 cyclopes kernel: [28195.789679]  [<ffffffff8178896d>] common_interrupt+0x6d/0x6d
Oct 31 06:47:22 cyclopes kernel: [28195.789679]  <EOI>  [<ffffffff81619159>] ? cpuidle_enter_state+0x49/0xc0
Oct 31 06:47:22 cyclopes kernel: [28195.789684]  [<ffffffff816192b7>] cpuidle_enter+0x17/0x20
Oct 31 06:47:22 cyclopes kernel: [28195.789686]  [<ffffffff810b9b77>] cpu_startup_entry+0x347/0x480
Oct 31 06:47:22 cyclopes kernel: [28195.789688]  [<ffffffff81045ac0>] start_secondary+0x230/0x2c0
Oct 31 06:47:22 cyclopes kernel: [28195.789689] ---[ end trace 83f6d496335214a4 ]---

No comments:

Post a Comment