Just a heads-up in case someone has this problem with their #Realtek RTL8168 NIC on #Linux!

Last December I discovered that one of the two NICs in a router / firewall PC would sporadically trigger the NETDEV Watchdog and then become soft-locked / unusable until a system reboot (for details see the linked toot below).

Analysis back then didn't give any conclusive leads but got me to switch from the in-tree #r8169 driver to #r8168-dkms (which didn't entirely fix the issue but at least "only" caused the NIC to lose carrier sporadically every few hours for only 1-2 seconds and then return to working).

I found a #solution the other day!:

  • use #r8168-dkms drivers (r8169 can't disable EEE?)

  • add “options r8168 eee_enabled=0" to /etc/modprobe.d/r8168 or your kernel parameters

  • if needed, rebuild your initramfs

I don't know why but disabling Energy Efficient Ethernet (EEE) resolves the random carrier loss issues.
https://indiepocalypse.social/@heals/113724312920257181

Heals :heart_nb: (@[email protected])

Hey #Linux friends out there - I could use some opinions / input on something I’ve been brooding over for a few days! I have a small Intel N100 based server running various services / automations at my parent’s house. The box has a double-NIC running as a transparent bridge with some filtering and other network management applied. Both NICs are identical Realtek on-board chips (10ec:8168 / sub: 10ec:0123) normally running on the in-tree #r8169 driver on kernel 6.12.6: > r8169 0000:01:00.0 eth0: RTL8168h/8111h, XX:XX:XX:XX:XX:XX, XID 541, IRQ 142 > r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko] > r8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko] One of them (eth1 / enp3s0) is regularly tossing me these errors: > r8169 0000:03:00.0 enp3s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100). > r8169 0000:03:00.0 enp3s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100). > r8169 0000:03:00.0 enp3s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5317 ms As far as I can tell, when this happens half of the bridge silently stops working. I can reach the PC from “my side" which is connected to a router on eth0 / enp1s0 but devices on “the other side” are unreachable until I reboot. Searching online wasn't very helpful at all as the main solution other users with this issue get is "replace the NIC with something not Realtek!” - yeah, no, I can’t. There's also [bug reports](https://bugzilla.kernel.org/show_bug.cgi?id=209839) on kernel.org going as far back as 2020 but no clear solutions. I turned off ASPM on the system and eventually switched to the r8168-DKMS drivers. On #r8168 the link will go down for 1-3 seconds but then fully recover. Not a great solution but a workaround I can live with for now. Anyone got any ideas / similar experiences that could help shed some light on the problem?

indiepocalypse social

Update: even running a constant ping on the affected NIC doesn't prevent it from sporadic errors every 2-5 hours. If anything I almost want to say it causes more.

The only good news for now is that using r8168 will simply trigger an interface link down/up in the span of a few seconds and everything keeps running normally where r8169 caused the NIC to stay in an unusable up-state constantly logging errors.

Seems I'll have to take the house offline sometime tonight and check the BIOS settings.

#Linux #r8169 #r8168

Hey #Linux friends out there - I could use some opinions / input on something I’ve been brooding over for a few days!

I have a small Intel N100 based server running various services / automations at my parent’s house. The box has a double-NIC running as a transparent bridge with some filtering and other network management applied.

Both NICs are identical Realtek on-board chips (10ec:8168 / sub: 10ec:0123) normally running on the in-tree #r8169 driver on kernel 6.12.6:

r8169 0000:01:00.0 eth0: RTL8168h/8111h, XX:XX:XX:XX:XX:XX, XID 541, IRQ 142
r8169 0000:01:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]

r8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko8169 0000:03:00.0 eth1: jumbo features [frames: 9194 bytes, tx checksumming: ko]

One of them (eth1 / enp3s0) is regularly tossing me these errors:

r8169 0000:03:00.0 enp3s0: rtl_txcfg_empty_cond == 0 (loop: 42, delay: 100).
r8169 0000:03:00.0 enp3s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
r8169 0000:03:00.0 enp3s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5317 ms

As far as I can tell, when this happens half of the bridge silently stops working. I can reach the PC from “my side" which is connected to a router on eth0 / enp1s0 but devices on “the other side” are unreachable until I reboot.

Searching online wasn't very helpful at all as the main solution other users with this issue get is "replace the NIC with something not Realtek!” - yeah, no, I can’t.

There's also bug reports on kernel.org going as far back as 2020 but no clear solutions.

I turned off ASPM on the system and eventually switched to the r8168-DKMS drivers. On #r8168 the link will go down for 1-3 seconds but then fully recover. Not a great solution but a workaround I can live with for now.

Anyone got any ideas / similar experiences that could help shed some light on the problem?

209839 – r8169 (RTL8125B): "rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100)" and connectivity loss, caused by small fragmented datagrams

Por casualidad descubrí la solución a las constantes desconexiones y bajadas de performance de mi tarjeta de red ethernet, resulta que la tarjeta es "Realtek RTL8111/8168/8411 PCIE Gigabit Ethernet", y la estoy usando en ArchLinux, pero se estaba cargando el módulo r8169 que es similar pero no es el espefico, el problema se soluciona con tan solo instalar el paquete "r8168" y reiniciar ..... #archlinux #realtek #r8168