acrn-kernel/Documentation
David Ahern a6db4494d2 net: ipv4: Consider failed nexthops in multipath routes
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:16:13 -04:00
..
ABI Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux 2016-03-26 12:59:04 -07:00
DocBook Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux 2016-03-21 13:48:00 -07:00
EDID
PCI
RCU
accounting
acpi
aoe
arm ARM: SoC 64-bit changes for v4.6 2016-03-20 15:08:45 -07:00
arm64
auxdisplay
backlight
blackfin
block
blockdev cpqarray: remove it from the kernel 2016-03-14 09:06:01 -06:00
bus-devices
cdrom
cgroup-v1 Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-03-18 20:25:49 -07:00
cma
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper
devicetree wireless-drivers patches for 4.7 2016-04-11 11:58:12 -04:00
dmaengine
driver-model Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2016-03-19 16:31:54 -07:00
dvb
early-userspace
extcon
fault-injection
fb
features arm64 updates for 4.6: 2016-03-17 20:03:47 -07:00
filesystems mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage 2016-04-04 10:41:08 -07:00
firmware_class
fmc
fpga
frv
gpio
hid
hwmon
i2c
ia64
ide
iio
infiniband
input
ioctl
isdn
ja_JP
kbuild
kdump
ko_KR
laptops
leds
locking
m68k
memory-devices
metag
mic Char/Misc patches for 4.6-rc1 2016-03-17 13:47:50 -07:00
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking net: ipv4: Consider failed nexthops in multipath routes 2016-04-11 15:16:13 -04:00
nfc
nios2
nvdimm
nvmem
parisc
pcmcia
phy
platform
power PM / runtime: Document steps for device removal 2016-04-05 03:46:59 +02:00
powerpc
pps
prctl
pti
ptp Another relatively boring cycle for the docs tree: typo fixes, translation 2016-03-17 12:09:35 -07:00
rapidio rapidio: add mport char device driver 2016-03-22 15:36:02 -07:00
s390
scheduler
scsi
security
serial
sh
sound
spi
sysctl Merge branch 'akpm' (patches from Andrew) 2016-03-18 19:26:54 -07:00
target
thermal
timers Another relatively boring cycle for the docs tree: typo fixes, translation 2016-03-17 12:09:35 -07:00
tpm
trace
usb
vDSO
video4linux
virtual One of the largest releases for KVM... Hardly any generic improvement, 2016-03-16 09:55:35 -07:00
vm mm: thp: set THP defrag by default to madvise and add a stall-free defrag option 2016-03-17 15:09:34 -07:00
w1
watchdog Merge git://www.linux-watchdog.org/linux-watchdog 2016-03-19 19:35:51 -07:00
wimax
x86 x86/Documentation: Start documenting x86 topology 2016-03-29 10:45:04 +02:00
xtensa
zh_CN
00-INDEX
BUG-HUNTING
Changes
CodeOfConflict
CodingStyle
DMA-API-HOWTO.txt
DMA-API.txt
DMA-ISA-LPC.txt
DMA-attributes.txt
HOWTO
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
Intel-IOMMU.txt
Makefile
ManagementStyle
SAK.txt
SM501.txt
SecurityBugs
SubmitChecklist
SubmittingDrivers
SubmittingPatches
VGA-softcursor.txt
adding-syscalls.txt
applying-patches.txt
assoc_array.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
cachetlb.txt
cgroup-v2.txt Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-03-21 10:05:13 -07:00
circular-buffers.txt
clk.txt
coccinelle.txt
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt
digsig.txt
dma-buf-sharing.txt dma-buf: Update docs for SYNC ioctl 2016-03-21 09:26:45 +01:00
dontdiff
dynamic-debug-howto.txt
edac.txt
efi-stub.txt
eisa.txt
email-clients.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt
highuid.txt
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kasan.txt mm, kasan: SLAB support 2016-03-25 16:37:42 -07:00
kcov.txt kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
kernel-doc-nano-HOWTO.txt
kernel-docs.txt
kernel-parameters.txt Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-20 19:08:56 -07:00
kernel-per-CPU-kthreads.txt
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
kselftest.txt
ldm.txt
local_ops.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt
md-cluster.txt
md.txt
memory-barriers.txt documentation: Clarify compiler store-fusion example 2016-03-14 15:52:19 -07:00
memory-hotplug.txt memory-hotplug: add automatic onlining policy for the newly added memory 2016-03-15 16:55:16 -07:00
men-chameleon-bus.txt
module-signing.txt
mono.txt
nommu-mmap.txt
ntb.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt mm, printk: introduce new format string for flags 2016-03-15 16:55:16 -07:00
pwm.txt
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt rtc: implement a sysfs interface for clock offset 2016-03-14 17:08:16 +01:00
serial-console.txt
sgi-ioc4.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt
svga.txt
sysfs-rules.txt
sysrq.txt
this_cpu_ops.txt
ubsan.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt