Age | Commit message (Collapse) | Author | Files | Lines |
|
The define makes it clearer that there is an important relationship
between the timeout for the async operation, and the wrapup time when
NetworkManager is quitting. Well, not for the time being. But in the future,
when we rework the quitting of NetworkManager.
|
|
NM_SHUTDOWN_TIMEOUT_MAX_MSEC is the maximum timeout for how long any
async operation may take. The idea is that during shutdown of NetworkManager
we give that much time to tear down. Then async operations may either implement
cancellation or not bother with that. But in any case, they must complete within
NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
Actually, for the time being, this has no effect at all. I am talking about the
future here. See "Improve Shutdown of NetworkManager" in TODO. This patch
is preparation for that effort.
Anyway. Stopping pppd can take a longer time (5 seconds). That is
currently the (known) longest time how long any of our async operations
is allowed to take.
As all async operations must complete before NM_SHUTDOWN_TIMEOUT_MAX_MSEC,
and we want to wait at least 5 seconds for pppd, we need to increase the
wait time NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
Also add and use NM_SHUTDOWN_TIMEOUT_5000_MSEC, which serves a similar
purpose as NM_SHUTDOWN_TIMEOUT_1500_MSEC.
|
|
At some places we scheduled a timeout in NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
There, we want to make sure that we don't take longer than
NM_SHUTDOWN_TIMEOUT_MAX_MSEC. But this leaves the actual wait time
unspecified.
Those callers don't want to wait an undefined time. They really should
be clear about how long they wait. Hence, use NM_SHUTDOWN_TIMEOUT_1500_MSEC
which makes it clear this is 1500 msec but also chosen to be not longer than
NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
|
|
When you have an async operation, you must make sure that
it is cancellable or completes in at most NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
But NM_SHUTDOWN_TIMEOUT_MAX_MSEC leaves it undefined how long it is.
If you really want to wait for 1500msec, but also need to ensure
to stay within NM_SHUTDOWN_TIMEOUT_MAX_MSEC, then use
NM_SHUTDOWN_TIMEOUT_1500_MSEC. This has the semantic of guaranteeing
both.
|
|
The abbreviations "ms", "us", "ns" don't look good.
Spell out to "msec", "usec", "nsec" as done at other places.
Also, rename NM_SHUTDOWN_TIMEOUT_MS_WATCHDOG to
NM_SHUTDOWN_TIMEOUT_ADDITIONAL_MSEC.
Also, rename NM_SHUTDOWN_TIMEOUT_MS to NM_SHUTDOWN_TIMEOUT_MAX_MSEC.
There are different timeouts, and this is the maximum gracetime we
will give during shutdown to complete async operations.
Naming is hard, but I think these are better names.
|
|
The DPDK port will not have a link after the devbind which is needed for
configuring an interface to be a DPDK port. The MTU is being committed
during the link change but for DPDK ports there is no link.
The DPDK port MTU should be set on ovsdb right after the interface is
added to ovsdb. This way the users will be able to set MTU for DPDK
ports and modify it.
Please see the following results:
```
port 2: iface0 (dpdk: configured_rx_queues=1, configured_rxq_descriptors=2048, configured_tx_queues=3,
configured_txq_descriptors=2048, lsc_interrupt_mode=false, mtu=2000, requested_rx_queues=1,
requested_rxq_descriptors=2048, requested_tx_queues=3, requested_txq_descriptors=2048, rx_csum_offload=true, tx_tso_offload=false)
```
|
|
|
|
|
|
"examples/python/gi/checkpoint.py" is not only an example. It's also a
useful script for testing checkpoints.
Support a "--last" argument to specify the last checkpoint created.
Otherwise, when you are using this example from a test script, it
can be cumbersome to find the right checkpoint point.
Also, rename "client" and "nm_client" variables to "nmc". The purpose of
using the same variable name for the same thing is readability, but also
it works better when copy+paste snippets into the Python REPL.
|
|
_obj_states_externally_removed_track()
We can get a platform signal for any number of reasons. In particular,
we can get a signal that the object is present in platform, while the object
is tracked as zombie.
"Zombies" are objects that were actively configured by NetworkManager, but
now no longer and thus will need to be removed. We remember them as objects
that we need to delete.
The assertion was wrong. We don't need to handle the case "in_platform"
and linked in "os_zombie_lst" specially. If we get a signal that the
object exists while being a zombie, that is fine and not something to
handle specially.
Backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f6a208f1db5 in __GI_abort () at abort.c:79
#2 0x00007f6a212ed123 in g_assertion_message (domain=<optimized out>, file=<optimized out>, line=<optimized out>,
func=0x560e23ada2c0 <__func__.39909> "_obj_states_externally_removed_track", message=<optimized out>) at gtestutils.c:2533
#3 0x00007f6a2134620e in g_assertion_message_expr (domain=domain@entry=0x560e23b781a0 "nm", file=file@entry=0x560e23acec60 "src/core/nm-l3cfg.c", line=line@entry=920,
func=func@entry=0x560e23ada2c0 <__func__.39909> "_obj_states_externally_removed_track", expr=expr@entry=0x560e23ad1980 "c_list_is_empty(&obj_state->os_zombie_lst)") at gtestutils.c:2556
#4 0x0000560e23853f38 in _obj_states_externally_removed_track (self=self@entry=0x560e25f168e0, obj=<optimized out>, obj@entry=0x560e25e466a0, in_platform=in_platform@entry=1)
at src/core/nm-l3cfg.c:920
#5 0x0000560e2385b8ea in _nm_l3cfg_notify_platform_change (self=0x560e25f168e0, change_type=change_type@entry=NM_PLATFORM_SIGNAL_CHANGED, obj=0x560e25e466a0) at src/core/nm-l3cfg.c:1364
#6 0x0000560e23861251 in _platform_signal_cb (platform=<optimized out>, obj_type_i=<optimized out>, ifindex=<optimized out>, platform_object=0x560e25e466b8, change_type_i=2,
p_self=<optimized out>) at ./src/libnm-platform/nmp-object.h:443
#7 0x00007f6a1c4a914e in ffi_call_unix64 () at ../src/x86/unix64.S:76
#8 0x00007f6a1c4a8aff in ffi_call (cif=cif@entry=0x7fffac40e570, fn=fn@entry=0x560e23861100 <_platform_signal_cb>, rvalue=<optimized out>, avalue=avalue@entry=0x7fffac40e480)
at ../src/x86/ffi64.c:525
#9 0x00007f6a217fee85 in g_cclosure_marshal_generic (closure=<optimized out>, return_gvalue=<optimized out>, n_param_values=<optimized out>, param_values=<optimized out>,
invocation_hint=<optimized out>, marshal_data=<optimized out>) at gclosure.c:1490
#10 0x00007f6a217fe3bd in g_closure_invoke (closure=0x560e25df53c0, return_value=0x0, n_param_values=5, param_values=0x7fffac40e7a0, invocation_hint=0x7fffac40e720) at gclosure.c:804
#11 0x00007f6a21811945 in signal_emit_unlocked_R (node=node@entry=0x7f6a00008870, detail=detail@entry=0, instance=instance@entry=0x560e25ddd080, emission_return=emission_return@entry=0x0,
instance_and_params=instance_and_params@entry=0x7fffac40e7a0) at gsignal.c:3636
#12 0x00007f6a2181aa56 in g_signal_emit_valist (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>, var_args=var_args@entry=0x7fffac40e9c0) at gsignal.c:3392
#13 0x00007f6a2181b093 in g_signal_emit (instance=instance@entry=0x560e25ddd080, signal_id=<optimized out>, detail=detail@entry=0) at gsignal.c:3448
#14 0x0000560e2392deea in nm_platform_cache_update_emit_signal (self=0x560e25ddd080, cache_op=NMP_CACHE_OPS_UPDATED, obj_old=<optimized out>, obj_new=<optimized out>)
at src/libnm-platform/nm-platform.c:8824
#15 0x0000560e238fd520 in event_handler_recvmsgs () at src/libnm-platform/nm-linux-platform.c:7183
#16 0x0000560e238fdcbf in event_handler_read_netlink () at src/libnm-platform/nm-linux-platform.c:9403
#17 0x0000560e238ffab3 in delayed_action_handle_one () at src/libnm-platform/nm-linux-platform.c:6238
#18 0x0000560e238ffcae in delayed_action_handle_all () at src/libnm-platform/nm-linux-platform.c:6256
#19 0x0000560e23901acc in do_delete_object () at src/libnm-platform/nm-linux-platform.c:7392
#20 0x0000560e2390227c in ip4_address_delete () at src/libnm-platform/nm-linux-platform.c:8782
#21 0x0000560e23922709 in nm_platform_ip4_address_delete (self=self@entry=0x560e25ddd080, ifindex=ifindex@entry=150, address=16843009, plen=<optimized out>, peer_address=16843009)
at src/libnm-platform/nm-platform.c:3574
#22 0x0000560e239275ab in nm_platform_ip_address_sync (self=0x560e25ddd080, addr_family=addr_family@entry=2, ifindex=150, known_addresses=<optimized out>, known_addresses@entry=0x0,
addresses_prune=0x560e25e81aa0) at src/libnm-platform/nm-platform.c:3984
#23 0x0000560e23855e17 in _l3_commit_one (self=0x560e25f168e0, addr_family=2, commit_type=<optimized out>, l3cd_old=<optimized out>, changed_combined_l3cd=<optimized out>)
at src/core/nm-l3cfg.c:4256
#24 0x0000560e2385fc5c in _l3_commit (self=0x560e25f168e0, commit_type=NM_L3_CFG_COMMIT_TYPE_REAPPLY, is_idle=<optimized out>) at src/core/nm-l3cfg.c:4353
#25 0x0000560e239c6a6d in nm_device_cleanup (self=0x560e25e985e0, reason=<optimized out>, cleanup_type=CLEANUP_TYPE_DECONFIGURE) at src/core/devices/nm-device.c:15082
#26 0x0000560e239c7522 in _set_state_full (self=0x560e25e985e0, state=<optimized out>, reason=<optimized out>, quitting=0) at src/core/devices/nm-device.c:15467
#27 0x0000560e239cd482 in queued_state_set (user_data=user_data@entry=0x560e25e985e0) at src/core/devices/nm-device.c:15706
#28 0x00007f6a2131b27b in g_idle_dispatch (source=0x560e25ebab60, callback=0x560e239cd3d0 <queued_state_set>, user_data=0x560e25e985e0) at gmain.c:5579
#29 0x00007f6a2131e95d in g_main_dispatch (context=0x560e25d97bc0) at gmain.c:3193
#30 g_main_context_dispatch (context=context@entry=0x560e25d97bc0) at gmain.c:3873
#31 0x00007f6a2131ed18 in g_main_context_iterate (context=0x560e25d97bc0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3946
#32 0x00007f6a2131f042 in g_main_loop_run (loop=0x560e25d730f0) at gmain.c:4142
#33 0x0000560e237c06ec in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: 929eae245d5d ('l3cfg: implement NM_L3CFG_CONFIG_FLAGS_ASSUME_CONFIG_ONCE and rework object state')
|
|
|
|
This is intended to unbreak LXD by default.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1112
|
|
Since commit ecc73eb239e6 ('ovs-port: always remove the OVSDB entry on
slave release'), ovs port were removing the ovsdb entry upon being
un-enslaved, no matter what the reason for un-enslavement was. The idea
was to remove the stale ovsdb entry upon forcible device removal.
This cleanup is specific to OpenVSwitch, since for other device types,
the device master is the property of the slave and thus goes away along
with the device.
Turns out we're now removing the ovsdb entry even when the device
actually doesn't go away, but we're pretending it does because the
daemon is shutting down.
To add insult to injury, we generally end up removing one entry,
because the other ovsdb calls end up in a queue and don't get serviced
before the daemon shuts down. The result is a mess. (This patch
doesn't solve that -- if someone terminates the daemon with in-flight
ovsdb calls they're still out of luck).
Let's do the cleanup now only if the device was actually physically
removed.
Fixes-test: @NM_reboot_openvswitch_vlan_configuration
Fixes: ecc73eb239e6 ('ovs-port: always remove the OVSDB entry on slave release')
https://bugzilla.redhat.com/show_bug.cgi?id=2055665
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1117
|
|
Once the shutdown logic is in place, we don't want to shut down until
the OVSDB calls are serviced.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1118
|
|
There's a standard %_unitdir macro for this purpose, shipped with
systemd-rpm-macros.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1098
|
|
It's not used anywhere.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1098
|
|
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1110
|
|
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1107
|
|
This also causes leaks with recent glib, which can be found via valgrind.
Fixes: c7b386250369 ('platform: add network namespace support to platform')
|
|
gcc-12.0.1-0.8.fc36 is annoying with false positives.
It's related to g_error() and its `for(;;) ;`.
For example:
../src/libnm-glib-aux/nm-shared-utils.c: In function 'nm_utils_parse_inaddr_bin_full':
../src/libnm-glib-aux/nm-shared-utils.c:1145:26: error: dangling pointer to 'error' may be used [-Werror=dangling-pointer=]
1145 | error->message);
| ^~
/usr/include/glib-2.0/glib/gmessages.h:343:32: note: in definition of macro 'g_error'
343 | __VA_ARGS__); \
| ^~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1133:31: note: 'error' declared here
1133 | gs_free_error GError *error = NULL;
| ^~~~~
/usr/include/glib-2.0/glib/gmessages.h:341:25: error: dangling pointer to 'addrbin' may be used [-Werror=dangling-pointer=]
341 | g_log (G_LOG_DOMAIN, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
342 | G_LOG_LEVEL_ERROR, \
| ~~~~~~~~~~~~~~~~~~~~~~~
343 | __VA_ARGS__); \
| ~~~~~~~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1141:13: note: in expansion of macro 'g_error'
1141 | g_error("unexpected assertion failure: could parse \"%s\" as %s, but not accepted by "
| ^~~~~~~
../src/libnm-glib-aux/nm-shared-utils.c:1112:14: note: 'addrbin' declared here
1112 | NMIPAddr addrbin;
| ^~~~~~~
I think the warning could potentially be useful and prevent real bugs.
So don't disable it altogether, but go through the effort to suppress it
at the places where it currently happens.
Note that NM_PRAGMA_WARNING_DISABLE_DANGLING_POINTER macro only expands
to suppressing the warning with __GNUC__ equal to 12. The purpose is to
only suppress the warning where we know we want to. Hopefully other gcc
versions don't have this problem.
I guess, we could also write a NM_COMPILER_WARNING() check in
"m4/compiler_options.m4", to disable the warning if we detect it. But
that seems too cumbersome.
|
|
New gcc-12.0.1-0.8.fc36 on Fedora rawhide likes to emit false
"-Wdangling-pointer" warnings with some g_error() uses. It seems
related to g_error()'s `for(;;) ;`.
As workaround, add a macro to suppress the warning.
But only do that for gcc-12. This bug hopefully gets fixed
and we don't want to suppress useful warnings too eagerly.
https://bugzilla.redhat.com/show_bug.cgi?id=2056613
|
|
Also, combine the different macros in the same #if/#else block.
The point of this is if you have a macro that does conditionally
NM_PRAGMA_WARNING_DISABLE(), then we need a way to balance the
push/pop.
|
|
Recent meson versions treat unknown options as error and break now the
build. Good from them. This was an oversight.
Fixes: bbb1f5df2f23 ('libnm: always build libnm with JSON validation')
|
|
G-IR currently lacks an annotation for associating async calls to their
_finish counterparts. As a result, vala's binding generator assumes that
corresponding function is just function name - _async + _finish. This
holds true for 99% of the time, but not here, because
nm_device_wifi_request_scan_options_async uses
nm_device_wifi_request_scan_finish instead of expected
nm_device_wifi_request_scan_options_finish (sharing it with
nm_device_wifi_request_scan_async). As such, a metadata entry is
required to point vala to correct finishing function.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1114
|
|
For some embedded systems, no cryptography is required at all (e.g when
only using Ethernet).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1108
|
|
Since commit 40032f461415 ('cli: fix resetting values via property
alias'), nmcli sets NULL properties during interactive add (nmcli -a
connection add) when the user leaves the field blank. This can lead to
an invalid connection for properties that can't be empty like
infiniband.transport-mode; they should be left to the default value in
case of no value entered.
Fixes: 40032f461415 ('cli: fix resetting values via property alias')
Fixes-test: @inf_create_port_novice_mode
https://bugzilla.redhat.com/show_bug.cgi?id=2053603
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1111
|
|
|
|
ci-templates now works around the earlier problem to install CentOS 8
Linux containers. Re-add the tests.
This reverts commit b2d2b8d6fa05e6911c3a7599b35e38e149ddf873.
|
|
We need the latest fix to bootstrap CentOS 8 Linux containers.
See-also: https://gitlab.freedesktop.org/freedesktop/ci-templates/-/merge_requests/131
See-also: https://gitlab.freedesktop.org/freedesktop/ci-templates/-/merge_requests/132
|
|
See-also: https://stackoverflow.com/questions/70926799/centos-through-vm-no-urls-in-mirrorlist
See-also: https://techglimpse.com/failed-metadata-repo-appstream-centos-8/
|
|
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1099
|
|
wifi: choose a (stable) random channel for Wi-Fi hotspot
The channel depends on the SSID.
Based-on-patch-by: xiangnian <xiangnian@uniontech.com>
See-also: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1054
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1099
|
|
As we iterate over "self->num_freqs", we must not modify "freqs",
otherwise, the second and subsequenty frequencies in self->freqs[i]
cannot match.
Fixes: dd8c546ff052 ('2007-12-27 Dan Williams <dcbw@redhat.com>')
Fixes: ba8527ca58bd ('wifi: preliminary nl80211 patch')
|
|
And also after WIFI_GET_WIFI_DATA_NETNS(), which also is a common
preamble that validates input arguments.
|
|
pppd is a delicate flower. On orderly shutdown, it likes to tell the
other side. This seems to take at least a second even when no real
network latency is at play, on busy systems 1.5 seconds easily ends up
being inadequate.
A violent shutdown is generally okay apart from that it can leave
garbage (port lock) behind and the other side potentially confused for a
while.
As it happens, this interacts badly with modemu.pl which is used for
testing: the pseudo terminal in PPP line discipline mode has no idea
that the remote disconnected and while ModemManager is learning that
something wrong the hard way (AT command timing out, because the remote
still expects to talk PPP), the test times out.
Let's increase the timeout to something more reasonable.
https://bugzilla.redhat.com/show_bug.cgi?id=2049596
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1103
|
|
When you do
$ nmcli connection modify "$PROFILE" +ipv4.routing-rules 'uidrange 1000-1000 lookup 12345'
Error: failed to modify ipv4.routing-rules: rule is invalid: invalid priority.
That message seems confusing. Reword.
|
|
This is important. We must not swallow 77, which has the meaning
that the test was skipped.
Fixes: f65747f6e9ed ('tests: let "run-nm-test.sh" fail with exit code 1 on failure')
|
|
|
|
|
|
|
|
Ask systemd instead of hardcoding the path. While this is a bit nicer,
it should have precisely zero effect as the discovered path should be
the same as we were hard-coding.
We default to placing the unit file under the same $prefix as the user is
installing into. This seems to be the correct thing to do if the user is
installing to /usr/local (according to systemd-path(1),
/usr/local/lib/systemd/system is all right), but can install the unit
file into wrong path if the user chooses to install into some obscure
location. I guess it's their responsibility in the end.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1101
|
|
The DNS configuration for a wireguard connection should be added with
type "VPN".
Fixes: 58287cbcc0c8 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1102
|
|
This is vestigal. It has been in place, because we'd be turning off echo
ourselves when asking for password and needed to make sure we'd still
terminal in original state upon unexpected termination.
This shouldn't be necessary since commit 9d95e1f1753a ('clients/cli: use a
nicer password prompt') we let readline take care of this and also clean
up after itself in nmc_cleanup_readline().
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1100
|
|
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1097
|
|
Don't progress to the IP ready state until all objects are committed
to platform. Note that l3cfg has a 20 seconds timeout after which
unavailable objects are considered "definitely unavailable" and are
removed from the list.
Fixes-test: @ipv6_routes_with_src
https://bugzilla.redhat.com/show_bug.cgi?id=2043133
|
|
l3cfg has a "temp_not_available" list of objects that couldn't be
added to platform, but can be added once some preconditions become
true (for example, a IPv6 route with a "src" attribute requires a
non-tentative src address to be present).
Retry to commit those objects once all addresses have completed
ACD/DAD.
|
|
|
|
nm_l3_config_data_get_nameservers() returns a pointer to "struct in6_addr". Not
a pointer to pointers.
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:389
#1 0x00007f8060dd9109 in memcpy (__len=<optimized out>, __src=0xfd, __dest=<optimized out>) at /usr/include/bits/string_fortified.h:29
#2 g_array_append_vals (len=1, data=0xfd, farray=0x55dd69332130) at ../glib/garray.c:522
#3 g_array_append_vals (farray=0x55dd69332130, data=0xfd, len=1) at ../glib/garray.c:509
#4 0x000055dd68d2a27d in _garray_inaddr_add (p_arr=<optimized out>, addr_family=<optimized out>, addr=0xfd) at src/core/nm-l3-config-data.c:295
#5 0x000055dd68ef6510 in nm_l3_config_data_add_nameserver (nameserver=<optimized out>, addr_family=10, self=0x55dd6949f900) at src/core/nm-l3-config-data.c:1442
#6 nm_device_copy_ip6_dns_config (self=0x55dd693c4420, from_device=<optimized out>) at src/core/devices/nm-device.c:10468
#7 0x00007f8060f28aba in _g_closure_invoke_va (param_types=0x0, n_params=<optimized out>, args=0x7fffed43d610, instance=0x55dd693c4420, return_value=0x0, closure=0x55dd693cdb10)
at ../gobject/gclosure.c:893
#8 g_signal_emit_valist (instance=0x55dd693c4420, signal_id=<optimized out>, detail=0, var_args=var_args@entry=0x7fffed43d610) at ../gobject/gsignal.c:3406
#9 0x00007f8060f28c03 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3553
#10 0x000055dd68efd1fb in _dev_ipac6_start (self=0x55dd693c4420) at src/core/devices/nm-device.c:11348
#11 0x000055dd68efd698 in _dev_ipac6_start_continue (self=0x55dd693c4420) at src/core/devices/nm-device.c:11373
#12 _dev_ipll6_set_llstate (self=0x55dd693c4420, llstate=<optimized out>, lladdr=<optimized out>) at src/core/devices/nm-device.c:10576
#13 0x000055dd68e7915e in _emit_changed_on_idle_cb (user_data=user_data@entry=0x55dd6941ca50) at src/core/nm-l3-ipv6ll.c:221
#14 0x00007f8060e0639b in g_idle_dispatch (source=0x55dd693eea30, callback=0x55dd68e78fd0 <_emit_changed_on_idle_cb>, user_data=0x55dd6941ca50) at ../glib/gmain.c:5897
#15 0x00007f8060e0a05f in g_main_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:3381
#16 g_main_context_dispatch (context=0x55dd6922c800) at ../glib/gmain.c:4099
#17 0x00007f8060e5f2a8 in g_main_context_iterate.constprop.0 (context=0x55dd6922c800, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4175
#18 0x00007f8060e09773 in g_main_loop_run (loop=0x55dd69211010) at ../glib/gmain.c:4373
#19 0x000055dd68d09c7b in main (argc=<optimized out>, argv=<optimized out>) at src/core/main.c:509
Fixes: 58287cbcc0c8 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
|
|
https://bugzilla.redhat.com/show_bug.cgi?id=1837254
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1093
|
|
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
|