diff options
author | Thomas Haller <thaller@redhat.com> | 2022-02-10 17:58:17 +0100 |
---|---|---|
committer | Beniamino Galvani <bgalvani@redhat.com> | 2022-06-08 11:10:08 +0200 |
commit | 87fe255c891ddad2c32c383c5a203259a0b05342 (patch) | |
tree | 16b4a29d2174ebaeb9967d73528ce91b5f0b7973 | |
parent | 99cd6ed25e02083e7c40ad15cbe76d58d080e669 (diff) |
platform: support IPv6 mulitpath routes and fix cache inconsistencybg/nm-1-34
Add support for IPv6 multipath routes, by treating them as single-hop
routes. Otherwise, we can easily end up with an inconsistent platform
cache.
Background:
-----------
Routes are hard. We have NMPlatform which is a cache of netlink objects.
That means, we have a hash table and we cache objects based on some
identity (nmp_object_id_equal()). So those objects must have some immutable,
indistinguishable properties that determine whether an object is the
same or a different one.
For routes and routing rules, this identifying property is basically a subset
of the attributes (but not all!). That makes it very hard, because tomorrow
kernel could add an attribute that becomes part of the identity, and NetworkManager
wouldn't recognize it, resulting in cache inconsistency by wrongly
thinking two different routes are one and the same. Anyway.
The other point is that we rely on netlink events to maintain the cache.
So when we receive a RTM_NEWROUTE we add the object to the cache, and
delete it upon RTM_DELROUTE. When you do `ip route replace`, kernel
might replace a (different!) route, but only send one RTM_NEWROUTE message.
We handle that by somehow finding the route that was replaced/deleted. It's
ugly. Did I say, that routes are hard?
Also, for IPv4 routes, multipath attributes are just a part of the
routes identity. That is, you add two different routes that only differ
by their multipath list, and then kernel does as you would expect.
NetworkManager does not support IPv4 multihop routes and just ignores
them.
Also, a multipath route can have next hops on different interfaces,
which goes against our current assumption, that an NMPlatformIP4Route
has an interface (or no interface, in case of blackhole routes). That
makes it hard to meaningfully support IPv4 routes. But we probably don't
have to, because we can just pretend that such routes don't exist and
our cache stays consistent (at least, until somebody calls `ip route
replace` *sigh*).
Not so for IPv6. When you add (`ip route append`) an IPv6 route that is
identical to an existing route -- except their multipath attribute -- then it
behaves as if the existing route was modified and the result is the
merged route with more next-hops. Note that in this case kernel will
only send a RTM_NEWROUTE message with the full multipath list. If we
would treat the multipath list as part of the route's identity, this
would be as if kernel deleted one routes and created a different one (the
merged one), but only sending one notification. That's a bit similar to
what happens during `ip route replace`, but it would be nightmare to
find out which route was thereby replaced.
Likewise, when you delete a route, then kernel will "subtract" the
next-hop and sent a RTM_DELROUTE notification only about the next-hop that
was deleted. To handle that, you would have to find the full multihop
route, and replace it with the remainder after the subtraction.
NetworkManager so far ignored IPv6 routes with more than one next-hop, this
means you can start with one single-hop route (that NetworkManger sees
and has in the platform cache). Then you create a similar route (only
differing by the next-hop). Kernel will merge the routes, but not notify
NetworkManager that the single-hop route is not longer a single-hop
route. This can easily cause a cache inconsistency and subtle bugs. For
IPv6 we MUST handle multihop routes.
Kernels behavior makes little sense, if you expect that routes have an
immutable identity and want to get notifications about addition/removal.
We can however make sense by it by pretending that all IPv6 routes are
single-hop! With only the twist that a single RTM_NEWROUTE notification
might notify about multiple routes at the same time. This is what the
patch does.
The Patch
---------
Now one RTM_NEWROUTE message can contain multiple IPv6 routes
(NMPObject). That would mean that nmp_object_new_from_nl() needs to
return a list of objects. But it's not implemented that way. Instead,
we still call nmp_object_new_from_nl(), and the parsing code can
indicate that there is something more, indicating the caller to call
nmp_object_new_from_nl() again in a loop to fetch more objects.
In practice, I think all RTM_DELROUTE messages for IPv6 routes are
single-hop. Still, we implement it to handle also multi-hop messages the
same way.
Note that we just parse the netlink message again from scratch. The alternative
would be to parse the first object once, and then clone the object and
only update the next-hop. That would be more efficient, but probably
harder to understand/implement.
https://bugzilla.redhat.com/show_bug.cgi?id=1837254#c20
(cherry picked from commit dac12a8d6178a6d82fc0913ad8825c8556380ba8)
(cherry picked from commit 698cf1092c39b31e0b2f08de963bc0440a444401)
-rw-r--r-- | src/libnm-platform/nm-linux-platform.c | 165 |
1 files changed, 125 insertions, 40 deletions
diff --git a/src/libnm-platform/nm-linux-platform.c b/src/libnm-platform/nm-linux-platform.c index 68f4f980c7..816739e565 100644 --- a/src/libnm-platform/nm-linux-platform.c +++ b/src/libnm-platform/nm-linux-platform.c @@ -1199,6 +1199,21 @@ _linktype_get_type(NMPlatform * platform, * libnl unility functions and wrappers ******************************************************************/ +typedef struct { + /* The first time, we are called with "iter_more" false. If there is only + * one message to parse, the callee can leave this at false and be done + * (meaning, it can just ignore the potential parsing of multiple messages). + * If there are multiple message, then set this to TRUE. We will continue + * the parsing as long as this flag stays TRUE and an object gets returned. */ + bool iter_more; + + union { + struct { + guint next_multihop; + } ip6_route; + }; +} ParseNlmsgIter; + #define NLMSG_TAIL(nmsg) ((struct rtattr *) (((char *) (nmsg)) + NLMSG_ALIGN((nmsg)->nlmsg_len))) /* copied from iproute2's addattr_l(). */ @@ -3363,7 +3378,7 @@ _new_from_nl_addr(struct nlmsghdr *nlh, gboolean id_only) /* Copied and heavily modified from libnl3's rtnl_route_parse() and parse_multipath(). */ static NMPObject * -_new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) +_new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only, ParseNlmsgIter *parse_nlmsg_iter) { static const struct nla_policy policy[] = { [RTA_TABLE] = {.type = NLA_U32}, @@ -3376,17 +3391,21 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) [RTA_METRICS] = {.type = NLA_NESTED}, [RTA_MULTIPATH] = {.type = NLA_NESTED}, }; + guint multihop_idx; const struct rtmsg *rtm; struct nlattr * tb[G_N_ELEMENTS(policy)]; + int addr_family; gboolean IS_IPv4; nm_auto_nmpobj NMPObject *obj = NULL; int addr_len; struct { - gboolean is_present; + gboolean found; + gboolean has_more; int ifindex; NMIPAddr gateway; } nh = { - .is_present = FALSE, + .found = FALSE, + .has_more = FALSE, }; guint32 mss; guint32 window = 0; @@ -3396,6 +3415,10 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) guint32 mtu = 0; guint32 lock = 0; + nm_assert((parse_nlmsg_iter->iter_more && parse_nlmsg_iter->ip6_route.next_multihop > 0) + || (!parse_nlmsg_iter->iter_more && parse_nlmsg_iter->ip6_route.next_multihop == 0)); + multihop_idx = parse_nlmsg_iter->ip6_route.next_multihop; + if (!nlmsg_valid_hdr(nlh, sizeof(*rtm))) return NULL; @@ -3405,9 +3428,11 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) * only handle ~supported~ routes. *****************************************************************/ - if (rtm->rtm_family == AF_INET) + addr_family = rtm->rtm_family; + + if (addr_family == AF_INET) IS_IPv4 = TRUE; - else if (rtm->rtm_family == AF_INET6) + else if (addr_family == AF_INET6) IS_IPv4 = FALSE; else return NULL; @@ -3420,57 +3445,76 @@ _new_from_nl_route(struct nlmsghdr *nlh, gboolean id_only) /*****************************************************************/ - addr_len = IS_IPv4 ? sizeof(in_addr_t) : sizeof(struct in6_addr); + addr_len = nm_utils_addr_family_to_size(addr_family); if (rtm->rtm_dst_len > (IS_IPv4 ? 32 : 128)) return NULL; - /***************************************************************** - * parse nexthops. Only handle routes with one nh. - *****************************************************************/ - if (tb[RTA_MULTIPATH]) { - size_t tlen = nla_len(tb[RTA_MULTIPATH]); + size_t tlen; struct rtnexthop *rtnh; + guint idx; + tlen = nla_len(tb[RTA_MULTIPATH]); if (tlen < sizeof(*rtnh)) goto rta_multipath_done; rtnh = nla_data_as(struct rtnexthop, tb[RTA_MULTIPATH]); - if (tlen < rtnh->rtnh_len) goto rta_multipath_done; + idx = 0; while (TRUE) { - if (nh.is_present) { - /* we don't support multipath routes. */ - return NULL; - } - - nh.is_present = TRUE; - nh.ifindex = rtnh->rtnh_ifindex; - - if (rtnh->rtnh_len > sizeof(*rtnh)) { - struct nlattr *ntb[RTA_GATEWAY + 1]; + if (idx == multihop_idx) { + nh.found = TRUE; + nh.ifindex = rtnh->rtnh_ifindex; + if (rtnh->rtnh_len > sizeof(*rtnh)) { + struct nlattr *ntb[RTA_MAX + 1]; + + if (nla_parse_arr(ntb, + (struct nlattr *) RTNH_DATA(rtnh), + rtnh->rtnh_len - sizeof(*rtnh), + NULL) + < 0) + return NULL; - if (nla_parse_arr(ntb, - (struct nlattr *) RTNH_DATA(rtnh), - rtnh->rtnh_len - sizeof(*rtnh), - NULL) - < 0) + if (_check_addr_or_return_null(ntb, RTA_GATEWAY, addr_len)) + memcpy(&nh.gateway, nla_data(ntb[RTA_GATEWAY]), addr_len); + } + } else if (nh.found) { + /* we just parsed a nexthop, but there is yet another hop afterwards. */ + nm_assert(idx == multihop_idx + 1); + if (IS_IPv4) { + /* for IPv4, multihop routes are currently not supported. + * + * If we ever support them, then the next-hop list is part of the NMPlatformIPRoute, + * that is, for IPv4 we truly have multihop routes. Unlike for IPv6. + * + * For now, just error out. */ return NULL; + } - if (_check_addr_or_return_null(ntb, RTA_GATEWAY, addr_len)) - memcpy(&nh.gateway, nla_data(ntb[RTA_GATEWAY]), addr_len); + /* For IPv6 multihop routes, we need to remember to iterate again. + * For each next-hop, we will create a distinct single-hop NMPlatformIP6Route. */ + nh.has_more = TRUE; + break; } if (tlen < RTNH_ALIGN(rtnh->rtnh_len) + sizeof(*rtnh)) - goto rta_multipath_done; + break; tlen -= RTNH_ALIGN(rtnh->rtnh_len); rtnh = RTNH_NEXT(rtnh); + idx++; } -rta_multipath_done:; + } + +rta_multipath_done: + + if (!nh.found && multihop_idx > 0) { + /* something is wrong. We are called back to collect multi_idx, but the index + * is not there. We messed up the book keeping. */ + return nm_assert_unreachable_val(NULL); } if (tb[RTA_OIF] || tb[RTA_GATEWAY] || tb[RTA_FLOW]) { @@ -3482,19 +3526,23 @@ rta_multipath_done:; if (_check_addr_or_return_null(tb, RTA_GATEWAY, addr_len)) memcpy(&gateway, nla_data(tb[RTA_GATEWAY]), addr_len); - if (!nh.is_present) { + if (!nh.found) { /* If no nexthops have been provided via RTA_MULTIPATH * we add it as regular nexthop to maintain backwards * compatibility */ nh.ifindex = ifindex; nh.gateway = gateway; + nh.found = TRUE; } else { /* Kernel supports new style nexthop configuration, * verify that it is a duplicate and ignore old-style nexthop. */ - if (nh.ifindex != ifindex || memcmp(&nh.gateway, &gateway, addr_len) != 0) + if (nh.ifindex != ifindex || memcmp(&nh.gateway, &gateway, addr_len) != 0) { + /* we have a RTA_MULTIPATH attribute that does not agree. + * That seems not right. Error out. */ return NULL; + } } - } else if (!nh.is_present) + } else if (!nh.found) return NULL; /*****************************************************************/ @@ -3595,6 +3643,12 @@ rta_multipath_done:; obj->ip_route.r_rtm_flags = rtm->rtm_flags; obj->ip_route.rt_source = nmp_utils_ip_config_source_from_rtprot(rtm->rtm_protocol); + if (nh.has_more) { + parse_nlmsg_iter->iter_more = TRUE; + parse_nlmsg_iter->ip6_route.next_multihop = multihop_idx + 1; + } else + parse_nlmsg_iter->iter_more = FALSE; + return g_steal_pointer(&obj); } @@ -4036,7 +4090,8 @@ static NMPObject * nmp_object_new_from_nl(NMPlatform * platform, const NMPCache *cache, struct nl_msg * msg, - gboolean id_only) + gboolean id_only, + ParseNlmsgIter *parse_nlmsg_iter) { struct nlmsghdr *msghdr; @@ -4058,7 +4113,7 @@ nmp_object_new_from_nl(NMPlatform * platform, case RTM_NEWROUTE: case RTM_DELROUTE: case RTM_GETROUTE: - return _new_from_nl_route(msghdr, id_only); + return _new_from_nl_route(msghdr, id_only, parse_nlmsg_iter); case RTM_NEWRULE: case RTM_DELRULE: case RTM_GETRULE: @@ -6914,12 +6969,13 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events gboolean is_del = FALSE; gboolean is_dump = FALSE; NMPCache * cache = nm_platform_get_cache(platform); - - msghdr = nlmsg_hdr(msg); + ParseNlmsgIter parse_nlmsg_iter; if (!handle_events) return; + msghdr = nlmsg_hdr(msg); + if (NM_IN_SET(msghdr->nlmsg_type, RTM_DELLINK, RTM_DELADDR, @@ -6932,7 +6988,11 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events is_del = TRUE; } - obj = nmp_object_new_from_nl(platform, cache, msg, is_del); + parse_nlmsg_iter = (ParseNlmsgIter){ + .iter_more = FALSE, + }; + + obj = nmp_object_new_from_nl(platform, cache, msg, is_del, &parse_nlmsg_iter); if (!obj) { _LOGT("event-notification: %s: ignore", nl_nlmsghdr_to_str(msghdr, buf_nlmsghdr, sizeof(buf_nlmsghdr))); @@ -6960,7 +7020,7 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events NULL, 0)); - { + while (TRUE) { nm_auto_nmpobj const NMPObject *obj_old = NULL; nm_auto_nmpobj const NMPObject *obj_new = NULL; @@ -7084,6 +7144,31 @@ event_valid_msg(NMPlatform *platform, struct nl_msg *msg, gboolean handle_events default: break; } + + if (!parse_nlmsg_iter.iter_more) { + /* we are done. */ + return; + } + + /* There is a special case here. For IPv6 routes, kernel will merge/mangle routes + * that only differ by their next-hop, and pretend they are multi-hop routes. We + * untangle them, and pretend there are only single-hop routes. Hence, one RTM_{NEW,DEL}ROUTE + * message might be about multiple IPv6 routes (NMPObject). So, now let's parse the next... */ + + nm_assert(NM_IN_SET(msghdr->nlmsg_type, RTM_NEWROUTE, RTM_DELROUTE)); + + nm_clear_pointer(&obj, nmp_object_unref); + + obj = nmp_object_new_from_nl(platform, cache, msg, is_del, &parse_nlmsg_iter); + if (!obj) { + /* we are done. Usually we don't expect this, because we were told that + * there would be another object to collect, but there isn't one. Something + * unusual happened. + * + * the only reason why this actually could happen is if the next-hop data + * is invalid -- we didn't verify that it would be valid when we set iter_more. */ + return; + } } } |