IPv6路径选择主要由函数fib6_select_path完成,如下介绍。

如果路由查询结果项中fib6_info没有配置nexthop属性路由,并且siblings的数量为0,不存在多个内置路由路径,则当前路由即为选择的路由。或者siblings的数量不为0,但是此次查询结果已经匹配了指定的出接口,也认为其为合适的路由。跳转到最后out,将路由下一跳赋值给结果fib6_result结构的成员nh(res->nh)。

随后将会介绍通用路由查找,其中将先行调用rt6_device_match进行了出接口匹配have_oif_match。

另外,如果查询到的路由配置了nexthop属性,出接口也已匹配,并且路由结果fib6_result中下一跳nh已有值,表明路由已经选择完成,直接退出。如在函数rt6_device_match或者rt6_select中可能设置res->nh下一跳值,稍后介绍。

void fib6_select_path(const struct net *net, struct fib6_result *res,struct flowi6 *fl6, int oif, bool have_oif_match,const struct sk_buff *skb, int strict)
{struct fib6_info *sibling, *next_sibling;struct fib6_info *match = res->f6i;if (!match->nh && (!match->fib6_nsiblings || have_oif_match))goto out;if (match->nh && have_oif_match && res->nh)return;

如果多路径hash值及为空,并且nexthop未配置,或者配置的为多路径的nexthop组,这里通过rt6_multipath_hash函数记录mp_hash值。之后,由函数nexthop_path_fib6_result在nexthop组中的选择合适的路径。

    /* We might have already computed the hash for ICMPv6 errors. In such* case it will always be non-zero. Otherwise now is the time to do it.*/if (!fl6->mp_hash &&(!match->nh || nexthop_is_multipath(match->nh)))fl6->mp_hash = rt6_multipath_hash(net, fl6, skb, NULL);if (unlikely(match->nh)) {nexthop_path_fib6_result(res, fl6->mp_hash);return;}

反之,对于没有配置nexthop的路由项,遍历siblings链表,找到以上计算的mp_hash值小于等于sibling中fib_nh_upper_bound值的第一个路由项,如果其得分小于0,使用fib6_result结构体成员(res->f6i)执行的路由项,否则,使用遍历到的路由项。

    if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))goto out;list_for_each_entry_safe(sibling, next_sibling, &match->fib6_siblings,fib6_siblings) {const struct fib6_nh *nh = sibling->fib6_nh;int nh_upper_bound;nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);if (fl6->mp_hash > nh_upper_bound)continue;if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)break;match = sibling;break;}out:res->f6i = match;res->nh = match->fib6_nh;

路由下一跳分值

如果查询未指定出接口,或者下一跳设备等于指定的出接口,分值赋值为2。否则,如果下一跳出接口与指定的出接口不同,并且设置了接口强制匹配标志RT6_LOOKUP_F_IFACE,返回负值-3(RT6_NUD_FAIL_HARD)。

如果内核支持RFC 4191中定义的Router Advertisement路由优先级扩展,将优先级左移2位,与m进行或操作,优先级定义了三个:1:low,2:med,3:high。

如果设置了下一跳可达性检测RT6_LOOKUP_F_REACHABLE,并且没有设置RTF_NONEXTHOP,而且下一跳存在,使用函数rt6_check_neigh检测下一跳是否可达,不可达返回负值n。

static int rt6_score_route(const struct fib6_nh *nh, u32 fib6_flags, int oif, int strict)
{int m = 0;if (!oif || nh->fib_nh_dev->ifindex == oif)m = 2;if (!m && (strict & RT6_LOOKUP_F_IFACE))return RT6_NUD_FAIL_HARD;#ifdef CONFIG_IPV6_ROUTER_PREFm |= IPV6_DECODE_PREF(IPV6_EXTRACT_PREF(fib6_flags)) << 2;
#endifif ((strict & RT6_LOOKUP_F_REACHABLE) &&!(fib6_flags & RTF_NONEXTHOP) && nh->fib_nh_gw_family) {int n = rt6_check_neigh(nh);if (n < 0)return n;}return m;
}

根据下一跳的设备和网关查询邻居表,如果不存在,表明下一跳网关不可达,如果开启了ROUTER_PREF扩展,返回RT6_NUD_SUCCEED(1),否则返回RT6_NUD_FAIL_DO_RR(-1)。

如果找到对应的邻居项,查看其状态是否有效NUD_VALID,为真返回RT6_NUD_SUCCEED(1);如果状态设置了NUD_FAILED,返回RT6_NUD_FAIL_PROBE(-1);否则,返回RT6_NUD_SUCCEED(1)。

static enum rt6_nud_state rt6_check_neigh(const struct fib6_nh *fib6_nh)
{enum rt6_nud_state ret = RT6_NUD_FAIL_HARD;struct neighbour *neigh;rcu_read_lock_bh();neigh = __ipv6_neigh_lookup_noref(fib6_nh->fib_nh_dev, &fib6_nh->fib_nh_gw6);if (neigh) {read_lock(&neigh->lock);if (neigh->nud_state & NUD_VALID)ret = RT6_NUD_SUCCEED;
#ifdef CONFIG_IPV6_ROUTER_PREFelse if (!(neigh->nud_state & NUD_FAILED))ret = RT6_NUD_SUCCEED;elseret = RT6_NUD_FAIL_PROBE;
#endifread_unlock(&neigh->lock);} else {ret = IS_ENABLED(CONFIG_IPV6_ROUTER_PREF) ?RT6_NUD_SUCCEED : RT6_NUD_FAIL_DO_RR;}rcu_read_unlock_bh();return ret;

多路径hash计数

hash策略,可通过PROC文件fib_multipath_use_neigh修改策略值,默认值为0,即根据报文三层信息生成hash,取值如下:

  • 0 Layer 3
  • 1 Layer 4
  • 2 Layer 3 or inner Layer 3 if present
static inline int ip6_multipath_hash_policy(const struct net *net)
{              return net->ipv6.sysctl.multipath_hash_policy;
}

对于策略0,如果skb有值,由函数ip6_multipath_l3_keys计算flow_keys,如果skb为ICMPv6报文,根据内层IPv6三层头部信息初始化hash_keys。否则,如果skb为空,根据流信息fl6初始化hash_keys。

/* if skb is set it will be used and fl6 can be NULL */
u32 rt6_multipath_hash(const struct net *net, const struct flowi6 *fl6,const struct sk_buff *skb, struct flow_keys *flkeys)
{struct flow_keys hash_keys;u32 mhash;switch (ip6_multipath_hash_policy(net)) {case 0:memset(&hash_keys, 0, sizeof(hash_keys));hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;if (skb) {ip6_multipath_l3_keys(skb, &hash_keys, flkeys);} else {hash_keys.addrs.v6addrs.src = fl6->saddr;hash_keys.addrs.v6addrs.dst = fl6->daddr;hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6);hash_keys.basic.ip_proto = fl6->flowi6_proto;}break;

对于策略1,如果skb有值,根据报文中的信息初始化hash_keys,包括源/目的地址,源/目的端口号和协议。否则,根据流信息fl6初始化以上变量值。

    case 1:if (skb) {unsigned int flag = FLOW_DISSECTOR_F_STOP_AT_ENCAP;struct flow_keys keys;/* short-circuit if we already have L4 hash present */if (skb->l4_hash)return skb_get_hash_raw(skb) >> 1;memset(&hash_keys, 0, sizeof(hash_keys));if (!flkeys) {skb_flow_dissect_flow_keys(skb, &keys, flag);flkeys = &keys;}hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;hash_keys.addrs.v6addrs.src = flkeys->addrs.v6addrs.src;hash_keys.addrs.v6addrs.dst = flkeys->addrs.v6addrs.dst;hash_keys.ports.src = flkeys->ports.src;hash_keys.ports.dst = flkeys->ports.dst;hash_keys.basic.ip_proto = flkeys->basic.ip_proto;} else {memset(&hash_keys, 0, sizeof(hash_keys));hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;hash_keys.addrs.v6addrs.src = fl6->saddr;hash_keys.addrs.v6addrs.dst = fl6->daddr;hash_keys.ports.src = fl6->fl6_sport;hash_keys.ports.dst = fl6->fl6_dport;hash_keys.basic.ip_proto = fl6->flowi6_proto;}break;

对于策略2,如果skb有值,并且IPv6报文内部封装了另外的IPv4或者IPv6报文,根据内层报文初始化hash_keys,否则,与策略0相同,使用ip6_multipath_l3_keys计数hash_keys。

如果skb为空,根据流信息fl6初始化hash_keys。

    case 2:memset(&hash_keys, 0, sizeof(hash_keys));hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;if (skb) {struct flow_keys keys;if (!flkeys) {skb_flow_dissect_flow_keys(skb, &keys, 0);flkeys = &keys;}/* Inner can be v4 or v6 */if (flkeys->control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS) {hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;hash_keys.addrs.v4addrs.src = flkeys->addrs.v4addrs.src;hash_keys.addrs.v4addrs.dst = flkeys->addrs.v4addrs.dst;} else if (flkeys->control.addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;hash_keys.addrs.v6addrs.src = flkeys->addrs.v6addrs.src;hash_keys.addrs.v6addrs.dst = flkeys->addrs.v6addrs.dst;hash_keys.tags.flow_label = flkeys->tags.flow_label;hash_keys.basic.ip_proto = flkeys->basic.ip_proto;} else {/* Same as case 0 */hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;ip6_multipath_l3_keys(skb, &hash_keys, flkeys);}} else {/* Same as case 0 */hash_keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;hash_keys.addrs.v6addrs.src = fl6->saddr;hash_keys.addrs.v6addrs.dst = fl6->daddr;hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6);hash_keys.basic.ip_proto = fl6->flowi6_proto;}break;}mhash = flow_hash_from_keys(&hash_keys);return mhash >> 1;

以下函数flow_hash_from_keys根据flow_keys生成u32哈希值。

static inline u32 __flow_hash_from_keys(struct flow_keys *keys, const siphash_key_t *keyval)
{       u32 hash;__flow_hash_consistentify(keys);hash = siphash(flow_keys_hash_start(keys),flow_keys_hash_length(keys), keyval);if (!hash)hash = 1;return hash;
}u32 flow_hash_from_keys(struct flow_keys *keys)
{__flow_hash_secret_init();return __flow_hash_from_keys(keys, &hashrnd);
}

函数__flow_hash_consistentify确保同一个连接的两个方向流量计算的hash值一致。如下用IP值较小的值作为源地址地址,较小的端口号作为源端口。

/* Sort the source and destination IP (and the ports if the IP are the same),* to have consistent hash within the two directions*/
static inline void __flow_hash_consistentify(struct flow_keys *keys)
{int addr_diff, i;switch (keys->control.addr_type) {case FLOW_DISSECTOR_KEY_IPV4_ADDRS:addr_diff = (__force u32)keys->addrs.v4addrs.dst -(__force u32)keys->addrs.v4addrs.src;if ((addr_diff < 0) ||(addr_diff == 0 &&((__force u16)keys->ports.dst <(__force u16)keys->ports.src))) {swap(keys->addrs.v4addrs.src, keys->addrs.v4addrs.dst);swap(keys->ports.src, keys->ports.dst);}break;case FLOW_DISSECTOR_KEY_IPV6_ADDRS:addr_diff = memcmp(&keys->addrs.v6addrs.dst,&keys->addrs.v6addrs.src,sizeof(keys->addrs.v6addrs.dst));if ((addr_diff < 0) ||(addr_diff == 0 &&((__force u16)keys->ports.dst <(__force u16)keys->ports.src))) {for (i = 0; i < 4; i++)swap(keys->addrs.v6addrs.src.s6_addr32[i],keys->addrs.v6addrs.dst.s6_addr32[i]);swap(keys->ports.src, keys->ports.dst);}break;

up_bound计算

如果当前路由信息项不存在siblings,或者正在被flush,不进行平衡计算。

void rt6_multipath_rebalance(struct fib6_info *rt)
{struct fib6_info *first;int total;/* In case the entire multipath route was marked for flushing,* then there is no need to rebalance upon the removal of every* sibling route.*/if (!rt->fib6_nsiblings || rt->should_flush)return;/* During lookup routes are evaluated in order, so we need to* make sure upper bounds are assigned from the first sibling* onwards.*/first = rt6_multipath_first_sibling(rt);if (WARN_ON_ONCE(!first))return;total = rt6_multipath_total_weight(first);rt6_multipath_upper_bound_set(first, total);
}

首先,遍历fib6节点的叶子链表,找到第一个与当前路由rt的metric值相等,并且符合进行ECMP的路由项。

static struct fib6_info *rt6_multipath_first_sibling(const struct fib6_info *rt)
{   struct fib6_info *iter;struct fib6_node *fn;fn = rcu_dereference_protected(rt->fib6_node,lockdep_is_held(&rt->fib6_table->tb6_lock));iter = rcu_dereference_protected(fn->leaf,lockdep_is_held(&rt->fib6_table->tb6_lock));while (iter) {if (iter->fib6_metric == rt->fib6_metric &&rt6_qualify_for_ecmp(iter))return iter;iter = rcu_dereference_protected(iter->fib6_next,lockdep_is_held(&rt->fib6_table->tb6_lock));}  return NULL;

根据邻居发现协议报文RA生成的路由,配置有nexthop的路由,以及未指定网关的路由项,都不能与当前路由rt一起用做ECMP。

/* fib entries using a nexthop object can not be coalesced into a multipath route*/
static inline bool rt6_qualify_for_ecmp(const struct fib6_info *f6i)
{    /* the RTF_ADDRCONF flag filters out RA's */return !(f6i->fib6_flags & RTF_ADDRCONF) && !f6i->nh &&f6i->fib6_nh->fib_nh_gw_family;
}

其次,计算找到的第一个叶子路由项以及其siblings项的总的weight值。

/* only called for fib entries with builtin fib6_nh */
static bool rt6_is_dead(const struct fib6_info *rt)
{if (rt->fib6_nh->fib_nh_flags & RTNH_F_DEAD ||(rt->fib6_nh->fib_nh_flags & RTNH_F_LINKDOWN &&ip6_ignore_linkdown(rt->fib6_nh->fib_nh_dev)))return true;return false;
}
static int rt6_multipath_total_weight(const struct fib6_info *rt)
{struct fib6_info *iter;int total = 0;if (!rt6_is_dead(rt))total += rt->fib6_nh->fib_nh_weight;list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings) {if (!rt6_is_dead(iter))total += iter->fib6_nh->fib_nh_weight;}return total;

最后,设置第一个sibling下一跳的绑定(upper_bound)值,等于其自身的权重值(weight)左移31位,除以以上siblings计算的总权重,得到其下一跳fib_nh_upper_bound值。

之后,依次遍历所有的siblings,用于计算每个sibling的upper_bound值的权重值weight,等于之前所有sibling的下一跳权重之和,即计算而来的upper_bound值是依次递增的。

static void rt6_upper_bound_set(struct fib6_info *rt, int *weight, int total)
{   int upper_bound = -1;if (!rt6_is_dead(rt)) {*weight += rt->fib6_nh->fib_nh_weight;upper_bound = DIV_ROUND_CLOSEST_ULL((u64) (*weight) << 31,total) - 1;}atomic_set(&rt->fib6_nh->fib_nh_upper_bound, upper_bound);
}
static void rt6_multipath_upper_bound_set(struct fib6_info *rt, int total)
{struct fib6_info *iter;int weight = 0;rt6_upper_bound_set(rt, &weight, total);list_for_each_entry(iter, &rt->fib6_siblings, fib6_siblings)rt6_upper_bound_set(iter, &weight, total);
}

多路径nexthop组

核心功能由函数nexthop_select_path完成,得到所需的下一跳之后将其赋值到结果fib6_result的成员nh。对于blackhole属性的nexthop下一跳,设置类型为RTN_BLACKHOLE。

static inline void nexthop_path_fib6_result(struct fib6_result *res, int hash)
{struct nexthop *nh = res->f6i->nh;struct nh_info *nhi;nh = nexthop_select_path(nh, hash);nhi = rcu_dereference_rtnl(nh->nh_info);if (nhi->reject_nh) {res->fib6_type = RTN_BLACKHOLE;res->fib6_flags |= RTF_REJECT; res->nh = nexthop_fib6_nh(nh);} else {res->nh = &nhi->fib6_nh;}
}

对于blackhole路由,fib6_result成员nh的值,由下一跳组内的第一个成员获得,或者对于非nexthop组,直接由nexthop中取得。

static inline struct fib6_nh *nexthop_fib6_nh(struct nexthop *nh)
{struct nh_info *nhi;if (nh->is_group) {struct nh_group *nh_grp;nh_grp = rcu_dereference_rtnl(nh->nh_grp);nh = nexthop_mpath_select(nh_grp, 0);if (!nh)return NULL;}nhi = rcu_dereference_rtnl(nh->nh_info);if (nhi->family == AF_INET6)return &nhi->fib6_nh;return NULL;

如果nexthop不是组,返回nh,无需进行路径选择。否则,遍历组内的所有成员,

nexthop与通用路由不同,总是进行可达性判断。

struct nexthop *nexthop_select_path(struct nexthop *nh, int hash)
{struct nexthop *rc = NULL;struct nh_group *nhg;int i;if (!nh->is_group) return nh;nhg = rcu_dereference(nh->nh_grp);for (i = 0; i < nhg->num_nh; ++i) {struct nh_grp_entry *nhge = &nhg->nh_entries[i];struct nh_info *nhi;if (hash > atomic_read(&nhge->upper_bound))continue;nhi = rcu_dereference(nhge->nh->nh_info);if (nhi->fdb_nh)return nhge->nh;/* nexthops always check if it is good and does* not rely on a sysctl for this behavior*/switch (nhi->family) {case AF_INET:if (ipv4_good_nh(&nhi->fib_nh))return nhge->nh;break;case AF_INET6:if (ipv6_good_nh(&nhi->fib6_nh))return nhge->nh;break;}if (!rc) rc = nhge->nh;}return rc;

使用下一跳的接口和网关查询邻居表,如果其为有效状态NUD_VALID,返回真。

static bool ipv6_good_nh(const struct fib6_nh *nh)
{  int state = NUD_REACHABLE;struct neighbour *n;rcu_read_lock_bh();n = __ipv6_neigh_lookup_noref_stub(nh->fib_nh_dev, &nh->fib_nh_gw6);if (n)state = n->nud_state;rcu_read_unlock_bh();return !!(state & NUD_VALID);
}

ICMP哈希值计算

对于ICMP协议,路由查询之前计算了mp_hash值。在路径选择函数fib6_select_path中,如果判断mp_hash已经赋值,不再计算其值。函数ip6_route_input_lookup中查完路由,将使用fib6_select_path做路径选择。

void ip6_route_input(struct sk_buff *skb)
{   const struct ipv6hdr *iph = ipv6_hdr(skb);struct net *net = dev_net(skb->dev); int flags = RT6_LOOKUP_F_HAS_SADDR | RT6_LOOKUP_F_DST_NOREF;struct ip_tunnel_info *tun_info;struct flowi6 fl6 = {.flowi6_iif = skb->dev->ifindex,.daddr = iph->daddr,.saddr = iph->saddr,.flowlabel = ip6_flowinfo(iph),.flowi6_mark = skb->mark,.flowi6_proto = iph->nexthdr,};struct flow_keys *flkeys = NULL, _flkeys;tun_info = skb_tunnel_info(skb); if (tun_info && !(tun_info->mode & IP_TUNNEL_INFO_TX))fl6.flowi6_tun_key.tun_id = tun_info->key.tun_id;if (fib6_rules_early_flow_dissect(net, skb, &fl6, &_flkeys))flkeys = &_flkeys;if (unlikely(fl6.flowi6_proto == IPPROTO_ICMPV6))fl6.mp_hash = rt6_multipath_hash(net, &fl6, skb, flkeys);skb_dst_drop(skb);skb_dst_set_noref(skb, ip6_route_input_lookup(net, skb->dev, &fl6, skb, flags));

input/output路由选择

在函数ip6_pol_route中首先调用路由表查找函数fib6_table_lookup,之后再是选路函数fib6_select_path。

struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,int oif, struct flowi6 *fl6,const struct sk_buff *skb, int flags)
{struct fib6_result res = {};struct rt6_info *rt = NULL;int strict = 0;WARN_ON_ONCE((flags & RT6_LOOKUP_F_DST_NOREF) &&!rcu_read_lock_held());strict |= flags & RT6_LOOKUP_F_IFACE;strict |= flags & RT6_LOOKUP_F_IGNORE_LINKSTATE;if (net->ipv6.devconf_all->forwarding == 0)strict |= RT6_LOOKUP_F_REACHABLE;rcu_read_lock();fib6_table_lookup(net, table, oif, fl6, &res, strict);if (res.f6i == net->ipv6.fib6_null_entry)goto out;fib6_select_path(net, &res, fl6, oif, false, skb, strict);

在函数fib6_table_lookup中,根据数据流的源和目的IP地址,找到对应的路由节点(fib6_node)后,由函数rt6_select初始化路由查询结果fib6_result。

int fib6_table_lookup(struct net *net, struct fib6_table *table, int oif,struct flowi6 *fl6, struct fib6_result *res, int strict)
{struct fib6_node *fn, *saved_fn;fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);saved_fn = fn;if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)oif = 0;redo_rt6_select:rt6_select(net, fn, oif, res, strict);

如果查询到的路由节点fn没有叶子节点,或者其叶子节点等于null路由表项,使用null路由表项。

static void rt6_select(struct net *net, struct fib6_node *fn, int oif,struct fib6_result *res, int strict)
{struct fib6_info *leaf = rcu_dereference(fn->leaf);struct fib6_info *rt0;bool do_rr = false;int key_plen;/* make sure this function or its helpers sets f6i */res->f6i = NULL;if (!leaf || leaf == net->ipv6.fib6_null_entry)goto out;rt0 = rcu_dereference(fn->rr_ptr);if (!rt0)rt0 = leaf;/* Double check to make sure fn is not an intermediate node* and fn->leaf does not points to its child's leaf* (This might happen if all routes under fn are deleted from* the tree and fib6_repair_tree() is called on the node.)*/key_plen = rt0->fib6_dst.plen;
#ifdef CONFIG_IPV6_SUBTREESif (rt0->fib6_src.plen)key_plen = rt0->fib6_src.plen;
#endifif (fn->fn_bit != key_plen)goto out;find_rr_leaf(fn, leaf, rt0, oif, strict, &do_rr, res);

函数find_rr_leaf在路由节点的rr链表和叶子leaf链表中查找合适的路由项。首先遍历rr链表(rr_head),直到链表为空NULL为止。其次,遍历叶子leaf链表,直到遇到等于rr_head头部的项为止。最后,如果没有找到合适的路由项,并且cont有值,编译cont开始的链表,直到链表为空NULL为止,最后一次的遍历不在设置cont变量。

static void find_rr_leaf(struct fib6_node *fn, struct fib6_info *leaf,struct fib6_info *rr_head, int oif, int strict,bool *do_rr, struct fib6_result *res)
{u32 metric = rr_head->fib6_metric;struct fib6_info *cont = NULL;int mpri = -1;__find_rr_leaf(rr_head, NULL, metric, res, &cont,oif, strict, do_rr, &mpri);__find_rr_leaf(leaf, rr_head, metric, res, &cont,oif, strict, do_rr, &mpri);if (res->f6i || !cont)return;__find_rr_leaf(cont, NULL, metric, res, NULL,oif, strict, do_rr, &mpri);
}

如果找到metric相等的路由项(对于以上的第一次遍历rr_head,为同一个路由项metric必然相等),对于nexthop属性路由项(不常用),先检测是否是blackhole路由,为真设置相应的类型和属性,否则,由函数nexthop_for_each_fib6_nh和rt6_nh_find_match进行查找。

static void __find_rr_leaf(struct fib6_info *f6i_start,struct fib6_info *nomatch, u32 metric,struct fib6_result *res, struct fib6_info **cont,int oif, int strict, bool *do_rr, int *mpri)
{struct fib6_info *f6i;for (f6i = f6i_start; f6i && f6i != nomatch;f6i = rcu_dereference(f6i->fib6_next)) {bool matched = false;struct fib6_nh *nh;if (cont && f6i->fib6_metric != metric) {*cont = f6i;return;}if (fib6_check_expired(f6i)) continue;if (unlikely(f6i->nh)) {struct fib6_nh_frl_arg arg = {.flags  = f6i->fib6_flags,.oif    = oif,.strict = strict,.mpri   = mpri,.do_rr  = do_rr};if (nexthop_is_blackhole(f6i->nh)) {res->fib6_flags = RTF_REJECT;res->fib6_type = RTN_BLACKHOLE;res->f6i = f6i;res->nh = nexthop_fib6_nh(f6i->nh);return;}if (nexthop_for_each_fib6_nh(f6i->nh, rt6_nh_find_match, &arg)) {matched = true;nh = arg.nh;}

对于内置的路由(非nexthop),由函数find_match确定当前路由下一跳是否可用。

        } else {nh = f6i->fib6_nh;if (find_match(nh, f6i->fib6_flags, oif, strict, mpri, do_rr))matched = true;}if (matched) {res->f6i = f6i;res->nh = nh;res->fib6_flags = f6i->fib6_flags;res->fib6_type = f6i->fib6_type;}}
}

对于nexthop路由项,其处理函数rt6_nh_find_match实际上也是有find_match来确定下一跳是否可用,与内置路由项是一致的。

static int rt6_nh_find_match(struct fib6_nh *nh, void *_arg)
{struct fib6_nh_frl_arg *arg = _arg;arg->nh = nh;return find_match(nh, arg->flags, arg->oif, arg->strict,arg->mpri, arg->do_rr);
}

如果下一跳设置了RTNH_F_DEAD标志位,不是合适的下一跳。如果下一跳链路断开RTNH_F_LINKDOWN,但是本次查找要检测链路状态,其也不是合适的下一跳。其后rt6_score_route函数计数下一跳的得分,由于find_match通常是被循环调用,之后在此下一跳得分高于之前下一跳的得分时,更新do_rr(round-robin)标志和分值。最终如果do_rr为真,在rt6_select函数中,将尝试更新路由节点的rr_ptr的值。

static bool find_match(struct fib6_nh *nh, u32 fib6_flags, int oif, int strict, int *mpri, bool *do_rr)
{bool match_do_rr = false;bool rc = false;if (nh->fib_nh_flags & RTNH_F_DEAD)goto out;if (ip6_ignore_linkdown(nh->fib_nh_dev) &&nh->fib_nh_flags & RTNH_F_LINKDOWN &&!(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))goto out;m = rt6_score_route(nh, fib6_flags, oif, strict);if (m == RT6_NUD_FAIL_DO_RR) {match_do_rr = true;m = 0; /* lowest valid score */} else if (m == RT6_NUD_FAIL_HARD) {goto out;}if (strict & RT6_LOOKUP_F_REACHABLE)rt6_probe(nh);/* note that m can be RT6_NUD_FAIL_PROBE at this point */if (m > *mpri) {*do_rr = match_do_rr;*mpri = m;rc = true;}
out:return rc;

通用路由选择

不同于以上的fib6_table_lookup函数,这里在使用fib6_node_lookup找到路由节点之后,有函数rt6_device_match查找合适的路由项,前提是路由节点的叶子不为空。

INDIRECT_CALLABLE_SCOPE struct rt6_info *ip6_pol_route_lookup(struct net *net,struct fib6_table *table, struct flowi6 *fl6,const struct sk_buff *skb, int flags)
{struct fib6_result res = {};struct fib6_node *fn;struct rt6_info *rt;if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF)flags &= ~RT6_LOOKUP_F_IFACE;rcu_read_lock();fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
restart:res.f6i = rcu_dereference(fn->leaf);if (!res.f6i)res.f6i = net->ipv6.fib6_null_entry;elsert6_device_match(net, &res, &fl6->saddr, fl6->flowi6_oif, flags);if (res.f6i == net->ipv6.fib6_null_entry) {fn = fib6_backtrack(fn, &fl6->saddr);if (fn)goto restart;rt = net->ipv6.ip6_null_entry;dst_hold(&rt->dst);goto out;} else if (res.fib6_flags & RTF_REJECT) {goto do_create;}fib6_select_path(net, &res, fl6, fl6->flowi6_oif,fl6->flowi6_oif != 0, skb, flags);

如果此次路由查询出接口和源地址都没有指定,对于nexthop属性路由,如果其为blackhole,将结果初始化为blackhole类型并结束处理。否则,对于内置路由,取出下一跳。统一判断下一跳是否设置了RTNH_F_DEAD标志,不为真即使用此下一跳,结束处理。

static void rt6_device_match(struct net *net, struct fib6_result *res,const struct in6_addr *saddr, int oif, int flags)
{struct fib6_info *f6i = res->f6i;struct fib6_info *spf6i;struct fib6_nh *nh;if (!oif && ipv6_addr_any(saddr)) {if (unlikely(f6i->nh)) {nh = nexthop_fib6_nh(f6i->nh);if (nexthop_is_blackhole(f6i->nh))goto out_blackhole;} else {nh = f6i->fib6_nh;}if (!(nh->fib_nh_flags & RTNH_F_DEAD))goto out;}

否则,遍历以此路由项fib6_info开始的链表,对于nexthop属性路由,由函数rt6_nh_dev_match进行匹配;对于内置路由,由函数__rt6_device_match匹配。如果成功匹配,返回结果。

    for (spf6i = f6i; spf6i; spf6i = rcu_dereference(spf6i->fib6_next)) {bool matched = false;if (unlikely(spf6i->nh)) {nh = rt6_nh_dev_match(net, spf6i->nh, res, saddr, oif, flags);if (nh)matched = true;} else {nh = spf6i->fib6_nh;if (__rt6_device_match(net, nh, saddr, oif, flags))matched = true;}if (matched) {res->f6i = spf6i;goto out;}}

在以上都没有匹配的情况下,如果查询指定了出接口,并且需要进行接口匹配,返回null类型下一跳。

    if (oif && flags & RT6_LOOKUP_F_IFACE) {res->f6i = net->ipv6.fib6_null_entry;nh = res->f6i->fib6_nh;goto out;}   if (unlikely(f6i->nh)) {nh = nexthop_fib6_nh(f6i->nh);if (nexthop_is_blackhole(f6i->nh))goto out_blackhole;} else {nh = f6i->fib6_nh;}       if (nh->fib_nh_flags & RTNH_F_DEAD) {res->f6i = net->ipv6.fib6_null_entry;nh = res->f6i->fib6_nh;}
out:    res->nh = nh;res->fib6_type = res->f6i->fib6_type;res->fib6_flags = res->f6i->fib6_flags;return;     out_blackhole: res->fib6_flags |= RTF_REJECT;res->fib6_type = RTN_BLACKHOLE;res->nh = nh;

内核版本 5.10

IPv6多路径路由选择相关推荐

  1. 《深入解析IPv6(第3版)》——第10章 IPv6路由选择10.1 IPv6中的路由选择

    本节书摘来自异步社区<深入解析IPv6(第3版)>一书中的第10章,第10.1节,作者: [美]Joseph Davies 更多章节内容可以访问云栖社区"异步社区"公众 ...

  2. IPv6基础介绍及常用命令盘点

    转自:微点阅读  https://www.weidianyuedu.com IPv6/IPv4协议栈对比 IPv6的一些变化 1)数据链路层(L2)的type字段标识为 0x86dd,表示承载的上层协 ...

  3. 网络基础:(二)路由选择基础与静态路由

    更多文章请移步:www.yanjun.pro IP路由选择是一个通过路由器将分组从一个网络发送到另一个网络的过程,在开始介绍IP选择路由前,先来了解两个基础概念:路由选择协议和被路由协议 路由选择协议 ...

  4. IPv6技术精要--第3章 对比IPv4和IPv6

    文章目录 3.1 IPv4报文头VSIPv6报文头 -----IPv4报文头------- IHL ( 4比特) ToS (8比特) 数据包总长(Total Length, 16比特) 标识符(Ide ...

  5. 计算机网络整理(上)

    部分内容参考JavaGuide 计算机网络总结 文章目录 一 概述 三网融合 计算机网络特点 互联网的标准化工作 互联网组成 端系统之间的通信方式 交换技术 按照网络作用范围分类 计算机网络的性能指标 ...

  6. 【期末复习】计算机网络 谢希仁版(四)网络层(重点)

    期末计网满绩计划 教材:计算机网络(第七版)谢希仁版 网络层 网络层 1. 网络层的核心功能: 分组转发与路由选择 1.1 网络协议IP 1.2 一些概念 2. 数据报服务和虚电路服务 2.1 虚电路 ...

  7. JavaEE:网络原理之TCP/IP

    文章目录 一.网络基础 1.认识 IP 地址 2.子网掩码 3.认识 MAC 地址 4.总结IP地址和MAC地址 二.应用层重点协议 1.DNS 2.NAT 3.NAPT 4.HTTP/HTTPS 三 ...

  8. 路由nexthop下一跳blackhole属性

    以下添加blackhole属性的nexthop及相关路由. # ip -6 nexthop add id 1 blackhole # # ip nexthop id 1 blackhole # # i ...

  9. Java ee 网络层重点协议IP协议

    文章目录 一.认识IP地址 1)概念: 2)作用 3)格式 4)组成 二.IP协议报头结构: 1)4位版本:IP协议的版本号,当前只有两个取值,4和6 2)4位首部长度:表示当前IP协议报头是多长 3 ...

最新文章

  1. 为什么很多程序员面试造火箭,入职拧螺丝?
  2. Java SE7新特性之try-with-resources语句
  3. 什么能在main()函数之前或之后执行
  4. 利用 IHttpHandler 自定义 HTTP 处理程序
  5. 北京大学Cousera学习笔记--3-计算导论与C语言基础-第一讲.计算机的基本原理-计算机怎么计算-数的二进制...
  6. js将json数据保存到本地
  7. 架构师论坛 创业_我在早期创业时作为设计师学到的东西
  8. Centos7 Docker 安装与启动_入门试炼01
  9. Unity3D调用android方法(非插件方式)
  10. ps人像精修照片步骤_15天零基础自学PS!送你整套PS教程297集+视频+素材+源文件模板6...
  11. iosiOStextView实现文字高度自适应
  12. php拍照虚线上传图片,照片怎么添加白色虚线 给照片上的人物周围添加虚线描边效果|照片处理工具...
  13. 《游戏系统设计一》游戏任务系统太复杂,带你一步一步实现
  14. BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding
  15. 从零实现深度学习框架——前馈网络语言模型
  16. 金庸小说中的农业漏洞[ZT]
  17. 《2022,自我增值的7个好习惯》读书笔记
  18. 华为鸿蒙电视什么屏幕,荣耀智慧屏出世,鸿蒙真容貌!和智能电视究竟有什么区别?...
  19. java swing 跳转窗口_java swing 怎么实现点击按钮或者某个组件,跳转到另一个页面或者窗体?...
  20. python课设_校园一卡通

热门文章

  1. 1007 素数对猜想(素数分布规律)
  2. 10-性能测试之JMeter运行方式
  3. 动感单车花式动作123
  4. visual c++中如何改变窗口背景颜色
  5. 是的但一台计算机有一个包包,联想拯救者P2双肩包:能装2台笔记本电脑的双肩包...
  6. 什么是配置管理(原创)
  7. 苹果手机验证码自动填充两次bug
  8. Minecraft的世界生成过程(一)生成地形之前
  9. 09 | 设计模式之美——王争
  10. 【深度学习经典网络架构—10】:注意力模块之CBAM