Based on Stefano's recent patch for faster tests. Allow the user to specify which addresses are translated when used by the guest, rather than always being the gateway address or nothing. We also allow this remapping to go to the host's global address (more precisely the address assigned to the guest) rather than just host loopback. Along the way to implementing that make many changes to clarify what various addresses we track mean, fixing a number of small bugs as well. Paul, amongst other things, I think this will allow podman to (finally) nicely address #19213, picking an address to remap to the host's external address with --map-guest-addr, much like it already uses --dns-forward. Changes in v2: * Assorted minor stylistic fixes based on Stefano's review * Change name of the new options from --nat-* to --map-* * Shorten descriptions of new options in --help (leave the full text to the man page) * Add fix for the fact that changing MTU causes IPv6 to be temporarily deconfigured during perf tests David Gibson (23): treewide: Use "our address" instead of "forwarding address" util: Helper for formatting MAC addresses treewide: Rename MAC address fields for clarity treewide: Use struct assignment instead of memcpy() for IP addresses conf: Use array indices rather than pointers for DNS array slots conf: More accurately count entries added in get_dns() conf: Move DNS array bounds checks into add_dns[46] conf: Move adding of a nameserver from resolv.conf into subfunction conf: Correct setting of dns_match address in add_dns6() conf: Treat --dns addresses as guest visible addresses conf: Remove incorrect initialisation of addr_ll_seen util: Correct sock_l4() binding for link local addresses treewide: Change misleading 'addr_ll' name Clarify which addresses in ip[46]_ctx are meaningful where Initialise our_tap_ll to ip6.gw when suitable fwd: Helpers to clarify what host addresses aren't guest accessible fwd: Split notion of "our tap address" from gateway for IPv4 Don't take "our" MAC address from the host conf, fwd: Split notion of gateway/router from guest-visible host address test: Reconfigure IPv6 address after changing MTU conf: Allow address remapped to host to be configured fwd: Distinguish translatable from untranslatable addresses on inbound fwd, conf: Allow NAT of the guest's assigned address arp.c | 4 +- conf.c | 318 +++++++++++++++++++++++++----------------- dhcp.c | 21 +-- dhcpv6.c | 21 +-- flow.c | 72 +++++----- flow.h | 18 +-- fwd.c | 170 +++++++++++++++++----- icmp.c | 4 +- ndp.c | 9 +- passt.1 | 43 +++++- passt.c | 2 +- passt.h | 53 +++++-- pasta.c | 14 +- tap.c | 12 +- tcp.c | 33 ++--- tcp_internal.h | 2 +- test/lib/setup | 11 +- test/passt_in_ns/dhcp | 73 ++++++++++ test/passt_in_ns/tcp | 38 +++-- test/passt_in_ns/udp | 22 +-- test/perf/passt_tcp | 37 ++--- test/perf/passt_udp | 31 ++-- test/perf/pasta_tcp | 29 ++-- test/perf/pasta_udp | 25 ++-- test/run | 4 +- udp.c | 12 +- util.c | 22 ++- util.h | 4 +- 28 files changed, 712 insertions(+), 392 deletions(-) create mode 100644 test/passt_in_ns/dhcp -- 2.46.0
The term "forwarding address" to indicate the local-to-passt address was well-intentioned, but ends up being kinda confusing. As discussed on a recent call, let's try "our" instead. (While we're there correct an error in flow_initiate_af()s comments where we referred to parameters by the wrong name). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- flow.c | 72 +++++++++++++++++++++++++------------------------- flow.h | 18 ++++++------- fwd.c | 70 ++++++++++++++++++++++++------------------------ icmp.c | 4 +-- tcp.c | 33 ++++++++++++----------- tcp_internal.h | 2 +- udp.c | 12 ++++----- 7 files changed, 106 insertions(+), 105 deletions(-) diff --git a/flow.c b/flow.c index 93b687dc..02631eb2 100644 --- a/flow.c +++ b/flow.c @@ -127,18 +127,18 @@ static struct timespec flow_timer_run; * @af: Address family (AF_INET or AF_INET6) * @eaddr: Endpoint address (pointer to in_addr or in6_addr) * @eport: Endpoint port - * @faddr: Forwarding address (pointer to in_addr or in6_addr) - * @fport: Forwarding port + * @oaddr: Our address (pointer to in_addr or in6_addr) + * @oport: Our port */ static void flowside_from_af(struct flowside *side, sa_family_t af, const void *eaddr, in_port_t eport, - const void *faddr, in_port_t fport) + const void *oaddr, in_port_t oport) { - if (faddr) - inany_from_af(&side->faddr, af, faddr); + if (oaddr) + inany_from_af(&side->oaddr, af, oaddr); else - side->faddr = inany_any6; - side->fport = fport; + side->oaddr = inany_any6; + side->oport = oport; if (eaddr) inany_from_af(&side->eaddr, af, eaddr); @@ -193,8 +193,8 @@ static int flowside_sock_splice(void *arg) * @tgt: Target flowside * @data: epoll reference portion for protocol handlers * - * Return: socket fd of protocol @proto bound to the forwarding address and port - * from @tgt (if specified). + * Return: socket fd of protocol @proto bound to our address and port from @tgt + * (if specified). */ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, const struct flowside *tgt, uint32_t data) @@ -205,11 +205,11 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, ASSERT(pif_is_socket(pif)); - pif_sockaddr(c, &sa, &sl, pif, &tgt->faddr, tgt->fport); + pif_sockaddr(c, &sa, &sl, pif, &tgt->oaddr, tgt->oport); switch (pif) { case PIF_HOST: - if (inany_is_loopback(&tgt->faddr)) + if (inany_is_loopback(&tgt->oaddr)) ifname = NULL; else if (sa.sa_family == AF_INET) ifname = c->ip4.ifname_out; @@ -309,11 +309,11 @@ static void flow_set_state(struct flow_common *f, enum flow_state state) pif_name(f->pif[INISIDE]), inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), ini->eport, - inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)), - ini->fport, + inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), + ini->oport, pif_name(f->pif[TGTSIDE]), - inany_ntop(&tgt->faddr, fstr1, sizeof(fstr1)), - tgt->fport, + inany_ntop(&tgt->oaddr, fstr1, sizeof(fstr1)), + tgt->oport, inany_ntop(&tgt->eaddr, estr1, sizeof(estr1)), tgt->eport); else if (MAX(state, oldstate) >= FLOW_STATE_INI) @@ -321,8 +321,8 @@ static void flow_set_state(struct flow_common *f, enum flow_state state) pif_name(f->pif[INISIDE]), inany_ntop(&ini->eaddr, estr0, sizeof(estr0)), ini->eport, - inany_ntop(&ini->faddr, fstr0, sizeof(fstr0)), - ini->fport); + inany_ntop(&ini->oaddr, fstr0, sizeof(fstr0)), + ini->oport); } /** @@ -347,7 +347,7 @@ static void flow_initiate_(union flow *flow, uint8_t pif) * flow_initiate_af() - Move flow to INI, setting INISIDE details * @flow: Flow to change state * @pif: pif of the initiating side - * @af: Address family of @eaddr and @faddr + * @af: Address family of @saddr and @daddr * @saddr: Source address (pointer to in_addr or in6_addr) * @sport: Endpoint port * @daddr: Destination address (pointer to in_addr or in6_addr) @@ -384,10 +384,10 @@ const struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif, inany_from_sockaddr(&ini->eaddr, &ini->eport, ssa); if (inany_v4(&ini->eaddr)) - ini->faddr = inany_any4; + ini->oaddr = inany_any4; else - ini->faddr = inany_any6; - ini->fport = dport; + ini->oaddr = inany_any6; + ini->oport = dport; flow_initiate_(flow, pif); return ini; } @@ -432,8 +432,8 @@ const struct flowside *flow_target(const struct ctx *c, union flow *flow, pif_name(f->pif[INISIDE]), inany_ntop(&ini->eaddr, estr, sizeof(estr)), ini->eport, - inany_ntop(&ini->faddr, fstr, sizeof(fstr)), - ini->fport); + inany_ntop(&ini->oaddr, fstr, sizeof(fstr)), + ini->oport); } if (tgtpif == PIF_NONE) @@ -561,12 +561,12 @@ static uint64_t flow_hash(const struct ctx *c, uint8_t proto, uint8_t pif, { struct siphash_state state = SIPHASH_INIT(c->hash_secret); - inany_siphash_feed(&state, &side->faddr); + inany_siphash_feed(&state, &side->oaddr); inany_siphash_feed(&state, &side->eaddr); return siphash_final(&state, 38, (uint64_t)proto << 40 | (uint64_t)pif << 32 | - (uint64_t)side->fport << 16 | + (uint64_t)side->oport << 16 | (uint64_t)side->eport); } @@ -587,7 +587,7 @@ static uint64_t flow_sidx_hash(const struct ctx *c, flow_sidx_t sidx) * information, and at least a forwarding port. */ ASSERT(pif != PIF_NONE && !inany_is_unspecified(&side->eaddr) && - side->eport != 0 && side->fport != 0); + side->eport != 0 && side->oport != 0); return flow_hash(c, FLOW_PROTO(f), pif, side); } @@ -709,20 +709,20 @@ static flow_sidx_t flowside_lookup(const struct ctx *c, uint8_t proto, * @pif: Interface of the flow * @af: Address family, AF_INET or AF_INET6 * @eaddr: Guest side endpoint address (guest local address) - * @faddr: Guest side forwarding address (guest remote address) + * @oaddr: Our guest side address (guest remote address) * @eport: Guest side endpoint port (guest local port) - * @fport: Guest side forwarding port (guest remote port) + * @oport: Our guest side port (guest remote port) * * Return: sidx of the matching flow & side, FLOW_SIDX_NONE if not found */ flow_sidx_t flow_lookup_af(const struct ctx *c, uint8_t proto, uint8_t pif, sa_family_t af, - const void *eaddr, const void *faddr, - in_port_t eport, in_port_t fport) + const void *eaddr, const void *oaddr, + in_port_t eport, in_port_t oport) { struct flowside side; - flowside_from_af(&side, af, eaddr, eport, faddr, fport); + flowside_from_af(&side, af, eaddr, eport, oaddr, oport); return flowside_lookup(c, proto, pif, &side); } @@ -732,22 +732,22 @@ flow_sidx_t flow_lookup_af(const struct ctx *c, * @proto: Protocol of the flow (IP L4 protocol number) * @pif: Interface of the flow * @esa: Socket address of the endpoint - * @fport: Forwarding port number + * @oport: Our port number * * Return: sidx of the matching flow & side, FLOW_SIDX_NONE if not found */ flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif, - const void *esa, in_port_t fport) + const void *esa, in_port_t oport) { struct flowside side = { - .fport = fport, + .oport = oport, }; inany_from_sockaddr(&side.eaddr, &side.eport, esa); if (inany_v4(&side.eaddr)) - side.faddr = inany_any4; + side.oaddr = inany_any4; else - side.faddr = inany_any6; + side.oaddr = inany_any6; return flowside_lookup(c, proto, pif, &side); } diff --git a/flow.h b/flow.h index 078fd605..d167b654 100644 --- a/flow.h +++ b/flow.h @@ -140,14 +140,14 @@ extern const uint8_t flow_proto[]; /** * struct flowside - Address information for one side of a flow * @eaddr: Endpoint address (remote address from passt's PoV) - * @faddr: Forwarding address (local address from passt's PoV) + * @oaddr: Our address (local address from passt's PoV) * @eport: Endpoint port - * @fport: Forwarding port + * @oport: Our port */ struct flowside { - union inany_addr faddr; + union inany_addr oaddr; union inany_addr eaddr; - in_port_t fport; + in_port_t oport; in_port_t eport; }; @@ -162,8 +162,8 @@ static inline bool flowside_eq(const struct flowside *left, { return inany_equals(&left->eaddr, &right->eaddr) && left->eport == right->eport && - inany_equals(&left->faddr, &right->faddr) && - left->fport == right->fport; + inany_equals(&left->oaddr, &right->oaddr) && + left->oport == right->oport; } int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, @@ -240,10 +240,10 @@ uint64_t flow_hash_insert(const struct ctx *c, flow_sidx_t sidx); void flow_hash_remove(const struct ctx *c, flow_sidx_t sidx); flow_sidx_t flow_lookup_af(const struct ctx *c, uint8_t proto, uint8_t pif, sa_family_t af, - const void *eaddr, const void *faddr, - in_port_t eport, in_port_t fport); + const void *eaddr, const void *oaddr, + in_port_t eport, in_port_t oport); flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif, - const void *esa, in_port_t fport); + const void *esa, in_port_t oport); union flow; diff --git a/fwd.c b/fwd.c index dea36f6c..b546bc41 100644 --- a/fwd.c +++ b/fwd.c @@ -167,7 +167,7 @@ void fwd_scan_ports_init(struct ctx *c) static bool is_dns_flow(uint8_t proto, const struct flowside *ini) { return ((proto == IPPROTO_UDP) || (proto == IPPROTO_TCP)) && - ((ini->fport == 53) || (ini->fport == 853)); + ((ini->oport == 53) || (ini->oport == 853)); } /** @@ -184,33 +184,33 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt) { if (is_dns_flow(proto, ini) && - inany_equals4(&ini->faddr, &c->ip4.dns_match)) + inany_equals4(&ini->oaddr, &c->ip4.dns_match)) tgt->eaddr = inany_from_v4(c->ip4.dns_host); else if (is_dns_flow(proto, ini) && - inany_equals6(&ini->faddr, &c->ip6.dns_match)) + inany_equals6(&ini->oaddr, &c->ip6.dns_match)) tgt->eaddr.a6 = c->ip6.dns_host; - else if (!c->no_map_gw && inany_equals4(&ini->faddr, &c->ip4.gw)) + else if (!c->no_map_gw && inany_equals4(&ini->oaddr, &c->ip4.gw)) tgt->eaddr = inany_loopback4; - else if (!c->no_map_gw && inany_equals6(&ini->faddr, &c->ip6.gw)) + else if (!c->no_map_gw && inany_equals6(&ini->oaddr, &c->ip6.gw)) tgt->eaddr = inany_loopback6; else - tgt->eaddr = ini->faddr; + tgt->eaddr = ini->oaddr; - tgt->eport = ini->fport; + tgt->eport = ini->oport; /* The relevant addr_out controls the host side source address. This * may be unspecified, which allows the kernel to pick an address. */ if (inany_v4(&tgt->eaddr)) - tgt->faddr = inany_from_v4(c->ip4.addr_out); + tgt->oaddr = inany_from_v4(c->ip4.addr_out); else - tgt->faddr.a6 = c->ip6.addr_out; + tgt->oaddr.a6 = c->ip6.addr_out; /* Let the kernel pick a host side source port */ - tgt->fport = 0; + tgt->oport = 0; if (proto == IPPROTO_UDP) { /* But for UDP we preserve the source port */ - tgt->fport = ini->eport; + tgt->oport = ini->eport; } return PIF_HOST; @@ -230,13 +230,13 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt) { if (!inany_is_loopback(&ini->eaddr) || - (!inany_is_loopback(&ini->faddr) && !inany_is_unspecified(&ini->faddr))) { + (!inany_is_loopback(&ini->oaddr) && !inany_is_unspecified(&ini->oaddr))) { char estr[INANY_ADDRSTRLEN], fstr[INANY_ADDRSTRLEN]; debug("Non loopback address on %s: [%s]:%hu -> [%s]:%hu", pif_name(PIF_SPLICE), inany_ntop(&ini->eaddr, estr, sizeof(estr)), ini->eport, - inany_ntop(&ini->faddr, fstr, sizeof(fstr)), ini->fport); + inany_ntop(&ini->oaddr, fstr, sizeof(fstr)), ini->oport); return PIF_NONE; } @@ -248,20 +248,20 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, /* Preserve the specific loopback adddress used, but let the kernel pick * a source port on the target side */ - tgt->faddr = ini->eaddr; - tgt->fport = 0; + tgt->oaddr = ini->eaddr; + tgt->oport = 0; - tgt->eport = ini->fport; + tgt->eport = ini->oport; if (proto == IPPROTO_TCP) tgt->eport += c->tcp.fwd_out.delta[tgt->eport]; else if (proto == IPPROTO_UDP) tgt->eport += c->udp.fwd_out.delta[tgt->eport]; /* Let the kernel pick a host side source port */ - tgt->fport = 0; + tgt->oport = 0; if (proto == IPPROTO_UDP) /* But for UDP preserve the source port */ - tgt->fport = ini->eport; + tgt->oport = ini->eport; return PIF_HOST; } @@ -280,7 +280,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt) { /* Common for spliced and non-spliced cases */ - tgt->eport = ini->fport; + tgt->eport = ini->oport; if (proto == IPPROTO_TCP) tgt->eport += c->tcp.fwd_in.delta[tgt->eport]; else if (proto == IPPROTO_UDP) @@ -293,11 +293,11 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, /* Preserve the specific loopback adddress used, but let the * kernel pick a source port on the target side */ - tgt->faddr = ini->eaddr; - tgt->fport = 0; + tgt->oaddr = ini->eaddr; + tgt->oport = 0; if (proto == IPPROTO_UDP) /* But for UDP preserve the source port */ - tgt->fport = ini->eport; + tgt->oport = ini->eport; if (inany_v4(&ini->eaddr)) tgt->eaddr = inany_loopback4; @@ -307,26 +307,26 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, return PIF_SPLICE; } - tgt->faddr = ini->eaddr; - tgt->fport = ini->eport; + tgt->oaddr = ini->eaddr; + tgt->oport = ini->eport; - if (inany_is_loopback4(&tgt->faddr) || - inany_is_unspecified4(&tgt->faddr) || - inany_equals4(&tgt->faddr, &c->ip4.addr_seen)) { - tgt->faddr = inany_from_v4(c->ip4.gw); - } else if (inany_is_loopback6(&tgt->faddr) || - inany_equals6(&tgt->faddr, &c->ip6.addr_seen) || - inany_equals6(&tgt->faddr, &c->ip6.addr)) { + if (inany_is_loopback4(&tgt->oaddr) || + inany_is_unspecified4(&tgt->oaddr) || + inany_equals4(&tgt->oaddr, &c->ip4.addr_seen)) { + tgt->oaddr = inany_from_v4(c->ip4.gw); + } else if (inany_is_loopback6(&tgt->oaddr) || + inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) || + inany_equals6(&tgt->oaddr, &c->ip6.addr)) { if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - tgt->faddr.a6 = c->ip6.gw; + tgt->oaddr.a6 = c->ip6.gw; else - tgt->faddr.a6 = c->ip6.addr_ll; + tgt->oaddr.a6 = c->ip6.addr_ll; } - if (inany_v4(&tgt->faddr)) { + if (inany_v4(&tgt->oaddr)) { tgt->eaddr = inany_from_v4(c->ip4.addr_seen); } else { - if (inany_is_linklocal6(&tgt->faddr)) + if (inany_is_linklocal6(&tgt->oaddr)) tgt->eaddr.a6 = c->ip6.addr_ll_seen; else tgt->eaddr.a6 = c->ip6.addr_seen; diff --git a/icmp.c b/icmp.c index cb81c768..f514dbc9 100644 --- a/icmp.c +++ b/icmp.c @@ -125,13 +125,13 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref) ini->eport, seq); if (pingf->f.type == FLOW_PING4) { - const struct in_addr *saddr = inany_v4(&ini->faddr); + const struct in_addr *saddr = inany_v4(&ini->oaddr); const struct in_addr *daddr = inany_v4(&ini->eaddr); ASSERT(saddr && daddr); /* Must have IPv4 addresses */ tap_icmp4_send(c, *saddr, *daddr, buf, n); } else if (pingf->f.type == FLOW_PING6) { - const struct in6_addr *saddr = &ini->faddr.a6; + const struct in6_addr *saddr = &ini->oaddr.a6; const struct in6_addr *daddr = &ini->eaddr.a6; tap_icmp6_send(c, saddr, daddr, buf, n); diff --git a/tcp.c b/tcp.c index c0820ce7..f01fe8f9 100644 --- a/tcp.c +++ b/tcp.c @@ -361,8 +361,8 @@ static const char *tcp_flag_str[] __attribute((__unused__)) = { static int tcp_sock_init_ext [NUM_PORTS][IP_VERSIONS]; static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; -/* Table of guest side forwarding addresses with very low RTT (assumed - * to be local to the host), LRU +/* Table of our guest side addresses with very low RTT (assumed to be local to + * the host), LRU */ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; @@ -663,7 +663,7 @@ static int tcp_rtt_dst_low(const struct tcp_tap_conn *conn) int i; for (i = 0; i < LOW_RTT_TABLE_SIZE; i++) - if (inany_equals(&tapside->faddr, low_rtt_dst + i)) + if (inany_equals(&tapside->oaddr, low_rtt_dst + i)) return 1; return 0; @@ -686,7 +686,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_conn *conn, return; for (i = 0; i < LOW_RTT_TABLE_SIZE; i++) { - if (inany_equals(&tapside->faddr, low_rtt_dst + i)) + if (inany_equals(&tapside->oaddr, low_rtt_dst + i)) return; if (hole == -1 && IN6_IS_ADDR_UNSPECIFIED(low_rtt_dst + i)) hole = i; @@ -698,7 +698,7 @@ static void tcp_rtt_dst_check(const struct tcp_tap_conn *conn, if (hole == -1) return; - low_rtt_dst[hole++] = tapside->faddr; + low_rtt_dst[hole++] = tapside->oaddr; if (hole == LOW_RTT_TABLE_SIZE) hole = 0; inany_from_af(low_rtt_dst + hole, AF_INET6, &in6addr_any); @@ -881,7 +881,7 @@ static void tcp_fill_header(struct tcphdr *th, { const struct flowside *tapside = TAPFLOW(conn); - th->source = htons(tapside->fport); + th->source = htons(tapside->oport); th->dest = htons(tapside->eport); th->seq = htonl(seq); th->ack_seq = htonl(conn->seq_ack_to_tap); @@ -913,7 +913,7 @@ static size_t tcp_fill_headers4(const struct tcp_tap_conn *conn, uint32_t seq) { const struct flowside *tapside = TAPFLOW(conn); - const struct in_addr *src4 = inany_v4(&tapside->faddr); + const struct in_addr *src4 = inany_v4(&tapside->oaddr); const struct in_addr *dst4 = inany_v4(&tapside->eaddr); size_t l4len = dlen + sizeof(*th); size_t l3len = l4len + sizeof(*iph); @@ -957,7 +957,7 @@ static size_t tcp_fill_headers6(const struct tcp_tap_conn *conn, size_t l4len = dlen + sizeof(*th); ip6h->payload_len = htons(l4len); - ip6h->saddr = tapside->faddr.a6; + ip6h->saddr = tapside->oaddr.a6; ip6h->daddr = tapside->eaddr.a6; ip6h->hop_limit = 255; @@ -992,7 +992,7 @@ size_t tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn, const uint16_t *check, uint32_t seq) { const struct flowside *tapside = TAPFLOW(conn); - const struct in_addr *a4 = inany_v4(&tapside->faddr); + const struct in_addr *a4 = inany_v4(&tapside->oaddr); if (a4) { return tcp_fill_headers4(conn, iov[TCP_IOV_TAP].iov_base, @@ -1417,15 +1417,15 @@ static void tcp_bind_outbound(const struct ctx *c, socklen_t sl; - pif_sockaddr(c, &bind_sa, &sl, PIF_HOST, &tgt->faddr, tgt->fport); - if (!inany_is_unspecified(&tgt->faddr) || tgt->fport) { + pif_sockaddr(c, &bind_sa, &sl, PIF_HOST, &tgt->oaddr, tgt->oport); + if (!inany_is_unspecified(&tgt->oaddr) || tgt->oport) { if (bind(s, &bind_sa.sa, sl)) { char sstr[INANY_ADDRSTRLEN]; flow_dbg(conn, "Can't bind TCP outbound socket to %s:%hu: %s", - inany_ntop(&tgt->faddr, sstr, sizeof(sstr)), - tgt->fport, strerror(errno)); + inany_ntop(&tgt->oaddr, sstr, sizeof(sstr)), + tgt->oport, strerror(errno)); } } @@ -1497,12 +1497,12 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, conn = FLOW_SET_TYPE(flow, FLOW_TCP, tcp); if (!inany_is_unicast(&ini->eaddr) || ini->eport == 0 || - !inany_is_unicast(&ini->faddr) || ini->fport == 0) { + !inany_is_unicast(&ini->oaddr) || ini->oport == 0) { char sstr[INANY_ADDRSTRLEN], dstr[INANY_ADDRSTRLEN]; debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu", inany_ntop(&ini->eaddr, sstr, sizeof(sstr)), ini->eport, - inany_ntop(&ini->faddr, dstr, sizeof(dstr)), ini->fport); + inany_ntop(&ini->oaddr, dstr, sizeof(dstr)), ini->oport); goto cancel; } @@ -2100,7 +2100,8 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref, goto cancel; /* FIXME: When listening port has a specific bound address, record that - * as the forwarding address */ + * as our address + */ ini = flow_initiate_sa(flow, ref.tcp_listen.pif, &sa, ref.tcp_listen.port); diff --git a/tcp_internal.h b/tcp_internal.h index 8b60aabc..aa8bb64f 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -44,7 +44,7 @@ #define TAPFLOW(conn_) (&((conn_)->f.side[TAPSIDE(conn_)])) #define TAP_SIDX(conn_) (FLOW_SIDX((conn_), TAPSIDE(conn_))) -#define CONN_V4(conn) (!!inany_v4(&TAPFLOW(conn)->faddr)) +#define CONN_V4(conn) (!!inany_v4(&TAPFLOW(conn)->oaddr)) #define CONN_V6(conn) (!CONN_V4(conn)) /* diff --git a/udp.c b/udp.c index 77312572..57dcc667 100644 --- a/udp.c +++ b/udp.c @@ -321,7 +321,7 @@ static void udp_splice_send(const struct ctx *c, size_t start, size_t n, static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp, const struct flowside *toside, size_t dlen) { - const struct in_addr *src = inany_v4(&toside->faddr); + const struct in_addr *src = inany_v4(&toside->oaddr); const struct in_addr *dst = inany_v4(&toside->eaddr); size_t l4len = dlen + sizeof(bp->uh); size_t l3len = l4len + sizeof(*ip4h); @@ -333,7 +333,7 @@ static size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp, ip4h->saddr = src->s_addr; ip4h->check = csum_ip4_header(l3len, IPPROTO_UDP, *src, *dst); - bp->uh.source = htons(toside->fport); + bp->uh.source = htons(toside->oport); bp->uh.dest = htons(toside->eport); bp->uh.len = htons(l4len); csum_udp4(&bp->uh, *src, *dst, bp->data, dlen); @@ -357,15 +357,15 @@ static size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, ip6h->payload_len = htons(l4len); ip6h->daddr = toside->eaddr.a6; - ip6h->saddr = toside->faddr.a6; + ip6h->saddr = toside->oaddr.a6; ip6h->version = 6; ip6h->nexthdr = IPPROTO_UDP; ip6h->hop_limit = 255; - bp->uh.source = htons(toside->fport); + bp->uh.source = htons(toside->oport); bp->uh.dest = htons(toside->eport); bp->uh.len = ip6h->payload_len; - csum_udp6(&bp->uh, &toside->faddr.a6, &toside->eaddr.a6, bp->data, dlen); + csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, bp->data, dlen); return l4len; } @@ -384,7 +384,7 @@ static void udp_tap_prepare(const struct mmsghdr *mmh, unsigned idx, struct udp_meta_t *bm = &udp_meta[idx]; size_t l4len; - if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->faddr)) { + if (!inany_v4(&toside->eaddr) || !inany_v4(&toside->oaddr)) { l4len = udp_update_hdr6(&bm->ip6h, bp, toside, mmh[idx].msg_len); tap_hdr_update(&bm->taph, l4len + sizeof(bm->ip6h) + sizeof(udp6_eth_hdr)); -- 2.46.0
There are a couple of places where we somewhat messily open code formatting an Ethernet like MAC address for display. Add an eth_ntop() helper for this. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 7 +++---- dhcp.c | 5 ++--- util.c | 19 +++++++++++++++++++ util.h | 3 +++ 4 files changed, 27 insertions(+), 7 deletions(-) diff --git a/conf.c b/conf.c index ed097bdc..830f91a6 100644 --- a/conf.c +++ b/conf.c @@ -921,7 +921,8 @@ pasta_opts: */ static void conf_print(const struct ctx *c) { - char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN], ifn[IFNAMSIZ]; + char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN]; + char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ]; int i; info("Template interface: %s%s%s%s%s", @@ -955,9 +956,7 @@ static void conf_print(const struct ctx *c) info("Namespace interface: %s", c->pasta_ifn); info("MAC:"); - info(" host: %02x:%02x:%02x:%02x:%02x:%02x", - c->mac[0], c->mac[1], c->mac[2], - c->mac[3], c->mac[4], c->mac[5]); + info(" host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac))); if (c->ifi4) { if (!c->no_dhcp) { diff --git a/dhcp.c b/dhcp.c index aa9f59da..acc5b03e 100644 --- a/dhcp.c +++ b/dhcp.c @@ -276,6 +276,7 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len) int dhcp(const struct ctx *c, const struct pool *p) { size_t mlen, dlen, offset = 0, opt_len, opt_off = 0; + char macstr[ETH_ADDRSTRLEN]; const struct ethhdr *eh; const struct iphdr *iph; const struct udphdr *uh; @@ -340,9 +341,7 @@ int dhcp(const struct ctx *c, const struct pool *p) return -1; } - info(" from %02x:%02x:%02x:%02x:%02x:%02x", - m->chaddr[0], m->chaddr[1], m->chaddr[2], - m->chaddr[3], m->chaddr[4], m->chaddr[5]); + info(" from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr))); m->yiaddr = c->ip4.addr; mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len)); diff --git a/util.c b/util.c index 0b414045..7cb2c10f 100644 --- a/util.c +++ b/util.c @@ -676,6 +676,25 @@ const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size) return dst; } +/** eth_ntop() - Convert an Ethernet MAC address to text format + * @mac: MAC address + * @dst: Output buffer, minimum ETH_ADDRSTRLEN bytes + * @size: Size of buffer at @dst + * + * Return: On success, a non-null pointer to @dst, NULL on failure + */ +const char *eth_ntop(const unsigned char *mac, char *dst, size_t size) +{ + int len; + + len = snprintf(dst, size, "%02x:%02x:%02x:%02x:%02x:%02x", + mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]); + if (len < 0 || (size_t)len >= size) + return NULL; + + return dst; +} + /** str_ee_origin() - Convert socket extended error origin to a string * @ee: Socket extended error structure * diff --git a/util.h b/util.h index cb4d181c..a716849a 100644 --- a/util.h +++ b/util.h @@ -215,9 +215,12 @@ static inline const char *af_name(sa_family_t af) #define SOCKADDR_STRLEN MAX(SOCKADDR_INET_STRLEN, SOCKADDR_INET6_STRLEN) +#define ETH_ADDRSTRLEN (sizeof("00:11:22:33:44:55")) + struct sock_extended_err; const char *sockaddr_ntop(const void *sa, char *dst, socklen_t size); +const char *eth_ntop(const unsigned char *mac, char *dst, size_t size); const char *str_ee_origin(const struct sock_extended_err *ee); /** -- 2.46.0
c->mac isn't a great name, because it doesn't say whose mac address it is and it's not necessarily obvious in all the contexts we use it. Since this is specifically the address that we (passt/pasta) use on the tap interface, rename it to "our_tap_mac". Rename the "mac_guest" field to "guest_mac" to be grammatically consistent. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- arp.c | 4 ++-- conf.c | 10 +++++----- dhcpv6.c | 6 ++++-- ndp.c | 4 ++-- passt.c | 2 +- passt.h | 8 ++++---- pasta.c | 8 ++++---- tap.c | 12 ++++++------ 8 files changed, 28 insertions(+), 26 deletions(-) diff --git a/arp.c b/arp.c index 93b22c5d..53334dac 100644 --- a/arp.c +++ b/arp.c @@ -72,7 +72,7 @@ int arp(const struct ctx *c, const struct pool *p) ah->ar_op = htons(ARPOP_REPLY); memcpy(am->tha, am->sha, sizeof(am->tha)); - memcpy(am->sha, c->mac, sizeof(am->sha)); + memcpy(am->sha, c->our_tap_mac, sizeof(am->sha)); memcpy(swap, am->tip, sizeof(am->tip)); memcpy(am->tip, am->sip, sizeof(am->tip)); @@ -80,7 +80,7 @@ int arp(const struct ctx *c, const struct pool *p) l2len = sizeof(*eh) + sizeof(*ah) + sizeof(*am); memcpy(eh->h_dest, eh->h_source, sizeof(eh->h_dest)); - memcpy(eh->h_source, c->mac, sizeof(eh->h_source)); + memcpy(eh->h_source, c->our_tap_mac, sizeof(eh->h_source)); tap_send_single(c, eh, l2len); diff --git a/conf.c b/conf.c index 830f91a6..750fdc86 100644 --- a/conf.c +++ b/conf.c @@ -956,7 +956,7 @@ static void conf_print(const struct ctx *c) info("Namespace interface: %s", c->pasta_ifn); info("MAC:"); - info(" host: %s", eth_ntop(c->mac, bufmac, sizeof(bufmac))); + info(" host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac))); if (c->ifi4) { if (!c->no_dhcp) { @@ -1289,7 +1289,7 @@ void conf(struct ctx *c, int argc, char **argv) if (c->mode != MODE_PASTA) die("--ns-mac-addr is for pasta mode only"); - parse_mac(c->mac_guest, optarg); + parse_mac(c->guest_mac, optarg); break; case 5: if (c->mode != MODE_PASTA) @@ -1500,7 +1500,7 @@ void conf(struct ctx *c, int argc, char **argv) break; case 'M': - parse_mac(c->mac, optarg); + parse_mac(c->our_tap_mac, optarg); break; case 'g': if (inet_pton(AF_INET6, optarg, &c->ip6.gw) && @@ -1629,9 +1629,9 @@ void conf(struct ctx *c, int argc, char **argv) nl_sock_init(c, false); if (!v6_only) - c->ifi4 = conf_ip4(ifi4, &c->ip4, c->mac); + c->ifi4 = conf_ip4(ifi4, &c->ip4, c->our_tap_mac); if (!v4_only) - c->ifi6 = conf_ip6(ifi6, &c->ip6, c->mac); + c->ifi6 = conf_ip6(ifi6, &c->ip6, c->our_tap_mac); if ((!c->ifi4 && !c->ifi6) || (*c->ip4.ifname_out && !c->ifi4) || (*c->ip6.ifname_out && !c->ifi6)) diff --git a/dhcpv6.c b/dhcpv6.c index 7dcca2a7..bbed41dc 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -574,8 +574,10 @@ void dhcpv6_init(const struct ctx *c) resp.server_id.duid_time = duid_time; resp_not_on_link.server_id.duid_time = duid_time; - memcpy(resp.server_id.duid_lladdr, c->mac, sizeof(c->mac)); - memcpy(resp_not_on_link.server_id.duid_lladdr, c->mac, sizeof(c->mac)); + memcpy(resp.server_id.duid_lladdr, + c->our_tap_mac, sizeof(c->our_tap_mac)); + memcpy(resp_not_on_link.server_id.duid_lladdr, + c->our_tap_mac, sizeof(c->our_tap_mac)); resp.ia_addr.addr = c->ip6.addr; } diff --git a/ndp.c b/ndp.c index 6dcb4872..9c0fef4a 100644 --- a/ndp.c +++ b/ndp.c @@ -247,7 +247,7 @@ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr, memcpy(&na.target_addr, &ns->target_addr, sizeof(na.target_addr)); - memcpy(na.target_l2_addr.mac, c->mac, ETH_ALEN); + memcpy(na.target_l2_addr.mac, c->our_tap_mac, ETH_ALEN); } else if (ih->icmp6_type == RS) { size_t dns_s_len = 0; @@ -331,7 +331,7 @@ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr, } dns_done: - memcpy(&ra.source_ll.mac, c->mac, ETH_ALEN); + memcpy(&ra.source_ll.mac, c->our_tap_mac, ETH_ALEN); } else { return 1; } diff --git a/passt.c b/passt.c index 4b3c306e..96374831 100644 --- a/passt.c +++ b/passt.c @@ -272,7 +272,7 @@ int main(int argc, char **argv) if ((!c.no_udp && udp_init(&c)) || (!c.no_tcp && tcp_init(&c))) exit(EXIT_FAILURE); - proto_update_l2_buf(c.mac_guest, c.mac); + proto_update_l2_buf(c.guest_mac, c.our_tap_mac); if (c.ifi4 && !c.no_dhcp) dhcp_init(); diff --git a/passt.h b/passt.h index ef684037..fe3e47d2 100644 --- a/passt.h +++ b/passt.h @@ -172,8 +172,8 @@ struct ip6_ctx { * @epollfd: File descriptor for epoll instance * @fd_tap_listen: File descriptor for listening AF_UNIX socket, if any * @fd_tap: AF_UNIX socket, tuntap device, or pre-opened socket - * @mac: Host MAC address - * @mac_guest: MAC address of guest or namespace, seen or configured + * @our_tap_mac: Pasta/passt's MAC on the tap link + * @guest_mac: MAC address of guest or namespace, seen or configured * @hash_secret: 128-bit secret for siphash functions * @ifi4: Index of template interface for IPv4, 0 if IPv4 disabled * @ip: IPv4 configuration @@ -226,8 +226,8 @@ struct ctx { int epollfd; int fd_tap_listen; int fd_tap; - unsigned char mac[ETH_ALEN]; - unsigned char mac_guest[ETH_ALEN]; + unsigned char our_tap_mac[ETH_ALEN]; + unsigned char guest_mac[ETH_ALEN]; uint64_t hash_secret[2]; unsigned int ifi4; diff --git a/pasta.c b/pasta.c index 1142f032..5f897d3e 100644 --- a/pasta.c +++ b/pasta.c @@ -294,10 +294,10 @@ void pasta_ns_conf(struct ctx *c) strerror(-rc)); /* Get or set MAC in target namespace */ - if (MAC_IS_ZERO(c->mac_guest)) - nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest); + if (MAC_IS_ZERO(c->guest_mac)) + nl_link_get_mac(nl_sock_ns, c->pasta_ifi, c->guest_mac); else - rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->mac_guest); + rc = nl_link_set_mac(nl_sock_ns, c->pasta_ifi, c->guest_mac); if (rc < 0) die("Couldn't set MAC address in namespace: %s", strerror(-rc)); @@ -392,7 +392,7 @@ void pasta_ns_conf(struct ctx *c) } } - proto_update_l2_buf(c->mac_guest, NULL); + proto_update_l2_buf(c->guest_mac, NULL); } /** diff --git a/tap.c b/tap.c index 87be3a6b..852d8376 100644 --- a/tap.c +++ b/tap.c @@ -118,8 +118,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) struct ethhdr *eh = (struct ethhdr *)buf; /* TODO: ARP table lookup */ - memcpy(eh->h_dest, c->mac_guest, ETH_ALEN); - memcpy(eh->h_source, c->mac, ETH_ALEN); + memcpy(eh->h_dest, c->guest_mac, ETH_ALEN); + memcpy(eh->h_source, c->our_tap_mac, ETH_ALEN); eh->h_proto = ntohs(proto); return eh + 1; } @@ -946,9 +946,9 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p) eh = (struct ethhdr *)p; - if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { - memcpy(c->mac_guest, eh->h_source, ETH_ALEN); - proto_update_l2_buf(c->mac_guest, NULL); + if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) { + memcpy(c->guest_mac, eh->h_source, ETH_ALEN); + proto_update_l2_buf(c->guest_mac, NULL); } switch (ntohs(eh->h_proto)) { @@ -1337,6 +1337,6 @@ void tap_sock_init(struct ctx *c) * sends us packets. Use the broadcast address so that our * first packets will reach it. */ - memset(&c->mac_guest, 0xff, sizeof(c->mac_guest)); + memset(&c->guest_mac, 0xff, sizeof(c->guest_mac)); } } -- 2.46.0
We rely on C11 already, so we can use clearer and more type-checkable struct assignment instead of mempcy() for copying IP addresses around. This exposes some "pointer could be const" warnings from cppcheck, so address those too. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 12 ++++++------ dhcpv6.c | 10 ++++++---- 2 files changed, 12 insertions(+), 10 deletions(-) diff --git a/conf.c b/conf.c index 750fdc86..9b05afeb 100644 --- a/conf.c +++ b/conf.c @@ -389,14 +389,14 @@ static void add_dns6(struct ctx *c, /* Guest or container can only access local addresses via redirect */ if (IN6_IS_ADDR_LOOPBACK(addr)) { if (!c->no_map_gw) { - memcpy(*conf, &c->ip6.gw, sizeof(**conf)); + **conf = c->ip6.gw; (*conf)++; if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) - memcpy(&c->ip6.dns_match, addr, sizeof(*addr)); + c->ip6.dns_match = *addr; } } else { - memcpy(*conf, addr, sizeof(**conf)); + **conf = *addr; (*conf)++; } @@ -632,7 +632,7 @@ static unsigned int conf_ip4(unsigned int ifi, ip4->prefix_len = 32; } - memcpy(&ip4->addr_seen, &ip4->addr, sizeof(ip4->addr_seen)); + ip4->addr_seen = ip4->addr; if (MAC_IS_ZERO(mac)) { int rc = nl_link_get_mac(nl_sock, ifi, mac); @@ -693,8 +693,8 @@ static unsigned int conf_ip6(unsigned int ifi, return 0; } - memcpy(&ip6->addr_seen, &ip6->addr, sizeof(ip6->addr)); - memcpy(&ip6->addr_ll_seen, &ip6->addr_ll, sizeof(ip6->addr_ll)); + ip6->addr_seen = ip6->addr; + ip6->addr_ll_seen = ip6->addr_ll; if (MAC_IS_ZERO(mac)) { rc = nl_link_get_mac(nl_sock, ifi, mac); diff --git a/dhcpv6.c b/dhcpv6.c index bbed41dc..1c595fd7 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -298,7 +298,8 @@ static struct opt_hdr *dhcpv6_ia_notonlink(const struct pool *p, { char buf[INET6_ADDRSTRLEN]; struct in6_addr req_addr; - struct opt_hdr *ia, *h; + const struct opt_hdr *h; + struct opt_hdr *ia; size_t offset; int ia_type; @@ -312,12 +313,13 @@ ia_ta: offset += sizeof(struct opt_ia_na); while ((h = dhcpv6_opt(p, &offset, OPT_IAAADR))) { - struct opt_ia_addr *opt_addr = (struct opt_ia_addr *)h; + const struct opt_ia_addr *opt_addr; if (ntohs(h->l) != OPT_VSIZE(ia_addr)) return NULL; - memcpy(&req_addr, &opt_addr->addr, sizeof(req_addr)); + opt_addr = (const struct opt_ia_addr *)h; + req_addr = opt_addr->addr; if (!IN6_ARE_ADDR_EQUAL(la, &req_addr)) { info("DHCPv6: requested address %s not on link", inet_ntop(AF_INET6, &req_addr, @@ -363,7 +365,7 @@ static size_t dhcpv6_dns_fill(const struct ctx *c, char *buf, int offset) srv->hdr.l = 0; } - memcpy(&srv->addr[i], &c->ip6.dns[i], sizeof(srv->addr[i])); + srv->addr[i] = c->ip6.dns[i]; srv->hdr.l += sizeof(srv->addr[i]); offset += sizeof(srv->addr[i]); } -- 2.46.0
Currently add_dns[46]() take a somewhat awkward double pointer to the entry in the c->ip[46].dns array to update. It turns out to be easier to work with indices into that array instead. This diff does add some lines, but it's comments, and will allow some future code reductions. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 73 +++++++++++++++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 9b05afeb..2a52bc32 100644 --- a/conf.c +++ b/conf.c @@ -354,54 +354,65 @@ bind_all_fail: * add_dns4() - Possibly add the IPv4 address of a DNS resolver to configuration * @c: Execution context * @addr: Address found in /etc/resolv.conf - * @conf: Pointer to reference of current entry in array of IPv4 resolvers + * @idx: Index of free entry in array of IPv4 resolvers + * + * Return: Number of entries added (0 or 1) */ -static void add_dns4(struct ctx *c, const struct in_addr *addr, - struct in_addr **conf) +static unsigned add_dns4(struct ctx *c, const struct in_addr *addr, + unsigned idx) { + unsigned added = 0; + /* Guest or container can only access local addresses via redirect */ if (IN4_IS_ADDR_LOOPBACK(addr)) { if (!c->no_map_gw) { - **conf = c->ip4.gw; - (*conf)++; + c->ip4.dns[idx] = c->ip4.gw; + added++; if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match)) c->ip4.dns_match = c->ip4.gw; } } else { - **conf = *addr; - (*conf)++; + c->ip4.dns[idx] = *addr; + added++; } if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host)) c->ip4.dns_host = *addr; + + return added; } /** * add_dns6() - Possibly add the IPv6 address of a DNS resolver to configuration * @c: Execution context * @addr: Address found in /etc/resolv.conf - * @conf: Pointer to reference of current entry in array of IPv6 resolvers + * @idx: Index of free entry in array of IPv6 resolvers + * + * Return: Number of entries added (0 or 1) */ -static void add_dns6(struct ctx *c, - struct in6_addr *addr, struct in6_addr **conf) +static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx) { + unsigned added = 0; + /* Guest or container can only access local addresses via redirect */ if (IN6_IS_ADDR_LOOPBACK(addr)) { if (!c->no_map_gw) { - **conf = c->ip6.gw; - (*conf)++; + c->ip6.dns[idx] = c->ip6.gw; + added++; if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) c->ip6.dns_match = *addr; } } else { - **conf = *addr; - (*conf)++; + c->ip6.dns[idx] = *addr; + added++; } if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host)) c->ip6.dns_host = *addr; + + return added; } /** @@ -410,18 +421,19 @@ static void add_dns6(struct ctx *c, */ static void get_dns(struct ctx *c) { - struct in6_addr *dns6 = &c->ip6.dns[0], dns6_tmp; - struct in_addr *dns4 = &c->ip4.dns[0], dns4_tmp; int dns4_set, dns6_set, dnss_set, dns_set, fd; + unsigned dns4_idx = 0, dns6_idx = 0; struct fqdn *s = c->dns_search; struct lineread resolvconf; + struct in6_addr dns6_tmp; + struct in_addr dns4_tmp; unsigned int added = 0; ssize_t line_len; char *line, *end; const char *p; - dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(dns4); - dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(dns6); + dns4_set = !c->ifi4 || !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[0]); + dns6_set = !c->ifi6 || !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[0]); dnss_set = !!*s->n || c->no_dns_search; dns_set = (dns4_set && dns6_set) || c->no_dns; @@ -442,17 +454,15 @@ static void get_dns(struct ctx *c) if (end) *end = 0; - if (!dns4_set && - dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) - 1 + if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1 && inet_pton(AF_INET, p + 1, &dns4_tmp)) { - add_dns4(c, &dns4_tmp, &dns4); + dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); added++; } - if (!dns6_set && - dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) - 1 + if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1 && inet_pton(AF_INET6, p + 1, &dns6_tmp)) { - add_dns6(c, &dns6_tmp, &dns6); + dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); added++; } } else if (!dnss_set && strstr(line, "search ") == line && @@ -1236,8 +1246,7 @@ void conf(struct ctx *c, int argc, char **argv) bool copy_addrs_opt = false, copy_routes_opt = false; enum fwd_ports_mode fwd_default = FWD_NONE; bool v4_only = false, v6_only = false; - struct in6_addr *dns6 = c->ip6.dns; - struct in_addr *dns4 = c->ip4.dns; + unsigned dns4_idx = 0, dns6_idx = 0; struct fqdn *dnss = c->dns_search; unsigned int ifi4 = 0, ifi6 = 0; const char *logfile = NULL; @@ -1662,13 +1671,13 @@ void conf(struct ctx *c, int argc, char **argv) if (!strcmp(optarg, "none")) { c->no_dns = 1; - dns4 = &c->ip4.dns[0]; + dns4_idx = 0; memset(c->ip4.dns, 0, sizeof(c->ip4.dns)); c->ip4.dns[0] = (struct in_addr){ 0 }; c->ip4.dns_match = (struct in_addr){ 0 }; c->ip4.dns_host = (struct in_addr){ 0 }; - dns6 = &c->ip6.dns[0]; + dns6_idx = 0; memset(c->ip6.dns, 0, sizeof(c->ip6.dns)); c->ip6.dns_match = (struct in6_addr){ 0 }; c->ip6.dns_host = (struct in6_addr){ 0 }; @@ -1678,15 +1687,15 @@ void conf(struct ctx *c, int argc, char **argv) c->no_dns = 0; - if (dns4 - &c->ip4.dns[0] < ARRAY_SIZE(c->ip4.dns) && + if (dns4_idx < ARRAY_SIZE(c->ip4.dns) && inet_pton(AF_INET, optarg, &dns4_tmp)) { - add_dns4(c, &dns4_tmp, &dns4); + dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); continue; } - if (dns6 - &c->ip6.dns[0] < ARRAY_SIZE(c->ip6.dns) && + if (dns6_idx < ARRAY_SIZE(c->ip6.dns) && inet_pton(AF_INET6, optarg, &dns6_tmp)) { - add_dns6(c, &dns6_tmp, &dns6); + dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); continue; } -- 2.46.0
get_dns() counts the number of guest DNS servers it adds, and gives an error if it couldn't add any. However, this count ignores the fact that add_dns[46]() may in some cases *not* add an entry. Use the array indices we're already tracking to get an accurate count. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/conf.c b/conf.c index 2a52bc32..d19013c1 100644 --- a/conf.c +++ b/conf.c @@ -427,7 +427,6 @@ static void get_dns(struct ctx *c) struct lineread resolvconf; struct in6_addr dns6_tmp; struct in_addr dns4_tmp; - unsigned int added = 0; ssize_t line_len; char *line, *end; const char *p; @@ -455,16 +454,12 @@ static void get_dns(struct ctx *c) *end = 0; if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1 - && inet_pton(AF_INET, p + 1, &dns4_tmp)) { + && inet_pton(AF_INET, p + 1, &dns4_tmp)) dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); - added++; - } if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1 - && inet_pton(AF_INET6, p + 1, &dns6_tmp)) { + && inet_pton(AF_INET6, p + 1, &dns6_tmp)) dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); - added++; - } } else if (!dnss_set && strstr(line, "search ") == line && s == c->dns_search) { end = strpbrk(line, "\n"); @@ -491,7 +486,7 @@ static void get_dns(struct ctx *c) out: if (!dns_set) { - if (!added) + if (!(dns4_idx + dns6_idx)) warn("Couldn't get any nameserver address"); if (c->no_dhcp_dns) -- 2.46.0
Every time we call add_dns[46] we need to first check if there's space in the c->ip[46].dns array for the new entry. We might as well make that check in add_dns[46]() itself. In fact it looks like the calls in get_dns() had an off by one error, not allowing the last entry of the array to be filled. So, that bug is also fixed by the change. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/conf.c b/conf.c index d19013c1..dfe417b4 100644 --- a/conf.c +++ b/conf.c @@ -363,6 +363,9 @@ static unsigned add_dns4(struct ctx *c, const struct in_addr *addr, { unsigned added = 0; + if (idx >= ARRAY_SIZE(c->ip4.dns)) + return 0; + /* Guest or container can only access local addresses via redirect */ if (IN4_IS_ADDR_LOOPBACK(addr)) { if (!c->no_map_gw) { @@ -395,6 +398,9 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx) { unsigned added = 0; + if (idx >= ARRAY_SIZE(c->ip6.dns)) + return 0; + /* Guest or container can only access local addresses via redirect */ if (IN6_IS_ADDR_LOOPBACK(addr)) { if (!c->no_map_gw) { @@ -453,12 +459,10 @@ static void get_dns(struct ctx *c) if (end) *end = 0; - if (!dns4_set && dns4_idx < ARRAY_SIZE(c->ip4.dns) - 1 - && inet_pton(AF_INET, p + 1, &dns4_tmp)) + if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp)) dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); - if (!dns6_set && dns6_idx < ARRAY_SIZE(c->ip6.dns) - 1 - && inet_pton(AF_INET6, p + 1, &dns6_tmp)) + if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp)) dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); } else if (!dnss_set && strstr(line, "search ") == line && s == c->dns_search) { @@ -1682,14 +1686,12 @@ void conf(struct ctx *c, int argc, char **argv) c->no_dns = 0; - if (dns4_idx < ARRAY_SIZE(c->ip4.dns) && - inet_pton(AF_INET, optarg, &dns4_tmp)) { + if (inet_pton(AF_INET, optarg, &dns4_tmp)) { dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); continue; } - if (dns6_idx < ARRAY_SIZE(c->ip6.dns) && - inet_pton(AF_INET6, optarg, &dns6_tmp)) { + if (inet_pton(AF_INET6, optarg, &dns6_tmp)) { dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); continue; } -- 2.46.0
get_dns() is already quite deeply nested, and future changes I have in mind will add more complexity. Prepare for this by splitting out the adding of a single nameserver to the configuration into its own function. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/conf.c b/conf.c index dfe417b4..4bb06b78 100644 --- a/conf.c +++ b/conf.c @@ -421,6 +421,29 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx) return added; } +/** + * add_dns_resolv() - Possibly add ns from host resolv.conf to configuration + * @c: Execution context + * @nameserver: Nameserver address string from /etc/resolv.conf + * @idx4: Pointer to index of current entry in array of IPv4 resolvers + * @idx6: Pointer to index of current entry in array of IPv6 resolvers + * + * @idx4 or @idx6 may be NULL, in which case resolvers of the corresponding type + * are ignored. + */ +static void add_dns_resolv(struct ctx *c, const char *nameserver, + unsigned *idx4, unsigned *idx6) +{ + struct in6_addr ns6; + struct in_addr ns4; + + if (idx4 && inet_pton(AF_INET, nameserver, &ns4)) + *idx4 += add_dns4(c, &ns4, *idx4); + + if (idx6 && inet_pton(AF_INET6, nameserver, &ns6)) + *idx6 += add_dns6(c, &ns6, *idx6); +} + /** * get_dns() - Get nameserver addresses from local /etc/resolv.conf * @c: Execution context @@ -431,8 +454,6 @@ static void get_dns(struct ctx *c) unsigned dns4_idx = 0, dns6_idx = 0; struct fqdn *s = c->dns_search; struct lineread resolvconf; - struct in6_addr dns6_tmp; - struct in_addr dns4_tmp; ssize_t line_len; char *line, *end; const char *p; @@ -459,11 +480,9 @@ static void get_dns(struct ctx *c) if (end) *end = 0; - if (!dns4_set && inet_pton(AF_INET, p + 1, &dns4_tmp)) - dns4_idx += add_dns4(c, &dns4_tmp, dns4_idx); - - if (!dns6_set && inet_pton(AF_INET6, p + 1, &dns6_tmp)) - dns6_idx += add_dns6(c, &dns6_tmp, dns6_idx); + add_dns_resolv(c, p + 1, + dns4_set ? NULL : &dns4_idx, + dns6_set ? NULL : &dns6_idx); } else if (!dnss_set && strstr(line, "search ") == line && s == c->dns_search) { end = strpbrk(line, "\n"); -- 2.46.0
add_dns6() (but not add_dns4()) has a bug setting dns_match: it sets it to the given address, rather than the gateway address. This is doubly wrong: - We've just established the given address is a host loopback address the guest can't access - We've just set ip6.dns[] to tell the guest to use the gateway address, so it won't use the dns_match address we're setting Correct this to use the gateway address, like IPv4. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conf.c b/conf.c index 4bb06b78..bf61c143 100644 --- a/conf.c +++ b/conf.c @@ -408,7 +408,7 @@ static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx) added++; if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) - c->ip6.dns_match = *addr; + c->ip6.dns_match = c->ip6.gw; } } else { c->ip6.dns[idx] = *addr; -- 2.46.0
Although it's not 100% explicit in the man page, addresses given to the --dns option are intended to be addresses as seen by the guest. This differs from addresses taken from the host's /etc/resolv.conf, which must be translated to guest accessible versions in some cases. Our implementation is currently inconsistent on this: when using --dns-forward, you must usually also give --dns with the matching address, which is meaningful only in the guest's address view. However if you give --dns with a loopback addres, it will be translated like a host view address. Move the remapping logic for DNS addresses out of add_dns4() and add_dns6() into add_dns_resolv() so that it is only applied for host nameserver addresses, not for nameservers given explicitly with --dns. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 88 ++++++++++++++++++++++++++++----------------------------- passt.1 | 14 +++++---- 2 files changed, 52 insertions(+), 50 deletions(-) diff --git a/conf.c b/conf.c index bf61c143..3c102bcf 100644 --- a/conf.c +++ b/conf.c @@ -353,7 +353,7 @@ bind_all_fail: /** * add_dns4() - Possibly add the IPv4 address of a DNS resolver to configuration * @c: Execution context - * @addr: Address found in /etc/resolv.conf + * @addr: Guest nameserver IPv4 address * @idx: Index of free entry in array of IPv4 resolvers * * Return: Number of entries added (0 or 1) @@ -361,64 +361,29 @@ bind_all_fail: static unsigned add_dns4(struct ctx *c, const struct in_addr *addr, unsigned idx) { - unsigned added = 0; - if (idx >= ARRAY_SIZE(c->ip4.dns)) return 0; - /* Guest or container can only access local addresses via redirect */ - if (IN4_IS_ADDR_LOOPBACK(addr)) { - if (!c->no_map_gw) { - c->ip4.dns[idx] = c->ip4.gw; - added++; - - if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match)) - c->ip4.dns_match = c->ip4.gw; - } - } else { - c->ip4.dns[idx] = *addr; - added++; - } - - if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host)) - c->ip4.dns_host = *addr; - - return added; + c->ip4.dns[idx] = *addr; + return 1; } /** * add_dns6() - Possibly add the IPv6 address of a DNS resolver to configuration * @c: Execution context - * @addr: Address found in /etc/resolv.conf + * @addr: Guest nameserver IPv6 address * @idx: Index of free entry in array of IPv6 resolvers * * Return: Number of entries added (0 or 1) */ -static unsigned add_dns6(struct ctx *c, struct in6_addr *addr, unsigned idx) +static unsigned add_dns6(struct ctx *c, const struct in6_addr *addr, + unsigned idx) { - unsigned added = 0; - if (idx >= ARRAY_SIZE(c->ip6.dns)) return 0; - /* Guest or container can only access local addresses via redirect */ - if (IN6_IS_ADDR_LOOPBACK(addr)) { - if (!c->no_map_gw) { - c->ip6.dns[idx] = c->ip6.gw; - added++; - - if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) - c->ip6.dns_match = c->ip6.gw; - } - } else { - c->ip6.dns[idx] = *addr; - added++; - } - - if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host)) - c->ip6.dns_host = *addr; - - return added; + c->ip6.dns[idx] = *addr; + return 1; } /** @@ -437,11 +402,44 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver, struct in6_addr ns6; struct in_addr ns4; - if (idx4 && inet_pton(AF_INET, nameserver, &ns4)) + if (idx4 && inet_pton(AF_INET, nameserver, &ns4)) { + if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_host)) + c->ip4.dns_host = ns4; + + /* Guest or container can only access local addresses via + * redirect + */ + if (IN4_IS_ADDR_LOOPBACK(&ns4)) { + if (c->no_map_gw) + return; + + ns4 = c->ip4.gw; + if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match)) + c->ip4.dns_match = c->ip4.gw; + } + *idx4 += add_dns4(c, &ns4, *idx4); + } + + if (idx6 && inet_pton(AF_INET6, nameserver, &ns6)) { + if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_host)) + c->ip6.dns_host = ns6; + + /* Guest or container can only access local addresses via + * redirect + */ + if (IN6_IS_ADDR_LOOPBACK(&ns6)) { + if (c->no_map_gw) + return; + + ns6 = c->ip6.gw; + + if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) + c->ip6.dns_match = c->ip6.gw; + } - if (idx6 && inet_pton(AF_INET6, nameserver, &ns6)) *idx6 += add_dns6(c, &ns6, *idx6); + } } /** diff --git a/passt.1 b/passt.1 index 3062b719..dca433b6 100644 --- a/passt.1 +++ b/passt.1 @@ -236,11 +236,15 @@ interface will be chosen instead. .TP .BR \-D ", " \-\-dns " " \fIaddr -Use \fIaddr\fR (IPv4 or IPv6) for DHCP, DHCPv6, NDP or DNS forwarding, as -configured (see options \fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR, -\fB--dns-forward\fR) instead of reading addresses from \fI/etc/resolv.conf\fR. -This option can be specified multiple times. Specifying \fB-D none\fR disables -usage of DNS addresses altogether. +Instruct the guest (via DHCP, DHVPv6 or NDP) to use \fIaddr\fR (IPv4 +or IPv6) as a nameserver, as configured (see options +\fB--no-dhcp-dns\fR, \fB--dhcp-dns\fR) instead of reading addresses +from \fI/etc/resolv.conf\fR. This option can be specified multiple +times. Specifying \fB-D none\fR disables usage of DNS addresses +altogether. Unlike addresses from \fI/etc/resolv.conf\fR, \fIaddr\fR +is given to the guest without remapping. For example \fB--dns +127.0.0.1\fR will instruct the guest to use itself as nameserver, not +the host. .TP .BR \-\-dns-forward " " \fIaddr -- 2.46.0
Despite the names, addr_ll_seen does not relate to addr_ll the same way addr_seen relates to addr. addr_ll_seen is an observed address from the guest, whereas addr_ll is *our* link-local address for use on the tap link when we can't use an external endpoint address. It's used both for passt provided services (DHCPv6, NDP) and in some cases for connections from addresses the guest can't access. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/conf.c b/conf.c index 3c102bcf..e5b5263f 100644 --- a/conf.c +++ b/conf.c @@ -720,7 +720,6 @@ static unsigned int conf_ip6(unsigned int ifi, } ip6->addr_seen = ip6->addr; - ip6->addr_ll_seen = ip6->addr_ll; if (MAC_IS_ZERO(mac)) { rc = nl_link_get_mac(nl_sock, ifi, mac); -- 2.46.0
When binding an IPv6 socket in sock_l4() we need to supply a scope id if the address is link-local. We check for this by comparing the given address to c->ip6.addr_ll. This is correct only by accident: while c->ip6.addr_ll is typically set to the host interface's link local address, the actual purpose of it is to provide a link local address for passt's private use on the tap interface. Instead set the scope id for any link-local address we're binding to. We're going to need something and this is what makes sense for sockets on the host. It doesn't make sense for PIF_SPLICE sockets, but those should always have loopback, not link-local addresses. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/util.c b/util.c index 7cb2c10f..f9f8113c 100644 --- a/util.c +++ b/util.c @@ -199,8 +199,7 @@ int sock_l4(const struct ctx *c, sa_family_t af, enum epoll_type type, if (bind_addr) { addr6.sin6_addr = *(struct in6_addr *)bind_addr; - if (!memcmp(bind_addr, &c->ip6.addr_ll, - sizeof(c->ip6.addr_ll))) + if (IN6_IS_ADDR_LINKLOCAL(bind_addr)) addr6.sin6_scope_id = c->ifi6; } return sock_l4_sa(c, type, &addr6, sizeof(addr6), ifname, -- 2.46.0
c->ip6.addr_ll is not like c->ip6.addr. The latter is an address for the guest, but the former is an address for our use on the tap link. Rename it accordingly, to 'our_tap_ll'. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 7 ++++--- dhcpv6.c | 2 +- fwd.c | 2 +- ndp.c | 2 +- passt.h | 4 ++-- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/conf.c b/conf.c index e5b5263f..1130ce5d 100644 --- a/conf.c +++ b/conf.c @@ -713,7 +713,7 @@ static unsigned int conf_ip6(unsigned int ifi, rc = nl_addr_get(nl_sock, ifi, AF_INET6, IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL, - &prefix_len, &ip6->addr_ll); + &prefix_len, &ip6->our_tap_ll); if (rc < 0) { err("Couldn't discover IPv6 address: %s", strerror(-rc)); return 0; @@ -735,7 +735,7 @@ static unsigned int conf_ip6(unsigned int ifi, } if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) || - IN6_IS_ADDR_UNSPECIFIED(&ip6->addr_ll)) + IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll)) return 0; return ifi; @@ -1027,7 +1027,8 @@ static void conf_print(const struct ctx *c) info(" router: %s", inet_ntop(AF_INET6, &c->ip6.gw, buf6, sizeof(buf6))); info(" our link-local: %s", - inet_ntop(AF_INET6, &c->ip6.addr_ll, buf6, sizeof(buf6))); + inet_ntop(AF_INET6, &c->ip6.our_tap_ll, + buf6, sizeof(buf6))); dns6: for (i = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[i]); i++) { diff --git a/dhcpv6.c b/dhcpv6.c index 1c595fd7..fa1dcccc 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -456,7 +456,7 @@ int dhcpv6(struct ctx *c, const struct pool *p, if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) src = &c->ip6.gw; else - src = &c->ip6.addr_ll; + src = &c->ip6.our_tap_ll; mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL); if (!mh) diff --git a/fwd.c b/fwd.c index b546bc41..dccc947d 100644 --- a/fwd.c +++ b/fwd.c @@ -320,7 +320,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) tgt->oaddr.a6 = c->ip6.gw; else - tgt->oaddr.a6 = c->ip6.addr_ll; + tgt->oaddr.a6 = c->ip6.our_tap_ll; } if (inany_v4(&tgt->oaddr)) { diff --git a/ndp.c b/ndp.c index 9c0fef4a..3a76b00a 100644 --- a/ndp.c +++ b/ndp.c @@ -344,7 +344,7 @@ dns_done: if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) rsaddr = &c->ip6.gw; else - rsaddr = &c->ip6.addr_ll; + rsaddr = &c->ip6.our_tap_ll; if (ih->icmp6_type == NS) { dlen = sizeof(struct ndp_na); diff --git a/passt.h b/passt.h index fe3e47d2..5e7e6a04 100644 --- a/passt.h +++ b/passt.h @@ -122,7 +122,7 @@ struct ip4_ctx { /** * struct ip6_ctx - IPv6 execution context * @addr: IPv6 address assigned to guest - * @addr_ll: Link-local IPv6 address on external, routable interface + * @our_tap_ll: Link-local IPv6 address for passt's use on tap * @addr_seen: Latest IPv6 global/site address seen as source from tap * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap * @gw: Default IPv6 gateway @@ -136,7 +136,7 @@ struct ip4_ctx { */ struct ip6_ctx { struct in6_addr addr; - struct in6_addr addr_ll; + struct in6_addr our_tap_ll; struct in6_addr addr_seen; struct in6_addr addr_ll_seen; struct in6_addr gw; -- 2.46.0
Some are guest visible addresses and may not be valid on the host, others are host visible addresses and may not be valid on the guest. Rearrange and comment the ip[46]_ctx definitions to make it clearer which is which. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- passt.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/passt.h b/passt.h index 5e7e6a04..3b8a6283 100644 --- a/passt.h +++ b/passt.h @@ -104,15 +104,18 @@ enum passt_modes { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip4_ctx { + /* PIF_TAP addresses */ struct in_addr addr; struct in_addr addr_seen; int prefix_len; struct in_addr gw; struct in_addr dns[MAXNS + 1]; struct in_addr dns_match; - struct in_addr dns_host; + /* PIF_HOST addresses */ + struct in_addr dns_host; struct in_addr addr_out; + char ifname_out[IFNAMSIZ]; bool no_copy_routes; @@ -122,12 +125,12 @@ struct ip4_ctx { /** * struct ip6_ctx - IPv6 execution context * @addr: IPv6 address assigned to guest - * @our_tap_ll: Link-local IPv6 address for passt's use on tap * @addr_seen: Latest IPv6 global/site address seen as source from tap * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap * @gw: Default IPv6 gateway * @dns: DNS addresses for DHCPv6 and NDP, zero-terminated * @dns_match: Forward DNS query if sent to this address + * @our_tap_ll: Link-local IPv6 address for passt's use on tap * @dns_host: Use this DNS on the host for forwarding * @addr_out: Optional source address for outbound traffic * @ifname_out: Optional interface name to bind outbound sockets to @@ -135,16 +138,19 @@ struct ip4_ctx { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip6_ctx { + /* PIF_TAP addresses */ struct in6_addr addr; - struct in6_addr our_tap_ll; struct in6_addr addr_seen; struct in6_addr addr_ll_seen; struct in6_addr gw; struct in6_addr dns[MAXNS + 1]; struct in6_addr dns_match; - struct in6_addr dns_host; + struct in6_addr our_tap_ll; + /* PIF_HOST addresses */ + struct in6_addr dns_host; struct in6_addr addr_out; + char ifname_out[IFNAMSIZ]; bool no_copy_routes; -- 2.46.0
In every place we use our_tap_ll, we only use it as a fallback if the IPv6 gateway address is not link-local. We can avoid that conditional at use time by doing it at initialisation of our_tap_ll instead. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 3 +++ dhcpv6.c | 5 +---- fwd.c | 5 +---- ndp.c | 5 +---- 4 files changed, 6 insertions(+), 12 deletions(-) diff --git a/conf.c b/conf.c index 1130ce5d..954f20ea 100644 --- a/conf.c +++ b/conf.c @@ -721,6 +721,9 @@ static unsigned int conf_ip6(unsigned int ifi, ip6->addr_seen = ip6->addr; + if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw)) + ip6->our_tap_ll = ip6->gw; + if (MAC_IS_ZERO(mac)) { rc = nl_link_get_mac(nl_sock, ifi, mac); if (rc < 0) { diff --git a/dhcpv6.c b/dhcpv6.c index fa1dcccc..14a5c7e6 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -453,10 +453,7 @@ int dhcpv6(struct ctx *c, const struct pool *p, c->ip6.addr_ll_seen = *saddr; - if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - src = &c->ip6.gw; - else - src = &c->ip6.our_tap_ll; + src = &c->ip6.our_tap_ll; mh = packet_get(p, 0, sizeof(*uh), sizeof(*mh), NULL); if (!mh) diff --git a/fwd.c b/fwd.c index dccc947d..75dc0151 100644 --- a/fwd.c +++ b/fwd.c @@ -317,10 +317,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, } else if (inany_is_loopback6(&tgt->oaddr) || inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) || inany_equals6(&tgt->oaddr, &c->ip6.addr)) { - if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - tgt->oaddr.a6 = c->ip6.gw; - else - tgt->oaddr.a6 = c->ip6.our_tap_ll; + tgt->oaddr.a6 = c->ip6.our_tap_ll; } if (inany_v4(&tgt->oaddr)) { diff --git a/ndp.c b/ndp.c index 3a76b00a..a1ee8349 100644 --- a/ndp.c +++ b/ndp.c @@ -341,10 +341,7 @@ dns_done: else c->ip6.addr_seen = *saddr; - if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - rsaddr = &c->ip6.gw; - else - rsaddr = &c->ip6.our_tap_ll; + rsaddr = &c->ip6.our_tap_ll; if (ih->icmp6_type == NS) { dlen = sizeof(struct ndp_na); -- 2.46.0
We usually avoid NAT, but in a few cases we need to apply address translations. For inbound connections that happens for addresses which make sense to the host but are either inaccessible, or mean a different location from the guest's point of view. Add some helper functions to determine such addresses, and use them in fwd_nat_from_host(). In doing so clarify some of the reasons for the logic. We'll also have further use for these helpers in future. While we're there fix one unneccessary inconsistency between IPv4 and IPv6. We always translated the guest's observed address, but for IPv4 we didn't translate the guest's assigned address, whereas for IPv6 we did. Change this to translate both in all cases for consistency. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- fwd.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 87 insertions(+), 11 deletions(-) diff --git a/fwd.c b/fwd.c index 75dc0151..d6f8a233 100644 --- a/fwd.c +++ b/fwd.c @@ -170,6 +170,85 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) ((ini->oport == 53) || (ini->oport == 853)); } +/** + * fwd_guest_accessible4() - Is IPv4 address guest-accessible + * @c: Execution context + * @addr: Host visible IPv4 address + * + * Return: true if @addr on the host is accessible to the guest without + * translation, false otherwise + */ +static bool fwd_guest_accessible4(const struct ctx *c, + const struct in_addr *addr) +{ + if (IN4_IS_ADDR_LOOPBACK(addr)) + return false; + + /* In socket interfaces 0.0.0.0 generally means "any" or unspecified, + * however on the wire it can mean "this host on this network". Since + * that has a different meaning for host and guest, we can't let it + * through untranslated. + */ + if (IN4_IS_ADDR_UNSPECIFIED(addr)) + return false; + + /* For IPv4, addr_seen is initialised to addr, so is always a valid + * address + */ + if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr) || + IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) + return false; + + return true; +} + +/** + * fwd_guest_accessible6() - Is IPv6 address guest-accessible + * @c: Execution context + * @addr: Host visible IPv6 address + * + * Return: true if @addr on the host is accessible to the guest without + * translation, false otherwise + */ +static bool fwd_guest_accessible6(const struct ctx *c, + const struct in6_addr *addr) +{ + if (IN6_IS_ADDR_LOOPBACK(addr)) + return false; + + if (IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr)) + return false; + + /* For IPv6, addr_seen starts unspecified, because we don't know what LL + * address the guest will take until we see it. Only check against it + * if it has been set to a real address. + */ + if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) && + IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr_seen)) + return false; + + return true; +} + +/** + * fwd_guest_accessible() - Is IPv[46] address guest-accessible + * @c: Execution context + * @addr: Host visible IPv[46] address + * + * Return: true if @addr on the host is accessible to the guest without + * translation, false otherwise + */ +static bool fwd_guest_accessible(const struct ctx *c, + const union inany_addr *addr) +{ + const struct in_addr *a4 = inany_v4(addr); + + if (a4) + return fwd_guest_accessible4(c, a4); + + return fwd_guest_accessible6(c, &addr->a6); +} + /** * fwd_nat_from_tap() - Determine to forward a flow from the tap interface * @c: Execution context @@ -307,18 +386,15 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, return PIF_SPLICE; } - tgt->oaddr = ini->eaddr; - tgt->oport = ini->eport; - - if (inany_is_loopback4(&tgt->oaddr) || - inany_is_unspecified4(&tgt->oaddr) || - inany_equals4(&tgt->oaddr, &c->ip4.addr_seen)) { - tgt->oaddr = inany_from_v4(c->ip4.gw); - } else if (inany_is_loopback6(&tgt->oaddr) || - inany_equals6(&tgt->oaddr, &c->ip6.addr_seen) || - inany_equals6(&tgt->oaddr, &c->ip6.addr)) { - tgt->oaddr.a6 = c->ip6.our_tap_ll; + if (!fwd_guest_accessible(c, &ini->eaddr)) { + if (inany_v4(&ini->eaddr)) + tgt->oaddr = inany_from_v4(c->ip4.gw); + else + tgt->oaddr.a6 = c->ip6.our_tap_ll; + } else { + tgt->oaddr = ini->eaddr; } + tgt->oport = ini->eport; if (inany_v4(&tgt->oaddr)) { tgt->eaddr = inany_from_v4(c->ip4.addr_seen); -- 2.46.0
ip4.gw conflates 3 conceptually different things, which (for now) have the same value: 1. The router/gateway address as seen by the guest 2. An address to NAT to the host with --no-map-gw isn't specified 3. An address to use as source when nothing else makes sense Case 3 occurs in two situations: a) for our DHCP responses - since they come from passt internally there's no naturally meaningful address for them to come from b) for forwarded connections coming from an address that isn't guest accessible (localhost or the guest's own address). (b) occurs even with --no-map-gw, and the expected behaviour of forwarding local connections requires it. For IPv6 role (3) is now taken by ip6.our_tap_ll (which usually has the same value as ip6.gw). For future flexibility we may want to make this "address of last resort" different from the gateway address, so split them logically for IPv4 as well. Specifically, add a new ip4.our_tap_addr field for the address with this role, and initialise it to ip4.gw for now. Unlike IPv6 where we can always get a link-local address, we might not be able to get a (non 0.0.0.0) address here (e.g. if the host is disconnected or only has a point to point link with no gateway address). In that case we have to disable forwarding of inbound connections with guest-inaccessible source addresses. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- dhcp.c | 8 ++++---- fwd.c | 10 +++++++--- passt.h | 2 ++ 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/dhcp.c b/dhcp.c index acc5b03e..353de323 100644 --- a/dhcp.c +++ b/dhcp.c @@ -345,9 +345,9 @@ int dhcp(const struct ctx *c, const struct pool *p) m->yiaddr = c->ip4.addr; mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len)); - memcpy(opts[1].s, &mask, sizeof(mask)); - memcpy(opts[3].s, &c->ip4.gw, sizeof(c->ip4.gw)); - memcpy(opts[54].s, &c->ip4.gw, sizeof(c->ip4.gw)); + memcpy(opts[1].s, &mask, sizeof(mask)); + memcpy(opts[3].s, &c->ip4.gw, sizeof(c->ip4.gw)); + memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr)); /* If the gateway is not on the assigned subnet, send an option 121 * (Classless Static Routing) adding a dummy route to it. @@ -377,7 +377,7 @@ int dhcp(const struct ctx *c, const struct pool *p) opt_set_dns_search(c, sizeof(m->o)); dlen = offsetof(struct msg, o) + fill(m); - tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, dlen); + tap_udp4_send(c, c->ip4.our_tap_addr, 67, c->ip4.addr, 68, m, dlen); return 1; } diff --git a/fwd.c b/fwd.c index d6f8a233..664b8ac6 100644 --- a/fwd.c +++ b/fwd.c @@ -387,10 +387,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, } if (!fwd_guest_accessible(c, &ini->eaddr)) { - if (inany_v4(&ini->eaddr)) - tgt->oaddr = inany_from_v4(c->ip4.gw); - else + if (inany_v4(&ini->eaddr)) { + if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr)) + /* No source address we can use */ + return PIF_NONE; + tgt->oaddr = inany_from_v4(c->ip4.our_tap_addr); + } else { tgt->oaddr.a6 = c->ip6.our_tap_ll; + } } else { tgt->oaddr = ini->eaddr; } diff --git a/passt.h b/passt.h index 3b8a6283..ecfed1e7 100644 --- a/passt.h +++ b/passt.h @@ -97,6 +97,7 @@ enum passt_modes { * @gw: Default IPv4 gateway * @dns: DNS addresses for DHCP, zero-terminated * @dns_match: Forward DNS query if sent to this address + * @our_tap_addr: IPv4 address for passt's use on tap * @dns_host: Use this DNS on the host for forwarding * @addr_out: Optional source address for outbound traffic * @ifname_out: Optional interface name to bind outbound sockets to @@ -111,6 +112,7 @@ struct ip4_ctx { struct in_addr gw; struct in_addr dns[MAXNS + 1]; struct in_addr dns_match; + struct in_addr our_tap_addr; /* PIF_HOST addresses */ struct in_addr dns_host; -- 2.46.0
When sending frames to the guest over the tap link, we need a source MAC address. Currently we take that from the MAC address of the main interface on the host, but that doesn't actually make much sense: * We can't preserve the real MAC address of packets from anywhere external so there's no transparency case here * In fact, it's confusingly different from how we handle IP addresses: whereas we give the guest the same IP as the host, we're making the host's MAC the one MAC that the guest *can't* use for itself. * We already need a fallback case if the host doesn't have an Ethernet like MAC (e.g. if it's connected via a point to point interface, such as a wireguard VPN). Change to just just use an arbitrary fixed MAC address - I've picked 9a:55:9a:55:9a:55. It's simpler and has the small advantage of making the fact that passt/pasta is in use typically obvious from guest side packet dumps. This can still, of course, be overridden with the -M option. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 40 ++++++---------------------------------- passt.h | 7 +++++++ util.h | 1 - 3 files changed, 13 insertions(+), 35 deletions(-) diff --git a/conf.c b/conf.c index 954f20ea..7eec3134 100644 --- a/conf.c +++ b/conf.c @@ -612,12 +612,10 @@ static int conf_ip4_prefix(const char *arg) * conf_ip4() - Verify or detect IPv4 support, get relevant addresses * @ifi: Host interface to attempt (0 to determine one) * @ip4: IPv4 context (will be written) - * @mac: MAC address to use (written if unset) * * Return: Interface index for IPv4, or 0 on failure. */ -static unsigned int conf_ip4(unsigned int ifi, - struct ip4_ctx *ip4, unsigned char *mac) +static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) { if (!ifi) ifi = nl_get_ext_if(nl_sock, AF_INET); @@ -660,19 +658,7 @@ static unsigned int conf_ip4(unsigned int ifi, ip4->addr_seen = ip4->addr; - if (MAC_IS_ZERO(mac)) { - int rc = nl_link_get_mac(nl_sock, ifi, mac); - if (rc < 0) { - char ifname[IFNAMSIZ]; - - err("Couldn't discover MAC address for %s: %s", - if_indextoname(ifi, ifname), strerror(-rc)); - return 0; - } - - if (MAC_IS_ZERO(mac)) - memcpy(mac, MAC_LAA, ETH_ALEN); - } + ip4->our_tap_addr = ip4->gw; if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) return 0; @@ -684,12 +670,10 @@ static unsigned int conf_ip4(unsigned int ifi, * conf_ip6() - Verify or detect IPv6 support, get relevant addresses * @ifi: Host interface to attempt (0 to determine one) * @ip6: IPv6 context (will be written) - * @mac: MAC address to use (written if unset) * * Return: Interface index for IPv6, or 0 on failure. */ -static unsigned int conf_ip6(unsigned int ifi, - struct ip6_ctx *ip6, unsigned char *mac) +static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) { int prefix_len = 0; int rc; @@ -724,19 +708,6 @@ static unsigned int conf_ip6(unsigned int ifi, if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw)) ip6->our_tap_ll = ip6->gw; - if (MAC_IS_ZERO(mac)) { - rc = nl_link_get_mac(nl_sock, ifi, mac); - if (rc < 0) { - char ifname[IFNAMSIZ]; - err("Couldn't discover MAC address for %s: %s", - if_indextoname(ifi, ifname), strerror(-rc)); - return 0; - } - - if (MAC_IS_ZERO(mac)) - memcpy(mac, MAC_LAA, ETH_ALEN); - } - if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) || IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll)) return 0; @@ -1287,6 +1258,7 @@ void conf(struct ctx *c, int argc, char **argv) c->tcp.fwd_in.mode = c->tcp.fwd_out.mode = FWD_UNSET; c->udp.fwd_in.mode = c->udp.fwd_out.mode = FWD_UNSET; + memcpy(c->our_tap_mac, MAC_OUR_LAA, ETH_ALEN); optind = 1; do { @@ -1657,9 +1629,9 @@ void conf(struct ctx *c, int argc, char **argv) nl_sock_init(c, false); if (!v6_only) - c->ifi4 = conf_ip4(ifi4, &c->ip4, c->our_tap_mac); + c->ifi4 = conf_ip4(ifi4, &c->ip4); if (!v4_only) - c->ifi6 = conf_ip6(ifi6, &c->ip6, c->our_tap_mac); + c->ifi6 = conf_ip6(ifi6, &c->ip6); if ((!c->ifi4 && !c->ifi6) || (*c->ip4.ifname_out && !c->ifi4) || (*c->ip6.ifname_out && !c->ifi6)) diff --git a/passt.h b/passt.h index ecfed1e7..c6c67ffc 100644 --- a/passt.h +++ b/passt.h @@ -26,6 +26,13 @@ union epoll_ref; #include "tcp.h" #include "udp.h" +/* Default address for our end on the tap interface. Bit 0 of byte 0 must be 0 + * (unicast) and bit 1 of byte 1 must be 1 (locally administered). Otherwise + * it's arbitrary. + */ +#define MAC_OUR_LAA \ + ((uint8_t [ETH_ALEN]){0x9a, 0x55, 0x9a, 0x55, 0x9a, 0x55}) + /** * union epoll_ref - Breakdown of reference for epoll fd bookkeeping * @type: Type of fd (tells us what to do with events) diff --git a/util.h b/util.h index a716849a..87b91e6c 100644 --- a/util.h +++ b/util.h @@ -96,7 +96,6 @@ #define PORT_IS_EPHEMERAL(port) ((port) >= PORT_EPHEMERAL_MIN) #define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 }) -#define MAC_LAA ((uint8_t [ETH_ALEN]){ BIT(1), 0, 0, 0, 0, 0 }) #define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN)) #ifndef __bswap_constant_16 -- 2.46.0
The @gw fields in the ip4_ctx and ip6_ctx give the (host's) default gateway. We use this for two quite distinct things: advertising the gateway that the guest should use (via DHCP, NDP and/or --config-net) and for a limited form of NAT. So that the guest can access services on the host, we map the gateway address within the guest to the loopback address on the host. Using the gateway address for this isn't necessarily the best choice for this purpose, certainly not for all circumstances. So, start off by splitting the notion of these into two different values: @guest_gw which is the gateway address the guest should use and @nat_host_loopback, which is the guest visible address to remap to the host's loopback. Usually nat_host_loopback will have the same value as guest_gw. However when --no-map-gw is specified we leave them unspecified instead. This means when we use nat_host_loopback, we don't need to separately check c->no_map_gw to see if it's relevant. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 61 +++++++++++++++++++++++++++++++-------------------------- dhcp.c | 10 ++++++---- fwd.c | 4 ++-- passt.h | 16 +++++++++------ pasta.c | 6 ++++-- 5 files changed, 55 insertions(+), 42 deletions(-) diff --git a/conf.c b/conf.c index 7eec3134..d4a3e9c7 100644 --- a/conf.c +++ b/conf.c @@ -410,12 +410,12 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver, * redirect */ if (IN4_IS_ADDR_LOOPBACK(&ns4)) { - if (c->no_map_gw) + if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback)) return; - ns4 = c->ip4.gw; + ns4 = c->ip4.map_host_loopback; if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match)) - c->ip4.dns_match = c->ip4.gw; + c->ip4.dns_match = c->ip4.map_host_loopback; } *idx4 += add_dns4(c, &ns4, *idx4); @@ -429,13 +429,13 @@ static void add_dns_resolv(struct ctx *c, const char *nameserver, * redirect */ if (IN6_IS_ADDR_LOOPBACK(&ns6)) { - if (c->no_map_gw) + if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback)) return; - ns6 = c->ip6.gw; + ns6 = c->ip6.map_host_loopback; if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match)) - c->ip6.dns_match = c->ip6.gw; + c->ip6.dns_match = c->ip6.map_host_loopback; } *idx6 += add_dns6(c, &ns6, *idx6); @@ -625,8 +625,9 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) return 0; } - if (IN4_IS_ADDR_UNSPECIFIED(&ip4->gw)) { - int rc = nl_route_get_def(nl_sock, ifi, AF_INET, &ip4->gw); + if (IN4_IS_ADDR_UNSPECIFIED(&ip4->guest_gw)) { + int rc = nl_route_get_def(nl_sock, ifi, AF_INET, + &ip4->guest_gw); if (rc < 0) { err("Couldn't discover IPv4 gateway address: %s", strerror(-rc)); @@ -658,7 +659,7 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) ip4->addr_seen = ip4->addr; - ip4->our_tap_addr = ip4->gw; + ip4->our_tap_addr = ip4->guest_gw; if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) return 0; @@ -686,8 +687,8 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) return 0; } - if (IN6_IS_ADDR_UNSPECIFIED(&ip6->gw)) { - rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->gw); + if (IN6_IS_ADDR_UNSPECIFIED(&ip6->guest_gw)) { + rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->guest_gw); if (rc < 0) { err("Couldn't discover IPv6 gateway address: %s", strerror(-rc)); @@ -705,8 +706,8 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) ip6->addr_seen = ip6->addr; - if (IN6_IS_ADDR_LINKLOCAL(&ip6->gw)) - ip6->our_tap_ll = ip6->gw; + if (IN6_IS_ADDR_LINKLOCAL(&ip6->guest_gw)) + ip6->our_tap_ll = ip6->guest_gw; if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) || IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll)) @@ -969,7 +970,8 @@ static void conf_print(const struct ctx *c) info(" mask: %s", inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); info(" router: %s", - inet_ntop(AF_INET, &c->ip4.gw, buf4, sizeof(buf4))); + inet_ntop(AF_INET, &c->ip4.guest_gw, + buf4, sizeof(buf4))); } for (i = 0; !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[i]); i++) { @@ -999,7 +1001,7 @@ static void conf_print(const struct ctx *c) info(" assign: %s", inet_ntop(AF_INET6, &c->ip6.addr, buf6, sizeof(buf6))); info(" router: %s", - inet_ntop(AF_INET6, &c->ip6.gw, buf6, sizeof(buf6))); + inet_ntop(AF_INET6, &c->ip6.guest_gw, buf6, sizeof(buf6))); info(" our link-local: %s", inet_ntop(AF_INET6, &c->ip6.our_tap_ll, buf6, sizeof(buf6))); @@ -1173,7 +1175,7 @@ fail: */ void conf(struct ctx *c, int argc, char **argv) { - int netns_only = 0; + int netns_only = 0, no_map_gw = 0; const struct option options[] = { {"debug", no_argument, NULL, 'd' }, {"quiet", no_argument, NULL, 'q' }, @@ -1202,7 +1204,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-dhcpv6", no_argument, &c->no_dhcpv6, 1 }, {"no-ndp", no_argument, &c->no_ndp, 1 }, {"no-ra", no_argument, &c->no_ra, 1 }, - {"no-map-gw", no_argument, &c->no_map_gw, 1 }, + {"no-map-gw", no_argument, &no_map_gw, 1 }, {"ipv4-only", no_argument, NULL, '4' }, {"ipv6-only", no_argument, NULL, '6' }, {"one-off", no_argument, NULL, '1' }, @@ -1503,18 +1505,18 @@ void conf(struct ctx *c, int argc, char **argv) parse_mac(c->our_tap_mac, optarg); break; case 'g': - if (inet_pton(AF_INET6, optarg, &c->ip6.gw) && - !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw) && - !IN6_IS_ADDR_LOOPBACK(&c->ip6.gw)) { + if (inet_pton(AF_INET6, optarg, &c->ip6.guest_gw) && + !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.guest_gw) && + !IN6_IS_ADDR_LOOPBACK(&c->ip6.guest_gw)) { if (c->mode == MODE_PASTA) c->ip6.no_copy_routes = true; break; } - if (inet_pton(AF_INET, optarg, &c->ip4.gw) && - !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw) && - !IN4_IS_ADDR_BROADCAST(&c->ip4.gw) && - !IN4_IS_ADDR_LOOPBACK(&c->ip4.gw)) { + if (inet_pton(AF_INET, optarg, &c->ip4.guest_gw) && + !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw) && + !IN4_IS_ADDR_BROADCAST(&c->ip4.guest_gw) && + !IN4_IS_ADDR_LOOPBACK(&c->ip4.guest_gw)) { if (c->mode == MODE_PASTA) c->ip4.no_copy_routes = true; break; @@ -1637,11 +1639,14 @@ void conf(struct ctx *c, int argc, char **argv) (*c->ip6.ifname_out && !c->ifi6)) die("External interface not usable"); - if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.gw)) - c->no_map_gw = c->no_dhcp = 1; + if (c->ifi4 && !no_map_gw) + c->ip4.map_host_loopback = c->ip4.guest_gw; - if (c->ifi6 && IN6_IS_ADDR_UNSPECIFIED(&c->ip6.gw)) - c->no_map_gw = 1; + if (c->ifi6 && !no_map_gw) + c->ip6.map_host_loopback = c->ip6.guest_gw; + + if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw)) + c->no_dhcp = 1; /* Inbound port options & DNS can be parsed now (after IPv4/IPv6 * settings) diff --git a/dhcp.c b/dhcp.c index 353de323..a06f1438 100644 --- a/dhcp.c +++ b/dhcp.c @@ -346,19 +346,21 @@ int dhcp(const struct ctx *c, const struct pool *p) m->yiaddr = c->ip4.addr; mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len)); memcpy(opts[1].s, &mask, sizeof(mask)); - memcpy(opts[3].s, &c->ip4.gw, sizeof(c->ip4.gw)); + memcpy(opts[3].s, &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr)); /* If the gateway is not on the assigned subnet, send an option 121 * (Classless Static Routing) adding a dummy route to it. */ if ((c->ip4.addr.s_addr & mask.s_addr) - != (c->ip4.gw.s_addr & mask.s_addr)) { + != (c->ip4.guest_gw.s_addr & mask.s_addr)) { /* a.b.c.d/32:0.0.0.0, 0:a.b.c.d */ opts[121].slen = 14; opts[121].s[0] = 32; - memcpy(opts[121].s + 1, &c->ip4.gw, sizeof(c->ip4.gw)); - memcpy(opts[121].s + 10, &c->ip4.gw, sizeof(c->ip4.gw)); + memcpy(opts[121].s + 1, + &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); + memcpy(opts[121].s + 10, + &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); } if (c->mtu != -1) { diff --git a/fwd.c b/fwd.c index 664b8ac6..f99d2040 100644 --- a/fwd.c +++ b/fwd.c @@ -268,9 +268,9 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto, else if (is_dns_flow(proto, ini) && inany_equals6(&ini->oaddr, &c->ip6.dns_match)) tgt->eaddr.a6 = c->ip6.dns_host; - else if (!c->no_map_gw && inany_equals4(&ini->oaddr, &c->ip4.gw)) + else if (inany_equals4(&ini->oaddr, &c->ip4.map_host_loopback)) tgt->eaddr = inany_loopback4; - else if (!c->no_map_gw && inany_equals6(&ini->oaddr, &c->ip6.gw)) + else if (inany_equals6(&ini->oaddr, &c->ip6.map_host_loopback)) tgt->eaddr = inany_loopback6; else tgt->eaddr = ini->oaddr; diff --git a/passt.h b/passt.h index c6c67ffc..7cdba85e 100644 --- a/passt.h +++ b/passt.h @@ -101,7 +101,9 @@ enum passt_modes { * @addr: IPv4 address assigned to guest * @addr_seen: Latest IPv4 address seen as source from tap * @prefixlen: IPv4 prefix length (netmask) - * @gw: Default IPv4 gateway + * @guest_gw: IPv4 gateway as seen by the guest + * @map_host_loopback: Outbound connections to this address are NATted to the + * host's 127.0.0.1 * @dns: DNS addresses for DHCP, zero-terminated * @dns_match: Forward DNS query if sent to this address * @our_tap_addr: IPv4 address for passt's use on tap @@ -116,7 +118,8 @@ struct ip4_ctx { struct in_addr addr; struct in_addr addr_seen; int prefix_len; - struct in_addr gw; + struct in_addr guest_gw; + struct in_addr map_host_loopback; struct in_addr dns[MAXNS + 1]; struct in_addr dns_match; struct in_addr our_tap_addr; @@ -136,7 +139,9 @@ struct ip4_ctx { * @addr: IPv6 address assigned to guest * @addr_seen: Latest IPv6 global/site address seen as source from tap * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap - * @gw: Default IPv6 gateway + * @guest_gw: IPv6 gateway as seen by the guest + * @map_host_loopback: Outbound connections to this address are NATted to the + * host's [::1] * @dns: DNS addresses for DHCPv6 and NDP, zero-terminated * @dns_match: Forward DNS query if sent to this address * @our_tap_ll: Link-local IPv6 address for passt's use on tap @@ -151,7 +156,8 @@ struct ip6_ctx { struct in6_addr addr; struct in6_addr addr_seen; struct in6_addr addr_ll_seen; - struct in6_addr gw; + struct in6_addr guest_gw; + struct in6_addr map_host_loopback; struct in6_addr dns[MAXNS + 1]; struct in6_addr dns_match; struct in6_addr our_tap_ll; @@ -213,7 +219,6 @@ struct ip6_ctx { * @no_dhcpv6: Disable DHCPv6 server * @no_ndp: Disable NDP handler altogether * @no_ra: Disable router advertisements - * @no_map_gw: Don't map connections, untracked UDP to gateway to host * @low_wmem: Low probed net.core.wmem_max * @low_rmem: Low probed net.core.rmem_max */ @@ -273,7 +278,6 @@ struct ctx { int no_dhcpv6; int no_ndp; int no_ra; - int no_map_gw; int low_wmem; int low_rmem; diff --git a/pasta.c b/pasta.c index 5f897d3e..0df7cb46 100644 --- a/pasta.c +++ b/pasta.c @@ -332,7 +332,8 @@ void pasta_ns_conf(struct ctx *c) if (c->ip4.no_copy_routes) { rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET, &c->ip4.gw); + AF_INET, + &c->ip4.guest_gw); } else { rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns, c->pasta_ifi, AF_INET); @@ -378,7 +379,8 @@ void pasta_ns_conf(struct ctx *c) if (c->ip6.no_copy_routes) { rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET6, &c->ip6.gw); + AF_INET6, + &c->ip6.guest_gw); } else { rc = nl_route_dup(nl_sock, c->ifi6, nl_sock_ns, c->pasta_ifi, -- 2.46.0
In the TCP throughput tests, we adjust the guest's MTU in order to test various packet sizes. Some of those are below 1280 which causes IPv6 to be deconfigured on the guest interface. When we increase it above 1280 again, IPv6 is re-enabled and we get an address in the right prefix with NDP, but we don't get exactly the expected address back - that's only communicated with --config-net or DHCPv6. With changes to how we handle NAT this can cause some of the IPv6 tests to fail, because they don't use the address that passt/pasta expects, and the guest doesn't initiate any traffic which allows us to learn what the new address is. Work around this by re-invoking dhclient -6 between adjusting the MTU and running IPv6 test cases. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- test/perf/passt_tcp | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/test/perf/passt_tcp b/test/perf/passt_tcp index 695479f3..635998e1 100644 --- a/test/perf/passt_tcp +++ b/test/perf/passt_tcp @@ -112,6 +112,10 @@ bw __BW__ 7.0 8.0 iperf3k ns +# Reducing MTU below 1280 deconfigures IPv6, get our address back +guest dhclient -6 -x +guest dhclient -6 __IFNAME__ + tl TCP RR latency over IPv4: guest to host lat - lat - -- 2.46.0
Because the host and guest share the same IP address with passt/pasta, it's not possible for the guest to directly address the host. Therefore we allow packets from the guest going to a special "NAT to host" address to be redirected to the host, appearing there as though they have both source and destination address of loopback. Currently that special address is always the address of the default gateway (or none). That can be a problem if we want that gateway to be addressable by the guest. Therefore, allow the special "NAT to host" address to be overridden on the command line with a new --map-host-loopback option. In order to exercise and test it, update the passt_in_ns and perf tests to use this option and give different mapping addresses for the two layers of the environment. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 52 ++++++++++++++++++++++++++++-- passt.1 | 14 +++++++++ test/lib/setup | 11 +++++-- test/passt_in_ns/dhcp | 73 +++++++++++++++++++++++++++++++++++++++++++ test/passt_in_ns/tcp | 38 +++++++++++----------- test/passt_in_ns/udp | 22 +++++++------ test/perf/passt_tcp | 33 +++++++++---------- test/perf/passt_udp | 31 +++++++++--------- test/perf/pasta_tcp | 29 ++++++++--------- test/perf/pasta_udp | 25 ++++++++------- test/run | 4 +-- 11 files changed, 237 insertions(+), 95 deletions(-) create mode 100644 test/passt_in_ns/dhcp diff --git a/conf.c b/conf.c index d4a3e9c7..d605c215 100644 --- a/conf.c +++ b/conf.c @@ -817,6 +817,9 @@ static void usage(const char *name, FILE *f, int status) fprintf(f, " --no-dhcp-search No list in DHCP/DHCPv6/NDP\n"); fprintf(f, + " --map-host-loopback ADDR Translate ADDR to refer to host\n" + " can be specified zero to two times (for IPv4 and IPv6)\n" + " default: gateway address\n" " --dns-forward ADDR Forward DNS queries sent to ADDR\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: don't forward DNS queries\n" @@ -959,6 +962,11 @@ static void conf_print(const struct ctx *c) info(" host: %s", eth_ntop(c->our_tap_mac, bufmac, sizeof(bufmac))); if (c->ifi4) { + if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback)) + info(" NAT to host 127.0.0.1: %s", + inet_ntop(AF_INET, &c->ip4.map_host_loopback, + buf4, sizeof(buf4))); + if (!c->no_dhcp) { uint32_t mask; @@ -989,6 +997,11 @@ static void conf_print(const struct ctx *c) } if (c->ifi6) { + if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback)) + info(" NAT to host ::1: %s", + inet_ntop(AF_INET6, &c->ip6.map_host_loopback, + buf6, sizeof(buf6))); + if (!c->no_ndp && !c->no_dhcpv6) info("NDP/DHCPv6:"); else if (!c->no_ndp) @@ -1122,6 +1135,35 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid) } } +/** + * conf_nat() - Parse --map-host-loopback option + * @c: Execution context + * @arg: String argument to --map-host-loopback + * @no_map_gw: --no-map-gw flag, updated for "none" argument + */ +static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw) +{ + if (strcmp(arg, "none") == 0) { + c->ip4.map_host_loopback = in4addr_any; + c->ip6.map_host_loopback = in6addr_any; + *no_map_gw = 1; + } + + if (inet_pton(AF_INET6, arg, &c->ip6.map_host_loopback) && + !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && + !IN6_IS_ADDR_LOOPBACK(&c->ip6.map_host_loopback) && + !IN6_IS_ADDR_MULTICAST(&c->ip6.map_host_loopback)) + return; + + if (inet_pton(AF_INET, arg, &c->ip4.map_host_loopback) && + !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && + !IN4_IS_ADDR_LOOPBACK(&c->ip4.map_host_loopback) && + !IN4_IS_ADDR_MULTICAST(&c->ip4.map_host_loopback)) + return; + + die("Invalid address to remap to host: %s", optarg); +} + /** * conf_open_files() - Open files as requested by configuration * @c: Execution context @@ -1231,6 +1273,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-copy-routes", no_argument, NULL, 18 }, {"no-copy-addrs", no_argument, NULL, 19 }, {"netns-only", no_argument, NULL, 20 }, + {"map-host-loopback", required_argument, NULL, 21 }, { 0 }, }; const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; @@ -1400,6 +1443,9 @@ void conf(struct ctx *c, int argc, char **argv) netns_only = 1; *userns = 0; break; + case 21: + conf_nat(c, optarg, &no_map_gw); + break; case 'd': c->debug = 1; c->quiet = 0; @@ -1639,10 +1685,12 @@ void conf(struct ctx *c, int argc, char **argv) (*c->ip6.ifname_out && !c->ifi6)) die("External interface not usable"); - if (c->ifi4 && !no_map_gw) + if (c->ifi4 && !no_map_gw && + IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback)) c->ip4.map_host_loopback = c->ip4.guest_gw; - if (c->ifi6 && !no_map_gw) + if (c->ifi6 && !no_map_gw && + IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback)) c->ip6.map_host_loopback = c->ip6.guest_gw; if (c->ifi4 && IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw)) diff --git a/passt.1 b/passt.1 index dca433b6..e85d9885 100644 --- a/passt.1 +++ b/passt.1 @@ -327,6 +327,20 @@ namespace will be silently dropped. Disable Router Advertisements. Router Solicitations coming from guest or target namespace will be ignored. +.TP +.BR \-\-map-host-loopback " " \fIaddr +Translate \fIaddr\fR to refer to the host. Packets from the guest to +\fIaddr\fR will be redirected to the host. On the host such packets +will appear to have both source and destination of 127.0.0.1 or ::1. + +If \fIaddr\fR is 'none', no address is mapped (this implies +\fB--no-map-gw\fR). Only one IPv4 and one IPv6 address can be +translated, if the option is specified multiple times, the last one +takes effect. + +Default is to translate the guest's default gateway address, unless +\fB--no-map-gw\fR is given, in which case no address is mapped. + .TP .BR \-\-no-map-gw Don't remap TCP connections and untracked UDP traffic, with the gateway address diff --git a/test/lib/setup b/test/lib/setup index 9b39b9fe..f6dc6a16 100755 --- a/test/lib/setup +++ b/test/lib/setup @@ -124,7 +124,12 @@ setup_passt_in_ns() { [ ${DEBUG} -eq 1 ] && __opts="${__opts} -d" [ ${TRACE} -eq 1 ] && __opts="${__opts} --trace" - context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold" + __map_host4=192.0.2.1 + __map_host6=2001:db8:9a55::1 + __map_ns4=192.0.2.2 + __map_ns6=2001:db8:9a55::2 + + context_run_bg pasta "./pasta ${__opts} -t 10001,10002,10011,10012 -T 10003,10013 -u 10001,10002,10011,10012 -U 10003,10013 -P ${STATESETUP}/pasta.pid --map-host-loopback ${__map_host4} --map-host-loopback ${__map_host6} --config-net ${NSTOOL} hold ${STATESETUP}/ns.hold" wait_for [ -f "${STATESETUP}/pasta.pid" ] context_setup_nstool qemu ${STATESETUP}/ns.hold @@ -139,11 +144,11 @@ setup_passt_in_ns() { if [ ${VALGRIND} -eq 1 ]; then context_run passt "make clean" context_run passt "make valgrind" - context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid" + context_run_bg passt "valgrind --max-stackframe=$((4 * 1024 * 1024)) --trace-children=yes --vgdb=no --error-exitcode=1 --suppressions=test/valgrind.supp ./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --map-host-loopback ${__map_ns4} --map-host-loopback ${__map_ns6}" else context_run passt "make clean" context_run passt "make" - context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid" + context_run_bg passt "./passt -f ${__opts} -s ${STATESETUP}/passt.socket -t 10001,10011,10021,10031 -u 10001,10011,10021,10031 -P ${STATESETUP}/passt.pid --map-host-loopback ${__map_ns4} --map-host-loopback ${__map_ns6}" fi wait_for [ -f "${STATESETUP}/passt.pid" ] diff --git a/test/passt_in_ns/dhcp b/test/passt_in_ns/dhcp new file mode 100644 index 00000000..0ceed7c9 --- /dev/null +++ b/test/passt_in_ns/dhcp @@ -0,0 +1,73 @@ +# SPDX-License-Identifier: GPL-2.0-or-later +# +# PASST - Plug A Simple Socket Transport +# for qemu/UNIX domain socket mode +# +# PASTA - Pack A Subtle Tap Abstraction +# for network namespace/tap device mode +# +# test/passt/dhcp - Check DHCP and DHCPv6 functionality in passt mode +# +# Copyright (c) 2021 Red Hat GmbH +# Author: Stefano Brivio <sbrivio(a)redhat.com> + +gtools ip jq dhclient sed tr +htools ip jq sed tr head + +set MAP_NS4 192.0.2.2 +set MAP_NS6 2001:db8:9a55::2 + +test Interface name +gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' +hout HOST_IFNAME ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]' +hout HOST_IFNAME6 ip -j -6 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]' +check [ -n "__IFNAME__" ] + +test DHCP: address +guest /sbin/dhclient -4 __IFNAME__ +gout ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local' +hout HOST_ADDR ip -j -4 addr show|jq -rM '.[] | select(.ifname == "__HOST_IFNAME__").addr_info[0].local' +check [ "__ADDR__" = "__HOST_ADDR__" ] + +test DHCP: route +gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' +hout HOST_GW ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").gateway] | .[0]' +check [ "__GW__" = "__HOST_GW__" ] + +test DHCP: MTU +gout MTU ip -j link show | jq -rM '.[] | select(.ifname == "__IFNAME__").mtu' +check [ __MTU__ = 65520 ] + +test DHCP: DNS +gout DNS sed -n 's/^nameserver \([0-9]*\.\)\(.*\)/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/' +hout HOST_DNS sed -n 's/^nameserver \([0-9]*\.\)\(.*\)/\1\2/p' /etc/resolv.conf | head -n3 | tr '\n' ',' | sed 's/,$//;s/$/\n/' +check [ "__DNS__" = "__HOST_DNS__" ] || ( [ "__DNS__" = "__MAP_NS4__" ] && expr "__HOST_DNS__" : "127[.]" ) + +# FQDNs should be terminated by dots, but the guest DHCP client might omit them: +# strip them first +test DHCP: search list +gout SEARCH sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/' +hout HOST_SEARCH sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/' +check [ "__SEARCH__" = "__HOST_SEARCH__" ] + +test DHCPv6: address +guest /sbin/dhclient -6 __IFNAME__ +gout ADDR6 ip -j -6 addr show|jq -rM '[.[] | select(.ifname == "__IFNAME__").addr_info[] | select(.prefixlen == 128).local] | .[0]' +hout HOST_ADDR6 ip -j -6 addr show|jq -rM '[.[] | select(.ifname == "__HOST_IFNAME6__").addr_info[] | select(.scope == "global" and .deprecated != true).local] | .[0]' +check [ "__ADDR6__" = "__HOST_ADDR6__" ] + +test DHCPv6: route +gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' +hout HOST_GW6 ip -j -6 route show|jq -rM '[.[] | select(.dst == "default").gateway] | .[0]' +check [ "__GW6__" = "__HOST_GW6__" ] + +# Strip interface specifier: interface names might differ between host and guest +test DHCPv6: DNS +gout DNS6 sed -n 's/^nameserver \([^:]*:\)\([^%]*\).*/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/' +hout HOST_DNS6 sed -n 's/^nameserver \([^:]*:\)\([^%]*\).*/\1\2/p' /etc/resolv.conf | tr '\n' ',' | sed 's/,$//;s/$/\n/' +check [ "__DNS6__" = "__HOST_DNS6__" ] || [ "__DNS6__" = "__MAP_NS6__" -a "__HOST_DNS6__" = "::1" ] + +test DHCPv6: search list +gout SEARCH6 sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/' +hout HOST_SEARCH6 sed 's/\. / /g' /etc/resolv.conf | sed 's/\.$//g' | sed -n 's/^search \(.*\)/\1/p' | tr ' \n' ',' | sed 's/,$//;s/$/\n/' +check [ "__SEARCH6__" = "__HOST_SEARCH6__" ] diff --git a/test/passt_in_ns/tcp b/test/passt_in_ns/tcp index cdb7060c..aaf340eb 100644 --- a/test/passt_in_ns/tcp +++ b/test/passt_in_ns/tcp @@ -15,6 +15,11 @@ gtools socat ip jq htools socat ip jq nstools socat ip jq +set MAP_HOST4 192.0.2.1 +set MAP_HOST6 2001:db8:9a55::1 +set MAP_NS4 192.0.2.2 +set MAP_NS6 2001:db8:9a55::2 + set TEMP_BIG __STATEDIR__/test_big.bin set TEMP_SMALL __STATEDIR__/test_small.bin set TEMP_NS_BIG __STATEDIR__/test_ns_big.bin @@ -36,16 +41,15 @@ check cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin test TCP/IPv4: guest to host: big transfer hostb socat -u TCP4-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc -gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' sleep 1 -guest socat -u OPEN:/root/big.bin TCP4:__GW__:10003 +guest socat -u OPEN:/root/big.bin TCP4:__MAP_HOST4__:10003 hostw check cmp __TEMP_BIG__ __BASEPATH__/big.bin test TCP/IPv4: guest to ns: big transfer nsb socat -u TCP4-LISTEN:10002 OPEN:__TEMP_NS_BIG__,create,trunc sleep 1 -guest socat -u OPEN:/root/big.bin TCP4:__GW__:10002 +guest socat -u OPEN:/root/big.bin TCP4:__MAP_NS4__:10002 nsw check cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin @@ -59,7 +63,7 @@ check cmp __TEMP_BIG__ __BASEPATH__/big.bin test TCP/IPv4: ns to host (via tap): big transfer hostb socat -u TCP4-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc sleep 1 -ns socat -u OPEN:__BASEPATH__/big.bin TCP4:__GW__:10003 +ns socat -u OPEN:__BASEPATH__/big.bin TCP4:__MAP_HOST4__:10003 hostw check cmp __TEMP_BIG__ __BASEPATH__/big.bin @@ -95,16 +99,15 @@ check cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin test TCP/IPv4: guest to host: small transfer hostb socat -u TCP4-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc -gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' sleep 1 -guest socat -u OPEN:/root/small.bin TCP4:__GW__:10003 +guest socat -u OPEN:/root/small.bin TCP4:__MAP_HOST4__:10003 hostw check cmp __TEMP_SMALL__ __BASEPATH__/small.bin test TCP/IPv4: guest to ns: small transfer nsb socat -u TCP4-LISTEN:10002 OPEN:__TEMP_NS_SMALL__,create,trunc sleep 1 -guest socat -u OPEN:/root/small.bin TCP4:__GW__:10002 +guest socat -u OPEN:/root/small.bin TCP4:__MAP_NS4__:10002 nsw check cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin @@ -118,7 +121,7 @@ check cmp __TEMP_SMALL__ __BASEPATH__/small.bin test TCP/IPv4: ns to host (via tap): small transfer hostb socat -u TCP4-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc sleep 1 -ns socat -u OPEN:__BASEPATH__/small.bin TCP4:__GW__:10003 +ns socat -u OPEN:__BASEPATH__/small.bin TCP4:__MAP_HOST4__:10003 hostw check cmp __TEMP_SMALL__ __BASEPATH__/small.bin @@ -152,17 +155,15 @@ check cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin test TCP/IPv6: guest to host: big transfer hostb socat -u TCP6-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc -gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -guest socat -u OPEN:/root/big.bin TCP6:[__GW6__%__IFNAME__]:10003 +guest socat -u OPEN:/root/big.bin TCP6:[__MAP_HOST6__]:10003 hostw check cmp __TEMP_BIG__ __BASEPATH__/big.bin test TCP/IPv6: guest to ns: big transfer nsb socat -u TCP6-LISTEN:10002 OPEN:__TEMP_NS_BIG__,create,trunc sleep 1 -guest socat -u OPEN:/root/big.bin TCP6:[__GW6__%__IFNAME__]:10002 +guest socat -u OPEN:/root/big.bin TCP6:[__MAP_NS6__]:10002 nsw check cmp __TEMP_NS_BIG__ __BASEPATH__/big.bin @@ -175,9 +176,8 @@ check cmp __TEMP_BIG__ __BASEPATH__/big.bin test TCP/IPv6: ns to host (via tap): big transfer hostb socat -u TCP6-LISTEN:10003 OPEN:__TEMP_BIG__,create,trunc -nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -ns socat -u OPEN:__BASEPATH__/big.bin TCP6:[__GW6__%__IFNAME__]:10003 +ns socat -u OPEN:__BASEPATH__/big.bin TCP6:[__MAP_HOST6__]:10003 hostw check cmp __TEMP_BIG__ __BASEPATH__/big.bin @@ -190,6 +190,7 @@ guest cmp test_big.bin /root/big.bin test TCP/IPv6: ns to guest (using namespace address): big transfer guestb socat -u TCP6-LISTEN:10001 OPEN:test_big.bin,create,trunc +nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' nsout ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local' sleep 1 ns socat -u OPEN:__BASEPATH__/big.bin TCP6:[__ADDR6__]:10001 @@ -212,17 +213,15 @@ check cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin test TCP/IPv6: guest to host: small transfer hostb socat -u TCP6-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc -gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -guest socat -u OPEN:/root/small.bin TCP6:[__GW6__%__IFNAME__]:10003 +guest socat -u OPEN:/root/small.bin TCP6:[__MAP_HOST6__]:10003 hostw check cmp __TEMP_SMALL__ __BASEPATH__/small.bin test TCP/IPv6: guest to ns: small transfer nsb socat -u TCP6-LISTEN:10002 OPEN:__TEMP_NS_SMALL__ sleep 1 -guest socat -u OPEN:/root/small.bin TCP6:[__GW6__%__IFNAME__]:10002 +guest socat -u OPEN:/root/small.bin TCP6:[__MAP_NS6__]:10002 nsw check cmp __TEMP_NS_SMALL__ __BASEPATH__/small.bin @@ -235,9 +234,8 @@ check cmp __TEMP_SMALL__ __BASEPATH__/small.bin test TCP/IPv6: ns to host (via tap): small transfer hostb socat -u TCP6-LISTEN:10003 OPEN:__TEMP_SMALL__,create,trunc -nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -ns socat -u OPEN:__BASEPATH__/small.bin TCP6:[__GW6__%__IFNAME__]:10003 +ns socat -u OPEN:__BASEPATH__/small.bin TCP6:[__MAP_HOST6__]:10003 hostw check cmp __TEMP_SMALL__ __BASEPATH__/small.bin diff --git a/test/passt_in_ns/udp b/test/passt_in_ns/udp index 8a025131..3426ab91 100644 --- a/test/passt_in_ns/udp +++ b/test/passt_in_ns/udp @@ -15,6 +15,11 @@ gtools socat ip jq nstools socat ip jq htools socat ip jq +set MAP_HOST4 192.0.2.1 +set MAP_HOST6 2001:db8:9a55::1 +set MAP_NS4 192.0.2.2 +set MAP_NS6 2001:db8:9a55::2 + set TEMP __STATEDIR__/test.bin set TEMP_NS __STATEDIR__/test_ns.bin @@ -34,16 +39,15 @@ check cmp __TEMP_NS__ __BASEPATH__/medium.bin test UDP/IPv4: guest to host hostb socat -u UDP4-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc -gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' sleep 1 -guest socat -u OPEN:/root/medium.bin UDP4:__GW__:10003,shut-null +guest socat -u OPEN:/root/medium.bin UDP4:__MAP_HOST4__:10003,shut-null hostw check cmp __TEMP__ __BASEPATH__/medium.bin test UDP/IPv4: guest to ns nsb socat -u UDP4-LISTEN:10002,null-eof OPEN:__TEMP_NS__,create,trunc sleep 1 -guest socat -u OPEN:/root/medium.bin UDP4:__GW__:10002,shut-null +guest socat -u OPEN:/root/medium.bin UDP4:__MAP_NS4__:10002,shut-null nsw check cmp __TEMP_NS__ __BASEPATH__/medium.bin @@ -57,7 +61,7 @@ check cmp __TEMP__ __BASEPATH__/medium.bin test UDP/IPv4: ns to host (via tap) hostb socat -u UDP4-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc sleep 1 -ns socat -u OPEN:__BASEPATH__/medium.bin UDP4:__GW__:10003,shut-null +ns socat -u OPEN:__BASEPATH__/medium.bin UDP4:__MAP_HOST4__:10003,shut-null hostw check cmp __TEMP__ __BASEPATH__/medium.bin @@ -93,17 +97,15 @@ check cmp __TEMP_NS__ __BASEPATH__/medium.bin test UDP/IPv6: guest to host hostb socat -u UDP6-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc -gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -guest socat -u OPEN:/root/medium.bin UDP6:[__GW6__%__IFNAME__]:10003,shut-null +guest socat -u OPEN:/root/medium.bin UDP6:[__MAP_HOST6__]:10003,shut-null hostw check cmp __TEMP__ __BASEPATH__/medium.bin test UDP/IPv6: guest to ns nsb socat -u UDP6-LISTEN:10002,null-eof OPEN:__TEMP_NS__,create,trunc sleep 1 -guest socat -u OPEN:/root/medium.bin UDP6:[__GW6__%__IFNAME__]:10002,shut-null +guest socat -u OPEN:/root/medium.bin UDP6:[__MAP_NS6__]:10002,shut-null nsw check cmp __TEMP_NS__ __BASEPATH__/medium.bin @@ -116,9 +118,8 @@ check cmp __TEMP__ __BASEPATH__/medium.bin test UDP/IPv6: ns to host (via tap) hostb socat -u UDP6-LISTEN:10003,null-eof OPEN:__TEMP__,create,trunc -nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' sleep 1 -ns socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__GW6__%__IFNAME__]:10003,shut-null +ns socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__MAP_HOST6__]:10003,shut-null hostw check cmp __TEMP__ __BASEPATH__/medium.bin @@ -131,6 +132,7 @@ guest cmp test.bin /root/medium.bin test UDP/IPv6: ns to guest (using namespace address) guestb socat -u UDP6-LISTEN:10001,null-eof OPEN:test.bin,create,trunc +nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' nsout ADDR6 ip -j -6 addr show|jq -rM '.[] | select(.ifname == "__IFNAME__").addr_info[0].local' sleep 1 ns socat -u OPEN:__BASEPATH__/medium.bin UDP6:[__ADDR6__]:10001,shut-null diff --git a/test/perf/passt_tcp b/test/perf/passt_tcp index 635998e1..089d9530 100644 --- a/test/perf/passt_tcp +++ b/test/perf/passt_tcp @@ -15,6 +15,9 @@ gtools /sbin/sysctl ip jq nproc seq sleep iperf3 tcp_rr tcp_crr # From neper nstools /sbin/sysctl ip jq nproc seq sleep iperf3 tcp_rr tcp_crr htools bc head sed seq +set MAP_NS4 192.0.2.2 +set MAP_NS6 2001:db8:9a55::2 + test passt: throughput and latency guest /sbin/sysctl -w net.core.rmem_max=536870912 @@ -29,8 +32,6 @@ ns /sbin/sysctl -w net.ipv4.tcp_rmem="4096 524288 134217728" ns /sbin/sysctl -w net.ipv4.tcp_wmem="4096 524288 134217728" ns /sbin/sysctl -w net.ipv4.tcp_timestamps=0 -gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' hout FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1 @@ -54,16 +55,16 @@ iperf3s ns 10002 bw - bw - guest ip link set dev __IFNAME__ mtu 1280 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 4M bw __BW__ 1.2 1.5 guest ip link set dev __IFNAME__ mtu 1500 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 4M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 4M bw __BW__ 1.6 1.8 guest ip link set dev __IFNAME__ mtu 9000 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 8M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 8M bw __BW__ 4.0 5.0 guest ip link set dev __IFNAME__ mtu 65520 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -w 16M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 16M bw __BW__ 7.0 8.0 iperf3k ns @@ -75,7 +76,7 @@ lat - lat - lat - nsb tcp_rr --nolog -6 -gout LAT tcp_rr --nolog -l1 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT tcp_rr --nolog -l1 -6 -c -H __MAP_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 tl TCP CRR latency over IPv6: guest to host @@ -85,29 +86,29 @@ lat - lat - lat - nsb tcp_crr --nolog -6 -gout LAT tcp_crr --nolog -l1 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT tcp_crr --nolog -l1 -6 -c -H __MAP_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 500 400 tr TCP throughput over IPv4: guest to host iperf3s ns 10002 guest ip link set dev __IFNAME__ mtu 256 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 1M bw __BW__ 0.2 0.3 guest ip link set dev __IFNAME__ mtu 576 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 1M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 1M bw __BW__ 0.5 0.8 guest ip link set dev __IFNAME__ mtu 1280 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 4M bw __BW__ 1.2 1.5 guest ip link set dev __IFNAME__ mtu 1500 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 4M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 4M bw __BW__ 1.6 1.8 guest ip link set dev __IFNAME__ mtu 9000 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 8M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 8M bw __BW__ 4.0 5.0 guest ip link set dev __IFNAME__ mtu 65520 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -w 16M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 16M bw __BW__ 7.0 8.0 iperf3k ns @@ -123,7 +124,7 @@ lat - lat - lat - nsb tcp_rr --nolog -4 -gout LAT tcp_rr --nolog -l1 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT tcp_rr --nolog -l1 -4 -c -H __MAP_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 tl TCP CRR latency over IPv4: guest to host @@ -133,7 +134,7 @@ lat - lat - lat - nsb tcp_crr --nolog -4 -gout LAT tcp_crr --nolog -l1 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT tcp_crr --nolog -l1 -4 -c -H __MAP_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 500 400 tr TCP throughput over IPv6: host to guest diff --git a/test/perf/passt_udp b/test/perf/passt_udp index f25c9033..4c66c41c 100644 --- a/test/perf/passt_udp +++ b/test/perf/passt_udp @@ -15,6 +15,9 @@ gtools /sbin/sysctl ip jq nproc sleep iperf3 udp_rr # From neper nstools ip jq sleep iperf3 udp_rr htools bc head sed +set MAP_NS4 192.0.2.2 +set MAP_NS6 2001:db8:9a55::2 + test passt: throughput and latency guest /sbin/sysctl -w net.core.rmem_max=16777216 @@ -22,10 +25,6 @@ guest /sbin/sysctl -w net.core.wmem_max=16777216 guest /sbin/sysctl -w net.core.rmem_default=16777216 guest /sbin/sysctl -w net.core.wmem_default=16777216 -gout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' -gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' - hout FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1 hout FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l hout FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__ @@ -46,13 +45,13 @@ iperf3s ns 10002 bw - bw - -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 3G -l 1232 +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -b 3G -l 1232 bw __BW__ 0.8 1.2 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 4G -l 1452 +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -b 4G -l 1452 bw __BW__ 1.0 1.5 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 8G -l 8952 +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -b 8G -l 8952 bw __BW__ 4.0 5.0 -iperf3 BW guest __GW6__%__IFNAME__ 10002 __TIME__ __OPTS__ -b 15G -l 64372 +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -b 15G -l 64372 bw __BW__ 4.0 5.0 iperf3k ns @@ -64,7 +63,7 @@ lat - lat - lat - nsb udp_rr --nolog -6 -gout LAT udp_rr --nolog -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT udp_rr --nolog -6 -c -H __MAP_NS6__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 @@ -72,17 +71,17 @@ tr UDP throughput over IPv4: guest to host iperf3s ns 10002 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 1G -l 228 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 1G -l 228 bw __BW__ 0.0 0.0 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 2G -l 548 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 2G -l 548 bw __BW__ 0.4 0.6 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 3G -l 1252 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 3G -l 1252 bw __BW__ 0.8 1.2 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 4G -l 1472 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 4G -l 1472 bw __BW__ 1.0 1.5 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 8G -l 8972 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 8G -l 8972 bw __BW__ 4.0 5.0 -iperf3 BW guest __GW__ 10002 __TIME__ __OPTS__ -b 15G -l 65492 +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -b 15G -l 65492 bw __BW__ 4.0 5.0 iperf3k ns @@ -94,7 +93,7 @@ lat - lat - lat - nsb udp_rr --nolog -4 -gout LAT udp_rr --nolog -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +gout LAT udp_rr --nolog -4 -c -H __MAP_NS4__ | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 diff --git a/test/perf/pasta_tcp b/test/perf/pasta_tcp index a443f5a9..d1ccf7d1 100644 --- a/test/perf/pasta_tcp +++ b/test/perf/pasta_tcp @@ -14,6 +14,9 @@ htools head ip seq bc sleep iperf3 tcp_rr tcp_crr jq sed nstools /sbin/sysctl nproc ip seq sleep iperf3 tcp_rr tcp_crr jq sed +set MAP_HOST4 192.0.2.1 +set MAP_HOST6 2001:db8:9a55::1 + test pasta: throughput and latency (local connections) ns /sbin/sysctl -w net.ipv4.tcp_rmem="131072 524288 134217728" @@ -122,8 +125,6 @@ te test pasta: throughput and latency (connections via tap) -nsout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' -nsout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' set THREADS 2 set OPTS -Z -P __THREADS__ -i1 -O__OMIT__ @@ -137,16 +138,16 @@ tr TCP throughput over IPv6: ns to host iperf3s host 10003 ns ip link set dev __IFNAME__ mtu 1500 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 512k +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -w 512k bw __BW__ 0.2 0.4 ns ip link set dev __IFNAME__ mtu 4000 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 1M +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -w 1M bw __BW__ 0.3 0.5 ns ip link set dev __IFNAME__ mtu 16384 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -w 8M bw __BW__ 1.5 2.0 ns ip link set dev __IFNAME__ mtu 65520 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -w 8M +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -w 8M bw __BW__ 2.0 2.5 iperf3k host @@ -156,7 +157,7 @@ lat - lat - lat - hostb tcp_rr --nolog -P 10003 -C 10013 -6 -nsout LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -6 -c -H __MAP_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 150 100 @@ -165,7 +166,7 @@ lat - lat - lat - hostb tcp_crr --nolog -P 10003 -C 10013 -6 -nsout LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -6 -c -H __MAP_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 1500 500 @@ -174,16 +175,16 @@ tr TCP throughput over IPv4: ns to host iperf3s host 10003 ns ip link set dev __IFNAME__ mtu 1500 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -w 512k +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -w 512k bw __BW__ 0.2 0.4 ns ip link set dev __IFNAME__ mtu 4000 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -w 1M +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -w 1M bw __BW__ 0.3 0.5 ns ip link set dev __IFNAME__ mtu 16384 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -w 8M bw __BW__ 1.5 2.0 ns ip link set dev __IFNAME__ mtu 65520 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -w 8M +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -w 8M bw __BW__ 2.0 2.5 iperf3k host @@ -193,7 +194,7 @@ lat - lat - lat - hostb tcp_rr --nolog -P 10003 -C 10013 -4 -nsout LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT tcp_rr --nolog -l1 -P 10003 -C 10013 -4 -c -H __MAP_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 150 100 @@ -202,7 +203,7 @@ lat - lat - lat - hostb tcp_crr --nolog -P 10003 -C 10013 -4 -nsout LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT tcp_crr --nolog -l1 -P 10003 -C 10013 -4 -c -H __MAP_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 1500 500 diff --git a/test/perf/pasta_udp b/test/perf/pasta_udp index 9fed62e4..544bf173 100644 --- a/test/perf/pasta_udp +++ b/test/perf/pasta_udp @@ -14,6 +14,9 @@ htools bc head ip sleep iperf3 udp_rr jq sed nstools ip sleep iperf3 udp_rr jq sed +set MAP_HOST4 192.0.2.1 +set MAP_HOST6 2001:db8:9a55::1 + test pasta: throughput and latency (local traffic) hout FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\/2)\/10^3/p' /proc/cpuinfo) | bc -l | head -n1 @@ -133,8 +136,6 @@ te test pasta: throughput and latency (traffic via tap) -nsout GW ip -j -4 route show|jq -rM '.[] | select(.dst == "default").gateway' -nsout GW6 ip -j -6 route show|jq -rM '.[] | select(.dst == "default").gateway' nsout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' info Throughput in Gbps, latency in µs, one thread at __FREQ__ GHz @@ -146,13 +147,13 @@ tr UDP throughput over IPv6: ns to host iperf3s host 10003 # (datagram size) = (packet size) - 48: 40 bytes of IPv6 header, 8 of UDP header -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 8G -l 1472 +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -b 8G -l 1472 bw __BW__ 0.3 0.5 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 12G -l 3972 +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -b 12G -l 3972 bw __BW__ 0.5 0.8 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 20G -l 16356 +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -b 20G -l 16356 bw __BW__ 3.0 4.0 -iperf3 BW ns __GW6__%__IFNAME__ 10003 __TIME__ __OPTS__ -b 30G -l 65472 +iperf3 BW ns __MAP_HOST6__ 10003 __TIME__ __OPTS__ -b 30G -l 65472 bw __BW__ 6.0 7.0 iperf3k host @@ -162,7 +163,7 @@ lat - lat - lat - hostb udp_rr --nolog -P 10003 -C 10013 -6 -nsout LAT udp_rr --nolog -P 10003 -C 10013 -6 -c -H __GW6__%__IFNAME__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT udp_rr --nolog -P 10003 -C 10013 -6 -c -H __MAP_HOST6__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 200 150 @@ -171,13 +172,13 @@ tr UDP throughput over IPv4: ns to host iperf3s host 10003 # (datagram size) = (packet size) - 28: 20 bytes of IPv4 header, 8 of UDP header -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -b 8G -l 1472 +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -b 8G -l 1472 bw __BW__ 0.3 0.5 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -b 12G -l 3972 +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -b 12G -l 3972 bw __BW__ 0.5 0.8 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -b 20G -l 16356 +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -b 20G -l 16356 bw __BW__ 3.0 4.0 -iperf3 BW ns __GW__ 10003 __TIME__ __OPTS__ -b 30G -l 65492 +iperf3 BW ns __MAP_HOST4__ 10003 __TIME__ __OPTS__ -b 30G -l 65492 bw __BW__ 6.0 7.0 iperf3k host @@ -187,7 +188,7 @@ lat - lat - lat - hostb udp_rr --nolog -P 10003 -C 10013 -4 -nsout LAT udp_rr --nolog -P 10003 -C 10013 -4 -c -H __GW__ | sed -n 's/^throughput=\(.*\)/\1/p' +nsout LAT udp_rr --nolog -P 10003 -C 10013 -4 -c -H __MAP_HOST4__ | sed -n 's/^throughput=\(.*\)/\1/p' hostw lat __LAT__ 200 150 diff --git a/test/run b/test/run index 3b376639..cd6d7076 100755 --- a/test/run +++ b/test/run @@ -101,7 +101,7 @@ run() { VALGRIND=1 setup passt_in_ns test passt/ndp - test passt/dhcp + test passt_in_ns/dhcp test passt_in_ns/icmp test passt_in_ns/tcp test passt_in_ns/udp @@ -115,7 +115,7 @@ run() { VALGRIND=0 setup passt_in_ns test passt/ndp - test passt/dhcp + test passt_in_ns/dhcp test perf/passt_tcp test perf/passt_udp test perf/pasta_tcp -- 2.46.0
fwd_nat_from_host() needs to adjust the source address for new flows coming from an address which is not accessible to the guest. Currently we always use our_tap_addr or our_tap_ll. However in cases where the address is accessible to the guest via translation (i.e. via --map-host-loopback) then it makes more sense to use that translation, rather than the fallback mapping of our_tap_*. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- fwd.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fwd.c b/fwd.c index f99d2040..c55aea0b 100644 --- a/fwd.c +++ b/fwd.c @@ -386,7 +386,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, return PIF_SPLICE; } - if (!fwd_guest_accessible(c, &ini->eaddr)) { + if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && + inany_equals4(&ini->eaddr, &in4addr_loopback)) { + /* Specifically 127.0.0.1, not 127.0.0.0/8 */ + tgt->oaddr = inany_from_v4(c->ip4.map_host_loopback); + } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && + inany_equals6(&ini->eaddr, &in6addr_loopback)) { + tgt->oaddr.a6 = c->ip6.map_host_loopback; + } else if (!fwd_guest_accessible(c, &ini->eaddr)) { if (inany_v4(&ini->eaddr)) { if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr)) /* No source address we can use */ -- 2.46.0
The guest is usually assigned one of the host's IP addresses. That means it can't access the host itself via its usual address. The --map-host-loopback option (enabled by default with the gateway address) allows the guest to contact the host. However, connections forwarded this way appear on the host to have originated from the loopback interface, which isn't always desirable. Add a new --map-guest-addr option, which acts similarly but forwarded connections will go to the host's external address, instead of loopback. If '-a' is used, so the guest's address is not the same as the host's, this will instead forward to whatever host-visible site is shadowed by the guest's assigned address. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 46 +++++++++++++++++++++++++++++----------------- fwd.c | 10 ++++++++++ passt.1 | 15 +++++++++++++++ passt.h | 6 ++++++ 4 files changed, 60 insertions(+), 17 deletions(-) diff --git a/conf.c b/conf.c index d605c215..67945086 100644 --- a/conf.c +++ b/conf.c @@ -820,6 +820,9 @@ static void usage(const char *name, FILE *f, int status) " --map-host-loopback ADDR Translate ADDR to refer to host\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: gateway address\n" + " --map-guest-addr ADDR Translate ADDR to guest's address\n" + " can be specified zero to two times (for IPv4 and IPv6)\n" + " default: none\n" " --dns-forward ADDR Forward DNS queries sent to ADDR\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: don't forward DNS queries\n" @@ -1136,29 +1139,32 @@ static void conf_ugid(char *runas, uid_t *uid, gid_t *gid) } /** - * conf_nat() - Parse --map-host-loopback option - * @c: Execution context - * @arg: String argument to --map-host-loopback - * @no_map_gw: --no-map-gw flag, updated for "none" argument + * conf_nat() - Parse --map-host-loopback or --map-guest-addr option + * @arg: String argument to option + * @addr4: IPv4 to update with parsed address + * @addr6: IPv6 to update with parsed address + * @no_map_gw: --no-map-gw flag, or NULL, updated for "none" argument */ -static void conf_nat(struct ctx *c, const char *arg, int *no_map_gw) +static void conf_nat(const char *arg, struct in_addr *addr4, + struct in6_addr *addr6, int *no_map_gw) { if (strcmp(arg, "none") == 0) { - c->ip4.map_host_loopback = in4addr_any; - c->ip6.map_host_loopback = in6addr_any; - *no_map_gw = 1; + *addr4 = in4addr_any; + *addr6 = in6addr_any; + if (no_map_gw) + *no_map_gw = 1; } - if (inet_pton(AF_INET6, arg, &c->ip6.map_host_loopback) && - !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && - !IN6_IS_ADDR_LOOPBACK(&c->ip6.map_host_loopback) && - !IN6_IS_ADDR_MULTICAST(&c->ip6.map_host_loopback)) + if (inet_pton(AF_INET6, arg, addr6) && + !IN6_IS_ADDR_UNSPECIFIED(addr6) && + !IN6_IS_ADDR_LOOPBACK(addr6) && + !IN6_IS_ADDR_MULTICAST(addr6)) return; - if (inet_pton(AF_INET, arg, &c->ip4.map_host_loopback) && - !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && - !IN4_IS_ADDR_LOOPBACK(&c->ip4.map_host_loopback) && - !IN4_IS_ADDR_MULTICAST(&c->ip4.map_host_loopback)) + if (inet_pton(AF_INET, arg, addr4) && + !IN4_IS_ADDR_UNSPECIFIED(addr4) && + !IN4_IS_ADDR_LOOPBACK(addr4) && + !IN4_IS_ADDR_MULTICAST(addr4)) return; die("Invalid address to remap to host: %s", optarg); @@ -1274,6 +1280,7 @@ void conf(struct ctx *c, int argc, char **argv) {"no-copy-addrs", no_argument, NULL, 19 }, {"netns-only", no_argument, NULL, 20 }, {"map-host-loopback", required_argument, NULL, 21 }, + {"map-guest-addr", required_argument, NULL, 22 }, { 0 }, }; const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; @@ -1444,7 +1451,12 @@ void conf(struct ctx *c, int argc, char **argv) *userns = 0; break; case 21: - conf_nat(c, optarg, &no_map_gw); + conf_nat(optarg, &c->ip4.map_host_loopback, + &c->ip6.map_host_loopback, &no_map_gw); + break; + case 22: + conf_nat(optarg, &c->ip4.map_guest_addr, + &c->ip6.map_guest_addr, NULL); break; case 'd': c->debug = 1; diff --git a/fwd.c b/fwd.c index c55aea0b..2a0452fa 100644 --- a/fwd.c +++ b/fwd.c @@ -272,6 +272,10 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto, tgt->eaddr = inany_loopback4; else if (inany_equals6(&ini->oaddr, &c->ip6.map_host_loopback)) tgt->eaddr = inany_loopback6; + else if (inany_equals4(&ini->oaddr, &c->ip4.map_guest_addr)) + tgt->eaddr = inany_from_v4(c->ip4.addr); + else if (inany_equals6(&ini->oaddr, &c->ip6.map_guest_addr)) + tgt->eaddr.a6 = c->ip6.addr; else tgt->eaddr = ini->oaddr; @@ -393,6 +397,12 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && inany_equals6(&ini->eaddr, &in6addr_loopback)) { tgt->oaddr.a6 = c->ip6.map_host_loopback; + } else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_guest_addr) && + inany_equals4(&ini->eaddr, &c->ip4.addr)) { + tgt->oaddr = inany_from_v4(c->ip4.map_guest_addr); + } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && + inany_equals6(&ini->eaddr, &c->ip6.addr)) { + tgt->oaddr.a6 = c->ip6.map_guest_addr; } else if (!fwd_guest_accessible(c, &ini->eaddr)) { if (inany_v4(&ini->eaddr)) { if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr)) diff --git a/passt.1 b/passt.1 index e85d9885..79d134db 100644 --- a/passt.1 +++ b/passt.1 @@ -348,6 +348,21 @@ as destination, to the host. Implied if there is no gateway on the selected default route, or if there is no default route, for any of the enabled address families. +.TP +.BR \-\-map-guest-addr " " \fIaddr +Translate \fIaddr\fR in the guest to be equal to the guest's assigned +address on the host. That is, packets from the guest to \fIaddr\fR +will be redirected to the address assigned to the guest with \fB-a\fR, +or by default the host's global address. This allows the guest to +access services availble on the host's global address, even though its +own address shadows that of the host. + +If \fIaddr\fR is 'none', no address is mapped. Only one IPv4 and one +IPv6 address can be translated, and if the option is specified +multiple times, the last one for each address type takes effect. + +Default is no mapping. + .TP .BR \-4 ", " \-\-ipv4-only Enable IPv4-only operation. IPv6 traffic will be ignored. diff --git a/passt.h b/passt.h index 7cdba85e..031c9b66 100644 --- a/passt.h +++ b/passt.h @@ -104,6 +104,8 @@ enum passt_modes { * @guest_gw: IPv4 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's 127.0.0.1 + * @map_guest_addr: Outbound connections to this address are NATted to the + * guest's assigned address * @dns: DNS addresses for DHCP, zero-terminated * @dns_match: Forward DNS query if sent to this address * @our_tap_addr: IPv4 address for passt's use on tap @@ -120,6 +122,7 @@ struct ip4_ctx { int prefix_len; struct in_addr guest_gw; struct in_addr map_host_loopback; + struct in_addr map_guest_addr; struct in_addr dns[MAXNS + 1]; struct in_addr dns_match; struct in_addr our_tap_addr; @@ -142,6 +145,8 @@ struct ip4_ctx { * @guest_gw: IPv6 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's [::1] + * @map_guest_addr: Outbound connections to this address are NATted to the + * guest's assigned address * @dns: DNS addresses for DHCPv6 and NDP, zero-terminated * @dns_match: Forward DNS query if sent to this address * @our_tap_ll: Link-local IPv6 address for passt's use on tap @@ -158,6 +163,7 @@ struct ip6_ctx { struct in6_addr addr_ll_seen; struct in6_addr guest_gw; struct in6_addr map_host_loopback; + struct in6_addr map_guest_addr; struct in6_addr dns[MAXNS + 1]; struct in6_addr dns_match; struct in6_addr our_tap_ll; -- 2.46.0
On Wed, 21 Aug 2024 14:19:56 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Based on Stefano's recent patch for faster tests. Allow the user to specify which addresses are translated when used by the guest, rather than always being the gateway address or nothing. We also allow this remapping to go to the host's global address (more precisely the address assigned to the guest) rather than just host loopback. Along the way to implementing that make many changes to clarify what various addresses we track mean, fixing a number of small bugs as well. Paul, amongst other things, I think this will allow podman to (finally) nicely address #19213, picking an address to remap to the host's external address with --map-guest-addr, much like it already uses --dns-forward. Changes in v2: * Assorted minor stylistic fixes based on Stefano's review * Change name of the new options from --nat-* to --map-* * Shorten descriptions of new options in --help (leave the full text to the man page) * Add fix for the fact that changing MTU causes IPv6 to be temporarily deconfigured during perf testsApplied! -- Stefano