[PATCH 0/4] RFC: Reworks and improvements to TCP activity timers
Here's a bunch of patches aimed at fixing bug 179, and reworking the currently broken inactivity timer along the way. I believe patches 1..2/4 are ready to go - I've tested them, and I'm happy with how they're behaving. Patches 3..4/4 I think are correct, but I've been getting bogged down in details trying to test them in the specific FIN_WAIT_2 situation that occurs in bug 179. I'm sending this out for comment, while I look at some other things to clear my head. Caveats: * Currently the inactivity timer uses an interval of 2h to match the intended behaviour of the existig non-working activity timeout. Arguably it should be much longer (several days), like the kernel NAT timeout for idle connection tracking. * The keepalive interval is currently 30s. That's too short - I have it there just for testing. I was intending to change it to 300s (5m) for "real", but that's certainly open to discussion. * This introduces two new fields in the connection structure, as the clock values for the two new timers. These are new 1-bit bool fields slotted into a 3-bit gap. Arguably these would be cleaner as new bits in the 'flags' field. However, since we only have one spare bit there at the moment, that would require some more complex reorganization which I didn't want to tackle right now. David Gibson (4): tcp: Remove non-working activity timeout mechanism tcp: Re-introduce inactivity timeouts based on a clock algorithm tcp: Extend tcp_send_flag() to send TCP keepalive segments tcp: Send TCP keepalive segments after a period of tap-side inactivity tcp.c | 115 ++++++++++++++++++++++++++++++++++++++----------- tcp.h | 6 ++- tcp_buf.c | 4 ++ tcp_conn.h | 5 +++ tcp_internal.h | 2 + tcp_vu.c | 3 ++ 6 files changed, 109 insertions(+), 26 deletions(-) -- 2.52.0
We previously had a mechanism to remove TCP connections which were
inactive for 2 hours. That was broken for a long time, due to poor
interactions with the timerfd handling, so we removed it.
Adding this long scale timer onto the timerfd handling, which mostly
handles much shorter timeouts is tricky to reason about. However, for the
inactivity timeouts, we don't require precision. Instead, we can use
a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL
(2 hours), a global timer marks every TCP connection as tentatively
inactive. That flag is cleared if we get any events, either tap side or
socket side.
If the inactive flag is still set when the next INACTIVITY_INTERVAL expires
then the connection has been inactive for an extended period and we reset
and close it. In practice this means that connections will be removed
after 2-4 hours of inactivity.
This is not a true fix for bug 179, but it does mitigate the damage, by
limiting the time that inactive connections will remain around,
Link: https://bugs.passt.top/show_bug.cgi?id=179
Signed-off-by: David Gibson
This mechanism was intended to remove connections which have had no
activity for two hours, even if they haven't closed or been reset
internally. It operated by setting the two hour timeout if there are
no sooner TCP timeouts to schedule.
However, when the timer fires, the way we detect the case of the activity
timeout doesn't work: it resets the timer for another two hours, then
checks if the old timeout was two hours. But the old timeout returned
by timerfd_settime() is not the original value of the timer, but the
remaining time. Since the timer has just fired it will essentially always
be 0.
For now, just remove the mechanism, disarming the timer entirely if there
isn't another upcoming event. We'll re-introduce some sort of activity
timeout by a different means later.
Signed-off-by: David Gibson
Nits only:
On Wed, 4 Feb 2026 21:41:35 +1000
David Gibson
We previously had a mechanism to remove TCP connections which were inactive for 2 hours. That was broken for a long time, due to poor interactions with the timerfd handling, so we removed it.
Adding this long scale timer onto the timerfd handling, which mostly handles much shorter timeouts is tricky to reason about. However, for the inactivity timeouts, we don't require precision. Instead, we can use a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL (2 hours), a global timer marks every TCP connection as tentatively inactive. That flag is cleared if we get any events, either tap side or socket side.
If the inactive flag is still set when the next INACTIVITY_INTERVAL expires then the connection has been inactive for an extended period and we reset and close it. In practice this means that connections will be removed after 2-4 hours of inactivity.
This is not a true fix for bug 179, but it does mitigate the damage, by limiting the time that inactive connections will remain around,
Link: https://bugs.passt.top/show_bug.cgi?id=179 Signed-off-by: David Gibson
--- tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- tcp.h | 4 +++- tcp_conn.h | 3 +++ 3 files changed, 54 insertions(+), 5 deletions(-) diff --git a/tcp.c b/tcp.c index f8663369..acdac7df 100644 --- a/tcp.c +++ b/tcp.c @@ -198,6 +198,13 @@ * TCP_INFO, with a representable range from RTT_STORE_MIN (100 us) to * RTT_STORE_MAX (3276.8 ms). The timeout value is clamped accordingly. * + * We also use a global interval timer for an activity timeout which doesn't + * require precision: + * + * - INACTIVITY_INTERVAL: if a connection has had no activity for an entire + * interval, close and reset it. This means that idle connections (without + * keepalives) will be removed between INACTIVITY_INTERVAL s and
Probably easier to read: ... seconds
+ * 2*INACTIVITY_INTERVAL s after the last activity.
same here.
* * Summary of data flows (with ESTABLISHED event) * ---------------------------------------------- @@ -333,7 +340,8 @@ enum {
#define RTO_INIT 1 /* s, RFC 6298 */ #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ -#define ACT_TIMEOUT 7200 + +#define INACTIVITY_INTERVAL 7200 /* s */
#define LOW_RTT_TABLE_SIZE 8 #define LOW_RTT_THRESHOLD 10 /* us */ @@ -2254,6 +2262,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, return 1; }
+ conn->inactive = false; + if (th->ack && !(conn->events & ESTABLISHED)) tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
@@ -2622,6 +2632,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, return; }
+ conn->inactive = false; + if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) { conn_event(c, conn, CLOSED); return; @@ -2872,18 +2884,50 @@ int tcp_init(struct ctx *c) return 0; }
+/** + * tcp_inactivity() - Scan for and close long-inactive connections + * @: Execution context
* @c: Execution context * @now: Current timestamp
+ */ +static void tcp_inactivity(struct ctx *c, const struct timespec *now) +{ + union flow *flow; + + if (now->tv_sec - c->tcp.inactivity_run < INACTIVITY_INTERVAL) + return; + + debug("TCP inactivity scan"); + c->tcp.inactivity_run = now->tv_sec; + + flow_foreach(flow) { + struct tcp_tap_conn *conn = &flow->tcp; + + if (flow->f.type != FLOW_TCP) + continue; + + if (conn->inactive) { + /* No activity in this interval, reset */ + flow_dbg(conn, "Inactive for at least %us, resetting", + INACTIVITY_INTERVAL); + tcp_rst(c, conn); + } + + /* Ready to check fot next interval */
for
+ conn->inactive = true; + } +} + /** * tcp_timer() - Periodic tasks: port detection, closed connections, pool refill * @c: Execution context * @now: Current timestamp */ -void tcp_timer(const struct ctx *c, const struct timespec *now) +void tcp_timer(struct ctx *c, const struct timespec *now) { - (void)now; - tcp_sock_refill_init(c); if (c->mode == MODE_PASTA) tcp_splice_refill(c); + + tcp_inactivity(c, now); }
/** diff --git a/tcp.h b/tcp.h index 24b90870..e104d453 100644 --- a/tcp.h +++ b/tcp.h @@ -21,7 +21,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule, const union inany_addr *addr, const char *ifname, in_port_t port); int tcp_init(struct ctx *c); -void tcp_timer(const struct ctx *c, const struct timespec *now); +void tcp_timer(struct ctx *c, const struct timespec *now); void tcp_defer_handler(struct ctx *c);
void tcp_update_l2_buf(const unsigned char *eth_d); @@ -38,6 +38,7 @@ extern bool peek_offset_cap; * @rto_max: Maximum retry timeout (in s) * @syn_retries: SYN retries using exponential backoff timeout * @syn_linear_timeouts: SYN retries before using exponential backoff timeout + * @inactivity_run: Time we last scanned for inactive connections */ struct tcp_ctx { struct fwd_ports fwd_in; @@ -47,6 +48,7 @@ struct tcp_ctx { int rto_max; uint8_t syn_retries; uint8_t syn_linear_timeouts; + time_t inactivity_run; };
#endif /* TCP_H */ diff --git a/tcp_conn.h b/tcp_conn.h index 21cea109..7197ff63 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -16,6 +16,7 @@ * @ws_from_tap: Window scaling factor advertised from tap/guest * @ws_to_tap: Window scaling factor advertised to tap/guest * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS + * @inactive: No activity within the current INACTIVITY_INTERVAL * @sock: Socket descriptor number * @events: Connection events, implying connection states * @timer: timerfd descriptor for timeout events @@ -57,6 +58,8 @@ struct tcp_tap_conn { (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN)))) #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
+ bool inactive :1; + int sock :FD_REF_BITS;
uint8_t events;
-- Stefano
On Thu, Feb 05, 2026 at 01:17:28AM +0100, Stefano Brivio wrote:
Nits only:
On Wed, 4 Feb 2026 21:41:35 +1000 David Gibson
wrote: We previously had a mechanism to remove TCP connections which were inactive for 2 hours. That was broken for a long time, due to poor interactions with the timerfd handling, so we removed it.
Adding this long scale timer onto the timerfd handling, which mostly handles much shorter timeouts is tricky to reason about. However, for the inactivity timeouts, we don't require precision. Instead, we can use a 1-bit page replacement / "clock" algorithm. Every INACTIVITY_INTERVAL (2 hours), a global timer marks every TCP connection as tentatively inactive. That flag is cleared if we get any events, either tap side or socket side.
If the inactive flag is still set when the next INACTIVITY_INTERVAL expires then the connection has been inactive for an extended period and we reset and close it. In practice this means that connections will be removed after 2-4 hours of inactivity.
This is not a true fix for bug 179, but it does mitigate the damage, by limiting the time that inactive connections will remain around,
Link: https://bugs.passt.top/show_bug.cgi?id=179 Signed-off-by: David Gibson
--- tcp.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---- tcp.h | 4 +++- tcp_conn.h | 3 +++ 3 files changed, 54 insertions(+), 5 deletions(-) diff --git a/tcp.c b/tcp.c index f8663369..acdac7df 100644 --- a/tcp.c +++ b/tcp.c @@ -198,6 +198,13 @@ * TCP_INFO, with a representable range from RTT_STORE_MIN (100 us) to * RTT_STORE_MAX (3276.8 ms). The timeout value is clamped accordingly. * + * We also use a global interval timer for an activity timeout which doesn't + * require precision: + * + * - INACTIVITY_INTERVAL: if a connection has had no activity for an entire + * interval, close and reset it. This means that idle connections (without + * keepalives) will be removed between INACTIVITY_INTERVAL s and
Probably easier to read: ... seconds
+ * 2*INACTIVITY_INTERVAL s after the last activity.
same here.
Good idea, done.
* * Summary of data flows (with ESTABLISHED event) * ---------------------------------------------- @@ -333,7 +340,8 @@ enum {
#define RTO_INIT 1 /* s, RFC 6298 */ #define RTO_INIT_AFTER_SYN_RETRIES 3 /* s, RFC 6298 */ -#define ACT_TIMEOUT 7200 + +#define INACTIVITY_INTERVAL 7200 /* s */
#define LOW_RTT_TABLE_SIZE 8 #define LOW_RTT_THRESHOLD 10 /* us */ @@ -2254,6 +2262,8 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, return 1; }
+ conn->inactive = false; + if (th->ack && !(conn->events & ESTABLISHED)) tcp_update_seqack_from_tap(c, conn, ntohl(th->ack_seq));
@@ -2622,6 +2632,8 @@ void tcp_sock_handler(const struct ctx *c, union epoll_ref ref, return; }
+ conn->inactive = false; + if ((conn->events & TAP_FIN_ACKED) && (events & EPOLLHUP)) { conn_event(c, conn, CLOSED); return; @@ -2872,18 +2884,50 @@ int tcp_init(struct ctx *c) return 0; }
+/** + * tcp_inactivity() - Scan for and close long-inactive connections + * @: Execution context
* @c: Execution context * @now: Current timestamp
Oops, fixed.
+ */ +static void tcp_inactivity(struct ctx *c, const struct timespec *now) +{ + union flow *flow; + + if (now->tv_sec - c->tcp.inactivity_run < INACTIVITY_INTERVAL) + return; + + debug("TCP inactivity scan"); + c->tcp.inactivity_run = now->tv_sec; + + flow_foreach(flow) { + struct tcp_tap_conn *conn = &flow->tcp; + + if (flow->f.type != FLOW_TCP) + continue; + + if (conn->inactive) { + /* No activity in this interval, reset */ + flow_dbg(conn, "Inactive for at least %us, resetting", + INACTIVITY_INTERVAL); + tcp_rst(c, conn); + } + + /* Ready to check fot next interval */
for
Wow, I really wasn't typing good that day. Fixed.
+ conn->inactive = true; + } +} + /** * tcp_timer() - Periodic tasks: port detection, closed connections, pool refill * @c: Execution context * @now: Current timestamp */ -void tcp_timer(const struct ctx *c, const struct timespec *now) +void tcp_timer(struct ctx *c, const struct timespec *now) { - (void)now; - tcp_sock_refill_init(c); if (c->mode == MODE_PASTA) tcp_splice_refill(c); + + tcp_inactivity(c, now); }
/** diff --git a/tcp.h b/tcp.h index 24b90870..e104d453 100644 --- a/tcp.h +++ b/tcp.h @@ -21,7 +21,7 @@ int tcp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, int tcp_listen(const struct ctx *c, uint8_t pif, unsigned rule, const union inany_addr *addr, const char *ifname, in_port_t port); int tcp_init(struct ctx *c); -void tcp_timer(const struct ctx *c, const struct timespec *now); +void tcp_timer(struct ctx *c, const struct timespec *now); void tcp_defer_handler(struct ctx *c);
void tcp_update_l2_buf(const unsigned char *eth_d); @@ -38,6 +38,7 @@ extern bool peek_offset_cap; * @rto_max: Maximum retry timeout (in s) * @syn_retries: SYN retries using exponential backoff timeout * @syn_linear_timeouts: SYN retries before using exponential backoff timeout + * @inactivity_run: Time we last scanned for inactive connections */ struct tcp_ctx { struct fwd_ports fwd_in; @@ -47,6 +48,7 @@ struct tcp_ctx { int rto_max; uint8_t syn_retries; uint8_t syn_linear_timeouts; + time_t inactivity_run; };
#endif /* TCP_H */ diff --git a/tcp_conn.h b/tcp_conn.h index 21cea109..7197ff63 100644 --- a/tcp_conn.h +++ b/tcp_conn.h @@ -16,6 +16,7 @@ * @ws_from_tap: Window scaling factor advertised from tap/guest * @ws_to_tap: Window scaling factor advertised to tap/guest * @tap_mss: MSS advertised by tap/guest, rounded to 2 ^ TCP_MSS_BITS + * @inactive: No activity within the current INACTIVITY_INTERVAL * @sock: Socket descriptor number * @events: Connection events, implying connection states * @timer: timerfd descriptor for timeout events @@ -57,6 +58,8 @@ struct tcp_tap_conn { (conn->rtt_exp = MIN(RTT_EXP_MAX, ilog2(MAX(1, rtt / RTT_STORE_MIN)))) #define RTT_GET(conn) (RTT_STORE_MIN << conn->rtt_exp)
+ bool inactive :1; + int sock :FD_REF_BITS;
uint8_t events;
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio