On 14/11/2024 15:23, Stefano Brivio wrote:On Thu, 14 Nov 2024 11:23:11 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:In the loop "i < iov_cnt" is the number of available buffers collected previously. Usually the size of one buffer is 1536 bytes. We join the buffers here (when VIRTIO_NET_F_MRG_RXBUF is avaialble) to create frame with a size of "mss". iov_cnt is computed in tcp_vu_sock_recv(): this is the number of buffers we have collected from the queue to have enough space to store fillsize bytes. But if we don't have enough buffers in the queue ioc_cnt will be lower and the size of the data we will collect will be truncated. Thanks, LaurentOn 17/10/2024 02:10, Stefano Brivio wrote:Well, there we do: fill_bufs = DIV_ROUND_UP(wnd_scaled - already_sent, mss); if (fill_bufs > TCP_FRAMES) { fill_bufs = TCP_FRAMES; and we don't fetch more data than that from the socket (in one pass). Is this implicit in the i < iov_cnt loop condition here? That's the part I don't understand: how do we limit the amount of data we can dequeue from a socket in one single pass.We don't have this garantee. But I think it's the same for the socket version?+/** + * tcp_vu_data_from_sock() - Handle new data from socket, queue to vhost-user, + * in window + * @c: Execution context + * @conn: Connection pointer + * + * Return: Negative on connection reset, 0 otherwise + */ +int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) +{ + uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; + struct vu_dev *vdev = c->vdev; + struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; + const struct flowside *tapside = TAPFLOW(conn); + uint16_t mss = MSS_GET(conn); + size_t l2_hdrlen, fillsize; + int i, iov_cnt, iov_used; + int v4 = CONN_V4(conn); + uint32_t already_sent = 0; + const uint16_t *check; + struct iovec *first; + int frame_size; + int num_buffers; + ssize_t len; + + if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) { + flow_err(conn, + "Got packet, but RX virtqueue not usable yet"); + return 0; + } + + already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; + + if (SEQ_LT(already_sent, 0)) { + /* RFC 761, section 2.1. */ + flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", + conn->seq_ack_from_tap, conn->seq_to_tap); + conn->seq_to_tap = conn->seq_ack_from_tap; + already_sent = 0; + } + + if (!wnd_scaled || already_sent >= wnd_scaled) { + conn_flag(c, conn, STALLED); + conn_flag(c, conn, ACK_FROM_TAP_DUE); + return 0; + } + + /* Set up buffer descriptors we'll fill completely and partially. */ + + fillsize = wnd_scaled; + + if (peek_offset_cap) + already_sent = 0; + + iov_vu[0].iov_base = tcp_buf_discard; + iov_vu[0].iov_len = already_sent; + fillsize -= already_sent; + + /* collect the buffers from vhost-user and fill them with the + * data from the socket + */ + iov_cnt = tcp_vu_sock_recv(c, conn, v4, fillsize, &len); + if (iov_cnt <= 0) + return iov_cnt; + + len -= already_sent; + if (len <= 0) { + conn_flag(c, conn, STALLED); + vu_queue_rewind(vq, iov_cnt); + return 0; + } + + conn_flag(c, conn, ~STALLED); + + /* Likely, some new data was acked too. */ + tcp_update_seqack_wnd(c, conn, 0, NULL); + + /* initialize headers */ + l2_hdrlen = tcp_vu_l2_hdrlen(!v4); + iov_used = 0; + num_buffers = 0; + check = NULL; + frame_size = 0; + + /* iov_vu is an array of buffers and the buffer size can be + * smaller than the frame size we want to use but with + * num_buffer we can merge several virtio iov buffers in one packet + * we need only to set the packet headers in the first iov and + * num_buffer to the number of iov entries...this part is clear to me, what I don't understand is if we still have a way to guarantee that the sum of several buffers is big enough to fit frame_size bytes.