Re: [PATCH v8 7/8] vhost-user: add vhost-user

15 Nov 2024

      On 14/11/2024 15:23, Stefano Brivio wrote:
...
On Thu, 14 Nov 2024 11:23:11 +0100
Laurent Vivier  wrote:
...
On 17/10/2024 02:10, Stefano Brivio wrote:
...
...
+/**
+ * tcp_vu_data_from_sock() - Handle new data from socket, queue to vhost-user,
+ *			     in window
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ *
+ * Return: Negative on connection reset, 0 otherwise
+ */
+int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
+{
+	uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap;
+	struct vu_dev *vdev = c->vdev;
+	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
+	const struct flowside *tapside = TAPFLOW(conn);
+	uint16_t mss = MSS_GET(conn);
+	size_t l2_hdrlen, fillsize;
+	int i, iov_cnt, iov_used;
+	int v4 = CONN_V4(conn);
+	uint32_t already_sent = 0;
+	const uint16_t *check;
+	struct iovec *first;
+	int frame_size;
+	int num_buffers;
+	ssize_t len;
+
+	if (!vu_queue_enabled(vq) || !vu_queue_started(vq)) {
+		flow_err(conn,
+			 "Got packet, but RX virtqueue not usable yet");
+		return 0;
+	}
+
+	already_sent = conn->seq_to_tap - conn->seq_ack_from_tap;
+
+	if (SEQ_LT(already_sent, 0)) {
+		/* RFC 761, section 2.1. */
+		flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u",
+			   conn->seq_ack_from_tap, conn->seq_to_tap);
+		conn->seq_to_tap = conn->seq_ack_from_tap;
+		already_sent = 0;
+	}
+
+	if (!wnd_scaled || already_sent >= wnd_scaled) {
+		conn_flag(c, conn, STALLED);
+		conn_flag(c, conn, ACK_FROM_TAP_DUE);
+		return 0;
+	}
+
+	/* Set up buffer descriptors we'll fill completely and partially. */
+
+	fillsize = wnd_scaled;
+
+	if (peek_offset_cap)
+		already_sent = 0;
+
+	iov_vu[0].iov_base = tcp_buf_discard;
+	iov_vu[0].iov_len = already_sent;
+	fillsize -= already_sent;
+
+	/* collect the buffers from vhost-user and fill them with the
+	 * data from the socket
+	 */
+	iov_cnt = tcp_vu_sock_recv(c, conn, v4, fillsize, &len);
+	if (iov_cnt <= 0)
+		return iov_cnt;
+
+	len -= already_sent;
+	if (len <= 0) {
+		conn_flag(c, conn, STALLED);
+		vu_queue_rewind(vq, iov_cnt);
+		return 0;
+	}
+
+	conn_flag(c, conn, ~STALLED);
+
+	/* Likely, some new data was acked too. */
+	tcp_update_seqack_wnd(c, conn, 0, NULL);
+
+	/* initialize headers */
+	l2_hdrlen = tcp_vu_l2_hdrlen(!v4);
+	iov_used = 0;
+	num_buffers = 0;
+	check = NULL;
+	frame_size = 0;
+
+	/* iov_vu is an array of buffers and the buffer size can be
+	 * smaller than the frame size we want to use but with
+	 * num_buffer we can merge several virtio iov buffers in one packet
+	 * we need only to set the packet headers in the first iov and
+	 * num_buffer to the number of iov entries
...this part is clear to me, what I don't understand is if we still
have a way to guarantee that the sum of several buffers is big enough
to fit frame_size bytes.
We don't have this garantee. But I think it's the same for the socket version?
Well, there we do:
fill_bufs = DIV_ROUND_UP(wnd_scaled - already_sent, mss);
  if (fill_bufs > TCP_FRAMES) {
      fill_bufs = TCP_FRAMES;
and we don't fetch more data than that from the socket (in one pass).
Is this implicit in the i < iov_cnt loop condition here? That's the part
I don't understand: how do we limit the amount of data we can dequeue
from a socket in one single pass.
In the loop "i < iov_cnt" is the number of available buffers collected previously. Usually 
the size of one buffer is 1536 bytes. We join the buffers here (when 
VIRTIO_NET_F_MRG_RXBUF is avaialble) to create frame with a size of "mss".

iov_cnt is computed in tcp_vu_sock_recv(): this is the number of buffers we have collected 
from the queue to have enough space to store fillsize bytes. But if we don't have enough 
buffers in the queue ioc_cnt will be lower and the size of the data we will collect will 
be truncated.

Thanks,
Laurent