Re: [PATCH v14 7/9] vhost-user: add vhost-user

27 Nov 2024

On Wed, 27 Nov 2024 10:48:41 +0100
Laurent Vivier  wrote:
...
On 27/11/2024 10:45, Stefano Brivio wrote:
...
On Wed, 27 Nov 2024 10:09:53 +0100
Laurent Vivier  wrote:
...
On 27/11/2024 05:47, Stefano Brivio wrote:
...
On Fri, 22 Nov 2024 17:43:34 +0100
Laurent Vivier  wrote:
...
+/**
+ * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload)
+ * @c:		Execution context
+ * @conn:	Connection pointer
+ * @flags:	TCP flags: if not set, send segment only if ACK is due
+ *
+ * Return: negative error code on connection reset, 0 otherwise
+ */
+int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags)
+{
+	struct vu_dev *vdev = c->vdev;
+	struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE];
+	const struct flowside *tapside = TAPFLOW(conn);
+	size_t l2len, l4len, optlen, hdrlen;
+	struct vu_virtq_element flags_elem[2];
+	struct tcp_payload_t *payload;
+	struct ipv6hdr *ip6h = NULL;
+	struct iovec flags_iov[2];
+	struct iphdr *iph = NULL;
+	struct ethhdr *eh;
+	uint32_t seq;
+	int elem_cnt;
+	int nb_ack;
+	int ret;
+
+	hdrlen = tcp_vu_hdrlen(CONN_V6(conn));
+
+	vu_set_element(&flags_elem[0], NULL, &flags_iov[0]);
+
+	elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1,
+			      hdrlen + sizeof(struct tcp_syn_opts), NULL);
Oops, I made this crash, by starting a number of iperf3 client threads
on the host:
$ iperf3 -c localhost -p 6001 -Z -l 500 -w 256M -t 600 -P20
with matching server in the guest, then terminating QEMU while the test
is running.
Details (I saw it first, then I reproduced it under gdb):
accepted connection from PID 3115463
NDP: received RS, sending RA
DHCP: offer to discover
      from 52:54:00:12:34:56
DHCP: ack to request
      from 52:54:00:12:34:56
NDP: sending unsolicited RA, next in 212s
Client connection closed
Program received signal SIGSEGV, Segmentation fault.
0x00005555555884f5 in vring_avail_idx (vq=0x555559343f10 ) at virtio.c:138
138		vq->shadow_avail_idx = le16toh(vq->vring.avail->idx);
(gdb) list
133	 *
134	 * Return: the available ring index of the given virtqueue
135	 */
136	static inline uint16_t vring_avail_idx(struct vu_virtq *vq)
137	{
138		vq->shadow_avail_idx = le16toh(vq->vring.avail->idx);
139	
140		return vq->shadow_avail_idx;
141	}
142	
(gdb) bt
#0  0x00005555555884f5 in vring_avail_idx (vq=0x555559343f10 ) at virtio.c:138
#1  vu_queue_empty (vq=vq@entry=0x555559343f10 ) at virtio.c:290
#2  vu_queue_pop (dev=dev@entry=0x555559343a00 , vq=vq@entry=0x555559343f10 , elem=elem@entry=0x7ffffff6f510) at virtio.c:505
#3  0x0000555555588c8c in vu_collect (vdev=vdev@entry=0x555559343a00 , vq=vq@entry=0x555559343f10 , elem=elem@entry=0x7ffffff6f510, max_elem=max_elem@entry=1,
      size=size@entry=74, frame_size=frame_size@entry=0x0) at vu_common.c:86
#4  0x000055555557e00e in tcp_vu_send_flag (c=0x7ffffff6f7a0, conn=0x5555555bd2d0 , flags=4) at tcp_vu.c:116
#5  0x0000555555578125 in tcp_send_flag (flags=4, conn=0x5555555bd2d0 , c=0x7ffffff6f7a0) at tcp.c:1278
#6  tcp_rst_do (conn=<optimized out>, c=<optimized out>) at tcp.c:1293
#7  tcp_timer_handler (c=c@entry=0x7ffffff6f7a0, ref=..., ref@entry=...) at tcp.c:2266
#8  0x0000555555558f26 in main (argc=<optimized out>, argv=<optimized out>) at passt.c:342
(gdb) p *vq
$1 = {vring = {num = 256, desc = 0x0, avail = 0x0, used = 0x0, log_guest_addr = 4338774592, flags = 0}, last_avail_idx = 35133, shadow_avail_idx = 35133, used_idx = 35133, signalled_used = 0,
    signalled_used_valid = false, notification = true, inuse = 0, call_fd = -1, kick_fd = -1, err_fd = -1, enable = 1, started = false, vra = {index = 0, flags = 0, desc_user_addr = 139660501995520,
      used_user_addr = 139660502000192, avail_user_addr = 139660501999616, log_guest_addr = 4338774592}}
(gdb) p *vq->vring.avail
Cannot access memory at address 0x0
...so we're sending a RST segment to the guest, but the ring doesn't
exist anymore.
By the way, I still have the gdb session running, if you need something
else out of it.
Now, I guess we should eventually introduce a more comprehensive
handling of the case where the guest suddenly terminates (not specific
to vhost-user), but given that we have vu_cleanup() working as expected
in this case, I wonder if we shouldn't simply avoid calling
vring_avail_idx() (it has a single caller) by checking for !vring.avail
in the caller, or something like that.
Yes, I think it's the lines I removed during the reviews:
if (!vq->vring.avail)
                 return true;
Ah, right:
https://archives.passt.top/passt-dev/20241114163859.7eeafa38@elisabeth/
...so, at least in our case, it's more than "sanity checks" after all.
:) Well, I guess it depends on the definition.
...
Could you try to checkout virtio.c from v11?
That would take a rather lengthy rebase, but I tried to reintroduce all
the checks you had:
--

diff --git a/virtio.c b/virtio.c
index 6a97435..0598ff4 100644
--- a/virtio.c
+++ b/virtio.c
@@ -284,6 +284,9 @@ static int virtqueue_read_next_desc(const struct vring_desc *desc,
   */
  bool vu_queue_empty(struct vu_virtq *vq)
  {
+	if (!vq->vring.avail)
+		return true;
+
  if (vq->shadow_avail_idx != vq->last_avail_idx)
      return false;
@@ -327,6 +330,9 @@ static bool vring_can_notify(const struct vu_dev *dev, struct vu_virtq *vq)
   */
  void vu_queue_notify(const struct vu_dev *dev, struct vu_virtq *vq)
  {
+	if (!vq->vring.avail)
+		return;
+
  if (!vring_can_notify(dev, vq)) {
      debug("vhost-user: virtqueue can skip notify...");
      return;
@@ -502,6 +508,9 @@ int vu_queue_pop(struct vu_dev *dev, struct vu_virtq *vq, struct vu_virtq_elemen
  unsigned int head;
  int ret;
+	if (!vq->vring.avail)
+		return -1;
+
  if (vu_queue_empty(vq))
      return -1;
@@ -591,6 +600,9 @@ void vu_queue_fill_by_index(struct vu_virtq *vq, unsigned int index,
  {
  struct vring_used_elem uelem;
+	if (!vq->vring.avail)
+		return;
+
  idx = (idx + vq->used_idx) % vq->vring.num;
uelem.id = htole32(index);
@@ -633,6 +645,9 @@ void vu_queue_flush(struct vu_virtq *vq, unsigned int count)
  {
  uint16_t old, new;
+	if (!vq->vring.avail)
+		return;
+
  /* Make sure buffer is written before we update index. */
  smp_wmb();
--
and it's all fine with those, I tried doing a few nasty things and
didn't observe any issue.
Any check I missed? Do you want to submit it as follow-up patch? I can
also do that. I'd rather (still) avoid a re-post of v14 if possible.
As you prefer. Let me know.
It would save me some time if you could... it should be based on v14 as
it is.

I didn't have time to take care of gcc warnings on 32-bit and of the
build failure on musl, yet.

-- 
Stefano