On Tue, Aug 27, 2024 at 12:14:20AM +0200, Stefano Brivio wrote:
On Mon, 26 Aug 2024 15:26:44 +1000 David Gibson
wrote: On Thu, Aug 15, 2024 at 05:50:22PM +0200, Laurent Vivier wrote:
Add vhost_user.c and vhost_user.h that define the functions needed to implement vhost-user backend.
[...]
+static int vu_message_read_default(int conn_fd, struct vhost_user_msg *vmsg) +{ + char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * + sizeof(int))] = { 0 }; + struct iovec iov = { + .iov_base = (char *)vmsg, + .iov_len = VHOST_USER_HDR_SIZE, + }; + struct msghdr msg = { + .msg_iov = &iov, + .msg_iovlen = 1, + .msg_control = control, + .msg_controllen = sizeof(control), + }; + ssize_t ret, sz_payload; + struct cmsghdr *cmsg; + size_t fd_size; + + ret = recvmsg(conn_fd, &msg, MSG_DONTWAIT); + if (ret < 0) { + if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) + return 0; + return -1; + } + + vmsg->fd_num = 0; + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; + cmsg = CMSG_NXTHDR(&msg, cmsg)) { + if (cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + fd_size = cmsg->cmsg_len - CMSG_LEN(0); + ASSERT(fd_size / sizeof(int) <= + VHOST_MEMORY_BASELINE_NREGIONS);
IIUC, this could be tripped by a bug in the peer (qemu?) rather than in our own code. In which case I think a die() would be more appropriate than an ASSERT().
Ah, right, it wouldn't be our issue... what about neither, so that we don't crash if QEMU has an issue we could easily recover from?
It wasn't immediately obvious to me if we could reasily recover from that or not. [snip]
+ vdev->nregions = memory->nregions; + + debug("Nregions: %u", memory->nregions); + for (i = 0; i < vdev->nregions; i++) { + struct vhost_user_memory_region *msg_region = &memory->regions[i]; + struct vu_dev_region *dev_region = &vdev->regions[i]; + void *mmap_addr; + + debug("Region %d", i); + debug(" guest_phys_addr: 0x%016"PRIx64, + msg_region->guest_phys_addr); + debug(" memory_size: 0x%016"PRIx64, + msg_region->memory_size); + debug(" userspace_addr 0x%016"PRIx64, + msg_region->userspace_addr); + debug(" mmap_offset 0x%016"PRIx64, + msg_region->mmap_offset); + + dev_region->gpa = msg_region->guest_phys_addr; + dev_region->size = msg_region->memory_size; + dev_region->qva = msg_region->userspace_addr; + dev_region->mmap_offset = msg_region->mmap_offset; + + /* We don't use offset argument of mmap() since the + * mapped address has to be page aligned, and we use huge + * pages.
We do what now?
We do madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE) in main(), but we're not using pkt_buf in this case, so I guess it's not relevant. I'm not sure if _passt_ calling madvise(..., MADV_HUGEPAGE) on the memory regions we get would have any effect, by the way.
Huh, I'd forgotten about that. AIUI qemu allocates the memory and we map it into passt, so I don't think our madvise() would have any effect here. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson