On Tue, Aug 27, 2024 at 12:14:20AM +0200, Stefano Brivio wrote:On Mon, 26 Aug 2024 15:26:44 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:It wasn't immediately obvious to me if we could reasily recover from that or not. [snip]On Thu, Aug 15, 2024 at 05:50:22PM +0200, Laurent Vivier wrote:Ah, right, it wouldn't be our issue... what about neither, so that we don't crash if QEMU has an issue we could easily recover from?Add vhost_user.c and vhost_user.h that define the functions needed to implement vhost-user backend. [...] +static int vu_message_read_default(int conn_fd, struct vhost_user_msg *vmsg) +{ + char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * + sizeof(int))] = { 0 }; + struct iovec iov = { + .iov_base = (char *)vmsg, + .iov_len = VHOST_USER_HDR_SIZE, + }; + struct msghdr msg = { + .msg_iov = &iov, + .msg_iovlen = 1, + .msg_control = control, + .msg_controllen = sizeof(control), + }; + ssize_t ret, sz_payload; + struct cmsghdr *cmsg; + size_t fd_size; + + ret = recvmsg(conn_fd, &msg, MSG_DONTWAIT); + if (ret < 0) { + if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) + return 0; + return -1; + } + + vmsg->fd_num = 0; + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; + cmsg = CMSG_NXTHDR(&msg, cmsg)) { + if (cmsg->cmsg_level == SOL_SOCKET && + cmsg->cmsg_type == SCM_RIGHTS) { + fd_size = cmsg->cmsg_len - CMSG_LEN(0); + ASSERT(fd_size / sizeof(int) <= + VHOST_MEMORY_BASELINE_NREGIONS);IIUC, this could be tripped by a bug in the peer (qemu?) rather than in our own code. In which case I think a die() would be more appropriate than an ASSERT().Huh, I'd forgotten about that. AIUI qemu allocates the memory and we map it into passt, so I don't think our madvise() would have any effect here. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonWe do madvise(pkt_buf, TAP_BUF_BYTES, MADV_HUGEPAGE) in main(), but we're not using pkt_buf in this case, so I guess it's not relevant. I'm not sure if _passt_ calling madvise(..., MADV_HUGEPAGE) on the memory regions we get would have any effect, by the way.+ vdev->nregions = memory->nregions; + + debug("Nregions: %u", memory->nregions); + for (i = 0; i < vdev->nregions; i++) { + struct vhost_user_memory_region *msg_region = &memory->regions[i]; + struct vu_dev_region *dev_region = &vdev->regions[i]; + void *mmap_addr; + + debug("Region %d", i); + debug(" guest_phys_addr: 0x%016"PRIx64, + msg_region->guest_phys_addr); + debug(" memory_size: 0x%016"PRIx64, + msg_region->memory_size); + debug(" userspace_addr 0x%016"PRIx64, + msg_region->userspace_addr); + debug(" mmap_offset 0x%016"PRIx64, + msg_region->mmap_offset); + + dev_region->gpa = msg_region->guest_phys_addr; + dev_region->size = msg_region->memory_size; + dev_region->qva = msg_region->userspace_addr; + dev_region->mmap_offset = msg_region->mmap_offset; + + /* We don't use offset argument of mmap() since the + * mapped address has to be page aligned, and we use huge + * pages.We do what now?