On 19/12/2024 20:47, Stefano Brivio wrote:On Thu, 19 Dec 2024 12:13:59 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:The hypervisor is not aware of anything. It's only a bridge between the two passt instances. But normally the destination side can report an error, this aborts the migration on both sides. In QEMU, the code is: while (true) { ssize_t read_ret; /* read the data from the backend */ read_ret = RETRY_ON_EINTR(read(read_fd, transfer_buf, chunk_size)); if (read_ret < 0) { ret = -errno; error_setg_errno(errp, -ret, "Failed to receive state"); goto fail; } assert(read_ret <= chunk_size); qemu_put_be32(f, read_ret); if (read_ret == 0) { /* EOF */ break; } /* send to destination QEMU */ qemu_put_buffer(f, transfer_buf, read_ret); } /* * Back-end will not really care, but be clean and close our end of the pipe * before inquiring the back-end about whether transfer was successful */ close(read_fd); On the other side: while (true) { size_t this_chunk_size = qemu_get_be32(f); ssize_t write_ret; const uint8_t *transfer_pointer; if (this_chunk_size == 0) { /* End of state */ break; } if (transfer_buf_size < this_chunk_size) { transfer_buf = g_realloc(transfer_buf, this_chunk_size); transfer_buf_size = this_chunk_size; } if (qemu_get_buffer(f, transfer_buf, this_chunk_size) < this_chunk_size) { error_setg(errp, "Failed to read state"); ret = -EINVAL; goto fail; } transfer_pointer = transfer_buf; while (this_chunk_size > 0) { write_ret = RETRY_ON_EINTR( write(write_fd, transfer_pointer, this_chunk_size) ); if (write_ret < 0) { ret = -errno; error_setg_errno(errp, -ret, "Failed to send state"); goto fail; } else if (write_ret == 0) { error_setg(errp, "Failed to send state: Connection is closed"); ret = -ECONNRESET; goto fail; } assert(write_ret <= this_chunk_size); this_chunk_size -= write_ret; transfer_pointer += write_ret; } } /* * Close our end, thus ending transfer, before inquiring the back-end about * whether transfer was successful */ close(write_fd); Moreover, I think it's important to know, the source side is stopped at the end of migration but it is in a state it can be restarted.+/** + * vu_migrate() -- Send/receive passt insternal state to/from QEMUMagic!+ * @vdev: vhost-user device + * @events: epoll events + */ +void vu_migrate(struct vu_dev *vdev, uint32_t events) +{ + int ret; + + /* TODO: collect/set passt internal state + * and use vdev->device_state_fd to send/receive it + */ + debug("vu_migrate fd %d events %x", vdev->device_state_fd, events); + if (events & EPOLLOUT) {I haven't really reviewed the series yet, but I have a preliminary question: how does the hypervisor tell us that we're writing too much?I guess we'll do a short write and we'll need to go back to EPOLLOUT? There's no minimum chunk size we can write, correct?Correct. It works even if there is no write() in this code. The hypervisor reads until we close the file descriptor (it's a pipe in fact).We must read until we get a close().+ debug("Saving backend state"); + + /* send some stuff */ + ret = write(vdev->device_state_fd, "PASST", 6); + /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ + vdev->device_state_result = ret == -1 ? -1 : 0; + /* Closing the file descriptor signals the end of transfer */ + epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL, + vdev->device_state_fd, NULL); + close(vdev->device_state_fd); + vdev->device_state_fd = -1; + } else if (events & EPOLLIN) {...and similarly here, I guess we'll get a short read?Thanks, Laurent+ char buf[6]; + + debug("Loading backend state"); + /* read some stuff */ + ret = read(vdev->device_state_fd, buf, sizeof(buf)); + /* value to be returned by VHOST_USER_CHECK_DEVICE_STATE */ + if (ret != sizeof(buf)) { + vdev->device_state_result = -1; + } else { + ret = strncmp(buf, "PASST", sizeof(buf)); + vdev->device_state_result = ret == 0 ? 0 : -1; + } + } else if (events & EPOLLHUP) { + debug("Closing migration channel"); + + /* The end of file signals the end of the transfer. */ + epoll_ctl(vdev->context->epollfd, EPOLL_CTL_DEL, + vdev->device_state_fd, NULL); + close(vdev->device_state_fd); + vdev->device_state_fd = -1; + } +}