526 lines
17 KiB
ReStructuredText
526 lines
17 KiB
ReStructuredText
.. _virtio-net:
|
|
|
|
Virtio-net
|
|
##########
|
|
|
|
Virtio-net is the para-virtualization solution used in ACRN for
|
|
networking. The ACRN device model emulates virtual NICs for UOS and the
|
|
frontend virtio network driver, simulating the virtual NIC and following
|
|
the virtio specification. (Refer to :ref:`introduction` and
|
|
:ref:`virtio-hld` background introductions to ACRN and Virtio.)
|
|
|
|
Here are some notes about Virtio-net support in ACRN:
|
|
|
|
- Legacy devices are supported, modern devices are not supported
|
|
- Two virtqueues are used in virtio-net: RX queue and TX queue
|
|
- Indirect descriptor is supported
|
|
- TAP backend is supported
|
|
- Control queue is not supported
|
|
- NIC multiple queues are not supported
|
|
|
|
Network Virtualization Architecture
|
|
***********************************
|
|
|
|
ACRN's network virtualization architecture is shown below in
|
|
:numref:`net-virt-arch`, and illustrates the many necessary network
|
|
virtualization components that must cooperate for the UOS to send and
|
|
receive data from the outside world.
|
|
|
|
.. figure:: images/network-virt-arch.png
|
|
:align: center
|
|
:width: 900px
|
|
:name: net-virt-arch
|
|
|
|
Network Virtualization Architecture
|
|
|
|
(The green components are parts of the ACRN solution, while the gray
|
|
components are parts of the Linux kernel.)
|
|
|
|
Let's explore these components further.
|
|
|
|
SOS/UOS Network Stack:
|
|
This is the standard Linux TCP/IP stack, currently the most
|
|
feature-rich TCP/IP implementation.
|
|
|
|
virtio-net Frontend Driver:
|
|
This is the standard driver in the Linux Kernel for virtual Ethernet
|
|
devices. This driver matches devices with PCI vendor ID 0x1AF4 and PCI
|
|
Device ID 0x1000 (for legacy devices in our case) or 0x1041 (for modern
|
|
devices). The virtual NIC supports two virtqueues, one for transmitting
|
|
packets and the other for receiving packets. The frontend driver places
|
|
empty buffers into one virtqueue for receiving packets, and enqueues
|
|
outgoing packets into another virtqueue for transmission. The size of
|
|
each virtqueue is 1024, configurable in the virtio-net backend driver.
|
|
|
|
ACRN Hypervisor:
|
|
The ACRN hypervisor is a type 1 hypervisor, running directly on the
|
|
bare-metal hardware, and suitable for a variety of IoT and embedded
|
|
device solutions. It fetches and analyzes the guest instructions, puts
|
|
the decoded information into the shared page as an IOREQ, and notifies
|
|
or interrupts the VHM module in the SOS for processing.
|
|
|
|
VHM Module:
|
|
The Virtio and Hypervisor Service Module (VHM) is a kernel module in the
|
|
Service OS (SOS) acting as a middle layer to support the device model
|
|
and hypervisor. The VHM forwards a IOREQ to the virtio-net backend
|
|
driver for processing.
|
|
|
|
ACRN Device Model and virtio-net Backend Driver:
|
|
The ACRN Device Model (DM) gets an IOREQ from a shared page and calls
|
|
the virtio-net backend driver to process the request. The backend driver
|
|
receives the data in a shared virtqueue and sends it to the TAP device.
|
|
|
|
Bridge and Tap Device:
|
|
Bridge and Tap are standard virtual network infrastructures. They play
|
|
an important role in communication among the SOS, the UOS, and the
|
|
outside world.
|
|
|
|
IGB Driver:
|
|
IGB is the physical Network Interface Card (NIC) Linux kernel driver
|
|
responsible for sending data to and receiving data from the physical
|
|
NIC.
|
|
|
|
The virtual network card (NIC) is implemented as a virtio legacy device
|
|
in the ACRN device model (DM). It is registered as a PCI virtio device
|
|
to the guest OS (UOS) and uses the standard virtio-net in the Linux kernel as
|
|
its driver (the guest kernel should be built with
|
|
``CONFIG_VIRTIO_NET=y``).
|
|
|
|
The virtio-net backend in DM forwards the data received from the
|
|
frontend to the TAP device, then from the TAP device to the bridge, and
|
|
finally from the bridge to the physical NIC driver, and vice versa for
|
|
returning data from the NIC to the frontend.
|
|
|
|
ACRN Virtio-Network Calling Stack
|
|
*********************************
|
|
|
|
Various components of ACRN network virtualization are shown in the
|
|
architecture diagram shows in :numref:`net-virt-arch`. In this section,
|
|
we will use UOS data transmission (TX) and reception (RX) examples to
|
|
explain step-by-step how these components work together to implement
|
|
ACRN network virtualization.
|
|
|
|
Initialization in Device Model
|
|
==============================
|
|
|
|
**virtio_net_init**
|
|
|
|
- Present frontend for a virtual PCI based NIC
|
|
- Setup control plan callbacks
|
|
- Setup data plan callbacks, including TX, RX
|
|
- Setup tap backend
|
|
|
|
Initialization in virtio-net Frontend Driver
|
|
============================================
|
|
|
|
**virtio_pci_probe**
|
|
|
|
- Construct virtio device using virtual pci device and register it to
|
|
virtio bus
|
|
|
|
**virtio_dev_probe --> virtnet_probe --> init_vqs**
|
|
|
|
- Register network driver
|
|
- Setup shared virtqueues
|
|
|
|
ACRN UOS TX FLOW
|
|
================
|
|
|
|
The following shows the ACRN UOS network TX flow, using TCP as an
|
|
example, showing the flow through each layer:
|
|
|
|
**UOS TCP Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
tcp_sendmsg -->
|
|
tcp_sendmsg_locked -->
|
|
tcp_push_one -->
|
|
tcp_write_xmit -->
|
|
tcp_transmit_skb -->
|
|
|
|
**UOS IP Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
ip_queue_xmit -->
|
|
ip_local_out -->
|
|
__ip_local_out -->
|
|
dst_output -->
|
|
ip_output -->
|
|
ip_finish_output -->
|
|
ip_finish_output2 -->
|
|
neigh_output -->
|
|
neigh_resolve_output -->
|
|
|
|
**UOS MAC Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
dev_queue_xmit -->
|
|
__dev_queue_xmit -->
|
|
dev_hard_start_xmit -->
|
|
xmit_one -->
|
|
netdev_start_xmit -->
|
|
__netdev_start_xmit -->
|
|
|
|
|
|
**UOS MAC Layer virtio-net Frontend Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
start_xmit --> // virtual NIC driver xmit in virtio_net
|
|
xmit_skb -->
|
|
virtqueue_add_outbuf --> // add out buffer to shared virtqueue
|
|
virtqueue_add -->
|
|
|
|
virtqueue_kick --> // notify the backend
|
|
virtqueue_notify -->
|
|
vp_notify -->
|
|
iowrite16 --> // trap here, HV will first get notified
|
|
|
|
**ACRN Hypervisor**
|
|
|
|
.. code-block:: c
|
|
|
|
vmexit_handler --> // vmexit because VMX_EXIT_REASON_IO_INSTRUCTION
|
|
pio_instr_vmexit_handler -->
|
|
emulate_io --> // ioreq cant be processed in HV, forward it to VHM
|
|
acrn_insert_request_wait -->
|
|
fire_vhm_interrupt --> // interrupt SOS, VHM will get notified
|
|
|
|
**VHM Module**
|
|
|
|
.. code-block:: c
|
|
|
|
vhm_intr_handler --> // VHM interrupt handler
|
|
tasklet_schedule -->
|
|
io_req_tasklet -->
|
|
acrn_ioreq_distribute_request --> // ioreq can't be processed in VHM, forward it to device DM
|
|
acrn_ioreq_notify_client -->
|
|
wake_up_interruptible --> // wake up DM to handle ioreq
|
|
|
|
**ACRN Device Model / virtio-net Backend Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
handle_vmexit -->
|
|
vmexit_inout -->
|
|
emulate_inout -->
|
|
pci_emul_io_handler -->
|
|
virtio_pci_write -->
|
|
virtio_pci_legacy_write -->
|
|
virtio_net_ping_txq --> // start TX thread to process, notify thread return
|
|
virtio_net_tx_thread --> // this is TX thread
|
|
virtio_net_proctx --> // call corresponding backend (tap) to process
|
|
virtio_net_tap_tx -->
|
|
writev --> // write data to tap device
|
|
|
|
**SOS TAP Device Forwarding**
|
|
|
|
.. code-block:: c
|
|
|
|
do_writev -->
|
|
vfs_writev -->
|
|
do_iter_write -->
|
|
do_iter_readv_writev -->
|
|
call_write_iter -->
|
|
tun_chr_write_iter -->
|
|
tun_get_user -->
|
|
netif_receive_skb -->
|
|
netif_receive_skb_internal -->
|
|
__netif_receive_skb -->
|
|
__netif_receive_skb_core -->
|
|
|
|
|
|
**SOS Bridge Forwarding**
|
|
|
|
.. code-block:: c
|
|
|
|
br_handle_frame -->
|
|
br_handle_frame_finish -->
|
|
br_forward -->
|
|
__br_forward -->
|
|
br_forward_finish -->
|
|
br_dev_queue_push_xmit -->
|
|
|
|
**SOS MAC Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
dev_queue_xmit -->
|
|
__dev_queue_xmit -->
|
|
dev_hard_start_xmit -->
|
|
xmit_one -->
|
|
netdev_start_xmit -->
|
|
__netdev_start_xmit -->
|
|
|
|
|
|
**SOS MAC Layer IGB Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
igb_xmit_frame --> // IGB physical NIC driver xmit function
|
|
|
|
ACRN UOS RX FLOW
|
|
================
|
|
|
|
The following shows the ACRN UOS network RX flow, using TCP as an example.
|
|
Let's start by receiving a device interrupt. (Note that the hypervisor
|
|
will first get notified when receiving an interrupt even in passthrough
|
|
cases.)
|
|
|
|
**Hypervisor Interrupt Dispatch**
|
|
|
|
.. code-block:: c
|
|
|
|
vmexit_handler --> // vmexit because VMX_EXIT_REASON_EXTERNAL_INTERRUPT
|
|
external_interrupt_vmexit_handler -->
|
|
dispatch_interrupt -->
|
|
common_handler_edge -->
|
|
ptdev_interrupt_handler -->
|
|
ptdev_enqueue_softirq --> // Interrupt will be delivered in bottom-half softirq
|
|
|
|
|
|
**Hypervisor Interrupt Injection**
|
|
|
|
.. code-block:: c
|
|
|
|
do_softirq -->
|
|
ptdev_softirq -->
|
|
vlapic_intr_msi --> // insert the interrupt into SOS
|
|
|
|
start_vcpu --> // VM Entry here, will process the pending interrupts
|
|
|
|
**SOS MAC Layer IGB Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
do_IRQ -->
|
|
...
|
|
igb_msix_ring -->
|
|
igbpoll -->
|
|
napi_gro_receive -->
|
|
napi_skb_finish -->
|
|
netif_receive_skb_internal -->
|
|
__netif_receive_skb -->
|
|
__netif_receive_skb_core --
|
|
|
|
**SOS Bridge Forwarding**
|
|
|
|
.. code-block:: c
|
|
|
|
br_handle_frame -->
|
|
br_handle_frame_finish -->
|
|
br_forward -->
|
|
__br_forward -->
|
|
br_forward_finish -->
|
|
br_dev_queue_push_xmit -->
|
|
|
|
**SOS MAC Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
dev_queue_xmit -->
|
|
__dev_queue_xmit -->
|
|
dev_hard_start_xmit -->
|
|
xmit_one -->
|
|
netdev_start_xmit -->
|
|
__netdev_start_xmit -->
|
|
|
|
**SOS MAC Layer TAP Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
tun_net_xmit --> // Notify and wake up reader process
|
|
|
|
**ACRN Device Model / virtio-net Backend Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
virtio_net_rx_callback --> // the tap fd get notified and this function invoked
|
|
virtio_net_tap_rx --> // read data from tap, prepare virtqueue, insert interrupt into the UOS
|
|
vq_endchains -->
|
|
vq_interrupt -->
|
|
pci_generate_msi -->
|
|
|
|
**VHM Module**
|
|
|
|
.. code-block:: c
|
|
|
|
vhm_dev_ioctl --> // process the IOCTL and call hypercall to inject interrupt
|
|
hcall_inject_msi -->
|
|
|
|
**ACRN Hypervisor**
|
|
|
|
.. code-block:: c
|
|
|
|
vmexit_handler --> // vmexit because VMX_EXIT_REASON_VMCALL
|
|
vmcall_vmexit_handler -->
|
|
hcall_inject_msi --> // insert interrupt into UOS
|
|
vlapic_intr_msi -->
|
|
|
|
**UOS MAC Layer virtio_net Frontend Driver**
|
|
|
|
.. code-block:: c
|
|
|
|
vring_interrupt --> // virtio-net frontend driver interrupt handler
|
|
skb_recv_done --> //registed by virtnet_probe-->init_vqs-->virtnet_find_vqs
|
|
virtqueue_napi_schedule -->
|
|
__napi_schedule -->
|
|
virtnet_poll -->
|
|
virtnet_receive -->
|
|
receive_buf -->
|
|
|
|
**UOS MAC Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
napi_gro_receive -->
|
|
napi_skb_finish -->
|
|
netif_receive_skb_internal -->
|
|
__netif_receive_skb -->
|
|
__netif_receive_skb_core -->
|
|
|
|
**UOS IP Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
ip_rcv -->
|
|
ip_rcv_finish -->
|
|
dst_input -->
|
|
ip_local_deliver -->
|
|
ip_local_deliver_finish -->
|
|
|
|
|
|
**UOS TCP Layer**
|
|
|
|
.. code-block:: c
|
|
|
|
tcp_v4_rcv -->
|
|
tcp_v4_do_rcv -->
|
|
tcp_rcv_established -->
|
|
tcp_data_queue -->
|
|
tcp_queue_rcv -->
|
|
__skb_queue_tail -->
|
|
|
|
sk->sk_data_ready --> // application will get notified
|
|
|
|
How to Use
|
|
==========
|
|
|
|
The network infrastructure shown in :numref:`net-virt-infra` needs to be
|
|
prepared in the SOS before we start. We need to create a bridge and at
|
|
least one tap device (two tap devices are needed to create a dual
|
|
virtual NIC) and attach a physical NIC and tap device to the bridge.
|
|
|
|
.. figure:: images/network-virt-sos-infrastruct.png
|
|
:align: center
|
|
:width: 900px
|
|
:name: net-virt-infra
|
|
|
|
Network Infrastructure in SOS
|
|
|
|
You can use Linux commands (e.g. ip, brctl) to create this network. In
|
|
our case, we use systemd to automatically create the network by default.
|
|
You can check the files with prefix 50- in the SOS
|
|
``/usr/lib/systemd/network/``:
|
|
|
|
- `50-acrn.netdev <https://raw.githubusercontent.com/projectacrn/acrn-hypervisor/master/tools/acrnbridge/acrn.netdev>`__
|
|
- `50-acrn.network <https://raw.githubusercontent.com/projectacrn/acrn-hypervisor/master/tools/acrnbridge/acrn.network>`__
|
|
- `50-acrn_tap0.netdev <https://raw.githubusercontent.com/projectacrn/acrn-hypervisor/master/tools/acrnbridge/acrn_tap0.netdev>`__
|
|
- `50-eth.network <https://raw.githubusercontent.com/projectacrn/acrn-hypervisor/master/tools/acrnbridge/eth.network>`__
|
|
|
|
When the SOS is started, run ``ifconfig`` to show the devices created by
|
|
this systemd configuration:
|
|
|
|
.. code-block:: none
|
|
|
|
acrn-br0 Link encap:Ethernet HWaddr B2:50:41:FE:F7:A3
|
|
inet addr:10.239.154.43 Bcast:10.239.154.255 Mask:255.255.255.0
|
|
inet6 addr: fe80::b050:41ff:fefe:f7a3/64 Scope:Link
|
|
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
|
|
RX packets:226932 errors:0 dropped:21383 overruns:0 frame:0
|
|
TX packets:14816 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:100457754 (95.8 Mb) TX bytes:83481244 (79.6 Mb)
|
|
|
|
acrn_tap0 Link encap:Ethernet HWaddr F6:A7:7E:52:50:C6
|
|
UP BROADCAST MULTICAST MTU:1500 Metric:1
|
|
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
|
|
|
|
enp3s0 Link encap:Ethernet HWaddr 98:4F:EE:14:5B:74
|
|
inet6 addr: fe80::9a4f:eeff:fe14:5b74/64 Scope:Link
|
|
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
|
|
RX packets:279174 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:69923 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:107312294 (102.3 Mb) TX bytes:87117507 (83.0 Mb)
|
|
Memory:82200000-8227ffff
|
|
|
|
lo Link encap:Local Loopback
|
|
inet addr:127.0.0.1 Mask:255.0.0.0
|
|
inet6 addr: ::1/128 Scope:Host
|
|
UP LOOPBACK RUNNING MTU:65536 Metric:1
|
|
RX packets:16 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:1216 (1.1 Kb) TX bytes:1216 (1.1 Kb)
|
|
|
|
Run ``brctl show`` to see the bridge ``acrn-br0`` and attached devices:
|
|
|
|
.. code-block:: none
|
|
|
|
bridge name bridge id STP enabled interfaces
|
|
|
|
acrn-br0 8000.b25041fef7a3 no acrn_tap0
|
|
enp3s0
|
|
|
|
Add a pci slot to the device model acrn-dm command line (mac address is
|
|
optional):
|
|
|
|
.. code-block:: none
|
|
|
|
-s 4,virtio-net,<tap_name>,[mac=<XX:XX:XX:XX:XX:XX>]
|
|
|
|
When the UOS is launched, run ``ifconfig`` to check the network. enp0s4r
|
|
is the virtual NIC created by acrn-dm:
|
|
|
|
.. code-block:: none
|
|
|
|
enp0s4 Link encap:Ethernet HWaddr 00:16:3E:39:0F:CD
|
|
inet addr:10.239.154.186 Bcast:10.239.154.255 Mask:255.255.255.0
|
|
inet6 addr: fe80::216:3eff:fe39:fcd/64 Scope:Link
|
|
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
|
|
RX packets:140 errors:0 dropped:8 overruns:0 frame:0
|
|
TX packets:46 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:110727 (108.1 Kb) TX bytes:4474 (4.3 Kb)
|
|
|
|
lo Link encap:Local Loopback
|
|
inet addr:127.0.0.1 Mask:255.0.0.0
|
|
inet6 addr: ::1/128 Scope:Host
|
|
UP LOOPBACK RUNNING MTU:65536 Metric:1
|
|
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
|
|
collisions:0 txqueuelen:1000
|
|
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
|
|
|
|
Performance Estimation
|
|
======================
|
|
|
|
We've introduced the network virtualization solution in ACRN, from the
|
|
top level architecture to the detailed TX and RX flow. Currently, the
|
|
control plane and data plane are all processed in ACRN device model,
|
|
which may bring some overhead. But this is not a bottleneck for 1000Mbit
|
|
NICs or below. Network bandwidth for virtualization can be very close to
|
|
the native bandwidth. For high speed NIC (e.g. 10Gb or above), it is
|
|
necessary to separate the data plane from the control plane. We can use
|
|
vhost for acceleration. For most IoT scenarios, processing in user space
|
|
is simple and reasonable.
|
|
|
|
|