Some DM's virtual timer devices use CLOCK_REALTIME as either clock
counter source or period timer source. Including:
- virtual RTC
- virtual PIT
- virtual HPET
According to Linux Manual, CLOCK_REALTIME is the 'wall clock' which is
affected by discontinuous jumps in the system time.
The issue is that service VM system time could be changed, either by
root user manually or by NTP automatically calibration.
When that happens, DM's virtual timer devices which relays on
CLOCK_REALTIME will experience discontinuous time jump, and become
inaccurate. It would affect both time stamp read value and period timer.
Especially when service VM system time is moved backwards, WaaG's system
software will lost response and be stalled for quite a long time.
To solve this issue, we need to switch CLOCK_REALTIME to
CLOCK_MONOTONIC. As it represents:
'A nonsettable monotonically increasing clock that measures time from
some unspecified point in the past that does not change after system
startup'
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
In the triple fault handler, post-launched VMs are instantly turned
off. Now a vm event is generated simultaneously. So that
developers can capture the event and decide what to do with it. (e.g.,
logging and populating diagnostics, or poweroff VM)
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When the virtual PM port is written, we can infer that guest has just
initiated a poweroff action. So we send a poweroff event upon this port
write. The DM event handler will try to emit it (to Libvirt).
Developers can write app/script to decide what to do with this event.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch adds support for HV vrtc vm_event.
RTC change event is sent upon each date/time reg write. Those events
will be handled in DM. DM will try to emit an RTC change event(to
Libvirt) based on its strategy. Only support post-launched VMs.
The DM event handler has already implemented the rtc chanage event.
Those events will be processed the same way as vrtc events from DM vrtc.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When a guest OS performs an RTC change action, we wish this event be
captured by developers, and then they can decide what to do with it.
(e.g., whether to change physical RTC)
There are some facts that makes RTC change event a bit complicated:
- There are 7 RTC date/time regs (year, month…). They can only be
updated one by one.
- RTC time is not reliable before date/time update is finished.
- Guests can update RTC date/time regs in any order.
- Guests may update RTC date/time regs during either RTC halted or not
halted.
A single date/time update event is not reliable. We have to wait for
the guest to finish the update process. So the DM's event handler
sets up a timer, and wait for some time (1 second). If no more change
happens befor the timer expires, we can conclude that the RTC
change has been done. Then the rtc change event is emitted.
This logic of event handler can be used to process HV vrtc time change
event too.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
The dm vrtc has been using time(NULL) as the vrtc base time. When
service VM system time is adjusted, the vrtc will experience time jump
which will make the vrtc time inaccurate. Change the source of base
time to monotonic time can resolve this issue, as the monotonic time is
not setable.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
Through it is best to halt the RTC before changing date/time, still some
OSes just write date/time while RTC is not halted. Currently the DM vRTC
has already dealt the situation where openBSD writes century byte out
side of vRTC halt by updating vRTC time on century byte writes.
Now WaaG is found writing all date/time regs outside of vRTC halt.
Because those date/time writes are not updated instantly, WaaG’s vRTC
time is not actually changed.
This bug has not affected anything till now when we are adding support
to RTC change vm_event.
To make WaaG’s vRTC work properly, this patch adds vRTC time update on
all date/time writes outside of vRTC halt.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
The idea of event throttle is to allow only curtain mounts of vm_events
to be emitted per second. This feature is implemented with an event
counter and a timer_fd periodic timer. Event counter increases until it
reaches the throttle rate limit, then the periodic timer resets the
counter in each time window.
Events exceed the throttle rate are dropped.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
The default event handler generates the vm_event message in json format,
then emit it through command monitor.
The event data json txt is currently leaved as blank. When a specific
event type is implemented, its event data generate handler can be added
correspondingly.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch added vm_event support in command monitor, so that vm_event
can be sent to a client (e.g., Libvirt) through the monitor.
As the command monitor works in socket server mode, the vm_event sending
process is designed in this way:
1. If a client wishes to receive vm_event, it issues a
REGISTER_VM_EVENT_CLIENT command to the monitor.
2. Command monitor then handles the REGISTER_VM_EVENT_CLIENT command. If
it is legitimate, the client is registered as as vm_event receiver.
The command monitor then send a ACK to the client, and keeps the socket
connection.
3. When a vm_event is generated, the command monitor send it out through
the socket connection.
4. Only one event client is allowed.
5. The registration is cancelled on socket disconnection.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch creates a thread for vm_event delivery. The thread uses epoll
to poll event notifications, then read out the msg data queued in sbuf.
An event handler is called upon success receiving. Both HV and DM event
sources share the same process.
Also vm_event tx API for DM event source is added in this patch.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch adds vm_event sbuf and notification initialization.
We have 2 types of event source: DM and HV, and they are slightly
different:
- Sbuf for DM event source is a memery page shared between threads.
Event notifications are delivered by userspace eventfd.
- While for hv event source, sbuf is a memery page shared with HV. Its
address(GPA) is shared to HV through hypercall. Its notifications
are generated by HV upcall, then delivered by kernel/userspace eventfd.
A sbuf message path acts like a one way ‘tunnel’, so a data structure
‘vm_event_tunnel’ is created to organize those sbufs.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
The sbuf will be used by DM to send and receive vm_events.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
This patch creates vm_event support in HV, including:
1. Create vm_event data type.
2. Add vm_event sbuf and its initializer. The sbuf will be allocated by
DM in Service VM. Its page address will then be share to HV through
hypercall.
3. Add an API to send the HV generated event.
Tracked-On: #8547
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently lpc emulates all the supported COM ports no matter it is
configured or not in command line. Change the behavior to only emulate
those specified in command line.
Tracked-On: #8537
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
Extend the devicemodel lpc uart emulation support to COM4. Since
COM1 is usually used for hv console and COM2 is taken by S5 feature,
only COM1 and COM2 is not enough.
Tracked-On: #8537
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
Currently, for pre-launched VM, ACPI table is generated only when "reboot=acpi"
is added to cmdline in the scenario file. This patch removes this check support
that ACPI table must be generated automatedly for the Pre-launched VM.
Tracked-On: #8546
Signed-off-by: Kunhui-Li <kunhuix.li@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When enable TPM2 tag in acrn scenario file, can't pass compile:
File "../hypervisor/../misc/config_tools/acpi_gen/bin_gen.py", line 128, in tpm2_acpi_gen
ctype_data.start_method_specific_parameters[i] = int(start_method_parameters[i], 16)
ValueError: invalid literal for int() with base 16: '\n '
Fix this issue in tpm2_acpi_gen.
Tracked-On: #8516
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Currently ivshmem device can only be configurated as single function
device(bdf.f = 0) on bus 0. This greatly limits the number of ivshmem
devices we can create. This patch is to enable multiple function bit in
HEADER_TYPE config register, so that we can create many more ivshmem
devices by using different function numbers on one bus:dev.
The multi function device bit is to be set on ivshmem devices whose function
number equls 0. PCI spec describe it as: ‘When Set, indicates that the
Device may contain multiple Functions, but not necessarily.’, So if this
dev is the only one on the bus:dev, it is still OK.
Tracked-On: #8520
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Some motherboards exposes MCFG1/MCFG2 instead of one ACPI MCFG table,
thus no /sys/firmware/acpi/tables/MCFG exists but MCFG1 and MCFG2.
Read MCFG1 if MCFG doesn't exist.
The same issue report/fix is at https://github.com/intel/pcm/issues/74.
Tracked-On: #8514
Signed-off-by: Xin Zhang <xin.x.zhang@intel.com>
Currently only devices on usb bus 0-4, port 0-19 can be passthrough to
the emulated XHCI controller. Remove this unnecessary limit.
Some unused definitions are also removed.
Tracked-On: #8506
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
The changelog file is automatically generated from git commit history
when building, it should not be tracked in git source tree.
Tracked-On: #6688
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Per BVT (Borrowed Virtual Time) scheduler design, following per thread
parameters are required to tune scheduling behaviour.
- weight
The time sharing of a thread on CPU.
- warp
Boost value of virtual time of a thread (time borrowed from future) to
reduce Effective Virtual Time to prioritize the thread.
- warp_limit
Max warp time in one warp.
- unwarp_period
Min unwarp time after a warp.
As of now, only weight is in use to tune virtual time ratio of VCPU
threads from different VMs. Others parameters are for future extension.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Add four per-vm bvt parameters as the initial bvt parameter values for
vCPU threads.
- bvt_weight
The time sharing of a thread on CPU.
- bvt_warp_value
Boost value of virtual time of a thread (time borrowed from future) to
reduce Effective Virtual Time to prioritize the thread.
- bvt_warp_limit
Max warp time in one warp.
- bvt_unwarp_period
Min unwarp time after a warp.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Abstract out schedulers config data for vCPU threads and other hypervisor
threads to sched_params structure. And it's used to initialize per
thread scheduler private data. The sched_params for vCPU threads come
from vm_config generated by config tools while other hypervisor threads
need give them explicitly.
Tracked-On: #8500
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Add clamp macro to clamp a value within a range.
Tracked-On: #8500
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
make_request sets the request bit, and signal_event wakes the vcpu
thread. If we signal_event comes first, the target vCPU has a chance to
sleep again before processing the request bit.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
When all vCPU threads on one pCPU are put to sleep (e.g., when all
guests execute HLT), hv would schedule to idle thread. Currently the
idle thread executes PAUSE which does not enter any c-state and consumes
a lot of power. This patch is to support HLT in the idle thread.
When we switch to HLT, we have to make sure events that would wake a
vCPU must also be able to wake the pCPU. Those events are either
generated by local interrupt or issued by other pCPUs followed by an
ipi kick.
Each of them have an interrupt involved, so they are also able to wake
the halted pCPU. Except when the pCPU has just scheduled to idle thread
but not yet halted, interrupts could be missed.
sleep-------schedule to idle------IRQ ON---HLT--(kick missed)
^
wake---kick|
This areas should be protected. This is done by a safe halt
mechanism leveraging STI instruction’s delay effect (same as Linux).
vCPUs with lapic_pt or hv with CONFIG_KEEP_IRQ_DISABLED=y does not allow
interrupts in root mode, so they could never wake from HLT (INIT kick
does not wake HLT in root mode either). They should continue using PAUSE
in idle.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Host doorbell array write can be asynchronous, so add an async thread
which is used to deal doorbell write.
Tracked-On: #8504
Signed-off-by: Yuanyuan Zhao <yuanyuan.zhao@intel.com>
Reviewed-by: Jian Jun Chen <jian.jun.chen@intel.com>
When bvt scheduler picks up a thread to run, it sets up a counter
‘run_countdown’ to determine how many ticks it should remain running.
Then the timer will decrease run_countdown by 1 on every 1000Hz tick
interrupt, until it reaches 0. The tick interrupt consumes a lot of
power during idle (if we are using HLT in idle thread).
This patch is to switch the 1000 HZ timer to a dynamic one, which only
interrupt on run_countdown expires.
Tracked-On: #8507
Signed-off-by: Wu Zhou <wu.zhou@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
ACRN boot fails when size of ivshmem device is
configured to 512MB, this patch allocates this memory
from E820 table instead of reserving in hypervisor.
Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
~0UL is widely used to specify the maximum memory size
when calling e820_alloc_memory(), this patch to define
a MACRO for it to avoid using this magic number.
Tracked-On: #8502
Signed-off-by: Yonghua Huang <yonghua.huang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
For a pdev which allocated to prelaunched VM or owned by HV, we need to check
whether it is a multifuction dev at function 0. If yes we have to emulate a
dummy function dev in Service VM, otherwise the sub-function devices will be
lost in guest OS pci probe process.
Tracked-On: #8492
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
When we do init_all_dev_config() in pci.c, the pdevs added to pci dev_config
will be exposed to Service VM or passthru to prelauched VM. The original code
would find service VM config in every pci pdev init loop, this is unnecessary
and definitely impact performance. Here we generate Service VM config pointer
with config tool so that init_one_dev_config() could refer service VM config
directly.
Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
Signed-off-by: Victor Sun <victor.sun@intel.com>
Rename is_allocated_to_prelaunched_vm to allocate_to_prelaunched_vm as
it not only checks whether the PCI device is allocated to a Pre-launched
VM but also associate it with Pre-launched VM's dev_config.
Tracked-On: #8491
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
1. using sockets_req_system_shutdown_user_vm_handler and sockets_req_system_reboot_user_vm_handler
wrapping function req_user_vm_shutdown_reboot
2. Modify the variable names SHUTDOWN_REQ to SYS_REBOOT_REQ
3. Add a handler for "ACK_REQ_SYS_REBOOT"
Tracked-On: #8431
Signed-off-by: Gaofei Sun <gaofeix.sun@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
Only Windows virtual machines can restart ACRN through life_mngr.
Added Linux and rtvm to restart acrn in life_mngr.
Tracked-On: #8431
Signed-off-by: Gaofei Sun <gaofeix.sun@intel.com>
Reviewed-by: Li Fei <fei1.li@intel.com>
On some platforms, the last e820 entry may not be of type E820_TYPE_RAM,
such as E820_TYPE_ACPI_NVS which may also be used by Service VM.
So we need take all e820 entry types into account when finding the upper
bound of Service VM EPT mapping.
Tracked-On: #8495
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Signed-off-by: Qiang Zhang <qiang4.zhang@intel.com>
The ivshmem spec define the BAR0 offset > 16 are reserved.
So ACRN need ignore all operation when offset out of range.
Tracked-On: #8487
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
Add the ivshmem_dev_lock to protect races between release
and allocations.
Tracked-On: ##8486
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <junjie.mao@intel.com>
ACRN sets the ivposition to the VM_ID in ivshmem_server_bind_peer().
This value should be saved in the ivshmem_device until unbind.
It is wrong to clear ivs_dev->mmio in the ivshmem_vbar_map(),
Instead, it should clear the ivshmem_device structure in the
create_ivshmem_device to ensure the same initial states
after VM reboot case.
Tracked-On: #8485
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Junjie Mao <Junjie.mao@intel.com>