This patch enable redundant virtual routers.
1. To enable this feature, db need to be updated using follow SQL by now(we
would get a UI way later):
UPDATE network_offerings SET redundant_router=1 WHERE guest_type="Virtual" AND
system_only=0;
2. System would try to start up two routers at different hosts. But if there is
only one host in the zone, system would start up two routers on it.
3. The failover part is using keepalived, and connection tracking part is using
conntrackd. There would be one master router and one backup router. The status
of router(master or backup) can be query from the database table domain_router
now. Management server would update the status every 30s by default.
4. The routers for the same zone would use same external NIC(same ip and mac).
The script used for fail-over would ensure only one external NIC present in the
network at any time.
5. Currently management server don't got the ability to stop one of router is
both of them reported as master. The feature is in the todo list.
After two routers start up, disconnect anyone of them, the guest network
shouldn't be affected, and established connection(http, ssh, etc.) should still
works. The fail-over on gateway part should be 3~4 seconds.
Currently the patch works with KVM. Would deal with vmware and XenServer soon.
status 9815: resolved fixed
The type cast should be done after making sure that the command was successful. Otherwise you may have the base Answer returned.
Changes:
- Throw an exception if the deployment plan passed into start() cannot be satisfied by the current constraints (such as root volume is already created in a pool in a different
cluster).
Changes:
- Added a new parameter to pass in deployment plan during vm start
- If a hostId is passed in to the DeployVMCmd (only allowed for a root admin to test a host), a plan is passed in to start the vm in that host's datacenter, pod and cluster and on that host
- If a plan is passed in during start, but if the VM's root volume is READY, then plan of the root volume takes precedence. In that case the plan passed in is not used.
Changes:
- When the ROOT volume of a VM is found to be READY, changed planner to reuse the pool for every volume(root or data) that is READY and that has a pool not in maintenance and not in avoid state
- If ROOT volume is not ready, we dont care about the DATA disk. Both would get re-allocated.
- When a pool is reused for a ready volume, Planner does not call storagepool allocators. And such volumes are not assigned a pool in the deployment destination returned by the planner. Accordingly StorageManager :: prepare method wont recreate these volumes since they are not mentioned in the destination.
Changes:
- Reason was that the old volume's templateId was being updated before volume creation was attempted. So on the retry, we dint find a difference in volume's templateId and VM's templateId and did not enter the recreation logic.
- Fix is to update the new volume's templateId with the VM's templateId while creating the new volume. The old volume's templateId stays the same and the volume is marked as 'Destroy' when a new volume is created.
Changes:
While starting a System VM:
- We check, incase the ROOT volume is READY, if the templateID of the volume matches the SystemVM's template.
- If it does not match, we update the volumes' templateId and ask deployment planner to reassign a pool to this volume even if it is READY.
In general:
- If a root volume is READY, we remove its entry from the deploydestination before calling storagemanager :: prepare()
- StorageManager creates a volume if a pool is assigned to it in deploydestination passed to it.
- If a volume has no pool assigned to it in deploydestination, it means the volume is ready and has a pool already allocated to it.
- Update system vm_instance's template_id if it does not match the system vm template.
- Use _templateDao.findSystemVMTemplate to find the latest system vm template.
Changes:
- Planner must reassign the storage pool if the template id for system vms has changed. StorageManager must then recreate the volume if the volume has been
reassigned. This is needed to do automatic update of the system template.
Changes:
- When migration fails we try to do cleanup on the destination host agent. The AgentUnavailableException in this cleanup was not caught.
-Due to that other cleanup like reverting capacity allocated and vm state were skipped.
-Fix is to catch the AgentUnavailableException so that rest of the cleanup can happen.
- Also corrected the exceptions in various cases of migration failure.
- In case the VM is still starting, HA should schedule a retry. Introduced a special migration exception for handling this.
- Added a new flag 'allocation_state' to zone,pod,cluster and host
- The possible values for this flag are 'Enabled' or 'Disabled'
- When a new zone,pod,cluster or host is added, allocation_state is 'Disabled' by default.
- For existing zone,pod,cluster or host, the state is 'Enabled'.
- All Add/Update/List commands for each of zone,pod,cluster or host can now take a new parameter 'allocationstate'
- If 'allocation_state' is 'Disabled', Allocators skip that zone or pod or cluster or pod.
- For a root admin, ListZones lists all zones including the 'Disabled' zones. But for any other user, the 'Disabled' zones are not included in the response.
- For any usecase that creates/deploys/adds/registers a resource and takes in zone as parameter, now we check if the Zone is 'Disabled'. If yes then the operation cannot be performed by a user other than root-admin. Add volume, snapshot, templates are examples of this usecase.
- To enable the root admin to test a particular pod/cluster/host, deployVM command takes in 'host_id' parameter that can be passed in only by root admin.
If this parameter is passed in by the admin, allocators do not search for hosts and use that host only. StoragePools are searched in the cluster of that host.
If VM cannot be deployed to that host, allocators and deployVM fails without retrying
This is a Root admin only functionality
---------------------
Service API changes:
---------------------
- ManagementServer will expose new API:
Pair<List<HostVO>, List<Long>> listHostsForMigrationOfVM(UserVm vm, Long
startIndex, Long pageSize)
The API returns list of all hosts in the VM's cluster minus the current host and also a list of hostIds that seem to have enough CPU and RAM capacity to host this VM.
- ListHostsCmd will call this service API if virtualmachineid is present in the request.
- MigrateVmCmd is the new command added that takes in virtualmachineid and destination hostid
- UserVmService will expose a new API: UserVm migrateVirtualMachine(UserVm vm, Host destinationHost)
------------------------------------
API throws error in following cases:
------------------------------------
- User is not a root Admin. (‘Permission denied’)
- A VM uses local storage, we cannot migrate it, so ‘listHosts’ will throw error.
- We fail to migrate the VM on the chosen host.
- API will support migration for XenServer only currently. So error is thrown
if hypervisor is not XenServer (e.g KVM, vSphere etc)
- Destination host is not in same cluster as source host.
- VM is not in running state
Bug 7723 - merge or re-write host tagging into master / 2.2
Bug 7627 - Need more logging for Allocators
Bug 8317 - Add better resource allocation failure messages
Changes for Deployment Planner to use host and storagePool allocators to find deployment destination.
Also has the changes for host tag feature.
Improved the logging for allocators.
2) add consoleproxy.launch.max to allow specifying a configured max number of console proxy VM that can be launched within a zone(if not specified, default is 10), this is to prevent possible DoS attacks or uncontrolled usage of system resources
3) Remove some unused code.
status 7811: resolved fixed
Other fixes:
* vmExpunge: cleanup LB/PF rules after vm was marked as Expunging in the DB to avoid the situation when user recovers a vm in the middle of expunge job.
status 7348: resolved fixed
More fixes:
* Update user_statistics on each domR stop/reboot
* Reset dhcpData/userData as a part of domR stop/reboot
* More logging for domR commands