CLOUDSTACK-9348: NioConnection improvementsReopened PR with squashed changes for a re-review and testing after https://github.com/apache/cloudstack/pull/1493 and sub-sequent PRs got reverted
* pr/1549:
CLOUDSTACK-9348: NioConnection improvements
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Taking fast and efficient volume snapshots with XenServer (and your storage provider)A XenServer storage repository (SR) and virtual disk image (VDI) each have UUIDs that are immutable.
This poses a problem for SAN snapshots, if you intend on mounting the underlying snapshot SR alongside the source SR (duplicate UUIDs).
VMware has a solution for this called re-signaturing (so, in other words, the snapshot UUIDs can be changed).
This PR only deals with the CloudStack side of things, but it works in concert with a new XenServer storage manager created by CloudOps (this storage manager enables re-signaturing of XenServer SR and VDI UUIDs).
I have written Marvin integration tests to go along with this, but cannot yet check those into the CloudStack repo as they rely on SolidFire hardware.
If anyone would like to see these integration tests, please let me know.
JIRA ticket: https://issues.apache.org/jira/browse/CLOUDSTACK-9281
Here's a video I made that shows this feature in action:
https://www.youtube.com/watch?v=YQ3pBeL-WaA&list=PLqOXKM0Bt13DFnQnwUx8ZtJzoyDV0Uuye&index=13
* pr/1403:
Faster logic to see if a cluster supports resigning
Support for backend snapshots with XenServer
Signed-off-by: Will Stevens <williamstevens@gmail.com>
- Unit test to demonstrate denial of service attack
The NioConnection uses blocking handlers for various events such as connect,
accept, read, write. In case a client connects NioServer (used by
agent mgr to service agents on port 8250) but fails to participate in SSL
handshake or just sits idle, this would block the main IO/selector loop in
NioConnection. Such a client could be either malicious or aggresive.
This unit test demonstrates such a malicious client that can perform a
denial-of-service attack on NioServer that blocks it to serve any other client.
- Use non-blocking SSL handshake
- Uses non-blocking socket config in NioClient and NioServer/NioConnection
- Scalable connectivity from agents and peer clustered-management server
- Removes blocking ssl handshake code with a non-blocking code
- Protects from denial-of-service issues that can degrade mgmt server responsiveness
due to an aggressive/malicious client
- Uses separate executor services for handling ssl handshakes
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This reverts commit 7ce0e10fbc, reversing
changes made to 29ba71f2db.
This was reverted because it seemed to be related to an issue
when doing a DeployDC, causing an `addHost` error.
Notify listeners when a host has been added to a cluster, is about to be removed from a cluster, or has been removed from a cluster
This PR addresses the following JIRA ticket:
https://issues.apache.org/jira/browse/CLOUDSTACK-8813
The problem is that there needs to be notifications sent when a host is added to, about to be removed from, and removed from a cluster.
Such notifications can be used for many purposes. For example, it can allow storage plug-ins to update ACLs on their storage systems. Also, it can allow us to clean up IQNs from ESXi hosts that are no longer needed.
* pr/816:
CLOUDSTACK-8813: Notify listeners when a host has been added to a cluster, is about to be removed from a cluster, or has been removed from a cluster
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Support access to a host’s out-of-band management interface (e.g. IPMI, iLO,
DRAC, etc.) to manage host power operations (on/off etc.) and querying current
power state in CloudStack.
Given the wide range of out-of-band management interfaces such as iLO and iDRA,
the service implementation allows for development of separate drivers as plugins.
This feature comes with a ipmitool based driver that uses the
ipmitool (http://linux.die.net/man/1/ipmitool) to communicate with any
out-of-band management interface that support IPMI 2.0.
This feature allows following common use-cases:
- Restarting stalled/failed hosts
- Powering off under-utilised hosts
- Powering on hosts for provisioning or to increase capacity
- Allowing system administrators to see the current power state of the host
For testing this feature `ipmisim` can be used:
https://pypi.python.org/pypi/ipmisim
FS:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Out-of-band+Management+for+CloudStack
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9348: Use non-blocking SSL handshake in NioConnection/Link- Uses non-blocking socket config in NioClient and NioServer/NioConnection
- Scalable connectivity from agents and peer clustered-management server
- Removes blocking ssl handshake code with a non-blocking code
- Protects from denial-of-service issues that can degrade mgmt server responsiveness
due to an aggressive/malicious client
- Uses separate executor services for handling connect/accept events
Changes are covered the NioTest so I did not write a new test, advise how we can improve this. Further, I tried to invest time on writing a benchmark test to reproduce a degraded server but could not write it deterministic-ally (sometimes fails/passes but not always). Review, CI testing and feedback requested /cc @swill @jburwell @DaanHoogland @wido @remibergsma @rafaelweingartner @GabrielBrascher
* pr/1493:
CLOUDSTACK-9348: Use non-blocking SSL handshake
CLOUDSTACK-9348: Unit test to demonstrate denial of service attack
Signed-off-by: Will Stevens <williamstevens@gmail.com>
- Uses non-blocking socket config in NioClient and NioServer/NioConnection
- Scalable connectivity from agents and peer clustered-management server
- Removes blocking ssl handshake code with a non-blocking code
- Protects from denial-of-service issues that can degrade mgmt server responsiveness
due to an aggressive/malicious client
- Uses separate executor services for handling ssl handshakes
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-8847: ListServiceOfferings is returning incompatible tagged offerings when called with VM idWhen calling listServiceOfferings with VM id as parameter. It is returning incompatible tagged offerings. It should only list all compatible tagged offerings. Compatible means the new service offering should contain all the tags of the existing service offering(Existing offering SUBSET of new offering). If that is the case It should list in the result and can be upgraded to that offering.
* pr/1321:
CLOUDSTACK-8847: ListServiceOfferings is returning incompatible tagged offerings when called with VM id
Signed-off-by: Will Stevens <williamstevens@gmail.com>
CLOUDSTACK-9130: Make RebootCommand similar to start/stop/migrate agent commands w.r.t. "execute in sequence" flag
RebootCommand now behaves in the same way as start/stop/migrate agent commands w.r.t. to sequential/parallel execution.
* pr/1200:
CLOUDSTACK-9130: Make RebootCommand similar to start/stop/migrate agent commands w.r.t. "execute in sequence" flag RebootCommand now behaves in the same way as start/stop/migrate agent commands w.r.t. to sequential/parallel execution.
Signed-off-by: Will Stevens <williamstevens@gmail.com>
CLOUDSTACK-9196: Fixing null pointer exception when vm meta data is synced on upgraded setuphttps://issues.apache.org/jira/browse/CLOUDSTACK-9196
NullPointerException can occur if XenServer reports non-existing VM in cloud DB.
* pr/1274:
CLOUDSTACK-9196: Fixing null pointer exception when vm meta data is synced on upgraded setup.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Removed unused variables from "NetworkStateListener" classWe removed the following variables from "com.cloud.network.NetworkStateListener"
. UsageEventDao _usageEventDao
. NetworkDao _networkDao
We changed the EventBus s_eventBus variable to private, the constructor not to use those variables and applied this change in classes com.cloud.network.IpAddressManagerImpl and org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
* pr/1261:
Removed unused variables from class NetworkStateListener
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-9185: [VMware DRS] VM sync failed with exception due to out-of-band changesSummary: The target "ClusteredVirtualMachineManagerImpl.HandlePowerStateReport" invoked during the VM power state sync is not found as HandlePowerStateReport was not implemented in ClusteredVirtualMachineManagerImpl and was private in VirtualMachineManagerImpl, which was resulting in InvocationTargetException. Changed HandlePowerStateReport() in VirtualMachineManagerImpl to protected.
* pr/1256:
CLOUDSTACK-9185: [VMware DRS] VM sync failed with exception due to out-of-band changes
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
CLOUDSTACK-8860: improve error messages in VM deployment code path.improved the error messages in vm deployment code path. added some more data to the error messages and also fixed some errors using internal ids to use uuids.
* pr/864:
CLOUDSTACK-8860: improve error messages in VM deployment code path.
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.7:
CLOUDSTACK-9154 - Sets the pub interface down when all guest nets are gone
CLOUDSTACK-9187 - Makes code ready for more something like ethXXXX, if we ever get that far
CLOUDSTACK-9188 - Reads network GC interval and wait from configDao
CLOUDSTACK-9187 - Fixes interface allocation to VRRP instances
CLOUDSTACK-9187 - Adds test to cover multiple nics and nic removal
CLOUDSTACK-9154 - Adds test to cover nics state after GC
CLOUDSTACK-9154 - Returns the guest iterface that is marked as added
Conflicts:
engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java
[4.7] Critical VPCVR issues fixed: CLOUDSTACK-9154; CLOUDSTACK-9187; and CLOUDSTACK-9188This PR applies the same fixes as in the PR #1259, but against branch 4.7.
Please refer to PR #1259 for the tests results and all the comments already made there.
Issues fixed are:
* CLOUDSTACK-9154: rVPC doesn't recover from cleaning up of network garbage collector
* CLOUDSTACK-9187: rVPC routers in Master/Master due to concurrency problem when writing the keepalivd.conf
* CLOUDSTACK-9188: NetworkGarbageCollector is not using gc.interval and gc.wait from settings
Those changes have been covered by 2 new tests added to ```smoke/test_vpc_redundant.py```:
* test_04_rvpc_network_garbage_collector_nics
* test_05_rvpc_multi_tiers
The test ```test_04_rvpc_network_garbage_collector_nics``` depends on the global settings for the network.gc.interval and gc.wait. If one wants the test to run quicker, please change the settings (default is 600 seconds for each) and restart the Management Server before running the tests. I would suggest to set it to 60 seconds.
In addition, the NetworkGarbageCollector was redefining the settings above mentioned and not reading their values through ConfigDao. Due to that, the settings were not being applied properly and the test was waiting to long to check the VPC routers.
* pr/1277:
CLOUDSTACK-9154 - Sets the pub interface down when all guest nets are gone
CLOUDSTACK-9187 - Makes code ready for more something like ethXXXX, if we ever get that far
CLOUDSTACK-9188 - Reads network GC interval and wait from configDao
CLOUDSTACK-9187 - Fixes interface allocation to VRRP instances
CLOUDSTACK-9187 - Adds test to cover multiple nics and nic removal
CLOUDSTACK-9154 - Adds test to cover nics state after GC
CLOUDSTACK-9154 - Returns the guest iterface that is marked as added
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.7:
Implement CheckHealthCommand for NSX controllers
Fix log message that refers to agent, not host
Prevent NullPointerException when host does not belong to a pod
Summary: The target "ClusteredVirtualMachineManagerImpl.HandlePowerStateReport" invoked during the VM power state sync is not found as HandlePowerStateReport was not implemented in ClusteredVirtualMachineManagerImpl and was private in VirtualMachineManagerImpl, which was resulting in InvocationTargetException. Changed HandlePowerStateReport() in VirtualMachineManagerImpl to protected.
CLOUDSTACK-9134: set device_id as the first device_id not in use instead of nic count
when we restart vpc tiers, the old nics will be removed, and create a new nic.
however, the device_id was set to the nic count, which may be already used.
this commit get the first device_id not in use as the device_id of new nic.
This issue also happen when we add multiple networks to a vm and remove them.
* pr/1209:
CLOUDSTACK-9134: set device_id as the first device_id not in use instead of nic count
Signed-off-by: Daan Hoogland <daan@onecht.net>
when we restart vpc tiers, the old nics will be removed, and create a new nic.
however, the device_id was set to the nic count, which may be already used.
this commit get the first device_id not in use as the device_id of new nic.
This issue also happen when we add multiple networks to a vm and remove them.
CLOUDSTACK-9105: Logging enhancement: Handle/reference to track API calls end to end in the MS logs
Added logid to logging framework, now all API call logs can be tracked with this id end to end
* pr/1167:
CLOUDSTACK-9105: Logging enhancement: Handle/reference to track API calls end to end in the MS logs Added logid to logging framework, now all API call logs can be tracked with this id end to end
Signed-off-by: Daan Hoogland <daan@onecht.net>
CLOUDSTACK-9047 rename enumsmake enums adhere to best practice naming conventions
* pr/1049:
CLOUDSTACK-9046 rename enums to adhere to naming conventions
CLOUDSTACK-9046 renamed enums in kvm plugin
CLOUDSTACK-9047 use 'State's only with context there are more types called 'State' (or to be called so but now 'state') So remove imports and prepend their enclosing class/context to them.
Signed-off-by: Daan Hoogland <daan@onecht.net>
* 4.6:
CLOUDSTACK-9075 - Uses the same vlan since it should have been already released
CLOUDSTACK-9075 - Adds VPC static routes test
CLOUDSTACK-9075 - Covers Private GW ACL with Redundant VPCs
CLOUDSTACK-9075 - Add method to get list of Physical Networks per zone
CLOUDSTACK-6276 Removing unused parameter in integration test for projects
CLOUDSTACK-6276 Removing unused parameter in integration test
CLOUDSTACK-6276 Fixing affinity groups for projects
CLOUDSTACK-6276 Fixing affinity groups for projectsWith some contributions from @resmo and @ustcweizhou.
This closes https://github.com/apache/cloudstack/pull/508
To test manually (need at least 2 hosts):
Create a project
Create an affinity group in that project
Deploy a vm with that affinity group
Deploy a second vm with that affinity group
They should be on different hosts
Ran old and new tests for affinity groups on the simulator
Test create affinity group as admin in project ... === TestName: test_01_admin_create_aff_grp_for_project | Status : SUCCESS ===
ok
Test create affinity group as domain admin for projects ... === TestName: test_02_doadmin_create_aff_grp_for_project | Status : SUCCESS ===
ok
Test create affinity group as user for projects ... === TestName: test_03_user_create_aff_grp_for_project | Status : SUCCESS ===
ok
Test create affinity group that exists (same name) for projects ... === TestName: test_4_user_create_aff_grp_existing_name_for_project | Status : SUCCESS ===
ok
#Delete Affinity Group by id. ... === TestName: test_01_delete_aff_grp_by_id | Status : SUCCESS ===
ok
#Delete Affinity Group by id should fail for user not in project ... === TestName: test_02_delete_aff_grp_by_id_another_user | Status : SUCCESS ===
ok
test DeployVM in anti-affinity groups ... === TestName: test_01_deploy_vm_anti_affinity_group | Status : SUCCESS ===
ok
test DeployVM in anti-affinity groups with more vms than hosts. ... === TestName: test_02_deploy_vm_anti_affinity_group_fail_on_not_enough_hosts | Status : SUCCESS ===
ok
List affinity group for a vm for projects ... === TestName: test_01_list_aff_grps_for_vm | Status : SUCCESS ===
ok
List multiple affinity groups associated with a vm for projects ... === TestName: test_02_list_multiple_aff_grps_for_vm | Status : SUCCESS ===
ok
List affinity groups by id for projects ... === TestName: test_03_list_aff_grps_by_id | Status : SUCCESS ===
ok
List Affinity Groups by name for projects ... === TestName: test_04_list_aff_grps_by_name | Status : SUCCESS ===
ok
List Affinity Groups by non-existing id for projects ... === TestName: test_05_list_aff_grps_by_non_existing_id | Status : SUCCESS ===
ok
List Affinity Groups by non-existing name for projects ... === TestName: test_06_list_aff_grps_by_non_existing_name | Status : SUCCESS ===
ok
List affinity group should list all for a vms associated with that group for projects ... === TestName: test_07_list_all_vms_in_aff_grp | Status : SUCCESS ===
ok
Update the list of affinityGroups by using affinity groupids ... === TestName: test_01_update_aff_grp_by_ids | Status : SUCCESS ===
ok
----------------------------------------------------------------------
Ran 16 tests in 581.706s
OK
Deploy vm as Admin in Affinity Group belonging to regular user (should fail) ... === TestName: test_01_deploy_vm_another_user | Status : SUCCESS ===
ok
Create Affinity Group as admin for regular user ... === TestName: test_02_create_aff_grp_user | Status : SUCCESS ===
ok
List Affinity Groups as admin for all the users ... === TestName: test_03_list_aff_grp_all_users | Status : SUCCESS ===
ok
List Affinity Groups belonging to admin user ... === TestName: test_04_list_all_admin_aff_grp | Status : SUCCESS ===
ok
List Affinity Groups belonging to regular user passing account id and domain id ... === TestName: test_05_list_all_users_aff_grp | Status : SUCCESS ===
ok
List Affinity Groups belonging to regular user passing group id ... === TestName: test_06_list_all_users_aff_grp_by_id | Status : SUCCESS ===
ok
Delete Affinity Group belonging to regular user ... === TestName: test_07_delete_aff_grp_of_other_user | Status : SUCCESS ===
ok
Test create affinity group as admin ... === TestName: test_01_admin_create_aff_grp | Status : SUCCESS ===
ok
Test create affinity group as domain admin ... === TestName: test_02_doadmin_create_aff_grp | Status : SUCCESS ===
ok
Test create affinity group as user ... === TestName: test_03_user_create_aff_grp | Status : SUCCESS ===
ok
Test create affinity group that exists (same name) ... === TestName: test_04_user_create_aff_grp_existing_name | Status : SUCCESS ===
ok
Test create affinity group with existing name but within different account ... === TestName: test_05_create_aff_grp_same_name_diff_acc | Status : SUCCESS ===
ok
Test create affinity group of non-existing type ... === TestName: test_06_create_aff_grp_nonexisting_type | Status : SUCCESS ===
ok
Delete Affinity Group by name ... === TestName: test_01_delete_aff_grp_by_name | Status : SUCCESS ===
ok
Delete Affinity Group as admin for an account ... === TestName: test_02_delete_aff_grp_for_acc | Status : SUCCESS ===
ok
Delete Affinity Group which has vms in it ... === TestName: test_03_delete_aff_grp_with_vms | Status : SUCCESS ===
ok
Delete Affinity Group with id which does not belong to this user ... === TestName: test_05_delete_aff_grp_id | Status : SUCCESS ===
ok
Delete Affinity Group by name which does not belong to this user ... === TestName: test_06_delete_aff_grp_name | Status : SUCCESS ===
ok
Delete Affinity Group by id. ... === TestName: test_08_delete_aff_grp_by_id | Status : SUCCESS ===
ok
Root admin should be able to delete affinity group of other users ... === TestName: test_09_delete_aff_grp_root_admin | Status : SUCCESS ===
ok
Deploy VM without affinity group ... === TestName: test_01_deploy_vm_without_aff_grp | Status : SUCCESS ===
ok
Deploy VM by aff grp name ... === TestName: test_02_deploy_vm_by_aff_grp_name | Status : SUCCESS ===
ok
Deploy VM by aff grp id ... === TestName: test_03_deploy_vm_by_aff_grp_id | Status : SUCCESS ===
ok
test DeployVM in anti-affinity groups ... === TestName: test_04_deploy_vm_anti_affinity_group | Status : SUCCESS ===
ok
Deploy vms by affinity group id ... === TestName: test_05_deploy_vm_by_id | Status : SUCCESS ===
ok
Deploy vm in affinity group of another user by name ... === TestName: test_06_deploy_vm_aff_grp_of_other_user_by_name | Status : SUCCESS ===
ok
Deploy vm in affinity group of another user by id ... === TestName: test_07_deploy_vm_aff_grp_of_other_user_by_id | Status : SUCCESS ===
ok
Deploy vm in multiple affinity groups ... === TestName: test_08_deploy_vm_multiple_aff_grps | Status : SUCCESS ===
ok
Deploy multiple vms in multiple affinity groups ... === TestName: test_09_deploy_vm_multiple_aff_grps | Status : SUCCESS ===
ok
Deploy VM by aff grp name and id ... === TestName: test_10_deploy_vm_by_aff_grp_name_and_id | Status : SUCCESS ===
ok
List affinity group for a vm ... === TestName: test_01_list_aff_grps_for_vm | Status : SUCCESS ===
ok
List multiple affinity groups associated with a vm ... === TestName: test_02_list_multiple_aff_grps_for_vm | Status : SUCCESS ===
ok
List affinity groups by id ... === TestName: test_03_list_aff_grps_by_id | Status : SUCCESS ===
ok
List Affinity Groups by name ... === TestName: test_04_list_aff_grps_by_name | Status : SUCCESS ===
ok
List Affinity Groups by non-existing id ... === TestName: test_05_list_aff_grps_by_non_existing_id | Status : SUCCESS ===
ok
List Affinity Groups by non-existing name ... === TestName: test_06_list_aff_grps_by_non_existing_name | Status : SUCCESS ===
ok
List affinity group should list all for a vms associated with that group ... === TestName: test_07_list_all_vms_in_aff_grp | Status : SUCCESS ===
ok
Update the list of affinityGroups by using affinity groupids ... === TestName: test_01_update_aff_grp_by_ids | Status : SUCCESS ===
ok
Update the list of affinityGroups by using affinity groupnames ... === TestName: test_02_update_aff_grp_by_names | Status : SUCCESS ===
ok
Update the list of affinityGroups for vm which is not associated ... === TestName: test_03_update_aff_grp_for_vm_with_no_aff_grp | Status : SUCCESS ===
ok
Update the list of Affinity Groups to empty list ... SKIP: Skip - Failing - work in progress
Update the list of Affinity Groups on running vm ... === TestName: test_05_update_aff_grp_on_running_vm | Status : SUCCESS ===
ok
----------------------------------------------------------------------
Ran 42 tests in 976.432s
OK (SKIP=1)
* pr/1134:
CLOUDSTACK-6276 Removing unused parameter in integration test for projects
CLOUDSTACK-6276 Removing unused parameter in integration test
CLOUDSTACK-6276 Fixing affinity groups for projects
Signed-off-by: Remi Bergsma <github@remi.nl>
Removed unused code from the EngineHostDao Implementation After analysing the code within the EngineHostDaoImpl class, we noticed that methods:
countBy;
findByGuid;
findAndUpdateDirectAgentToLoad;
findAndUpdateApplianceToLoad;
markHostsAsDisconnected;
listAllUpAndEnabledNonHAHosts;
findLostHosts;
getRunningHostCounts;
getNextSequence;
countRoutingHostsByDataCenter;
updateResourceState;
findByTypeNameAndZoneId;
findHypervisorHostInCluster;
lockOneRandomRow;
And variables:
status_logger;
state_logger;
Have no usage. Thus, in order to clean up the code, we decided to remove them from EngineHostDaoImpl and its interface (EngineHostDao).
All of EngineHostDaoImpl's attributes were set to private, given that they are not accessed by any other class.
* pr/942:
Removed unused code from the EngineHostDao Implementation.
Signed-off-by: Remi Bergsma <github@remi.nl>
* 4.6:
CLOUDSTACK-9020: UI enhancements from metrics view
CLOUDSTACK-9064: The users should be able to create multiple Guest Shared Networks in same Vlan ID, same Physical Network and same network, just with a different IP ranges.
CLOUDSTACK-9078: Gave scripts executable permissions.
CLOUDSTACK-9076: Changed ownership of directory /var/lib/cloudstack to cloud.
Cannot list vlanipranges by keyword
CLOUDSTACk-9002: VM deployment is successful even when dhcp entry command fails - Fixed
Reason: The return value of the call to accept() function in the applyRules() function of BasicNetworkTopology.java was not checked for success or failure. As a result even if it fails, exception was not thrown and VM deployment went ahead without any errors.
Fix: Added the necessary checks.
* pr/995:
CLOUDSTACk-9002: VM deployment is successful even when dhcp entry command fails - Fixed
Signed-off-by: Remi Bergsma <github@remi.nl>
There 2 things which has been changed.
* We look on power_state_update_time instead of update_time. Didn't make sense to me at all to look at update_time.
* Due DB update optimisation, powerState will only be updated if < MAX_CONSECUTIVE_SAME_STATE_UPDATE_COUNT. That is why we can not rely on these information unless we make sure these are up to date.
Cloudstack 8816 entityuuid missing in some of the eventsIn some of the events generated, entity uuid was missing making it difficult to find the entity. Fixed the same.
Tested it on rabbitmq instance.
There are the events before after the fix:
Before
--------------------------------------------------------------------------------
routing_key: management-server.ActionEvent.ACCOUNT-DELETE.Account.*
exchange: cloudstack-events
message_count: 2
payload:
{"eventDateTime":"2015-09-04 17:59:24 +0530","status":"Scheduled","description":"deleting User test4 (id: 28) and accountId \u003d 28","event":"ACCOUNT.DELETE","Account":"c09e2e81-8edc-4c27-b072-25005b522b63","account":"bd73dc2e-35c0-11e5-b094-d4ae52cb9af0","user":"bd7ea748-35c0-11e5-b094-d4ae52cb9af0"}
payload_bytes: 304
payload_encoding: string
redelivered: False
--------------------------------------------------------------------------------
routing_key: management-server.AsyncJobEvent.complete.Account.*
exchange: cloudstack-events
message_count: 0
payload: {"cmdInfo":"{\"id\":\"9dd3abc2-3f8b-4852-aa60-a74b234acb13\",\"response\":\"json\",\"sessionkey\":\"5ig1ItP2_5v-mgY4cVJbJN5hw_w\",\"ctxDetails\":\"
{\\\"interface com.cloud.user.Account\\\":\\\"9dd3abc2-3f8b-4852-aa60-a74b234acb13\\\"}
\",\"cmdEventType\":\"ACCOUNT.DELETE\",\"expires\":\"2015-09-07T11:11:56+0000\",\"ctxUserId\":\"2\",\"signatureversion\":\"3\",\"httpmethod\":\"GET\",\"uuid\":\"9dd3abc2-3f8b-4852-aa60-a74b234acb13\",\"ctxAccountId\":\"2\",\"ctxStartEventId\":\"447\"}","instanceType":"Account","jobId":"5004989d-0cde-4922-8afa-66bf38b75ea7","status":"SUCCEEDED","processStatus":"0","commandEventType":"ACCOUNT.DELETE","resultCode":"0","command":"org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd","jobResult":"org.apache.cloudstack.api.response.SuccessResponse/null/
{\"success\":true}
","account":"bd73dc2e-35c0-11e5-b094-d4ae52cb9af0","user":"bd7ea748-35c0-11e5-b094-d4ae52cb9af0"}
payload_bytes: 914
payload_encoding: string
redelivered: False
--------------------------------------------------------------------------------
After
--------------------------------------------------------------------------------
routing_key: management-server.ActionEvent.ACCOUNT-DELETE.Account.e5e2db91-414d-484c-99d5-c4e265c14ad8
exchange: cloudstack-events
message_count: 13
payload: {"eventDateTime":"2015-09-07 17:32:26 +0530","status":"Completed","description":"Successfully completed deleting account. Account Id: 45","event":"ACCOUNT.DELETE","entityuuid":"e5e2db91-414d-484c-99d5-c4e265c14ad8","entity":"com.cloud.user.Account","account":"bd73dc2e-35c0-11e5-b094-d4ae52cb9af0","user":"bd7ea748-35c0-11e5-b094-d4ae52cb9af0"}
payload_bytes: 344
payload_encoding: string
redelivered: True
--------------------------------------------------------------------------------
routing_key: management-server.AsyncJobEvent.complete.Account.e5e2db91-414d-484c-99d5-c4e265c14ad8
exchange: cloudstack-events
message_count: 12
payload: {"cmdInfo":"{\"id\":\"e5e2db91-414d-484c-99d5-c4e265c14ad8\",\"response\":\"json\",\"sessionkey\":\"8AJVbn8HIpg5LZ_VaVfSPs_QN2k\",\"ctxDetails\":\"{\\\"interface com.cloud.user.Account\\\":\\\"e5e2db91-414d-484c-99d5-c4e265c14ad8\\\"}\",\"cmdEventType\":\"ACCOUNT.DELETE\",\"expires\":\"2015-09-07T12:17:42+0000\",\"ctxUserId\":\"2\",\"signatureversion\":\"3\",\"httpmethod\":\"GET\",\"uuid\":\"e5e2db91-414d-484c-99d5-c4e265c14ad8\",\"ctxAccountId\":\"2\",\"ctxStartEventId\":\"465\"}","instanceType":"Account","instanceUuid":"e5e2db91-414d-484c-99d5-c4e265c14ad8","jobId":"0bb08486-6d9f-4e9f-bfef-b7463c42e71b","status":"SUCCEEDED","processStatus":"0","commandEventType":"ACCOUNT.DELETE","resultCode":"0","command":"org.apache.cloudstack.api.command.admin.account.DeleteAccountCmd","jobResult":"org.apache.cloudstack.api.response.SuccessResponse/null/{\"success\":true}","account":"bd73dc2e-35c0-11e5-b094-d4ae52cb9af0","user":"bd7ea748-35c0-11e5-b094-d4ae52cb9af0"}
payload_bytes: 968
payload_encoding: string
redelivered: True
--------------------------------------------------------------------------------
* pr/782:
CLOUDSTACK-8816 Systemvm reboot event doesnt have uuids. Fixed the same
CLOUDSTACK-8816: Project UUID is not showing for some of operations in RabbitMQ.
CLOUDSTACK-8816: entity uuid missing in create network event
CLOUDSTACK-8816: instance uuid is missing in events for delete account
CLOUDSTACK-8816 Fixed entityUuid missing in some cases is events
Signed-off-by: Rajani Karuturi <rajani.karuturi@citrix.com>
This reverts commit cd7218e241, reversing
changes made to f5a7395cc2.
Reason for Revert:
noredist build failed with the below error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project cloud-plugin-hypervisor-vmware: Compilation failure
[ERROR] /home/jenkins/acs/workspace/build-master-noredist/plugins/hypervisors/vmware/src/com/cloud/hypervisor/guru/VMwareGuru.java:[484,12] error: non-static variable logger cannot be referenced from a static context
[ERROR] -> [Help 1]
even the normal build is broken as reported by @koushik-das on dev list
http://markmail.org/message/nngimssuzkj5gpbz
CLOUDSTACK-8754: VM migration triggered by dynamic scaling is failingThis is caused by serialization failure for VmWorkMigrateForScale object. Replaced DeployDestination member present in VmWorkMigrateForScale with serializable types.
Haven't added any unit test as couldn't find a clean way to do it. Simply adding a test to do (de)serialization won't help catch any new member addition to the type which breaks serializability.
* pr/725:
CLOUDSTACK-8754: VM migration triggered by dynamic scaling is failing This is caused by serialization failure for VmWorkMigrateForScale object. Replaced DeployDestination member present in VmWorkMigrateForScale with serializable types.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This is caused by serialization failure for VmWorkMigrateForScale object. Replaced DeployDestination member
present in VmWorkMigrateForScale with serializable types.
Changed methodnames according to Nic.java refactor.
Fixed NicVO.java due to regression from Nic.java refactor.
Fixed VmWareGuru.java after Nic.java refactor.
See issue CLOUDSTACK-8736 for ongoing effort to clean up network code.
Instead of putting the host to Alert state immediately, the investigators should be allowed to run for some time based on alert.wait global config.
At the end of this interval if the host state still cannot be determined then put the host in Alert. Also updated some of the log messages.
This closes#621
- New implementation uses nanoseconds. Due to that, the places where the Profiler is used as a Monitor and/or
a stopwatch will suffer with the difference in the return
- Also added a getDuration(), which returns the time in nanoseconds in case someone wants to use it instead
- Added an extra test to check if the getDuration() works fine with nanoseconds
- Fixed the test that checks the time in milliseconds: I added an error margin to cover the test better
Signed-off-by: Daan Hoogland <daan@onecht.net>
During ping task, while scanning and updating status of all VMs on the host that are stuck in a transitional state
and are missing from the power report, do so only for VMs that are not removed.
(cherry picked from commit de7173a0ed)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
During ping task, while scanning and updating status of all VMs on the host that are stuck in a transitional state
and are missing from the power report, do so only for VMs that are not removed.
* removed the "is redundant" flag form the addVpcRouterToGuestNetwork() method
* removed the "is redundant" flag from the removeVpcRouterFromGuestNetwork() method
* changed the path of the master.py file in the keepalived.conf.temp file
* the call to routerDao.addRouterToGuestNetwork() in the VpcRouterDeploymentDefinition is not needed. That step will be performed once a VM is created
- In addition, when restarting a VPC the routers will have the guest net configured, if any exists.
* Pushing the POM.xml as well, to use the old Jetty for now. Could not fix the logging problem. Will replace the POM with master version after VPC is done.
Fixed on 4.4 and master but not on 4.5, cherry-picked on 4.5 using commit
fbafc957dc
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Conflicts:
engine/orchestration/src/com/cloud/agent/manager/DirectAgentAttache.java
pool, the source and destination pools cannot be local and cluster/zone and vice versa.
Cloudstack detects it and throws a exception. However, the end user only sees an
unexpected exception and not the reason for failure. Fixed it by making sure the
reason for the failure is correctly captured and shown to the end user.
(cherry picked from commit cffae8eef0)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Conflicts:
server/src/com/cloud/storage/VolumeApiServiceImpl.java
When migration fails instead of returning NULL, throw the exception.
(cherry picked from commit a5a65c7b55)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
If VM has been cold migrated across different VMware DCs, then unregister the VM from source host.
(cherry picked from commit 15b348632d)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Before registering a VM check if a different CS VM with same name exists in vCenter.
(cherry picked from commit 33179cce56)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Separate global config to enable/disable Storage Migration during normal deployment
Introduced a configuration parameter named enable.storage.migration
(cherry picked from commit c55bc0b2d1)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
Changes:
- When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is not setting the DeploymentPlanner to the VmWorkJob. So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
Changes:
- When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is not setting the DeploymentPlanner to the VmWorkJob. So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
On default iptables rules are updated to add ACCEPT egress traffic.
If the network egress default policy is false, CS remove ACCEPT and adds the DROP rule which
is egress default rule when there are no other egress rules.
If the CS network egress default policy is true, CS won't configure any default rule for egress because
router already came up to accept egress traffic. If there are already egress rules for network then the
egress rules get applied on VR.
For isolated network with out firewall service, VR default allows egress traffic (guestnetwork --> public network)
On default iptables rules are updated to add ACCEPT egress traffic.
If the network egress default policy is false, CS remove ACCEPT and adds the DROP rule which
is egress default rule when there are no other egress rules.
If the CS network egress default policy is true, CS won't configure any default rule for egress because
router already came up to accept egress traffic. If there are already egress rules for network then the
egress rules get applied on VR.
For isolated network with out firewall service, VR default allows egress traffic (guestnetwork --> public network)
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
pool, the source and destination pools cannot be local and cluster/zone and vice versa.
Cloudstack detects it and throws a exception. However, the end user only sees an
unexpected exception and not the reason for failure. Fixed it by making sure the
reason for the failure is correctly captured and shown to the end user.
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.
(cherry picked from commit 55b4ead495)
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.
Separate global config to enable/disable Storage Migration during normal deployment
Introduced a configuration parameter named enable.storage.migration
1. While destroying a ROOT volume do the lookup of the associated VM under the DC and not just cluster.
2. In case of VMware, during VM start if a volume is being recreated no need to detach the old volume because
we now expunge it immediately and don't wait for the storage cleanup task to run.
- Check to see if network is implemented changed from 'state == Implementing||Implemented' to 'state == Implemented'.
The earlier check was a hack to prevent the issue described below.
- At the time of implementing network (using implementNetwork() method), if the VR needs to be deployed then it follows
the same path of regular VM deployment. This leads to a nested call to implementNetwork() while preparing VR nics. This
flow creates issues in dealing with network state transitions. The original call puts network in "Implementing" state
and then the nested call again tries to put it into same state resulting in issues. In order to avoid it, implementNetwork()
call for VR is replaced with below code.
cleanup the rules then destroy
fix adds a provision to specify if cleanup is needed on network on
shutdown. VR is marked as to not to require network rules clean up on
network shutdown as the VR is destroyed and recreated.
ran the simulator tests that test network life cycle
The following changes are made:
- Check to see if network is implemented changed from 'state == Implementing||Implemented' to 'state == Implemented'.
The earlier check was a hack to prevent the issue described below.
- At the time of implementing network (using implementNetwork() method), if the VR needs to be deployed then
it follows the same path of regular VM deployment. This leads to a nested call to implementNetwork() while
preparing VR nics. This flow creates issues in dealing with network state transitions. The original call
puts network in "Implementing" state and then the nested call again tries to put it into same state resulting
in issues. In order to avoid it, implementNetwork() call for VR is replaced with below code.
requires storage migration resulting in failure of VM migration. This also improves
the hostsformigration api. Firstly we were trying to list all hosts and then
finding suitable storage pools for all volumes and then we were checking whether
vm migration requires storage migration to that host. Now the process is updated.
We are checking for only those volumes which are not in zone wide primary store.
We are verifying by comparing volumes->poolid->clusterid to host clusterid. If it
uses local or clusterids are different then verifying whether host has suitable
storage pools for the volume of the vm to be migrated too.
Changes:
PodId in which the router should get started was not being saved to the DB due to the VO's setter method not following the setXXX format. So when planner loaded the router from DB, it always got podId as null and that would allow planner to deploy the router in any pod. If the router happens to start in a different pod than the user VM, the Vm fails to start since the Dhcp service check fails.
Fixed the VO's setPodId method, that was causing the DB save operation fail.
List of changes:
1. Created a separate thread pool for handling cron and ping tasks. The size of the pool is based on direct.agent.pool.size. The existing direct agent pool will run all commands other than cron and ping.
2. For normal tasks (generated as part of user/admin API calls), if throttle limit is reached then tasks get queued up for subsequent execution once threads are available.
3. For cron and ping tasks (internally generated by MS like ping, VM sync etc.), if throttle limit is reached then these gets rejected. Since these are internally generated these can be rejected without any issues.
Added a new flag 'checkBeforeCleanup' to StopCommand based on which check is done to see if VM is running in HV host.
If VM is running then in this case it is not stopped and the operation bails out.
Also modified the MS code to call the StopCommand with appropriate value for the flag based on the context.
Currently it is only set to 'true' when called from the new vmsync logic based on powerstate of VM. For rest it
is set to 'false' meaning no change in behaviour.
And when the flag is updated on the resource accordingly generate usage events again.
Also when display flag is false in deployvm cmd it should be false for the volumes associated with the vm as well
template is downloading, template_store_ref has leftover not in ready
state, when create vm from that template, the code doesn't check either
zone id, nor template_store_ref state.
Conflicts:
engine/orchestration/src/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java
CLOUDSTACK-4762 : Enabling VGPU support for XenServer.
This feature is to enable the GPU-passthrough and vGPU functionality,
with the help of this feature, admins/users will be able to leverage
the GPU graphics unit power by deploying a virtul machine with GPU or
vGPU support or by changing the service offering of an existing VM
at any later point of time. There GPU/vGPU enabled VMs are able to run
graphical applications.
For now, this feature is only supported with XenServer hypervisor but
can be extended to add the support of other hypervisors.
PrepareForMigrationCommand, so that destination hypervisor can
mount pool. This further exposed an issue for KVM where iso
was not getting cleaned up upon successful migration, fixed as well.
introduces a force option in delete network to forcifully delete a
network. This comes handy in rare cases where network fails to implenet
and network is in shutdown state, but network shutdown to rollback
implement process fails as well.
Conflicts:
api/src/org/apache/cloudstack/api/command/user/network/DeleteNetworkCmd.java
server/src/com/cloud/user/DomainManagerImpl.java
This is happening as concurrent operations are happening on the same VM. Earlier this was not seen as all vm operations were synchronized at agent layer. By making execute.in.sequence
global config to false this restriction is no longer there. In the latest code operations to a single vm are synchronized by maintaining a job queue. In some scenarios the destroy vm operation
was not going through this job queue mechanism and so was resulting in failures due to simultaneous operations.
Adding the missing file
During HA and maintenance call different planners (if the original planners are not able to find capacity) which skip some heurestics
Changes:
- The vmprofile owner passed in to the planner should be the VM's account and not the caller
- Do not do the access check for Root Admin
Conflicts:
server/src/com/cloud/deploy/DeploymentPlanningManagerImpl.java
local Ip addresses do not get released resulting in all the link local
Ip addresses being consumed eventually.
fix ensure Nics with reservation strategy 'Start' should go through
release phase in the Nic life cycle so that release is performed before
Nic is removed to avoid resource leaks.
1. Egress default policy rules is send to the firewall provider. It is up to the
provider to configure the rules.
2. The default policy rules are send for both allow and deny default policy.
3. On network shutdown rules for delete are send.
4. For VR and SRX, by default deny the traffic. So no default rule to deny traffic is required.
Changes:
- During Vm migration while finding a new host within the cluster, we need to set the storagepool Id to the deployment plan too.
- This will indicate the planner that the volumes are ready and no need to find new pool
- This in turn will prevent the threshold check done during the pool allocation. This step is not needed since there is no need to allocate pools newly.
- Thus the migration wont fail because th threshold check fails.
Changes:
- Added 'virtualmachineid' parameter to the createVolume API to specify a VM for the volume. The Vm should be in 'Running' or 'Stopped' state.
- This parameter is used only when createVolume API is called using snapshotid parameter
- When this parameter is set, the volume is created from the snapshot in the pod/cluster of the VM. Also the volume is then attached to the VM in the same request
- If attach Volume fails but create has succeeded, the API errors out but the Volume created remains available. User may attach the same volume later
- When Vm is provided, but if no storage pool is available in the VM's pod/cluster then the volume is not created and API fails.
Volume create usage event and resource count werent getting registered. Check its type rather than it is UserVm since the code is coming from VirtualMachineManager.
The parameter is optional and true by default to preserve the original behavior
Conflicts:
api/src/org/apache/cloudstack/api/command/user/vpc/CreateVPCCmd.java
engine/orchestration/src/org/apache/cloudstack/engine/orchestration/NetworkOrchestrator.java
In VMware during VM start the existing disk information is used to configure the VMs. So even if a new disk is created using the new template VM continues to use the old disk.
Once the old root disk is marked for destroy force expunge it and sync the new disk into the VM folder before VM start
Implemented commands that are required for VR to bootup and Vm deployment to work
Modified hyperv agent code, to deploy VR with Boot Args, boot args passed to VR using KVP Exchange Component.
Fix for VR to boot up and get configured with boot args, Fixed issue in VolumeOrchestrator
Implemented SetFirewallRulesCommand in HyperV Resource
Implemented VR network commands to provide the necessary services from VR
Fixed hyperv localstorage path encode url issue. encode is converting space to '+'
Cloudstack sends requests to directly managed HV hosts (direct agents) using the direct agent thread pool. The size of the pool is determined by global config direct.agent.pool.size defaulted to 500.
Currently there is no restriction on the number of threads a direct agent can use from this shared thread pool to send requests to the host. This is fine as long as the host is responding to requests
in a reasonable amount of time. But if there is a considerable delay in getting response, the thread remain blocked for that much time. As more commands are send to the slow host threads keep getting
blocked. This can eventually lead to a situation where requests to healthy hosts cannot be processed as there are not enough free threads.
The problem being addressed here is to localize the impact of few bad hosts, so that entire management server is not affected.
One such way is to throttle based on the # of outstanding requests on per host basis. The outstanding requests to a host will be a % of direct agent pool size. This is configurable based on
direct.agent.thread.cap. The default value is 0.1 or 10%, a value of 1 would mean the old behavior where there is no upper cap. This will ensure that the impacted host will be bound by a upper cap on the number of threads it can use to process requests and not the entire pool.
commit c9ee0d12e191e803fb341f3f96e95ca434a36f6c
Author: Wei Zhou <w.zhou@leaseweb.com>
Date: Wed Oct 23 16:55:10 2013 +0200
CLOUDSTACK-4931, CLOUDSTACK-4937: setDetails to user VMs only
(cherry picked from commit a94acc5a43)
commit fe1586c71377bc6d219db2dcf088c40b65dd1fc4
Author: Anthony Xu <anthony.xu@citrix.com>
Date: Tue Oct 22 11:20:27 2013 -0700
CLOUDSTACK-4649:
vm sync tracks the pv driver version for xenserver
Anthony
commit 56a218f66eda540b4b4b04030ee71fc6863f8532
Author: Anthony Xu <anthony.xu@citrix.com>
Date: Mon Oct 21 16:10:07 2013 -0700
CLOUDSTACK-4649:
xs 6.1/6.2 introduce the new virtual platform, so there are two virtual platforms, windows PV driver version must match virtual platforms,
this patch tracks PV driver versions in vm details and template details.
Anthony
commit 4e85d28c678a6f96b5b70d8d33fc60f9d1ea3df6
Author: Laszlo Hornyak <laszlo.hornyak@gmail.com>
Date: Mon Oct 21 21:17:33 2013 +0200
removed unused static field
- s_httpClientManager was not used
Signed-off-by: Laszlo Hornyak <laszlo.hornyak@gmail.com>
commit d4121fa26023db236f7396cea455ef090672ae9a
Author: Chris Suich <chris.suich@netapp.com>
Date: Tue Oct 22 10:45:22 2013 -0400
Updated DataMotionServiceImpl and ApiResponseHelper based on review feedback.
commit aaf026e1e4204d405bcda2ae4f1a01b1d0f7e7cb
Author: Chris Suich <chris.suich@netapp.com>
Date: Thu Oct 17 14:27:12 2013 -0400
Added context to strategy sorting error responses
Added TODOs for DRYing out pickStrategy() overloading
commit a221f4aa3fb2ddc255bc35cf753f98f88f5bf44e
Author: Chris Suich <chris.suich@netapp.com>
Date: Wed Oct 16 09:57:28 2013 -0400
Updated inefficient strategy sorting/selection
Removed unnecessary canRevertSnapshot from PrimaryDataStoreDriver
Other general cleaup and fixes from reviews
commit 7d58949c6a1b7e853e891b59387a9620e8cd7a91
Author: Chris Suich <chris.suich@netapp.com>
Date: Mon Oct 14 14:01:22 2013 -0400
Added volume snapshot revert capability to SnapshotResponse
Updated UI to hide/show snapshot revert action per snapshot
Signed-off-by: Edison Su <sudison@gmail.com>
xs 6.1/6.2 introduce the new virtual platform, so there are two virtual platforms, windows PV driver version must match virtual platforms,
this patch tracks PV driver versions in vm details and template details.
Anthony
Introduction of a new Transaction API that is more consistent with the style
of Spring's transaction managment. The existing Transaction class was renamed
to TransactionLegacy. All of the non-DAO code in the management server has been
updated to use the new Transaction API.
was discussed on the mailing list as a useful debugging tool, currently
the log prints the DB id of the agent, which makes admins have to look
it up to know where the Command was run.
ACS is now comprised of a hierarchy of spring application contexts.
Each plugin can contribute configuration files to add to an existing
module or create it's own module.
Additionally, for the mgmt server, ACS custom AOP is no longer used
and instead we use Spring AOP to manage interceptors.
The managed context framework provides a simple way to add logic
to ACS at the various entry points of the system. As threads are
launched and ran listeners can be registered for onEntry or onLeave
of the managed context. This framework will be used specifically
to handle DB transaction checking and setting up the CallContext.
This framework is need to transition away from ACS custom AOP to
Spring AOP.
Various classes are using member injection to inject extensible objects.
Really those object should come from an AdapterList that is injected in.
This patch switches the code to use setter injection that will later allow
spring to inject an AdapterList or something similar to allow
extensibility.
Initial patch for VXLAN support.
Fully functional, hopefully, for GuestNetwork - AdvancedZone.
Patch Note:
in cloudstack-server
- Add isolation method VXLAN
- Add VxlanGuestNetworkGuru as plugin for VXLAN isolation
- Modify NetworkServiceImpl to handle extended vNet range for VXLAN isolation
- Add VXLAN isolation option in zoneWizard UI
in cloudstack-agent (kvm)
- Add modifyvxlan.sh script that handle bridge/vxlan interface manipulation script
-- Usage is exactly same to modifyvlan.sh
- BridgeVifDriver will call modifyvxlan.sh instead of modifyvlan.sh when VXLAN is used for isolation
Database changes:
- No change in database structure.
- VXLAN isolation uses same tables that VLAN uses to store vNet allocation status.
Known Issue and/or TODO:
- Some resource still says 'VLAN' in log even if VXLAN is used
- in UI, "Network - GuestNetworks" dosen't display VNI
-- VLAN ID field displays "N/A"
- Documentation!
Signed-off-by : Toshiaki Hatano <haeena@haeena.net>
Issue happens as there are more than one thread processing connect for a host simultaneously. The VM full sync. is not designed to work in this scenario and as a result user VMs may get stopped incorrectly.
Direct agent scan task runs at regular intervals (direct.agent.scan.interval defaulted to 90 secs) and identifies hosts that needs to be processed for connect. In a normal scenario hosts mostly get connected within that interval and there are no issues. But if due to some reason the connect process takes more time and is not completed by the time next agent scan runs. In this case, based on the db. state same hosts may get picked up again. And then there will be situations where more than one thread is processing connect for the same host.
The fix is to check if there is a thread already processing connect for a host and in this case all subsequent threads for that host will simply bail out. Also there may be a scenario where one thread already completed processing connect but another thread already got scheduled before that and will again repeat the same. This is also prevented by putting appropriate checks.
Changes:
- Passing the avoid set generated by the first pass of deployment to the second try.
- The second try is done, when the first pass that uses a reserved plan fails to deploy on the reserved host, to search over the entire zone again
Changes:
- Locking the group and save reservation mechanism done by DPM
- Added admin operation to cleanup VM reservations
- DPM will also cleanup VM reservations on startup
This feature allows a user to deploy VMs only in the resources dedicated to his account or domain.
1. Resources(Zones, Pods, Clusters or hosts) can be dedicated to an account or domain.
Implemented 12 new APIs to dedicate/list/release resources:
- dedicateZone, listDedicatedZones, releaseDedicatedZone for a Zone.
- dedicatePod, listDedicatedPods, releaseDedicatedPod for a Pod.
- dedicateCluster, listDedicatedClusters, releaseDedicatedCluster for a Cluster
- dedicateHost, listDedicatedHosts, releaseDedicatedHost for a Host.
2. Once a resource(eg. pod) is dedicated to an account, other resources(eg. clusters/hosts) inside that cannot be further dedicated.
3. Once a resource is dedicated to a domain, other resources inside that can be further dedicated to its sub-domain or account.
4. If any resource (eg.cluster) is dedicated to a account/domain, then resources(eg. Pod) above that cannot be dedicated to different accounts/domain (not belonging to the same domain)
5. To use Explicit dedication, user needs to create an Affinity Group of type 'ExplicitDedication'
6. A VM can be deployed with the above affinity group parameter as an input.
7. A new ExplicitDedicationProcessor has been added which will process the affinity group of type 'Explicit Dedication' for a deployment of a VM that demands dedicated resources.
This processor implements the AffinityGroupProcessor adapter. This processor will update the avoid list.
8. A VM requesting dedication will be deployed on dedicatd resources if available with the user account.
9. A VM requesting dedication can also be deployed on the dedicated resources available with the parent domains iff no dedicated resources are available with the current user's account or
domain.
10. A VM (without dedication) can be deployed on shared host but not on dedicated hosts.
11. To modify the dedication, the resource has to be released first.
12. Existing Private zone functionality has been redirected to Explicit dedication of zones.
13. Updated the db upgrade schema script. A new table "dedicated_resources" has been added.
14. Added the right permissions in commands.properties
15. Unit tests: For the new APIs and Service, added unit tests under : plugins/dedicated-resources/test/org/apache/cloudstack/dedicated/DedicatedApiUnitTest.java
16. Marvin Test: To dedicate host, create affinity group, deploy-vm, check if vm is deployed on the dedicated host.
Changes:
- In VolumeReservationVO, the getter method of a column had a typo, causing us to create a wrong searchbuilder. It was searching over the 'id' column instead of 'vm_reservation_id' causing
- This bug was causing the vm deployment to choose a wrong pool during deployment since the search was choosing incorrectly
- This bug in the GenericSearchBuilder is also fixed - if the getter method does not use the standard 'get' or 'is' prefix, one should annotate that method using
@Column(name = "<column_name>") and indicate which column this method refers to. This will cause the GenericSearchBuilder to identify the field correctly.
Changes:
- There is no good mechanism currently to figure out if the deployment failed due to affinity groups only
- We can just hint the user that the deployment might have failed due to the affinity groups and ask to review the input
Changes:
- Cloud-engine 2 step reserver and deploy flow was not retrying out of clusters, if there are no resources in the volume's cluster.
- Fixed this by letting the reservationm step not error out and continue to let deploy step find out resources outside cluster
The bug was found was Harikrishna P. when iso was used, in case of Isos, the
create vm from scratch which fails due to factory being used to get the object
which is not spring injected
Signed-off-by: Rohit Yadav <bhaisaab@apache.org>
Supporting kickstart in CloudStack baremetal
able to start vm
Conflicts:
client/tomcatconf/componentContext.xml.in
server/src/com/cloud/baremetal/BareMetalTemplateAdapter.java
server/src/com/cloud/baremetal/BareMetalVmManagerImpl.java
server/src/com/cloud/vm/UserVmManagerImpl.java
Corresponding getter/setter is renamed too.
Reason is GenericDao does not update the field unless the method name matches the field name; the setter of this VO was one such case.