Update volume chain_info to NULL during cold migration.
Otherwise during VM start, CCP will configure and try to power-on the VM with wrong disk information.
(cherry picked from commit 7b32b8a268)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
pool, the source and destination pools cannot be local and cluster/zone and vice versa.
Cloudstack detects it and throws a exception. However, the end user only sees an
unexpected exception and not the reason for failure. Fixed it by making sure the
reason for the failure is correctly captured and shown to the end user.
(cherry picked from commit cffae8eef0)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Conflicts:
server/src/com/cloud/storage/VolumeApiServiceImpl.java
During VM creation, if vm.instancename.flag is set to true and hypervisor type is VMware, check if VM with the same hostname already exists in the zone.
(cherry picked from commit 5f9e4fddf3)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When migration fails instead of returning NULL, throw the exception.
(cherry picked from commit a5a65c7b55)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
If VM has been cold migrated across different VMware DCs, then unregister the VM from source host.
(cherry picked from commit 15b348632d)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Before registering a VM check if a different CS VM with same name exists in vCenter.
(cherry picked from commit 33179cce56)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Fixed the following:
- Destroying volume in 'UploadAbandoned' state resulted in NPE
- Existing upload volume functionality interfered with this, added proper checks to prevent that
Call removeRawUsageRecords with interval (> 0) and it will clean up cloud_usage
table by removing records older than interval no. of days from today (current date)
and in case it runs when the job exec time is near, it will fail alerting
user to try again after a 15 min window.
There is an issue with async job scheduler, if this API were async it tries
to search and remove job from cloud_usage.async_job table and fails which is
why this API is sync and extends BaseCmd.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Separate global config to enable/disable Storage Migration during normal deployment
Introduced a configuration parameter named enable.storage.migration
(cherry picked from commit c55bc0b2d1)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
While taking a snapshot of a volume, CS chooses the endpoint to perform backup snapshot operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume is attached to a VM, the endpoint chosen by CS should be the host that contains the VM.
(cherry picked from commit a75a431373)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
When we download volume then we create entry in volume_store_ref table.
We mark the volume entry to ready state once download_url gets generated.
When we migrate that volume, then again one more entry is created with same volume id.
Its state is marked as allocated. Later we try to list only one dataobject in datastore
for state transition during volume migration. If the listed volume's state is allocated
then migration passes otherwise it fails.
Below fix will remove the randomness and give priority to volume entry which is made for
migration (download_url/extracturl will be null in case of migration). Giving priority to
download volume case is not needed as there will be only one entry in that case so no randomness.
Update volume chain_info to NULL during cold migration.
Otherwise during VM start, CCP will configure and try to power-on the VM with wrong disk information.
During VM creation, if vm.instancename.flag is set to true and hypervisor type is VMware, check if VM with the same hostname already exists in the zone.
While taking a snapshot of a volume, CS chooses the endpoint to perform backup snapshot operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume is attached to a VM, the endpoint chosen by CS should be the host that contains the VM.
During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
During vmsync if StopCommand (issued as part of PowerOff/PowerMissing report) fails to stop VM (since VM is running on HV),
don't transition VM state to "Stopped" in CS db. Also added a check to throw ConcurrentOperationException if vm state is not
"Running" after start operation.
on calling GetUploadParamsForVolume, persisting the metadata to db
validating the account limits and incrementing the appropriate limits
encoded the metadata on management server using preshared key
Changes:
- This is a race condition between the deleteDomain thread and AccountChecker thread. DeleteDomain thread marks the domain as inactive and proceeds for cleanup, AccountChecker thread that runs at the same time cleans up any domains marked as inactive.
- When the DeleteDomain thread finds that domain is already removed, it need not error out since the domain deletion has already happened
Changes:
- When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is not setting the DeploymentPlanner to the VmWorkJob. So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
Changes:
- When there is HA we try to redeploy the affected vm using regular planners and if that fails we retry using the special planner for HA (which skips checking disable threshold)
Now because of job framework the InsufficientCapacittyException gets masked and the special planners are not called. Job framework needs to be fixed to rethrow the correct exception.
- Also the VM Work Job framework is not setting the DeploymentPlanner to the VmWorkJob. So the HA Planner being passed by HAMgr was not getting used.
- Now the job framework sets the planner passed in by any caller of the VM Start operation, to the job
Changes:
- This is a race condition between the deleteDomain thread and AccountChecker thread. DeleteDomain thread marks the domain as inactive and proceeds for cleanup, AccountChecker thread that runs at the same time cleans up any domains marked as inactive.
- When the DeleteDomain thread finds that domain is already removed, it need not error out since the domain deletion has already happened
Upgrade fails if value is set using plain text encoding, the value needs to
be encrypted (if a key was provided during db was setup).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 6321a29e43)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Upgrade fails if value is set using plain text encoding, the value needs to
be encrypted (if a key was provided during db was setup).
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
On default iptables rules are updated to add ACCEPT egress traffic.
If the network egress default policy is false, CS remove ACCEPT and adds the DROP rule which
is egress default rule when there are no other egress rules.
If the CS network egress default policy is true, CS won't configure any default rule for egress because
router already came up to accept egress traffic. If there are already egress rules for network then the
egress rules get applied on VR.
For isolated network with out firewall service, VR default allows egress traffic (guestnetwork --> public network)
On default iptables rules are updated to add ACCEPT egress traffic.
If the network egress default policy is false, CS remove ACCEPT and adds the DROP rule which
is egress default rule when there are no other egress rules.
If the CS network egress default policy is true, CS won't configure any default rule for egress because
router already came up to accept egress traffic. If there are already egress rules for network then the
egress rules get applied on VR.
For isolated network with out firewall service, VR default allows egress traffic (guestnetwork --> public network)
Changes;
- Upgrades maven-war plugin to 4.5 (faster war packaging)
- Upgrade spring framework to latest minor release
- Upgrade ehcache, jasypt, httpclient, httpcore and other core dependencies
- Upgrade to latest ipv6 library, fix unit test NetUtilsTest
- httpcore and httpclient are sharing same version variable
- commons-httpclient is different that httpclient, the fix gives it a separate var
- Apidocs failed to generate and get stuck with new reflections version, for now
we will continue using 0.9.8
Newer dependencies can be listed using:
mvn versions:display-dependency-updates -Dnoredist -Dsimulator -P developer,systemvm
Testing;
- Tested using Maven 3.2.1
- Local noredist build with unit tests succeeds
- CloudStack mgmt server started, basic business layer tests work
- Observed 10-15% build time improvement using new maven-war plugin
Branch: bugfix/4.5-8011 (commits are squashed in favour of a linear history)
Pull request:
https://github.com/apache/cloudstack/pull/50
This closes#50
TravisCI build summary:
https://travis-ci.org/shapeblue/cloudstack/builds/42902172
- Build passes with unit tests
- Apidocs generates successfully
- Most integration tests pass, some fail due to timeout errors, second re-run
passes some of them
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit fac7bfc5d5)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Conflicts:
pom.xml
Changes;
- Upgrades maven-war plugin to 4.5 (faster war packaging)
- Upgrade spring framework to latest minor release
- Upgrade ehcache, jasypt, httpclient, httpcore and other core dependencies
- Upgrade to latest ipv6 library, fix unit test NetUtilsTest
- httpcore and httpclient are sharing same version variable
- commons-httpclient is different that httpclient, the fix gives it a separate var
- Apidocs failed to generate and get stuck with new reflections version, for now
we will continue using 0.9.8
Newer dependencies can be listed using:
mvn versions:display-dependency-updates -Dnoredist -Dsimulator -P developer,systemvm
Testing;
- Tested using Maven 3.2.1
- Local noredist build with unit tests succeeds
- CloudStack mgmt server started, basic business layer tests work
- Observed 10-15% build time improvement using new maven-war plugin
Branch: bugfix/4.5-8011 (commits are squashed in favour of a linear history)
Pull request:
https://github.com/apache/cloudstack/pull/50
This closes#50
TravisCI build summary:
https://travis-ci.org/shapeblue/cloudstack/builds/42902172
- Build passes with unit tests
- Apidocs generates successfully
- Most integration tests pass, some fail due to timeout errors, second re-run
passes some of them
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
remove 441to450 ddl
(cherry picked from commit 5578616143)
(cherry picked from commit f18d6238b0)
Conflicts:
engine/schema/src/com/cloud/upgrade/DatabaseUpgradeChecker.java
schema: Add upgrade paths from 4.3.2 to 4.4.0
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 73c62837b5)
Conflicts:
engine/schema/src/com/cloud/upgrade/DatabaseUpgradeChecker.java
engine/schema/src/com/cloud/upgrade/dao/Upgrade441to450.java
setup/db/db/schema-441to450.sql
merged new work from schema-441to450.sql into schema-442to450.sql
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
root cause:
when vmsync reports system VM is down, CCP doesn't release the VM resource before starting it.
fix:
make sure cleanup is called for a VM when it is reported as Stopped
pool, the source and destination pools cannot be local and cluster/zone and vice versa.
Cloudstack detects it and throws a exception. However, the end user only sees an
unexpected exception and not the reason for failure. Fixed it by making sure the
reason for the failure is correctly captured and shown to the end user.
Revert "CLOUDSTACK-7073: Added domainId field to the user table in order to restrict duplicated users creation on the db level"
This reverts commit 5a96d8ef5c.
Conflicts:
setup/db/db/schema-440to450.sql
Revert "CLOUDSTACK-7073: Added domainId field to the user table in order to restrict duplicated users creation on the db level"
This reverts commit 5a96d8ef5c.
Conflicts:
setup/db/db/schema-440to450.sql
When a template a copied from a secondary to primary, we were trying to release a lock
twice, once in the create/copy base image function and in the create/copy base image
complete callback routine. This caused the exception as reported in the bug. Fixed by
updating the code make sure we release the lock in copy base image function only as
this is the place we took acquired the lock.
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
Unnecessary exception in MS logs while removing default NIC from VM. Following changes are made:
1. Changed the exception from CloudRuntimeException to InvalidParameterValueExecption.
2. Moved out validation logic to UserVMManagerImpl from VirtualMachineManagerImpl.
3. Handling InvalidParameterValueException from async API calls so that they are not logged as ERROR in MS logs.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
Also when the ssvm is destroyed all the download urls are expired to be cleaned up in the next run by the new ssvm.
(cherry picked from commit ce90837357)
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
(cherry picked from commit 39fe766c2b)
While expunging a volume, CS chooses the endpoint to perform delete operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume to be deleted is attached to a VM, the endpoint chosen by CCP should be the host that contains the VM.
(cherry picked from commit f1e3e83bbf)
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.
(cherry picked from commit 55b4ead495)
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
GPU enabled hosts from non-GPU VM deployment.
Cluster reordering is based on the number of unique host tags in a cluster,
cluster with most number of unique host tags will put at the end of list.
Hosts with GPU capability will get tagged with implicit tags defined by
global config param 'implicit.host.tags' at the time os host discovery.
Also added FirstFitPlannerTest unit test file.
While expunging a volume, CS chooses the endpoint to perform delete operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume to be deleted is attached to a VM, the endpoint chosen by CCP should be the host that contains the VM.
(outside cloudstack), the state of the vm is not updated in cloudstack db. The
ping task was not checking for resource (host) status by default. The power
state of the vms is returned as part of the resource status. Fixed the issue by
making sure ping task atleast tries once to get the resource status.
Separate global config to enable/disable Storage Migration during normal deployment
Introduced a configuration parameter named enable.storage.migration
This adds an upgrade path from 4.3.1 to 4.4.0, the implementation of which
simply extends the Upgrade430to440 as there was no schema change between 4.3.0
and 4.3.1
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 208399354f)
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
In vm secondary ips case static nat configured to vm primary/secondary ips
IP1-->vm1Ip1, IP2-->vm1Ip2
While destroying vm deleting all static nats associated with the vm
1. While destroying a ROOT volume do the lookup of the associated VM under the DC and not just cluster.
2. In case of VMware, during VM start if a volume is being recreated no need to detach the old volume because
we now expunge it immediately and don't wait for the storage cleanup task to run.
- Check to see if network is implemented changed from 'state == Implementing||Implemented' to 'state == Implemented'.
The earlier check was a hack to prevent the issue described below.
- At the time of implementing network (using implementNetwork() method), if the VR needs to be deployed then it follows
the same path of regular VM deployment. This leads to a nested call to implementNetwork() while preparing VR nics. This
flow creates issues in dealing with network state transitions. The original call puts network in "Implementing" state
and then the nested call again tries to put it into same state resulting in issues. In order to avoid it, implementNetwork()
call for VR is replaced with below code.
cleanup the rules then destroy
fix adds a provision to specify if cleanup is needed on network on
shutdown. VR is marked as to not to require network rules clean up on
network shutdown as the VR is destroyed and recreated.
ran the simulator tests that test network life cycle
The following changes are made:
- Check to see if network is implemented changed from 'state == Implementing||Implemented' to 'state == Implemented'.
The earlier check was a hack to prevent the issue described below.
- At the time of implementing network (using implementNetwork() method), if the VR needs to be deployed then
it follows the same path of regular VM deployment. This leads to a nested call to implementNetwork() while
preparing VR nics. This flow creates issues in dealing with network state transitions. The original call
puts network in "Implementing" state and then the nested call again tries to put it into same state resulting
in issues. In order to avoid it, implementNetwork() call for VR is replaced with below code.
requires storage migration resulting in failure of VM migration. This also improves
the hostsformigration api. Firstly we were trying to list all hosts and then
finding suitable storage pools for all volumes and then we were checking whether
vm migration requires storage migration to that host. Now the process is updated.
We are checking for only those volumes which are not in zone wide primary store.
We are verifying by comparing volumes->poolid->clusterid to host clusterid. If it
uses local or clusterids are different then verifying whether host has suitable
storage pools for the volume of the vm to be migrated too.
to support IOPS capacity control in a cluster wide storage pool and a
local storage pool
to enable hypervisor type check, storage type check for root volume and
avoid list check
Since original commit(31de58edab) contained
a bug, it was reverted and this commit is a revised one.
to support IOPS capacity control in a cluster wide storage pool and a
local storage pool
to enable hypervisor type check, storage type check for root volume and
avoid list check