This fixes the following:
- Unchecked thread growth in RemoteEndHostEndPoint
- Potential NPE while finding EP for a storage/scope
Unbounded thread growth can be reproduced with following findings:
- Every unreachable template would produce 6 new threads (in a single
ScheduledExecutorService instance) spaced by 10 seconds
- Every reachable template url without the template would produce 1 new
thread (and one ScheduledExecutorService instance), it errors out quickly without
causing more thread growth.
- Every valid url will produce upto 10 threads as the same ep (endpoint
instance) will be reused to query upload/download (async callback)
progresses.
Every RemoteHostEndPoint instances creates its own
ScheduledExecutorService instance which is why in the jstack dump, we
see several threads that share the prefix RemoteHostEndPoint-{1..10}
(given poolsize is defined as 10, it uses suffixes 1-10).
This fixes the discovered thread leakage with following notes:
- Instead of ScheduledExecutorService instance, a cached pool could be
used instead and was implemented, and with `static` scope to be reused
among other future RemoteHostEndPoint instances.
- It was not clear why we would want to wait when we've Answers returned
from the remote EP, and therefore a scheduled/delayed Runnable was
not required at all for processing answers. ScheduledExecutorService
was therefore not really required, moved to ExecutorService instead.
- Another benefit of using a cached pool is that it will shutdown
threads if they are not used in 60 seconds, and they get re-used for
future runnable submissions.
- Caveat: the executor service is still unbounded, however, the use-case
that this method is used for short jobs to check upload/download
progresses fits the case here.
- Refactored CmdRunner to not use/reference objects from parent class.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Snapshots are not deleted resulting unexpected storage consumption in case of VMware.
Steps to reproduce this issue :
In VMware setup, create a snapshot of volume say Snap1.
After successful creation of snapshot Snap1, create new snapshot of same volume say Snap2.snapshots
While Snap2 is in BackingUp state, delete Snap1.
Snap1 will disappear from Web UI, but when we check secondary storage, files associated with Snap1 still persists even after cleanup job is performed.
In snapshot_store_ref table in DB, Snap1 will be in ready state instead of Destroyed.
Also, in snapshots table, status of Snap1 will be Destroyed but removed column will be null and will never change to the date of snapshot removal.
Fix for this issue :
In VMware, snapshot chain is not maintained, instead full snapshot is taken every time.
So, it makes sense not to assign parent snapshot id for the snapshot. In this way, every snapshot will be individual and can be deleted successfully whenever required.
This PR adds an ability to Pass a new parameter, locationType,
to the “createSnapshot” API command. Depending on the locationType,
we decide where the snapshot should go in case of managed storage.
There are two possible values for the locationType param
1) `Standard`: The standard operation for managed storage is to
keep the snapshot on the device. For non-managed storage, this will
be to upload it to secondary storage. This option will be the
default.
2) `Archive`: Applicable only to managed storage. This will
keep the snapshot on the secondary storage. For non-managed
storage, this will result in an error.
The reason for implementing this feature is to avoid a single
point of failure for primary storage. Right now in case of managed
storage, if the primary storage goes down, there is no easy way
to recover data as all snapshots are also stored on the primary.
This features allows us to mitigate that risk.
CLOUDSTACK-9238: Fix URL length to 2048 for all url fields in VOI will update the PR to add max field length in the API commands too
* pr/1567:
API: update url field max length
not needed on host table
Fix URL length to 2048 for all url fields in VO
Signed-off-by: Will Stevens <williamstevens@gmail.com>
Taking fast and efficient volume snapshots with XenServer (and your storage provider)A XenServer storage repository (SR) and virtual disk image (VDI) each have UUIDs that are immutable.
This poses a problem for SAN snapshots, if you intend on mounting the underlying snapshot SR alongside the source SR (duplicate UUIDs).
VMware has a solution for this called re-signaturing (so, in other words, the snapshot UUIDs can be changed).
This PR only deals with the CloudStack side of things, but it works in concert with a new XenServer storage manager created by CloudOps (this storage manager enables re-signaturing of XenServer SR and VDI UUIDs).
I have written Marvin integration tests to go along with this, but cannot yet check those into the CloudStack repo as they rely on SolidFire hardware.
If anyone would like to see these integration tests, please let me know.
JIRA ticket: https://issues.apache.org/jira/browse/CLOUDSTACK-9281
Here's a video I made that shows this feature in action:
https://www.youtube.com/watch?v=YQ3pBeL-WaA&list=PLqOXKM0Bt13DFnQnwUx8ZtJzoyDV0Uuye&index=13
* pr/1403:
Faster logic to see if a cluster supports resigning
Support for backend snapshots with XenServer
Signed-off-by: Will Stevens <williamstevens@gmail.com>
The S3 implementation is far from finished, this commit focusses on the bases.
- Upgrade AWS SDK to latest version.
- Rewrite S3 Template downloader.
- Rewrite S3Utils utility class.
- Improve addImageStoreS3 API command.
- Split various classes for convenience.
- Various minor improvements and code optimalisations.
A side effect of the new AWS SDK is that it, by default, uses the V4 signature. Therefore I added an option to specify the Signer, so it stays compatible with previous versions.
Guys, can you review it? things need to be discussed:
(1) this supports KVM/QCOW2 only. Anyone want to implement for other Hypervisor/format ?
(2) The original data volume (on primary storage) will be removed.
(3) The script uses the default timeout in libvirtComputingResource. Do we need to add one in global configuration (like copy.volume.wait or backup.snapshot.wait, create.volume.from.snapshot.wait)
(4) In scripts/storage/qcow2/managesnapshot.sh, I use "qemu-img convert -f qcow2 -O qcow2" to copy the snapshot from secondary to primary (hence there is no base image file), instead of "cp -f", this is because convert is faster than cp in my testing.
* pr/732:
CLOUDSTACK-5863: revert volume snapshot for KVM/QCOW2
Signed-off-by: Wei Zhou <w.zhou@tech.leaseweb.com>
This reverts commit cd7218e241, reversing
changes made to f5a7395cc2.
Reason for Revert:
noredist build failed with the below error:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:compile (default-compile) on project cloud-plugin-hypervisor-vmware: Compilation failure
[ERROR] /home/jenkins/acs/workspace/build-master-noredist/plugins/hypervisors/vmware/src/com/cloud/hypervisor/guru/VMwareGuru.java:[484,12] error: non-static variable logger cannot be referenced from a static context
[ERROR] -> [Help 1]
even the normal build is broken as reported by @koushik-das on dev list
http://markmail.org/message/nngimssuzkj5gpbz
This patch makes it possible to expose VM UUID to subsystems, this can be
useful for implementing VM Snapshots for KVM in future.
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
This reverts commit 180afe52e5.
This broke the build on master:
master build broken with the below error http://jenkins.buildacloud.org/job/build-master-slowbuild/2101/consoleText
```
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /home/jenkins/acs/workspace/build-master-slowbuild/plugins/hypervisors/xenserver/test/com/cloud/hypervisor/xenserver/resource/wrapper/xenbase/CitrixRequestWrapperTest.java:[1581,51] error: constructor CreateVMSnapshotCommand in class CreateVMSnapshotCommand cannot be applied to given types;
[ERROR] required: String,String,VMSnapshotTO,List<VolumeObjectTO>,String
found: String,VMSnapshotTO,List<VolumeObjectTO>,String
reason: actual and formal argument lists differ in length
/home/jenkins/acs/workspace/build-master-slowbuild/plugins/hypervisors/xenserver/test/com/cloud/hypervisor/xenserver/resource/wrapper/xenbase/CitrixRequestWrapperTest.java:[1623,53] error: no suitable constructor found for RevertToVMSnapshotCommand(String,VMSnapshotTO,List<VolumeObjectTO>,String)
[INFO] 2 errors
[INFO] -------------------------------------------------------------
```
This was PR #717 towards 4.5
This patch makes it possible to expose VM UUID to subsystems, this can be
useful for implementing VM Snapshots for KVM in future.
This was PR #717 towards 4.5
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
(cherry picked from commit 0062ff2672)
Signed-off-by: Remi Bergsma <github@remi.nl>
... of new volumes. Following changes are implemented 1. Disable or enable a pool with the
updateStoragePool api. A new 'enabled' parameter added for the same. 2. When a
pool is disabled the state of the pool is updated to 'Disabled' in the db. On
enabling it is updated back to 'Up'. Alert is raised when a pool is disabled or
enabled. 3. Updated other storage providers to also honour the disabled state.
4. A disabled pool is skipped by allocators for provisioing of new volumes. 5.
Since the allocators skip a disabled pool for provisioning of volumes, the
volumes are also not listed as a destination for volume migration.
FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Disabling+Storage+Pool+for+Provisioning
This closes#257
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
inconsistent state (NotUploaded or UploadInProgress) and doesn't transition to any terminal
state (Uploaded, Uploaded). As a result these volumes/templates cannot be removed. Fix is to handle such
volume/template entries correctly.
Fixed the following:
- Destroying volume in 'UploadAbandoned' state resulted in NPE
- Existing upload volume functionality interfered with this, added proper checks to prevent that
When we download volume then we create entry in volume_store_ref table.
We mark the volume entry to ready state once download_url gets generated.
When we migrate that volume, then again one more entry is created with same volume id.
Its state is marked as allocated. Later we try to list only one dataobject in datastore
for state transition during volume migration. If the listed volume's state is allocated
then migration passes otherwise it fails.
Below fix will remove the randomness and give priority to volume entry which is made for
migration (download_url/extracturl will be null in case of migration). Giving priority to
download volume case is not needed as there will be only one entry in that case so no randomness.
While taking a snapshot of a volume, CS chooses the endpoint to perform backup snapshot operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume is attached to a VM, the endpoint chosen by CS should be the host that contains the VM.
While expunging a volume, CS chooses the endpoint to perform delete operation by selecting any host that has the storage containing the volume mounted on it.
Instead, if the volume to be deleted is attached to a VM, the endpoint chosen by CCP should be the host that contains the VM.
to support IOPS capacity control in a cluster wide storage pool and a
local storage pool
to enable hypervisor type check, storage type check for root volume and
avoid list check
Since original commit(31de58edab) contained
a bug, it was reverted and this commit is a revised one.
to support IOPS capacity control in a cluster wide storage pool and a
local storage pool
to enable hypervisor type check, storage type check for root volume and
avoid list check
1. Adding the missing Template/Volume URLs expiration functionality
2. Improvement - While deleting the volume during expiration use rm -rf as vmware now contains directoy
3. Improvement - Use standard Answer so that the error gets logged in case deletion of expiration link didnt work fine.
4. Improvement - In case of domain change, expire the old urls
with hostid included was passed to the local storage pool allocator, it returned all the local
storage pools in the cluster, instead of just the local pool on the given host in the plan.
This was happening the search at a host level was happening only for data disk. Fixed this.
Additionally, the query to list the storage pools on a host was failing if the pool did have
tags. Fixed the query too.
CLOUDSTACK-6802: Fix for not being able to attach data disk on local. This issue gets fixed
with the above issue too. The query to list pools on a host was failing if there were no
tags on the storage pool.
template is downloading, template_store_ref has leftover not in ready
state, when create vm from that template, the code doesn't check either
zone id, nor template_store_ref state.
Conflicts:
engine/orchestration/src/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java
storage pool (SMB) and attached to a running vm can be live migrated to another shared storage
pool. Also a vm and its volumes can be live migrated to another host and storage pool respectively.
2) Corrected some logging in MidoNetPublicNetworkGuru - removed .toString method call on the objects in the log body as toString is called on the object by default when use log4j
encoded. This cause createStoragePool or addImageStore command to fail if special
characters were present. Updated the code to pass user, password and domain as part
of details while adding primary or secondary. Also made changes on server side to
handle it.
Create two storage pools, one with storage tag X, one with storage tag Y.
Create a service offering with storage tag X.
Create a disk offering with storage tag Y.
Attempt to deploy a virtual machine with a datadisk, using given offerings, it fails.
Deployment planner keeps a global object 'avoid'. It loops through each volume to
be created, asking storage allocators for matching pools, passing this avoid object.
First disk matches a pool or pools, adds ALL other pools to avoid object, then
deployment planner attaches matching pools to a list for that disk.
Second disk matches a pool, adds all other pools to avoid object, then deployment
planner says "wait, matching pool is in avoid, can't use it". Oops. In fact, at this
point ALL pools are in avoid (unless there are other pools that have both tags).
Need to remove matching pool from the avoid set during each select phase.