Skip to content

Commit e3cf7fd

Browse files
api,agent,server,engine-schema: scalability improvements
Following changes and improvements have been added: - Improvements in handling of PingRoutingCommand 1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs. 2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch 3. Optimized scanning stalled VMs - Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers` - Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine - Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0. - Added caching for account/use role API access with expiration after write set to 60 seconds. - Added caching for some recurring DB retrievals 1. CapacityManager - listing service offerings - beneficial in host capacity calculation 2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins 3. DownloadListener - hypervisors for zone - beneficial for host joins 5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands - Optimized MS list retrieval for agent connect - Optimize finding ready systemvm template for zone - Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks - Changes in agent-agentmanager connection with NIO client-server classes 1. Optimized the use of the executor service 2. Refactore Agent class to better handle connections. 3. Do SSL handshakes within worker threads 5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end. 6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used. - Improvements in StatsCollection - minimize DB retrievals. - Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals. - Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools. - Minor improvements in resource limit calculations wrt DB retrievals Signed-off-by: Abhishek Kumar <[email protected]> Co-authored-by: Abhishek Kumar <[email protected]> Co-authored-by: Rohit Yadav <[email protected]>
1 parent 019f2c6 commit e3cf7fd

File tree

128 files changed

+3072
-2041
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+3072
-2041
lines changed

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.6
1+
3.10

agent/conf/agent.properties

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -433,3 +433,9 @@ iscsi.session.cleanup.enabled=false
433433

434434
# Implicit host tags managed by agent.properties
435435
# host.tags=
436+
437+
# Timeout(in seconds) for SSL handshake when agent connects to server
438+
#ssl.handshake.timeout=
439+
440+
# Wait(in seconds) during agent reconnections
441+
#backoff.seconds=

agent/src/main/java/com/cloud/agent/Agent.java

Lines changed: 401 additions & 371 deletions
Large diffs are not rendered by default.

agent/src/main/java/com/cloud/agent/AgentShell.java

Lines changed: 33 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,29 +16,6 @@
1616
// under the License.
1717
package com.cloud.agent;
1818

19-
import com.cloud.agent.Agent.ExitStatus;
20-
import com.cloud.agent.dao.StorageComponent;
21-
import com.cloud.agent.dao.impl.PropertiesStorage;
22-
import com.cloud.agent.properties.AgentProperties;
23-
import com.cloud.agent.properties.AgentPropertiesFileHandler;
24-
import com.cloud.resource.ServerResource;
25-
import com.cloud.utils.LogUtils;
26-
import com.cloud.utils.ProcessUtil;
27-
import com.cloud.utils.PropertiesUtil;
28-
import com.cloud.utils.backoff.BackoffAlgorithm;
29-
import com.cloud.utils.backoff.impl.ConstantTimeBackoff;
30-
import com.cloud.utils.exception.CloudRuntimeException;
31-
import org.apache.commons.daemon.Daemon;
32-
import org.apache.commons.daemon.DaemonContext;
33-
import org.apache.commons.daemon.DaemonInitException;
34-
import org.apache.commons.lang.math.NumberUtils;
35-
import org.apache.commons.lang3.BooleanUtils;
36-
import org.apache.commons.lang3.StringUtils;
37-
import org.apache.logging.log4j.Logger;
38-
import org.apache.logging.log4j.LogManager;
39-
import org.apache.logging.log4j.core.config.Configurator;
40-
41-
import javax.naming.ConfigurationException;
4219
import java.io.File;
4320
import java.io.FileNotFoundException;
4421
import java.io.IOException;
@@ -53,6 +30,31 @@
5330
import java.util.Properties;
5431
import java.util.UUID;
5532

33+
import javax.naming.ConfigurationException;
34+
35+
import org.apache.commons.daemon.Daemon;
36+
import org.apache.commons.daemon.DaemonContext;
37+
import org.apache.commons.daemon.DaemonInitException;
38+
import org.apache.commons.lang.math.NumberUtils;
39+
import org.apache.commons.lang3.BooleanUtils;
40+
import org.apache.commons.lang3.StringUtils;
41+
import org.apache.logging.log4j.LogManager;
42+
import org.apache.logging.log4j.Logger;
43+
import org.apache.logging.log4j.core.config.Configurator;
44+
45+
import com.cloud.agent.Agent.ExitStatus;
46+
import com.cloud.agent.dao.StorageComponent;
47+
import com.cloud.agent.dao.impl.PropertiesStorage;
48+
import com.cloud.agent.properties.AgentProperties;
49+
import com.cloud.agent.properties.AgentPropertiesFileHandler;
50+
import com.cloud.resource.ServerResource;
51+
import com.cloud.utils.LogUtils;
52+
import com.cloud.utils.ProcessUtil;
53+
import com.cloud.utils.PropertiesUtil;
54+
import com.cloud.utils.backoff.BackoffAlgorithm;
55+
import com.cloud.utils.backoff.impl.ConstantTimeBackoff;
56+
import com.cloud.utils.exception.CloudRuntimeException;
57+
5658
public class AgentShell implements IAgentShell, Daemon {
5759
protected static Logger LOGGER = LogManager.getLogger(AgentShell.class);
5860

@@ -406,7 +408,9 @@ public void init(String[] args) throws ConfigurationException {
406408

407409
LOGGER.info("Defaulting to the constant time backoff algorithm");
408410
_backoff = new ConstantTimeBackoff();
409-
_backoff.configure("ConstantTimeBackoff", new HashMap<String, Object>());
411+
Map<String, Object> map = new HashMap<>();
412+
map.put("seconds", _properties.getProperty("backoff.seconds"));
413+
_backoff.configure("ConstantTimeBackoff", map);
410414
}
411415

412416
private void launchAgent() throws ConfigurationException {
@@ -455,6 +459,11 @@ public void launchNewAgent(ServerResource resource) throws ConfigurationExceptio
455459
agent.start();
456460
}
457461

462+
@Override
463+
public Integer getSslHandshakeTimeout() {
464+
return AgentPropertiesFileHandler.getPropertyValue(AgentProperties.SSL_HANDSHAKE_TIMEOUT);
465+
}
466+
458467
public synchronized int getNextAgentId() {
459468
return _nextAgentId++;
460469
}

agent/src/main/java/com/cloud/agent/IAgentShell.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,6 @@ public interface IAgentShell {
7070
String getConnectedHost();
7171

7272
void launchNewAgent(ServerResource resource) throws ConfigurationException;
73+
74+
Integer getSslHandshakeTimeout();
7375
}

agent/src/main/java/com/cloud/agent/properties/AgentProperties.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -810,6 +810,13 @@ public Property<Integer> getWorkers() {
810810
*/
811811
public static final Property<String> HOST_TAGS = new Property<>("host.tags", null, String.class);
812812

813+
/**
814+
* Timeout for SSL handshake in seconds
815+
* Data type: Integer.<br>
816+
* Default value: <code>null</code>
817+
*/
818+
public static final Property<Integer> SSL_HANDSHAKE_TIMEOUT = new Property<>("ssl.handshake.timeout", null, Integer.class);
819+
813820
public static class Property <T>{
814821
private String name;
815822
private T defaultValue;

api/src/main/java/org/apache/cloudstack/acl/RoleService.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ public interface RoleService {
3030
ConfigKey<Boolean> EnableDynamicApiChecker = new ConfigKey<>("Advanced", Boolean.class, "dynamic.apichecker.enabled", "false",
3131
"If set to true, this enables the dynamic role-based api access checker and disables the default static role-based api access checker.", true);
3232

33+
ConfigKey<Integer> DynamicApiCheckerCachePeriod = new ConfigKey<>("Advanced", Integer.class,
34+
"dynamic.apichecker.cache.period", "0",
35+
"Defines the expiration time in seconds for the Dynamic API Checker cache, determining how long cached data is retained before being refreshed. If set to zero then caching will be disabled",
36+
false);
37+
3338
boolean isEnabled();
3439

3540
/**

api/src/main/java/org/apache/cloudstack/api/command/admin/domain/ListDomainsCmd.java

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ public EnumSet<DomainDetails> getDetails() throws InvalidParameterValueException
100100
dv = EnumSet.of(DomainDetails.all);
101101
} else {
102102
try {
103-
ArrayList<DomainDetails> dc = new ArrayList<DomainDetails>();
103+
ArrayList<DomainDetails> dc = new ArrayList<>();
104104
for (String detail : viewDetails) {
105105
dc.add(DomainDetails.valueOf(detail));
106106
}
@@ -142,7 +142,10 @@ protected void updateDomainResponse(List<DomainResponse> response) {
142142
if (CollectionUtils.isEmpty(response)) {
143143
return;
144144
}
145-
_resourceLimitService.updateTaggedResourceLimitsAndCountsForDomains(response, getTag());
145+
EnumSet<DomainDetails> details = getDetails();
146+
if (details.contains(DomainDetails.all) || details.contains(DomainDetails.resource)) {
147+
_resourceLimitService.updateTaggedResourceLimitsAndCountsForDomains(response, getTag());
148+
}
146149
if (!getShowIcon()) {
147150
return;
148151
}

api/src/main/java/org/apache/cloudstack/api/command/user/account/ListAccountsCmd.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,10 @@ protected void updateAccountResponse(List<AccountResponse> response) {
149149
if (CollectionUtils.isEmpty(response)) {
150150
return;
151151
}
152-
_resourceLimitService.updateTaggedResourceLimitsAndCountsForAccounts(response, getTag());
152+
EnumSet<DomainDetails> details = getDetails();
153+
if (details.contains(DomainDetails.all) || details.contains(DomainDetails.resource)) {
154+
_resourceLimitService.updateTaggedResourceLimitsAndCountsForAccounts(response, getTag());
155+
}
153156
if (!getShowIcon()) {
154157
return;
155158
}

api/src/main/java/org/apache/cloudstack/outofbandmanagement/OutOfBandManagementService.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ public interface OutOfBandManagementService {
3939
long getId();
4040
boolean isOutOfBandManagementEnabled(Host host);
4141
void submitBackgroundPowerSyncTask(Host host);
42-
boolean transitionPowerStateToDisabled(List<? extends Host> hosts);
42+
boolean transitionPowerStateToDisabled(List<Long> hostIds);
4343

4444
OutOfBandManagementResponse enableOutOfBandManagement(DataCenter zone);
4545
OutOfBandManagementResponse enableOutOfBandManagement(Cluster cluster);

0 commit comments

Comments
 (0)