Category Archives: Orchestration

Pivotal BOSH Director Setup Error – Could not find VM for stemcell ‘sc-b0131c8f-ef44-456b-8e7c-df3951236d29’

I was trying to install Pivotal Kubernetes Services on vSphere.  I setup the inital Operations Manger .ova appliance without issue.  Since I was deploying on vSphere, I needed to configure the BOSH Director installation through the vSphere tile next.  I ran through the configuration and tried to deploy once.. and it failed.  I tried again, and was dead-stopped at the above error over and over again.  I believe this came up because I deleted the BOSH/0 VM and tried to have the installer run again.

When in this state, it continually fails with the following error:

Could not find VM for stemcell ‘sc-b0131c8f-ef44-456b-8e7c-df3951236d29’

I had no idea what that meant, so I found this on the tech support site:
https://discuss.pivotal.io/hc/en-us/articles/115000488247-OpsManager-Install-Updates-error-Could-not-find-VM-for-stemcell-xxxxx-

Same error, but I didn’t have BOSH director even setup yet so it didn’t apply.

The full log readout is below:

{“type”: “step_started”, “id”: “bosh_product.deploying”}
===== 2018-05-10 20:29:13 UTC Running “/usr/local/bin/bosh –no-color –non-interactive –tty create-env /var/tempest/workspaces/default/deployments/bosh.yml”
Deployment manifest: ‘/var/tempest/workspaces/default/deployments/bosh.yml’
Deployment state: ‘/var/tempest/workspaces/default/deployments/bosh-state.json’

Started validating
Validating release ‘bosh’… Finished (00:00:00)
Validating release ‘bosh-vsphere-cpi’… Finished (00:00:00)
Validating release ‘uaa’… Finished (00:00:01)
Validating release ‘credhub’… Finished (00:00:00)
Validating release ‘bosh-system-metrics-server’… Finished (00:00:01)
Validating release ‘os-conf’… Finished (00:00:00)
Validating release ‘backup-and-restore-sdk’… Finished (00:00:04)
Validating cpi release… Finished (00:00:00)
Validating deployment manifest… Finished (00:00:00)
Validating stemcell… Finished (00:00:03)
Finished validating (00:00:12)

Started installing CPI
Compiling package ‘ruby-2.4-r3/8471dec5da9ecc321686b8990a5ad2cc84529254’… Finished (00:00:00)
Compiling package ‘iso9660wrap/82cd03afdce1985db8c9d7dba5e5200bcc6b5aa8’… Finished (00:00:00)
Compiling package ‘vsphere_cpi/3049e51ead9d72268c1f6dfb5b471cbc7e2d6816’… Finished (00:00:00)
Installing packages… Finished (00:00:00)
Rendering job templates… Finished (00:00:01)
Installing job ‘vsphere_cpi’… Finished (00:00:00)
Finished installing CPI (00:00:02)

Starting registry… Finished (00:00:00)
Uploading stemcell ‘bosh-vsphere-esxi-ubuntu-trusty-go_agent/3541.12’… Skipped [Stemcell already uploaded] (00:00:00)

Started deploying
Creating VM for instance ‘bosh/0’ from stemcell ‘sc-b0131c8f-ef44-456b-8e7c-df3951236d29’… Failed (00:00:02)
Failed deploying (00:00:02)

Stopping registry… Finished (00:00:00)
Cleaning up rendered CPI jobs… Finished (00:00:00)

Deploying:
Creating instance ‘bosh/0’:
Creating VM:
Creating vm with stemcell cid ‘sc-b0131c8f-ef44-456b-8e7c-df3951236d29’:
CPI ‘create_vm’ method responded with error: CmdError{“type”:”Unknown”,”message”:”Could not find VM for stemcell ‘sc-b0131c8f-ef44-456b-8e7c-df3951236d29‘”,”ok_to_retry”:false}

Exit code 1
===== 2018-05-10 20:29:31 UTC Finished “/usr/local/bin/bosh –no-color –non-interactive –tty create-env /var/tempest/workspaces/default/deployments/bosh.yml”; Duration: 18s; Exit Status: 1
{“type”: “step_finished”, “id”: “bosh_product.deploying”}
Exited with 1.

I did end up resolving this by deleting the bosh-state.json file. It apparently held some erroneous setup info about the stem cells that was causing the setup process to try and use a stemcell it had not yet downloaded.

I was able to SSH into the PKS Operations Manager VM and run this to fix it:

sudo rm /var/tempest/workspaces/default/deployments/bosh-state.json

Then, I was able to re-run the deployment with success.

vRealize Orchestrator HTTP-REST – Cannot execute the request; Read timed out

I recently stumbled upon an issue with the HTTP-REST plugin in VRO that took some experimentation to understand.  For some reason, I kept getting a “Read timed out” error when my workflows would make a REST call that took more than 60 seconds to return a response.  There is an operationTimeout property you can set to govern this, but I found it is ignored under certain circumstances. It’s very confusing since you can examine the opeationTimeout property and it *appears* correct. I had to do quite a bit of testing to get to the bottom of the behavior.

In my implementation, I was using VRO 7.2 and transient HTTP REST host objects to do my REST calls. I favored that approach over using the HTTP-REST configuration workflows to add and remove every host and combination of operations I could some day invent. This approach seemed somewhat inflexible.

Here is my basic testing workflow:

Test Workflow

Here is the code in the script:

//  Username, password, and useTransientHost are input parameters.

var uri = "https://myrestapihost.domain.com/api/DoSomething/id44";
var method = "GET";
var body = "";  // For POST/PUT body content.  This has to be a JSON string. E.g.  body = "{ 'p1' : 1, 'p2' : 2 }";
var httpRestHost = null;

if ( useTransientHost )
{
  System.log("Using Transient host.");
  //  Create a dynamic REST host:

  var restHost = RESTHostManager.createHost("dynamicRequest");
  restHost.operationTimeout = 900;  //  This gets ignored!!!
  httpRestHost = RESTHostManager.createTransientHostFrom(restHost);
  httpRestHost.operationTimeout = 900;  //  Set it here too, just to be really really sure.
}
else
{
  System.log("Using NON-Transient host.");
  httpRestHost = RESTHostManager.getHost("71998784-d590-426d-8945-75ec0b1ad7b4");		//  Use the ID For your HTTP-REST host here
  httpRestHost.operationTimeout = 900;  //  This gets ignored!!!
}

System.log("OperationTimeout  is set to: " + httpRestHost.operationTimeout.toString());

//  Create the authentication:
var authParams = ['Shared Session', userName, password];
var authenticationObject = RESTAuthenticationManager.createAuthentication('Basic', authParams);
httpRestHost.authentication = authenticationObject;

//  Remove the endpoint from the URI:
var urlEndpointSplit = uri.split("/");
var urlEndpoint = urlEndpointSplit[urlEndpointSplit.length - 1];
uri = uri.split(urlEndpoint)[0];

httpRestHost.url = uri;

//  REST client only accepts method in all UPPER CASE:
method = method.toUpperCase();

var request = httpRestHost.createRequest(method, urlEndpoint, body);
request.contentType = "application/json";

System.debug("REST request to URI: " + method + " " + request.fullUrl);

var response = request.execute();   //  This should have a 90-second timeout
System.debug("Response status Code: " + response.statusCode);

if ( response.contentAsString )
{
  System.debug("Response: " + response.contentAsString);
}

I added three input parameters:

  • userName
  • password
  • useTransientHost

When I call it using a transient host, it always times out in 60s, no matter what I set the operationTimeout setting to. Here is the output from the run:

[2017-03-28 11:51:55.740] [I] Using Transient host.
[2017-03-28 11:51:55.747] [I] OperationTimeout is set to: 900
[2017-03-28 11:51:55.751] [D] REST request to URI: GET https://myrestapihost.domain.com/api/DoSomething/id44
[2017-03-28 11:52:55.902] [E] Error in (Workflow:Example REST API Call / HTTP Rest Call (item1)#43) Cannot execute the request: ; Read timed out

You can see the 60s timeout despite the fact the operationTimeout property was set to 900.

I ran it again and referenced a non-transient HTTP host:

2017-03-28 12:00:24.618] [I] Using NON-Transient host.
[2017-03-28 12:00:24.629] [I] OperationTimeout is set to: 900
[2017-03-28 12:00:24.634] [D] REST request to URI: GET https://myrestapihost.domain.com/api/DoSomething/id44
[2017-03-28 12:02:24.740] [E] Error in (Workflow:Example REST API Call / HTTP Rest Call (item1)#46) Cannot execute the request: ; Read timed out

In this case, it timed out in 120 seconds, not 60 (or 900). I found 120 came from what I entered in for operationTimeout when I created the host using the HTTP-REST/Configuration/Add a REST host workflow:

Test Host Settings

So, in the end, the following appears to be true:

  • OperationTimeout defaults to 60 seconds.
  • Though it appears you can, you CANNOT override by setting the operationTimeout property in code (this should be a read-only property if that is the case)
  • It instead uses the operationTimeout set on the HTTP-REST host object when you create (or update) it using the configuration workflows.
  • Transient hosts are always at 60s timeouts. No way to override this.

However:

You CAN override just about everything else, including URI and authentication.

This means I can get around this by adding a dummy HTTP-REST host as per normal using the Add a REST host workflow:

TestHost1TestHost2TestHost3

The URL, Authentication and other settings do not matter, they can be overridden in your code as I did above in my example. The ONLY setting that matters is the operationTimeout (and perhaps the connectionTimeout, which stands to reason may have the same issue, but I never tested it).

Then reference the host ID as I did in the code above and override the URL, authentication, and whatever else you need to.

I’ve engaged VMware tech support to log this as a bug. I really think the operationTimeout should be settable or read-only.  We’ll see where that goes…

UPDATE 4/6/2017 – I just got final word back from VMware tech support engineering.  The behavior I noted above is normal behavior, and the workaround I proposed is the accepted workaround.  Nothing to see here…

I did ask for a feature request to either have the operationTimeout property be programmatically changeable or to be set as read-only to reduce confusion.