Skip to main content

Introduction

The CUDO Compute API uses the Google AIP-193 Error Model for all error responses. This provides a consistent, machine-readable structure while keeping messages helpful for humans. Every non-2xx response body will be a single JSON object shaped like:
{
	"code": 400,
	"status": "INVALID_ARGUMENT",
	"message": "Machine type 'gpu-a100-x99' is not available in data center gb-bournemouth-1.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "spec.machineType", "description": "Unknown machine type 'gpu-a100-x99'" }
			]
		},
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "MACHINE_TYPE_UNAVAILABLE",
			"metadata": {
				"requestedMachineType": "gpu-a100-x99",
				"availableMachineTypes": ["gpu-a100-x2", "gpu-a100-x4"],
				"dataCenterId": "gb-bournemouth-1"
			}
		}
	]
}

Fields

The status string is the canonical error code name (e.g. INVALID_ARGUMENT). The numeric HTTP status is returned separately as the HTTP response status line and repeated in code for client convenience.
FieldTypeRequiredDescription
codenumberAlwaysHTTP status code (e.g. 404). Mirrors the response status.
statusstringAlwaysCanonical error code (AIP-193). Upper snake case.
messagestringAlwaysEnd-user readable summary (English, sentence form). Not for programmatic parsing.
detailsarray<object>SometimesZero or more structured detail objects (typed via @type).

Detail objects

Each element in details is a typed envelope with an @type URL identifying the Protobuf / schema. We leverage standard Google types. Examples:
@typePurpose
type.googleapis.com/google.rpc.BadRequestField validation problems (fieldViolations).
type.googleapis.com/google.rpc.ErrorInfoDomain + reason + metadata (actionable metadata).
type.googleapis.com/google.rpc.RetryInfoBackoff hints when a retry is appropriate.
type.googleapis.com/google.rpc.QuotaFailureWhich quota bucket was exceeded.
type.googleapis.com/google.rpc.DebugInfoLow-level debugging info (rarely returned in production).
If you need richer structured data for automation, always prefer reading details over parsing message.

Canonical codes used

Below is the subset of AIP-193 / gRPC canonical codes you may encounter and how we map them to HTTP statuses:
Canonical CodeTypical HTTPMeaningExample Scenario
INVALID_ARGUMENT400Request parameter is malformed / invalid.Unsupported machineType value.
FAILED_PRECONDITION400 / 409Operation violates current resource state.Trying to resize volume while attached in read-only mode.
OUT_OF_RANGE400Value outside allowed range/limit.Disk size below minimum.
NOT_FOUND404Resource does not exist or is not visible.Volume ID not found in project.
ALREADY_EXISTS409Attempt to create a duplicate resource.SSH key fingerprint already present.
PERMISSION_DENIED403Authenticated but lacks permission / scope.API key missing write:machines scope.
UNAUTHENTICATED401Missing / invalid credentials.Expired API key token.
RESOURCE_EXHAUSTED429Quota or capacity exhausted.GPU quota reached in dataCenterId.
ABORTED409Concurrent modification conflict.Optimistic lock / ETag mismatch.
CANCELLED499Client cancelled request (if observed).HTTP client aborted connection.
DEADLINE_EXCEEDED504Server couldn’t finish before timeout.Long-running image import timed out.
INTERNAL500Unexpected internal error.Unclassified exception.
UNIMPLEMENTED501Method not implemented / allowed.Endpoint exists in spec but disabled.
UNAVAILABLE503Temporary outage or maintenance.dataCenterId capacity scaling event.
DATA_LOSS500Irrecoverable data corruption.(Extremely rare) Integrity failure.
UNKNOWN500Unclassified error when no better code fits.Edge case fallback.
If you receive a code not listed here, treat it like UNKNOWN and contact support.

Examples

Below are representative responses for common categories. Only standard google.rpc detail types are used (no custom types) to keep automation straightforward.

Invalid Arguments (400)

{
	"code": 400,
	"status": "INVALID_ARGUMENT",
	"message": "Invalid fields in request.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "machineTypeId", "description": "Unknown machine type 'gpu-a100-x99'." },
				{ "field": "bootDiskSizeGib", "description": "Minimum is 20 GiB." }
			]
		}
	]
}
{
	"code": 400,
	"status": "OUT_OF_RANGE",
	"message": "machineCount 200 exceeds the maximum of 128.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "machineCount", "description": "Max allowed is 128." }
			]
		}
	]
}
{
	"code": 409,
	"status": "FAILED_PRECONDITION",
	"message": "Cluster must be in STATE_ACTIVE before resizing.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "CLUSTER_STATE_INVALID",
			"metadata": {
				"clusterId": "clu-abc123",
				"currentState": "STATE_UPDATING",
				"requiredState": "STATE_ACTIVE",
				"operation": "RESIZE"
			}
		}
	]
}
{
	"code": 404,
	"status": "NOT_FOUND",
	"message": "Cluster 'clu-missing99' not found in project 'proj-123'.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "CLUSTER_NOT_FOUND",
			"metadata": {
				"clusterId": "clu-missing99",
				"projectId": "proj-123"
			}
		}
	]
}
{
	"code": 429,
	"status": "RESOURCE_EXHAUSTED",
	"message": "GPU quota exceeded for project 'proj-123' in dataCenterId 'gb-bournemouth-1'.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.QuotaFailure",
			"violations": [
				{
					"subject": "projects/proj-123/dataCenterIds/gb-bournemouth-1/gpuModelId:gpu-a100",
					"description": "Limit 16 reached"
				}
			]
		},
		{ "@type": "type.googleapis.com/google.rpc.RetryInfo", "retryDelay": "30s" }
	]
}
{
	"code": 503,
	"status": "UNAVAILABLE",
	"message": "dataCenterIdal control plane temporarily unavailable. Retry later.",
	"details": [
		{ "@type": "type.googleapis.com/google.rpc.RetryInfo", "retryDelay": "5s" }
	]
}
{
	"code": 500,
	"status": "INTERNAL",
	"message": "An internal error occurred. Try again or contact support if it persists.",
	"details": []
}

Client handling guidance

CategoryGuidance
Validation (INVALID_ARGUMENT, FAILED_PRECONDITION, OUT_OF_RANGE)Do not retry until you modify the request. Surface field violations directly to users.
Authentication / Authorization (UNAUTHENTICATED, PERMISSION_DENIED)Refresh or request proper credentials / scopes. Retries without change will fail.
Capacity / Quota (RESOURCE_EXHAUSTED)Implement exponential backoff respecting any RetryInfo.retryDelay. Consider requesting quota increase.
Transient (UNAVAILABLE, ABORTED, DEADLINE_EXCEEDED)Safe to retry with exponential backoff + jitter. Preserve idempotency via client-specified request IDs if endpoint supports it.
Not Found (NOT_FOUND)Confirm the resource ID and project. Recreate if appropriate. Do not blind retry.
Conflict (ALREADY_EXISTS, ABORTED)Adjust resource name or re-fetch latest state before retrying.

Retry Strategy

We recommend capped exponential backoff (e.g. base=500ms, multiplier=1.6, max=30s, jitter 0-100%). If a RetryInfo.retryDelay is present, start with that delay before resuming your normal backoff sequence.

Do not parse human readable messages

Never parse message strings-use status, structured details, and (when present) ErrorInfo.metadata for automation.

Versioning & stability

The error envelope fields listed above are stable. New canonical codes (rare) or new details types may appear without prior notice but will always be additive. Breaking changes (removal or semantic change of existing fields) are not planned. If absolutely required, they will be announced in the Changelog with at least 90 days’ notice.

Testing errors

During development you can deliberately trigger certain errors:
ErrorHow to Trigger (Example)
INVALID_ARGUMENTSupply a negative diskSizeGb.
NOT_FOUNDUse a random UUID for a volume ID.
UNAUTHENTICATEDOmit the Authorization header.
PERMISSION_DENIEDUse an API key lacking required scope.
RESOURCE_EXHAUSTEDCreate resources until hitting published quota limits.
UNAVAILABLERare; simulate by blocking network or during announced maintenance window.

Support

Include the following when contacting support about an error:
  1. status & HTTP code
  2. Endpoint + HTTP method
  3. Timestamp
  4. (If relevant) The @type detail objects
This enables rapid trace correlation in our internal logs.
If you have suggestions for additional structured error detail types, please open an issue or contact support with your use case.