Introduction

The CUDO Compute API uses the Google AIP-193 Error Model for all error responses. This provides a consistent, machine-readable structure while keeping messages helpful for humans. Every non-2xx response body will be a single JSON object shaped like:

{
	"code": 400,
	"status": "INVALID_ARGUMENT",
	"message": "Machine type 'gpu-a100-x99' is not available in data center gb-bournemouth-1.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "spec.machineType", "description": "Unknown machine type 'gpu-a100-x99'" }
			]
		},
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "MACHINE_TYPE_UNAVAILABLE",
			"metadata": {
				"requestedMachineType": "gpu-a100-x99",
				"availableMachineTypes": ["gpu-a100-x2", "gpu-a100-x4"],
				"dataCenterId": "gb-bournemouth-1"
			}
		}
	]
}

Fields

The status string is the canonical error code name (e.g. INVALID_ARGUMENT). The numeric HTTP status is returned separately as the HTTP response status line and repeated in code for client convenience.

Field	Type	Required	Description
`code`	number	Always	HTTP status code (e.g. 404). Mirrors the response status.
`status`	string	Always	Canonical error code (AIP-193). Upper snake case.
`message`	string	Always	End-user readable summary (English, sentence form). Not for programmatic parsing.
`details`	array<object>	Sometimes	Zero or more structured detail objects (typed via `@type`).

Detail objects

Each element in details is a typed envelope with an @type URL identifying the Protobuf / schema. We leverage standard Google types. Examples:

`@type`	Purpose
`type.googleapis.com/google.rpc.BadRequest`	Field validation problems (`fieldViolations`).
`type.googleapis.com/google.rpc.ErrorInfo`	Domain + reason + metadata (actionable metadata).
`type.googleapis.com/google.rpc.RetryInfo`	Backoff hints when a retry is appropriate.
`type.googleapis.com/google.rpc.QuotaFailure`	Which quota bucket was exceeded.
`type.googleapis.com/google.rpc.DebugInfo`	Low-level debugging info (rarely returned in production).

If you need richer structured data for automation, always prefer reading details over parsing message.

Canonical codes used

Below is the subset of AIP-193 / gRPC canonical codes you may encounter and how we map them to HTTP statuses:

Canonical Code	Typical HTTP	Meaning	Example Scenario
INVALID_ARGUMENT	400	Request parameter is malformed / invalid.	Unsupported machineType value.
FAILED_PRECONDITION	400 / 409	Operation violates current resource state.	Trying to resize volume while attached in read-only mode.
OUT_OF_RANGE	400	Value outside allowed range/limit.	Disk size below minimum.
NOT_FOUND	404	Resource does not exist or is not visible.	Volume ID not found in project.
ALREADY_EXISTS	409	Attempt to create a duplicate resource.	SSH key fingerprint already present.
PERMISSION_DENIED	403	Authenticated but lacks permission / scope.	API key missing write:machines scope.
UNAUTHENTICATED	401	Missing / invalid credentials.	Expired API key token.
RESOURCE_EXHAUSTED	429	Quota or capacity exhausted.	GPU quota reached in dataCenterId.
ABORTED	409	Concurrent modification conflict.	Optimistic lock / ETag mismatch.
CANCELLED	499	Client cancelled request (if observed).	HTTP client aborted connection.
DEADLINE_EXCEEDED	504	Server couldn’t finish before timeout.	Long-running image import timed out.
INTERNAL	500	Unexpected internal error.	Unclassified exception.
UNIMPLEMENTED	501	Method not implemented / allowed.	Endpoint exists in spec but disabled.
UNAVAILABLE	503	Temporary outage or maintenance.	dataCenterId capacity scaling event.
DATA_LOSS	500	Irrecoverable data corruption.	(Extremely rare) Integrity failure.
UNKNOWN	500	Unclassified error when no better code fits.	Edge case fallback.

If you receive a code not listed here, treat it like UNKNOWN and contact support.

Examples

Below are representative responses for common categories. Only standard google.rpc detail types are used (no custom types) to keep automation straightforward.

Invalid Arguments (400)

{
	"code": 400,
	"status": "INVALID_ARGUMENT",
	"message": "Invalid fields in request.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "machineTypeId", "description": "Unknown machine type 'gpu-a100-x99'." },
				{ "field": "bootDiskSizeGib", "description": "Minimum is 20 GiB." }
			]
		}
	]
}

Out of Range (400)

{
	"code": 400,
	"status": "OUT_OF_RANGE",
	"message": "machineCount 200 exceeds the maximum of 128.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.BadRequest",
			"fieldViolations": [
				{ "field": "machineCount", "description": "Max allowed is 128." }
			]
		}
	]
}

Precondition (409)

{
	"code": 409,
	"status": "FAILED_PRECONDITION",
	"message": "Cluster must be in STATE_ACTIVE before resizing.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "CLUSTER_STATE_INVALID",
			"metadata": {
				"clusterId": "clu-abc123",
				"currentState": "STATE_UPDATING",
				"requiredState": "STATE_ACTIVE",
				"operation": "RESIZE"
			}
		}
	]
}

Not Found (404)

{
	"code": 404,
	"status": "NOT_FOUND",
	"message": "Cluster 'clu-missing99' not found in project 'proj-123'.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.ErrorInfo",
			"domain": "compute.cudo.dev",
			"reason": "CLUSTER_NOT_FOUND",
			"metadata": {
				"clusterId": "clu-missing99",
				"projectId": "proj-123"
			}
		}
	]
}

Quota (429)

{
	"code": 429,
	"status": "RESOURCE_EXHAUSTED",
	"message": "GPU quota exceeded for project 'proj-123' in dataCenterId 'gb-bournemouth-1'.",
	"details": [
		{
			"@type": "type.googleapis.com/google.rpc.QuotaFailure",
			"violations": [
				{
					"subject": "projects/proj-123/dataCenterIds/gb-bournemouth-1/gpuModelId:gpu-a100",
					"description": "Limit 16 reached"
				}
			]
		},
		{ "@type": "type.googleapis.com/google.rpc.RetryInfo", "retryDelay": "30s" }
	]
}

Unavailable (503)

{
	"code": 503,
	"status": "UNAVAILABLE",
	"message": "dataCenterIdal control plane temporarily unavailable. Retry later.",
	"details": [
		{ "@type": "type.googleapis.com/google.rpc.RetryInfo", "retryDelay": "5s" }
	]
}

Internal (500)

{
	"code": 500,
	"status": "INTERNAL",
	"message": "An internal error occurred. Try again or contact support if it persists.",
	"details": []
}

Client handling guidance

Category	Guidance
Validation (`INVALID_ARGUMENT`, `FAILED_PRECONDITION`, `OUT_OF_RANGE`)	Do not retry until you modify the request. Surface field violations directly to users.
Authentication / Authorization (`UNAUTHENTICATED`, `PERMISSION_DENIED`)	Refresh or request proper credentials / scopes. Retries without change will fail.
Capacity / Quota (`RESOURCE_EXHAUSTED`)	Implement exponential backoff respecting any `RetryInfo.retryDelay`. Consider requesting quota increase.
Transient (`UNAVAILABLE`, `ABORTED`, `DEADLINE_EXCEEDED`)	Safe to retry with exponential backoff + jitter. Preserve idempotency via client-specified request IDs if endpoint supports it.
Not Found (`NOT_FOUND`)	Confirm the resource ID and project. Recreate if appropriate. Do not blind retry.
Conflict (`ALREADY_EXISTS`, `ABORTED`)	Adjust resource name or re-fetch latest state before retrying.

Retry Strategy

We recommend capped exponential backoff (e.g. base=500ms, multiplier=1.6, max=30s, jitter 0-100%). If a RetryInfo.retryDelay is present, start with that delay before resuming your normal backoff sequence.

Do not parse human readable messages

Never parse message strings-use status, structured details, and (when present) ErrorInfo.metadata for automation.

Versioning & stability

The error envelope fields listed above are stable. New canonical codes (rare) or new details types may appear without prior notice but will always be additive. Breaking changes (removal or semantic change of existing fields) are not planned. If absolutely required, they will be announced in the Changelog with at least 90 days’ notice.

Testing errors

During development you can deliberately trigger certain errors:

Error	How to Trigger (Example)
INVALID_ARGUMENT	Supply a negative `diskSizeGb`.
NOT_FOUND	Use a random UUID for a volume ID.
UNAUTHENTICATED	Omit the `Authorization` header.
PERMISSION_DENIED	Use an API key lacking required scope.
RESOURCE_EXHAUSTED	Create resources until hitting published quota limits.
UNAVAILABLE	Rare; simulate by blocking network or during announced maintenance window.

Support

Include the following when contacting support about an error:

status & HTTP code
Endpoint + HTTP method
Timestamp
(If relevant) The @type detail objects

This enables rapid trace correlation in our internal logs.

If you have suggestions for additional structured error detail types, please open an issue or contact support with your use case.

Basics

Endpoints

Changelog

Errors

Introduction

Fields

Detail objects

Canonical codes used

Examples

Client handling guidance

Retry Strategy

Do not parse human readable messages

Versioning & stability

Testing errors

Support

Basics

Endpoints

Changelog

​Introduction

​Fields

​Detail objects

​Canonical codes used

​Examples

​Client handling guidance

​Retry Strategy

​Do not parse human readable messages

​Versioning & stability

​Testing errors

​Support

Introduction

Fields

Detail objects

Canonical codes used

Examples

Client handling guidance

Retry Strategy

Do not parse human readable messages

Versioning & stability

Testing errors

Support