endpoints

Creates, updates, deletes, gets or lists an endpoints resource.

Overview

Name	`endpoints`
Type	Resource
Id	`google.aiplatform.endpoints`

Fields

The following fields are returned by SELECT queries:

get
list

Name	Datatype	Description
`name`	`string`	Output only. The resource name of the Endpoint.
`clientConnectionConfig`	`object`	Configurations that are applied to the endpoint for online prediction. (id: GoogleCloudAiplatformV1ClientConnectionConfig)
`createTime`	`string (google-datetime)`	Output only. Timestamp when this Endpoint was created.
`dedicatedEndpointDns`	`string`	Output only. DNS of the dedicated endpoint. Will only be populated if dedicated_endpoint_enabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format: `https://{endpoint_id}.{region}-{uid}.prediction.vertexai.goog`.
`dedicatedEndpointEnabled`	`boolean`	If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.
`deployedModels`	`array`	Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.
`description`	`string`	The description of the Endpoint.
`displayName`	`string`	Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.
`enablePrivateServiceConnect`	`boolean`	Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, network or enable_private_service_connect, can be set.
`encryptionSpec`	`object`	Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. (id: GoogleCloudAiplatformV1EncryptionSpec)
`etag`	`string`	Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.
`gdcConfig`	`object`	Configures the Google Distributed Cloud (GDC) environment for online prediction. Only set this field when the Endpoint is to be deployed in a GDC environment. (id: GoogleCloudAiplatformV1GdcConfig)
`genAiAdvancedFeaturesConfig`	`object`	Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported. (id: GoogleCloudAiplatformV1GenAiAdvancedFeaturesConfig)
`labels`	`object`	The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
`modelDeploymentMonitoringJob`	`string`	Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by JobService.CreateModelDeploymentMonitoringJob. Format: `projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{model_deployment_monitoring_job}`
`network`	`string`	Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, network or enable_private_service_connect, can be set. Format: `projects/{project}/global/networks/{network}`. Where `{project}` is a project number, as in `12345`, and `{network}` is network name.
`predictRequestResponseLoggingConfig`	`object`	Configures the request-response logging for online prediction. (id: GoogleCloudAiplatformV1PredictRequestResponseLoggingConfig)
`privateServiceConnectConfig`	`object`	Optional. Configuration for private service connect. network and private_service_connect_config are mutually exclusive. (id: GoogleCloudAiplatformV1PrivateServiceConnectConfig)
`satisfiesPzi`	`boolean`	Output only. Reserved for future use.
`satisfiesPzs`	`boolean`	Output only. Reserved for future use.
`trafficSplit`	`object`	A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
`updateTime`	`string (google-datetime)`	Output only. Timestamp when this Endpoint was last updated.

Name	Datatype	Description
`name`	`string`	Output only. The resource name of the Endpoint.
`clientConnectionConfig`	`object`	Configurations that are applied to the endpoint for online prediction. (id: GoogleCloudAiplatformV1ClientConnectionConfig)
`createTime`	`string (google-datetime)`	Output only. Timestamp when this Endpoint was created.
`dedicatedEndpointDns`	`string`	Output only. DNS of the dedicated endpoint. Will only be populated if dedicated_endpoint_enabled is true. Depending on the features enabled, uid might be a random number or a string. For example, if fast_tryout is enabled, uid will be fasttryout. Format: `https://{endpoint_id}.{region}-{uid}.prediction.vertexai.goog`.
`dedicatedEndpointEnabled`	`boolean`	If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.
`deployedModels`	`array`	Output only. The models deployed in this Endpoint. To add or remove DeployedModels use EndpointService.DeployModel and EndpointService.UndeployModel respectively.
`description`	`string`	The description of the Endpoint.
`displayName`	`string`	Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.
`enablePrivateServiceConnect`	`boolean`	Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, network or enable_private_service_connect, can be set.
`encryptionSpec`	`object`	Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key. (id: GoogleCloudAiplatformV1EncryptionSpec)
`etag`	`string`	Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.
`gdcConfig`	`object`	Configures the Google Distributed Cloud (GDC) environment for online prediction. Only set this field when the Endpoint is to be deployed in a GDC environment. (id: GoogleCloudAiplatformV1GdcConfig)
`genAiAdvancedFeaturesConfig`	`object`	Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported. (id: GoogleCloudAiplatformV1GenAiAdvancedFeaturesConfig)
`labels`	`object`	The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
`modelDeploymentMonitoringJob`	`string`	Output only. Resource name of the Model Monitoring job associated with this Endpoint if monitoring is enabled by JobService.CreateModelDeploymentMonitoringJob. Format: `projects/{project}/locations/{location}/modelDeploymentMonitoringJobs/{model_deployment_monitoring_job}`
`network`	`string`	Optional. The full name of the Google Compute Engine network to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, network or enable_private_service_connect, can be set. Format: `projects/{project}/global/networks/{network}`. Where `{project}` is a project number, as in `12345`, and `{network}` is network name.
`predictRequestResponseLoggingConfig`	`object`	Configures the request-response logging for online prediction. (id: GoogleCloudAiplatformV1PredictRequestResponseLoggingConfig)
`privateServiceConnectConfig`	`object`	Optional. Configuration for private service connect. network and private_service_connect_config are mutually exclusive. (id: GoogleCloudAiplatformV1PrivateServiceConnectConfig)
`satisfiesPzi`	`boolean`	Output only. Reserved for future use.
`satisfiesPzs`	`boolean`	Output only. Reserved for future use.
`trafficSplit`	`object`	A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
`updateTime`	`string (google-datetime)`	Output only. Timestamp when this Endpoint was last updated.

Methods

The following methods are available for this resource:

Name	Accessible by	Required Params	Optional Params	Description
`get`	`select`	`projectsId`, `locationsId`, `endpointsId`		Gets an Endpoint.
`list`	`select`	`projectsId`, `locationsId`	`filter`, `pageSize`, `pageToken`, `readMask`, `orderBy`, `gdcZone`	Lists Endpoints in a Location.
`create`	`insert`	`projectsId`, `locationsId`	`endpointId`	Creates an Endpoint.
`patch`	`update`	`projectsId`, `locationsId`, `endpointsId`	`updateMask`	Updates an Endpoint.
`update`	`update`	`projectsId`, `locationsId`, `endpointsId`		Updates an Endpoint with a long running operation.
`delete`	`delete`	`projectsId`, `locationsId`, `endpointsId`		Deletes an Endpoint.
`deploy_model`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Deploys a Model into this Endpoint, creating a DeployedModel within it.
`undeploy_model`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using.
`mutate_deployed_model`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Updates an existing deployed model. Updatable fields include `min_replica_count`, `max_replica_count`, `required_replica_count`, `autoscaling_metric_specs`, `disable_container_logging` (v1 only), and `enable_container_logging` (v1beta1 only).
`predict`	`exec`	`endpointsId`		Perform an online prediction.
`raw_predict`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform an online prediction with an arbitrary HTTP payload. The response includes the following HTTP headers: * `X-Vertex-AI-Endpoint-Id`: ID of the Endpoint that served this prediction. * `X-Vertex-AI-Deployed-Model-Id`: ID of the Endpoint's DeployedModel that served this prediction.
`stream_raw_predict`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform a streaming online prediction with an arbitrary HTTP payload.
`direct_predict`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform an unary online prediction request to a gRPC model server for Vertex first-party products and frameworks.
`direct_raw_predict`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform an unary online prediction request to a gRPC model server for custom containers.
`server_streaming_predict`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform a server-side streaming online prediction request for Vertex LLM streaming.
`predict_long_running`	`exec`	`endpointsId`
`explain`	`exec`	`projectsId`, `locationsId`, `endpointsId`		Perform an online explanation. If deployed_model_id is specified, the corresponding DeployModel must have explanation_spec populated. If deployed_model_id is not specified, all DeployedModels must have explanation_spec populated.
`generate_content`	`exec`	`endpointsId`		Generate content with multimodal inputs.
`stream_generate_content`	`exec`	`endpointsId`		Generate content with multimodal inputs with streaming support.
`count_tokens`	`exec`	`endpointsId`		Perform a token counting.
`compute_tokens`	`exec`	`endpointsId`		Return a list of tokens based on the input text.
`fetch_predict_operation`	`exec`	`endpointsId`		Fetch an asynchronous online prediction operation.

Parameters

Parameters can be passed in the WHERE clause of a query. Check the Methods section to see which parameters are required or optional for each operation.

Name	Datatype	Description
`endpointsId`	`string`
`locationsId`	`string`
`projectsId`	`string`
`endpointId`	`string`
`filter`	`string`
`gdcZone`	`string`
`orderBy`	`string`
`pageSize`	`integer (int32)`
`pageToken`	`string`
`readMask`	`string (google-fieldmask)`
`updateMask`	`string (google-fieldmask)`

`SELECT` examples

get
list

Gets an Endpoint.

SELECT
name,
clientConnectionConfig,
createTime,
dedicatedEndpointDns,
dedicatedEndpointEnabled,
deployedModels,
description,
displayName,
enablePrivateServiceConnect,
encryptionSpec,
etag,
gdcConfig,
genAiAdvancedFeaturesConfig,
labels,
modelDeploymentMonitoringJob,
network,
predictRequestResponseLoggingConfig,
privateServiceConnectConfig,
satisfiesPzi,
satisfiesPzs,
trafficSplit,
updateTime
FROM google.aiplatform.endpoints
WHERE projectsId = '{{ projectsId }}' -- required
AND locationsId = '{{ locationsId }}' -- required
AND endpointsId = '{{ endpointsId }}' -- required
;

Lists Endpoints in a Location.

SELECT
name,
clientConnectionConfig,
createTime,
dedicatedEndpointDns,
dedicatedEndpointEnabled,
deployedModels,
description,
displayName,
enablePrivateServiceConnect,
encryptionSpec,
etag,
gdcConfig,
genAiAdvancedFeaturesConfig,
labels,
modelDeploymentMonitoringJob,
network,
predictRequestResponseLoggingConfig,
privateServiceConnectConfig,
satisfiesPzi,
satisfiesPzs,
trafficSplit,
updateTime
FROM google.aiplatform.endpoints
WHERE projectsId = '{{ projectsId }}' -- required
AND locationsId = '{{ locationsId }}' -- required
AND filter = '{{ filter }}'
AND pageSize = '{{ pageSize }}'
AND pageToken = '{{ pageToken }}'
AND readMask = '{{ readMask }}'
AND orderBy = '{{ orderBy }}'
AND gdcZone = '{{ gdcZone }}'
;

`INSERT` examples

create
Manifest

Creates an Endpoint.

INSERT INTO google.aiplatform.endpoints (
data__displayName,
data__description,
data__trafficSplit,
data__etag,
data__labels,
data__encryptionSpec,
data__network,
data__enablePrivateServiceConnect,
data__privateServiceConnectConfig,
data__predictRequestResponseLoggingConfig,
data__dedicatedEndpointEnabled,
data__gdcConfig,
data__clientConnectionConfig,
data__genAiAdvancedFeaturesConfig,
projectsId,
locationsId,
endpointId
)
SELECT 
'{{ displayName }}',
'{{ description }}',
'{{ trafficSplit }}',
'{{ etag }}',
'{{ labels }}',
'{{ encryptionSpec }}',
'{{ network }}',
{{ enablePrivateServiceConnect }},
'{{ privateServiceConnectConfig }}',
'{{ predictRequestResponseLoggingConfig }}',
{{ dedicatedEndpointEnabled }},
'{{ gdcConfig }}',
'{{ clientConnectionConfig }}',
'{{ genAiAdvancedFeaturesConfig }}',
'{{ projectsId }}',
'{{ locationsId }}',
'{{ endpointId }}'
RETURNING
name,
done,
error,
metadata,
response
;

# Description fields are for documentation purposes
- name: endpoints
  props:
    - name: projectsId
      value: string
      description: Required parameter for the endpoints resource.
    - name: locationsId
      value: string
      description: Required parameter for the endpoints resource.
    - name: displayName
      value: string
      description: >
        Required. The display name of the Endpoint. The name can be up to 128 characters long and can consist of any UTF-8 characters.
        
    - name: description
      value: string
      description: >
        The description of the Endpoint.
        
    - name: trafficSplit
      value: object
      description: >
        A map from a DeployedModel's ID to the percentage of this Endpoint's traffic that should be forwarded to that DeployedModel. If a DeployedModel's ID is not listed in this map, then it receives no traffic. The traffic percentage values must add up to 100, or map must be empty if the Endpoint is to not accept any traffic at a moment.
        
    - name: etag
      value: string
      description: >
        Used to perform consistent read-modify-write updates. If not set, a blind "overwrite" update happens.
        
    - name: labels
      value: object
      description: >
        The labels with user-defined metadata to organize your Endpoints. Label keys and values can be no longer than 64 characters (Unicode codepoints), can only contain lowercase letters, numeric characters, underscores and dashes. International characters are allowed. See https://goo.gl/xmQnxf for more information and examples of labels.
        
    - name: encryptionSpec
      value: object
      description: >
        Customer-managed encryption key spec for an Endpoint. If set, this Endpoint and all sub-resources of this Endpoint will be secured by this key.
        
    - name: network
      value: string
      description: >
        Optional. The full name of the Google Compute Engine [network](https://cloud.google.com//compute/docs/networks-and-firewalls#networks) to which the Endpoint should be peered. Private services access must already be configured for the network. If left unspecified, the Endpoint is not peered with any network. Only one of the fields, network or enable_private_service_connect, can be set. [Format](https://cloud.google.com/compute/docs/reference/rest/v1/networks/insert): `projects/{project}/global/networks/{network}`. Where `{project}` is a project number, as in `12345`, and `{network}` is network name.
        
    - name: enablePrivateServiceConnect
      value: boolean
      description: >
        Deprecated: If true, expose the Endpoint via private service connect. Only one of the fields, network or enable_private_service_connect, can be set.
        
    - name: privateServiceConnectConfig
      value: object
      description: >
        Optional. Configuration for private service connect. network and private_service_connect_config are mutually exclusive.
        
    - name: predictRequestResponseLoggingConfig
      value: object
      description: >
        Configures the request-response logging for online prediction.
        
    - name: dedicatedEndpointEnabled
      value: boolean
      description: >
        If true, the endpoint will be exposed through a dedicated DNS [Endpoint.dedicated_endpoint_dns]. Your request to the dedicated DNS will be isolated from other users' traffic and will have better performance and reliability. Note: Once you enabled dedicated endpoint, you won't be able to send request to the shared DNS {region}-aiplatform.googleapis.com. The limitation will be removed soon.
        
    - name: gdcConfig
      value: object
      description: >
        Configures the Google Distributed Cloud (GDC) environment for online prediction. Only set this field when the Endpoint is to be deployed in a GDC environment.
        
    - name: clientConnectionConfig
      value: object
      description: >
        Configurations that are applied to the endpoint for online prediction.
        
    - name: genAiAdvancedFeaturesConfig
      value: object
      description: >
        Optional. Configuration for GenAiAdvancedFeatures. If the endpoint is serving GenAI models, advanced features like native RAG integration can be configured. Currently, only Model Garden models are supported.
        
    - name: endpointId
      value: string

`UPDATE` examples

patch
update

Updates an Endpoint.

UPDATE google.aiplatform.endpoints
SET 
data__displayName = '{{ displayName }}',
data__description = '{{ description }}',
data__trafficSplit = '{{ trafficSplit }}',
data__etag = '{{ etag }}',
data__labels = '{{ labels }}',
data__encryptionSpec = '{{ encryptionSpec }}',
data__network = '{{ network }}',
data__enablePrivateServiceConnect = {{ enablePrivateServiceConnect }},
data__privateServiceConnectConfig = '{{ privateServiceConnectConfig }}',
data__predictRequestResponseLoggingConfig = '{{ predictRequestResponseLoggingConfig }}',
data__dedicatedEndpointEnabled = {{ dedicatedEndpointEnabled }},
data__gdcConfig = '{{ gdcConfig }}',
data__clientConnectionConfig = '{{ clientConnectionConfig }}',
data__genAiAdvancedFeaturesConfig = '{{ genAiAdvancedFeaturesConfig }}'
WHERE 
projectsId = '{{ projectsId }}' --required
AND locationsId = '{{ locationsId }}' --required
AND endpointsId = '{{ endpointsId }}' --required
AND updateMask = '{{ updateMask}}'
RETURNING
name,
clientConnectionConfig,
createTime,
dedicatedEndpointDns,
dedicatedEndpointEnabled,
deployedModels,
description,
displayName,
enablePrivateServiceConnect,
encryptionSpec,
etag,
gdcConfig,
genAiAdvancedFeaturesConfig,
labels,
modelDeploymentMonitoringJob,
network,
predictRequestResponseLoggingConfig,
privateServiceConnectConfig,
satisfiesPzi,
satisfiesPzs,
trafficSplit,
updateTime;

Updates an Endpoint with a long running operation.

UPDATE google.aiplatform.endpoints
SET 
data__endpoint = '{{ endpoint }}'
WHERE 
projectsId = '{{ projectsId }}' --required
AND locationsId = '{{ locationsId }}' --required
AND endpointsId = '{{ endpointsId }}' --required
RETURNING
name,
done,
error,
metadata,
response;

`DELETE` examples

delete

Deletes an Endpoint.

DELETE FROM google.aiplatform.endpoints
WHERE projectsId = '{{ projectsId }}' --required
AND locationsId = '{{ locationsId }}' --required
AND endpointsId = '{{ endpointsId }}' --required
;

Lifecycle Methods

Deploys a Model into this Endpoint, creating a DeployedModel within it.

EXEC google.aiplatform.endpoints.deploy_model 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"deployedModel": "{{ deployedModel }}", 
"trafficSplit": "{{ trafficSplit }}"
}'
;

Undeploys a Model from an Endpoint, removing a DeployedModel from it, and freeing all resources it's using.

EXEC google.aiplatform.endpoints.undeploy_model 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"deployedModelId": "{{ deployedModelId }}", 
"trafficSplit": "{{ trafficSplit }}"
}'
;

Updates an existing deployed model. Updatable fields include min_replica_count, max_replica_count, required_replica_count, autoscaling_metric_specs, disable_container_logging (v1 only), and enable_container_logging (v1beta1 only).

EXEC google.aiplatform.endpoints.mutate_deployed_model 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"deployedModel": "{{ deployedModel }}", 
"updateMask": "{{ updateMask }}"
}'
;

Perform an online prediction.

EXEC google.aiplatform.endpoints.predict 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"instances": "{{ instances }}", 
"parameters": "{{ parameters }}"
}'
;

Perform an online prediction with an arbitrary HTTP payload. The response includes the following HTTP headers: * X-Vertex-AI-Endpoint-Id: ID of the Endpoint that served this prediction. * X-Vertex-AI-Deployed-Model-Id: ID of the Endpoint's DeployedModel that served this prediction.

EXEC google.aiplatform.endpoints.raw_predict 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"httpBody": "{{ httpBody }}"
}'
;

Perform a streaming online prediction with an arbitrary HTTP payload.

EXEC google.aiplatform.endpoints.stream_raw_predict 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"httpBody": "{{ httpBody }}"
}'
;

Perform an unary online prediction request to a gRPC model server for Vertex first-party products and frameworks.

EXEC google.aiplatform.endpoints.direct_predict 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"inputs": "{{ inputs }}", 
"parameters": "{{ parameters }}"
}'
;

Perform an unary online prediction request to a gRPC model server for custom containers.

EXEC google.aiplatform.endpoints.direct_raw_predict 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"methodName": "{{ methodName }}", 
"input": "{{ input }}"
}'
;

Perform a server-side streaming online prediction request for Vertex LLM streaming.

EXEC google.aiplatform.endpoints.server_streaming_predict 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"inputs": "{{ inputs }}", 
"parameters": "{{ parameters }}"
}'
;

Successful response

EXEC google.aiplatform.endpoints.predict_long_running 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"instances": "{{ instances }}", 
"parameters": "{{ parameters }}"
}'
;

Perform an online explanation. If deployed_model_id is specified, the corresponding DeployModel must have explanation_spec populated. If deployed_model_id is not specified, all DeployedModels must have explanation_spec populated.

EXEC google.aiplatform.endpoints.explain 
@projectsId='{{ projectsId }}' --required, 
@locationsId='{{ locationsId }}' --required, 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"instances": "{{ instances }}", 
"parameters": "{{ parameters }}", 
"explanationSpecOverride": "{{ explanationSpecOverride }}", 
"deployedModelId": "{{ deployedModelId }}"
}'
;

Generate content with multimodal inputs.

EXEC google.aiplatform.endpoints.generate_content 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"contents": "{{ contents }}", 
"systemInstruction": "{{ systemInstruction }}", 
"cachedContent": "{{ cachedContent }}", 
"tools": "{{ tools }}", 
"toolConfig": "{{ toolConfig }}", 
"labels": "{{ labels }}", 
"safetySettings": "{{ safetySettings }}", 
"modelArmorConfig": "{{ modelArmorConfig }}", 
"generationConfig": "{{ generationConfig }}"
}'
;

Generate content with multimodal inputs with streaming support.

EXEC google.aiplatform.endpoints.stream_generate_content 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"contents": "{{ contents }}", 
"systemInstruction": "{{ systemInstruction }}", 
"cachedContent": "{{ cachedContent }}", 
"tools": "{{ tools }}", 
"toolConfig": "{{ toolConfig }}", 
"labels": "{{ labels }}", 
"safetySettings": "{{ safetySettings }}", 
"modelArmorConfig": "{{ modelArmorConfig }}", 
"generationConfig": "{{ generationConfig }}"
}'
;

Perform a token counting.

EXEC google.aiplatform.endpoints.count_tokens 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"model": "{{ model }}", 
"instances": "{{ instances }}", 
"contents": "{{ contents }}", 
"systemInstruction": "{{ systemInstruction }}", 
"tools": "{{ tools }}", 
"generationConfig": "{{ generationConfig }}"
}'
;

Return a list of tokens based on the input text.

EXEC google.aiplatform.endpoints.compute_tokens 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"instances": "{{ instances }}", 
"model": "{{ model }}", 
"contents": "{{ contents }}"
}'
;

Fetch an asynchronous online prediction operation.

EXEC google.aiplatform.endpoints.fetch_predict_operation 
@endpointsId='{{ endpointsId }}' --required 
@@json=
'{
"operationName": "{{ operationName }}"
}'
;

Overview​

Fields​

Methods​

Parameters​

SELECT examples​

INSERT examples​

UPDATE examples​

DELETE examples​

Lifecycle Methods​