RayService Quickstart#

Prerequisites#

This guide mainly focuses on the behavior of KubeRay v1.3.0 and Ray 2.41.0.

What’s a RayService?#

A RayService manages these components:

  • RayCluster: Manages resources in a Kubernetes cluster.

  • Ray Serve Applications: Manages users’ applications.

What does the RayService provide?#

  • Kubernetes-native support for Ray clusters and Ray Serve applications: After using a Kubernetes configuration to define a Ray cluster and its Ray Serve applications, you can use kubectl to create the cluster and its applications.

  • In-place updating for Ray Serve applications: See RayService for more details.

  • Zero downtime upgrading for Ray clusters: See RayService for more details.

  • High-availabilable services: See RayService high availability for more details.

Example: Serve two simple Ray Serve applications using RayService#

Step 1: Create a Kubernetes cluster with Kind#

kind create cluster --image=kindest/node:v1.26.0

Step 2: Install the KubeRay operator#

Follow this document to install the latest stable KubeRay operator from the Helm repository. Note that the YAML file in this example uses serveConfigV2 to specify a multi-application Serve configuration, available starting from KubeRay v0.6.0.

Step 3: Install a RayService#

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.3.0/ray-operator/config/samples/ray-service.sample.yaml

Step 4: Verify the Kubernetes cluster status#

# List all RayService custom resources in the `default` namespace.
kubectl get rayservice
NAME                SERVICE STATUS   NUM SERVE ENDPOINTS
rayservice-sample   Running          2
# List all RayCluster custom resources in the `default` namespace.
kubectl get raycluster
NAME                                 DESIRED WORKERS   AVAILABLE WORKERS   CPUS    MEMORY   GPUS   STATUS   AGE
rayservice-sample-raycluster-czjtm   1                 1                   2500m   4Gi      0      ready    4m21s
# List all Ray Pods in the `default` namespace.
kubectl get pods -l=ray.io/is-ray-node=yes
NAME                                                          READY   STATUS    RESTARTS   AGE
rayservice-sample-raycluster-czjtm-head-ldxl7                 1/1     Running   0          4m21s
rayservice-sample-raycluster-czjtm-small-group-worker-pk88k   1/1     Running   0          4m21s
# Check the `Ready` condition of the RayService.
# The RayService is ready to serve requests when the condition is `True`.
# Users can also use `kubectl describe rayservices.ray.io rayservice-sample` to check the condition section
kubectl get rayservice rayservice-sample -o json | jq -r '.status.conditions[] | select(.type=="Ready") | to_entries[] | "\(.key): \(.value)"'
lastTransitionTime: 2025-04-11T16:17:01Z
message: Number of serve endpoints is greater than 0
observedGeneration: 1
reason: NonZeroServeEndpoints
status: True
type: Ready
# List services in the `default` namespace.
kubectl get services -o json | jq -r '.items[].metadata.name'
kuberay-operator
kubernetes
rayservice-sample-head-svc
rayservice-sample-raycluster-czjtm-head-svc
rayservice-sample-serve-svc

When the Ray Serve applications are healthy and ready, KubeRay creates a head service and a Ray Serve service for the RayService custom resource. For example, rayservice-sample-head-svc and rayservice-sample-serve-svc.

What do these services do?

  • rayservice-sample-head-svc
    This service points to the head pod of the active RayCluster and is typically used to view the Ray Dashboard (port 8265).

  • rayservice-sample-serve-svc
    This service exposes the HTTP interface of Ray Serve, typically on port 8000.
    Use this service to send HTTP requests to your deployed Serve applications (e.g., REST API, ML inference, etc.).

Step 5: Verify the status of the Serve applications#

# (1) Forward the dashboard port to localhost.
# (2) Check the Serve page in the Ray dashboard at http://localhost:8265/#/serve.
kubectl port-forward svc/rayservice-sample-head-svc 8265:8265 > /dev/null &
  • Refer to rayservice-troubleshooting.md for more details on RayService observability. Below is a screenshot example of the Serve page in the Ray dashboard. Ray Serve Dashboard

Step 6: Send requests to the Serve applications by the Kubernetes serve service#

# Step 6.1: Run a curl Pod.
# If you already have a curl Pod, you can use `kubectl exec -it <curl-pod> -- sh` to access the Pod.
kubectl run curl --image=radial/busyboxplus:curl --command -- tail -f /dev/null
# Step 6.3: Send a request to the calculator app.
kubectl exec curl -- curl -sS -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/calc/ -d '["MUL", 3]'
15 pizzas please!
# Step 6.2: Send a request to the fruit stand app.
kubectl exec curl -- curl -sS -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/fruit/ -d '["MANGO", 2]'
6

Step 7: Clean up the Kubernetes cluster#

# Kill the `kubectl port-forward` background job in the earlier step
killall kubectl
kind delete cluster

Next steps#

  • See RayService document for the full list of RayService features, including in-place update, zero downtime upgrade, and high-availability.

  • See RayService troubleshooting guide if you encounter any issues.

  • See Examples for more RayService examples. The MobileNet example is a good example to start with because it doesn’t require GPUs and is easy to run on a local machine.