Run A RayJob
This page shows how to leverage Kueue’s scheduling and resource management capabilities when running KubeRay’s RayJob.
This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue’s overview.
Before you begin
-
Make sure you are using Kueue v0.6.0 version or newer and KubeRay v1.1.0 or newer.
-
Check Administer cluster quotas for details on the initial Kueue setup.
-
See KubeRay Installation for installation and configuration details of KubeRay.
Note
In order to use RayJob, prior to v0.8.1, you need to restart Kueue after the installation. You can do it by running:kubectl delete pods -lcontrol-plane=controller-manager -nkueue-system
.
RayJob definition
When running RayJobs on Kueue, take into consideration the following aspects:
a. Queue selection
The target local queue should be specified in the metadata.labels
section of the RayJob configuration.
b. Configure the resource needs
The resource needs of the workload can be configured in the spec.rayClusterSpec
.
c. Limitations
- A Kueue managed RayJob cannot use an existing RayCluster.
- The RayCluster should be deleted at the end of the job execution,
spec.ShutdownAfterJobFinishes
should betrue
. - Because Kueue will reserve resources for the RayCluster,
spec.rayClusterSpec.enableInTreeAutoscaling
should befalse
. - Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of
spec.rayClusterSpec.workerGroupSpecs
is 7.
Example RayJob
In this example, the code is provided to the Ray framework via a ConfigMap.
The RayJob looks like the following:
You can run this RayJob with the following commands:
Note
The example above comes from here and only has thequeue-name
label added and requests updated.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.