-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Setup cgroup v2 in C++ #49416
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: dentiny <[email protected]>
src/ray/common/cgroup/cgroup_utils.h
Outdated
// There're two types of memory cgroup constraints: | ||
// 1. For those with limit capped, they will be created a dedicated cgroup; | ||
// 2. For those without limit specified, they will be added to the default cgroup. | ||
inline constexpr std::string_view kDefaultCgroupUuid = "default_cgroup_uuid"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this default_cgroup_uuid
the name for the default cgroup? the word uuid
sounds strange to me, since this is a name but not a uuid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I choose the word uuid
here since context contains a data field called uuid
.
Signed-off-by: dentiny <[email protected]>
a615cab
to
feb0282
Compare
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need more designs on the API. For that, please describe how would you like to use this API, e.g. in WorkerPool or NodeManager. Write some pseudo code in worker / task lifetime to highlight usage of cgroup api.
namespace ray { | ||
|
||
// Context used to setup cgroupv2 for a task / actor. | ||
struct PhysicalModeExecutionContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: do we need this separate config class from CgroupV2Setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These data fields are necessary to construct and destruct cgroup;
As of now the struct doesn't seem that necessary since it only contains 4 fields and we could directly pass them into the factory function, but we could have much more fields (i.e. cpu-related, resource min / high), better to have a struct.
// 1. For those with limit capped, they will be created a dedicated cgroup; | ||
// 2. For those without limit specified, they will be added to the default cgroup. | ||
static constexpr std::string_view kDefaultCgroupV2Uuid = "default_cgroup_uuid"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a uuid, and we should not use a name like this visible in linux. We had the idea of getting a default name from cluster ID right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed as id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had the idea of getting a default name from cluster ID right?
I'm not sure how cluster id is related here?
return std::unique_ptr<CgroupV2Setup>(new CgroupV2Setup(std::move(ctx))); | ||
} | ||
|
||
CgroupV2Setup::~CgroupV2Setup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before deleting the cgroup v2 we need to first move proc out of it, or the deletion would fail. for that you need to somehow record the prev cgroup if any, or we can blindly move all those procs to the default cgroup. this can come from a "global cgroup mgr" in raylet:
class CgroupV2Manager {
ctor(default_cgroup_name);
bool PutPidIntoDefaultCgroupRemovingAnyCgroupsIfAny(pid);
bool CreateCgroupV2ForPid(pid, cgroup_name);
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before deleting the cgroup v2 we need to first move proc out of it, or the deletion would fail.
Yes.. fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, I add a few items based on our offline discussion:
- I add a README for cgroup
- I add comments on local task manager on how I plan to integrate the cgroup RAII class with local task manager
- I add comments on how I plan to prepare cgroup basic setup in raylet
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dentiny let's have a meeting to discuss about design of this PR.
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Offline discussion: need to document the precondition
|
Possible map to place cgroup: ray/src/ray/core_worker/transport/normal_task_submitter.h Lines 366 to 367 in c79de07
|
Possible place to put cgroup setup class: ray/src/ray/raylet/local_task_manager.cc Lines 728 to 737 in c79de07
|
Discussed offline with @rynewang , we hope two other items are included for easier review and roadmap:
|
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
Signed-off-by: dentiny <[email protected]>
@@ -387,6 +387,16 @@ void LocalTaskManager::DispatchScheduledTasksToWorkers() { | |||
const std::shared_ptr<WorkerInterface> worker, | |||
PopWorkerStatus status, | |||
const std::string &runtime_env_setup_error_message) -> bool { | |||
// TODO(hjiang): After getting the ready-to-use worker and task id, we're |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait a minute... The running_tasks
is inserted with task ID, before we call PopWorker
. This means if you change running_tasks to a map { task ID -> cgroup raii }
, the value must be nullptr at first, and is only set until PopWorker callback is called. I'm OK with this, but, @jjyao do you feel putting the cgroup raii (to remove proc from cgroup and delete cgroup on dtor) in LocalTaskManager's running_tasks is the best place? Is there a even more precise data structure somewhere in raylet to model a running task attempt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value must be nullptr at first, and is only set until PopWorker callback is called
Yeah that's my plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nullptr is needed anyway I think, for example, we need to handle cases where cgroup prerequisite is not met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjyao Do you have any thoughts?
A rewrite of #48788