-
Notifications
You must be signed in to change notification settings - Fork 183
[8.0] feat: PoolCE takes care of RAM requirements #8232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rel-v8r0
Are you sure you want to change the base?
Conversation
if availableMemory < requestedMemory: | ||
return None | ||
|
||
return requestedMemory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have overlooked something, do we specify a default RAM value for the jobs?
Because if I understand correctly, getRAMInUse()
depends on the value returned by _getMemoryForJobs()
, which would be 0 if the tag *GB
or *GB_MAX
is not specified (?).
So if we have n
jobs with no "RAM" tag, we will always have getRAMInUse = 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no obvious default value for RAM (for processors we specify 1 which is a slightly more obvious default value). So, yes, as it stands the default value for RAM is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RAM and NumberOfProcessors should be treated in exactly the same way. These are resources with integer values, having default values and summing up for jobs on the same node. So, may be common methods (or class) can be introduced for those, may be other tags can be added eventually, e.g. disk space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having RAM in GBs isn't fine enough for an integer value, having less than 1GB per core is sensible on some systems and jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be made MB instead?
self.ceParameters["MaxTotalJobs"] = self.processors | ||
max_ram = int(self.ceParameters.get("MaxRAM", 0)) | ||
if max_ram > 0: | ||
self.ram = max_ram // 1024 # Convert from MB to GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we specify MaxRAM
in GB directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification comes from BDII2CSAgent. I guess just "historical". Maybe MB should be used everywhere...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thanks!
Good question, may be it would be easier to have a single unit indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better GB everywhere. BDII2CSAgent can be updated if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
I've done a quick review: Please let me know if you need anything else from me on this. (I didn't quite hear what you said about it in the most recent BiLD meeting: there was some local noise just as you were talking about it).
The other CEs (singularity/inprocess) should probably set the same memory setting based on the tags too if they've not already been set by PoolCE for cases where the PoolCE isn't in use.
Regards,
Simon
# only NumberOfProcessors for now, but RAM (or other stuff) can also be added | ||
jobID = int(kwargs.get("jobDesc", {}).get("jobID", 0)) | ||
cd.setOptionInCFG("/Resources/Computing/JobLimits/%d/NumberOfProcessors" % jobID, processorsForJob) | ||
cd.setOptionInCFG("/Resources/Computing/JobLimits/%d/MaxRAM" % jobID, memoryForJob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to read this value in the innerCE to pass it to the CG2Manager? (It should just be a case of setting MemoryLimitMB in (a copy of) the ceParameters dictionary just before it's given to systemCall).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intention to read this value in the innerCE to pass it to the CG2Manager?
yes.
But I do not fully understand the rest of the phrase.
6c09d92
to
ee00569
Compare
|
||
self.processors = int(self.ceParameters.get("NumberOfProcessors", self.processors)) | ||
self.ceParameters["MaxTotalJobs"] = self.processors | ||
max_ram = int(self.ceParameters.get("MaxRAM", 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining this at the CE level isn't going to work very well (e.g. the HLTFarm has several different hardware configurations with wildly different RAM per core).
Can we inspect the host to have a sensible default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already at the "inner CE" level (this is a very common confusion). So, basically it is at the level of the Worker Node.
BEGINRELEASENOTES
*WMS
NEW: PoolComputingElement takes RAM requirements into consideration
ENDRELEASENOTES