Skip to content

Conversation

fstagni
Copy link
Contributor

@fstagni fstagni commented Jun 25, 2025

BEGINRELEASENOTES

*WMS
NEW: PoolComputingElement takes RAM requirements into consideration

ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Jun 25, 2025
if availableMemory < requestedMemory:
return None

return requestedMemory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have overlooked something, do we specify a default RAM value for the jobs?

Because if I understand correctly, getRAMInUse() depends on the value returned by _getMemoryForJobs(), which would be 0 if the tag *GB or *GB_MAX is not specified (?).
So if we have n jobs with no "RAM" tag, we will always have getRAMInUse = 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no obvious default value for RAM (for processors we specify 1 which is a slightly more obvious default value). So, yes, as it stands the default value for RAM is 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAM and NumberOfProcessors should be treated in exactly the same way. These are resources with integer values, having default values and summing up for jobs on the same node. So, may be common methods (or class) can be introduced for those, may be other tags can be added eventually, e.g. disk space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having RAM in GBs isn't fine enough for an integer value, having less than 1GB per core is sensible on some systems and jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made MB instead?

self.ceParameters["MaxTotalJobs"] = self.processors
max_ram = int(self.ceParameters.get("MaxRAM", 0))
if max_ram > 0:
self.ram = max_ram // 1024 # Convert from MB to GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we specify MaxRAM in GB directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification comes from BDII2CSAgent. I guess just "historical". Maybe MB should be used everywhere...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see thanks!
Good question, may be it would be easier to have a single unit indeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be better GB everywhere. BDII2CSAgent can be updated if necessary.

@fstagni fstagni requested a review from sfayer June 26, 2025 09:49
Copy link
Member

@sfayer sfayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

I've done a quick review: Please let me know if you need anything else from me on this. (I didn't quite hear what you said about it in the most recent BiLD meeting: there was some local noise just as you were talking about it).

The other CEs (singularity/inprocess) should probably set the same memory setting based on the tags too if they've not already been set by PoolCE for cases where the PoolCE isn't in use.

Regards,
Simon

# only NumberOfProcessors for now, but RAM (or other stuff) can also be added
jobID = int(kwargs.get("jobDesc", {}).get("jobID", 0))
cd.setOptionInCFG("/Resources/Computing/JobLimits/%d/NumberOfProcessors" % jobID, processorsForJob)
cd.setOptionInCFG("/Resources/Computing/JobLimits/%d/MaxRAM" % jobID, memoryForJob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to read this value in the innerCE to pass it to the CG2Manager? (It should just be a case of setting MemoryLimitMB in (a copy of) the ceParameters dictionary just before it's given to systemCall).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention to read this value in the innerCE to pass it to the CG2Manager?

yes.
But I do not fully understand the rest of the phrase.

@fstagni fstagni force-pushed the 80_Pool_RAM branch 3 times, most recently from 6c09d92 to ee00569 Compare July 2, 2025 12:21

self.processors = int(self.ceParameters.get("NumberOfProcessors", self.processors))
self.ceParameters["MaxTotalJobs"] = self.processors
max_ram = int(self.ceParameters.get("MaxRAM", 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining this at the CE level isn't going to work very well (e.g. the HLTFarm has several different hardware configurations with wildly different RAM per core).

Can we inspect the host to have a sensible default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already at the "inner CE" level (this is a very common confusion). So, basically it is at the level of the Worker Node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alsoTargeting:integration Cherry pick this PR to integration after merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants