Skip to content

fix cpu count#2211

Closed
NanoNabla wants to merge 1 commit intocoqui-ai:mainfrom
NanoNabla:fix_mp_cpucount
Closed

fix cpu count#2211
NanoNabla wants to merge 1 commit intocoqui-ai:mainfrom
NanoNabla:fix_mp_cpucount

Conversation

@NanoNabla
Copy link
Contributor

@NanoNabla NanoNabla commented May 12, 2022

Using os.cpu_count()) can result too high numbers when using cluster management software e.g. SLURM.
Therefore, more processes than available cpus are started which result in low performance.

Besides cluster management software environments with linux standard taskset are also affected.

I replaced os.cpu_count() with len(psutil.Process().cpu_affinity()) which sould return a real value of available processors.

An example is

taskset -c 0 ./main.py
# main.py
import os
import psutil
print(os.cpu_count())
print(len(os.sched_getaffinity(0)))
print(len(psutil.Process().cpu_affinity()))

More information about this issue can be found here

@CLAassistant
Copy link

CLAassistant commented May 12, 2022

CLA assistant check
All committers have signed the CLA.

@wasertech
Copy link
Collaborator

I have introduced this function with #2249

def available_cpu_count():
    """Number of available virtual or physical CPUs on this system, i.e.
    user/real as output by time(1) when called with an optimally scaling
    userspace-only program
    See this https://stackoverflow.com/a/1006301/13561390"""

    # cpuset
    # cpuset may restrict the number of *available* processors
    try:
        m = re.search(r"(?m)^Cpus_allowed:\s*(.*)$", open("/proc/self/status").read())
        if m:
            res = bin(int(m.group(1).replace(",", ""), 16)).count("1")
            if res > 0:
                return res
    except IOError:
        pass

    # Python 2.6+
    try:
        import multiprocessing

        return multiprocessing.cpu_count()
    except (ImportError, NotImplementedError):
        pass

    # https://github.com/giampaolo/psutil
    try:
        import psutil

        return psutil.cpu_count()  # psutil.NUM_CPUS on old versions
    except (ImportError, AttributeError):
        pass

    # POSIX
    try:
        res = int(os.sysconf("SC_NPROCESSORS_ONLN"))

        if res > 0:
            return res
    except (AttributeError, ValueError):
        pass

    # Windows
    try:
        res = int(os.environ["NUMBER_OF_PROCESSORS"])

        if res > 0:
            return res
    except (KeyError, ValueError):
        pass

    # jython
    try:
        from java.lang import Runtime

        runtime = Runtime.getRuntime()
        res = runtime.availableProcessors()
        if res > 0:
            return res
    except ImportError:
        pass

    # BSD
    try:
        sysctl = subprocess.Popen(["sysctl", "-n", "hw.ncpu"], stdout=subprocess.PIPE)
        scStdout = sysctl.communicate()[0]
        res = int(scStdout)

        if res > 0:
            return res
    except (OSError, ValueError):
        pass

    # Linux
    try:
        res = open("/proc/cpuinfo").read().count("processor\t:")

        if res > 0:
            return res
    except IOError:
        pass

    # Solaris
    try:
        pseudoDevices = os.listdir("/devices/pseudo/")
        res = 0
        for pd in pseudoDevices:
            if re.match(r"^cpuid@[0-9]+$", pd):
                res += 1

        if res > 0:
            return res
    except OSError:
        pass

    # Other UNIXes (heuristic)
    try:
        try:
            dmesg = open("/var/run/dmesg.boot").read()
        except IOError:
            dmesgProcess = subprocess.Popen(["dmesg"], stdout=subprocess.PIPE)
            dmesg = dmesgProcess.communicate()[0]

        res = 0
        while "\ncpu" + str(res) + ":" in dmesg:
            res += 1

        if res > 0:
            return res
    except OSError:
        pass

    raise Exception("Can not determine number of CPUs on this system")

Maybe we could use it stt-wide?

@NanoNabla
Copy link
Contributor Author

Since your code seems to work with taskset on Linux I'm fine with using it stt-wide. But since this PR seems to be of no interest I do not have time to take any effort into this anymore.

@wasertech
Copy link
Collaborator

wasertech commented Jul 5, 2022

since this PR seems to be of no interest I do not have time to take any effort into this anymore.

That's because your code didn't pass the lint pre-commit checks, my guess is that you didn't hooked pre-commit correctly like described in the Code Linting section of CONTRIBUTING.rst.

I'll try to take some time to make it happen then.

@wasertech
Copy link
Collaborator

#2253 should do nicely.

@wasertech wasertech mentioned this pull request Jul 5, 2022
@wasertech
Copy link
Collaborator

We can probably also close this PR in favor of #2253

@NanoNabla NanoNabla closed this Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants