|
| 1 | +.. _resource_sched_profiler: |
| 2 | + |
| 3 | +============================================ |
| 4 | +Resource Scheduling and Profiling with Nipype |
| 5 | +============================================ |
| 6 | +The latest version of Nipype supports system resource scheduling and profiling. |
| 7 | +These features allows users to ensure high throughput of their data processing |
| 8 | +while also controlling the amount of computing resources a given workflow will |
| 9 | +use. |
| 10 | + |
| 11 | +Specifying Resources in the Node Interface |
| 12 | +========================================== |
| 13 | +Each ``Node`` instance interface has two parameters that specify its expected |
| 14 | +thread and memory usage: ``num_threads`` and ``estimated_memory_gb``. If a |
| 15 | +particular node is expected to use 8 threads and 2 GB of memory: |
| 16 | + |
| 17 | +:: |
| 18 | + import nipype.pipeline.engine as pe |
| 19 | + node = pe.Node() |
| 20 | + node.interface.num_threads = 8 |
| 21 | + node.interface.estimated_memory_gb = 2 |
| 22 | + |
| 23 | +If the resource parameters are never set, they default to being 1 thread and 1 |
| 24 | +GB of RAM. |
| 25 | + |
| 26 | +Resource Scheduler |
| 27 | +================== |
| 28 | +The ``MultiProc`` workflow plugin schedules node execution based on the |
| 29 | +resources used by the current running nodes and the total resources available to |
| 30 | +the workflow. The plugin utilizes the plugin arguments ``n_procs`` and |
| 31 | +``memory_gb`` to set the maximum resources a workflow can utilize. To limit a |
| 32 | +workflow to using 4 cores and 6 GB of RAM: |
| 33 | + |
| 34 | +:: |
| 35 | + args_dict = {'n_procs' : 4, 'memory_gb' : 6} |
| 36 | + workflow.run(plugin='MultiProc', plugin_args=args_dict) |
| 37 | + |
| 38 | +If these values are not specifically set then the plugin will assume it can |
| 39 | +use all of the processors and memory on the system. For example, if the machine |
| 40 | +has 16 cores and 12 GB of RAM, the workflow will internally assume those values |
| 41 | +for ``n_procs`` and ``memory_gb``, respectively. |
| 42 | + |
| 43 | +The plugin will then queue eligible nodes for execution based on their expected |
| 44 | +usage via the ``num_threads`` and ``estimated_memory_gb`` interface parameters. |
| 45 | +If the plugin sees that only 3 of its 4 processors and 4 GB of its 6 GB of RAM |
| 46 | +are being used, it will attempt to execute the next available node as long as |
| 47 | +its ``num_threads = 1`` and ``estimated_memory_gb <= 2``. If this is not the |
| 48 | +case, it will continue to check every available node in the queue until it sees |
| 49 | +a node that meets these conditions or it waits for a executing node to finish to |
| 50 | +earn back the necessary resources. The priority of the queue is highest for |
| 51 | +nodes with the most ``estimated_memory_gb`` followed by nodes with the most |
| 52 | +expected ``num_threads``. |
| 53 | + |
| 54 | +Runtime Profiler and using the Callback Log |
| 55 | +=========================================== |
| 56 | +It is not always easy to estimate the amount of resources a particular function |
| 57 | +or command uses. To help with this, Nipype provides some feedback about the |
| 58 | +system resources used by every node during workflow execution via the built-in |
| 59 | +runtime profiler. The runtime profiler is automatically enabled if the] |
| 60 | +``psutil`` Python package is installed and found on the system. If the package |
| 61 | +is not found, the workflow will run normally without the runtime profiler. |
| 62 | + |
| 63 | +The runtime profiler records the number of threads and the amount of memory (GB) |
| 64 | +used as ``runtime_threads`` and ``runtime_memory_gb`` in the Node's |
| 65 | +``result.runtime`` parameter. Since the node object is pickled and written to |
| 66 | +disk in its working directory, these values are available for analysis after |
| 67 | +node or workflow execution by parsing the pickle file contents in Python. |
| 68 | + |
| 69 | +Nipype also provides a logging mechanism for saving node runtime statistics to |
| 70 | +a JSON-style log file via the ``log_nodes_cb`` logger function. This is enabled |
| 71 | +by setting the ``status_callback`` parameter to point to this function in the |
| 72 | +``plugin_args`` when using the ``MultiProc`` plugin. |
| 73 | + |
| 74 | +:: |
| 75 | + from nipype.pipeline.plugins.callback_log import log_nodes_cb |
| 76 | + args_dict = {'n_procs' : 4, 'memory_gb' : 6, |
| 77 | + 'status_callback' : log_nodes_cb} |
| 78 | + |
| 79 | +To set the filepath for the callback log the ``'callback'`` logger must be |
| 80 | +configured. |
| 81 | + |
| 82 | +:: |
| 83 | + # Set path to log file |
| 84 | + import logging |
| 85 | + callback_log_path = '/home/user/run_stats.log' |
| 86 | + logger = logging.getLogger('callback') |
| 87 | + logger.setLevel(logging.DEBUG) |
| 88 | + handler = logging.FileHandler(callback_log_path) |
| 89 | + logger.addHandler(handler) |
| 90 | + |
| 91 | +Finally, the workflow can be ran. |
| 92 | + |
| 93 | +:: |
| 94 | + workflow.run(plugin='MultiProc', plugin_args=args_dict) |
| 95 | + |
| 96 | +After the workflow finishes executing, the log file at |
| 97 | +"/home/user/run_stats.log" can be parsed for the runtime statistics. Here is an |
| 98 | +example of what the contents would look like: |
| 99 | + |
| 100 | +:: |
| 101 | + {"name":"resample_node","id":"resample_node", |
| 102 | + "start":"2016-03-11 21:43:41.682258", |
| 103 | + "estimated_memory_gb":2,"num_threads":1} |
| 104 | + {"name":"resample_node","id":"resample_node", |
| 105 | + "finish":"2016-03-11 21:44:28.357519", |
| 106 | + "estimated_memory_gb":"2","num_threads":"1", |
| 107 | + "runtime_threads":"3","runtime_memory_gb":"1.118469238281"} |
| 108 | + |
| 109 | +Here it can be seen that the number of threads was underestimated while the |
| 110 | +amount of memory needed was overestimated. The next time this workflow is run |
| 111 | +the user can change the node interface ``num_threads`` and |
| 112 | +``estimated_memory_gb`` parameters to reflect this for a higher pipeline |
| 113 | +throughput. |
| 114 | + |
| 115 | +Visualizing Pipeline Resources |
| 116 | +============================== |
| 117 | +Nipype provides the ability to visualize the workflow execution based on the |
| 118 | +runtimes and system resources each node takes. |
0 commit comments