Multi GPU systems and GPU affinity

Viewing 13 posts - 1 through 13 (of 13 total)
  • Posted in: Muster 9
  • 11th February 2020 at 6:06 pm #28951

    Maya 2019.2
    Redshift 3.0.13
    Muster 9.0.14

    I’ve been doing some testing on our render farm to optimize our rendering power.
    A little background. We are running multiple render nodes and have a centralized install of redshift. We have a couple render nodes with 4x 2080TIs, 64GM ram, and 8 cores.

    For the longest time we’ve been rendering with a config that uses all 4 GPUs on a single frame. But I started running a GPU affinity with muster so we can have all 4 GPUs render a single frame each. This essentially renders our jobs twice as fast. Awesome!

    I am pretty sure that I set up the affinity correctly by going to the config, increasing the instances on the host to 4, then setting the GPU affinity mask to GPU1, GPU2, GPU3, GPU4 for each instance. So, for one host, I have 4 instances each only able to use a single GPU. Is this the correct setup process?

    I’ve been testing on a fairly heavy scene to make sure the render nodes can hold up to the stress.
    I have run into one issue. Often I am getting a maya error code in the log:
    Maya exited with status -1073741819. I believe it might be the machines are running out of memory?

    How does muster kick out these jobs specifically when the host has 4 separate instances? Does it send 4 command line batches to the host (one for each instance)? This effectively would load the same scene 4 times which may be too much for the computers to handle.

    I think that maya may just be overwhelmed with the amount of processing that’s going on. But I am wondering if there are any other things I should take into consideration or I can try to rectify this and gain more rendering power/optimization.

    What I am trying to replicate is this deadline feature which allows redshift to set the number of GPUs per task. I think in a perfect world, muster would set off a render, load the scene once, then allow redshift to use one GPU per frame rather than 4 GPUs per frame. Is there any way to do this? That may not be the right solution…

    Any other thoughts or ideas that would help with this configuration in general?

    Thank you!
    Adam

    I will post the Log files for muster and redshift below to keep the thread clean – I’ve removed some client info in the project path FYI.

    11th February 2020 at 6:07 pm #28952

    Muster Log File:

    [MUSTER]Spawning process C:\Program Files\Autodesk\Maya2019\bin\Render.exe inside C:\Program Files\Autodesk\Maya2019\bin using the following command line flags:
    [MUSTER]-r redshift -logLevel 1 -gpu {2} -proj “\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya” -s 41.000 -e 44.000 -b 1.000 -rfs 41 -rfb 1 -pad 1 -rl particles “\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\scenes\sn05_Fingerprint\sn05_sh01_printHallway\sn05_sh02_PrintHallway_MusterTest_v02.mb”

    Starting “C:\Program Files\Autodesk\Maya2019\bin\mayabatch.exe”
    VP2 Error : Failed to initialize graphics device.

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    [Redshift] Redshift for Maya 2019

    [Redshift] Version 3.0.13, Dec 20 2019

    Installing Redshift MEL overrides…

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/connectNodeToAttrOverride.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createMayaSoftwareCommonGlobalsTab (2).mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createMayaSoftwareCommonGlobalsTab.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createRenderNode.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/mayaBatchRender.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/MLdeleteUnused.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/relationshipEditor.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/renderWindowPanel.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/renderWithCurrentRenderer.mel

    File read in 4.7 seconds.

    Result: //192.168.1.6/Xsan/_PROJECTS/[PROJECT_PATH_REMOVED]/3D/Maya/scenes/sn05_Fingerprint/sn05_sh01_printHallway/sn05_sh02_PrintHallway_MusterTest_v02.mb

    [Redshift] Cache path: C:\Users\pixel\AppData\Local\Redshift\Cache

    [Redshift]

    [Redshift] Redshift Initialized

    [Redshift] Version: 3.0.13, Dec 20 2019

    [Redshift] Windows Platform (Windows 10 Pro)

    [Redshift] Release Build

    [Redshift] Number of CPU HW threads: 16

    [Redshift] CPU speed: 3.79 GHz

    [Redshift] Total system memory: 63.88 GB

    [Redshift] TDR delay: 60s

    [Redshift] Current working dir: C:\Program Files\Autodesk\Maya2019\bin

    [Redshift] redshift_LICENSE=5055@192.168.3.25

    [Redshift] Creating CUDA contexts

    [Redshift] CUDA init ok

    [Redshift] Ordinals: { 2 }

    [Redshift] Initializing GPUComputing module (CUDA). Ordinal 2

    [Redshift] CUDA Ver: 10020

    [Redshift] Device 3/4 : GeForce RTX 2080 Ti

    [Redshift] Compute capability: 7.5

    [Redshift] Num multiprocessors: 68

    [Redshift] PCI busID: 66, deviceID: 0, domainID: 0

    [Redshift] Theoretical memory bandwidth: 616.000000 GB/Sec

    [Redshift] Measured PCIe bandwidth (pinned CPU->GPU): 6.110065 GB/s

    [Redshift] Measured PCIe bandwidth (pinned GPU->CPU): 2.536454 GB/s

    [Redshift] Measured PCIe bandwidth (paged CPU->GPU): 3.767165 GB/s

    [Redshift] Measured PCIe bandwidth (paged GPU->CPU): 2.396808 GB/s

    [Redshift] Estimated GPU->CPU latency (0): 0.049223 ms

    [Redshift] Estimated GPU->CPU latency (1): 0.049464 ms

    [Redshift] Estimated GPU->CPU latency (2): 0.049904 ms

    [Redshift] Estimated GPU->CPU latency (3): 0.050430 ms

    [Redshift] New CUDA context created

    [Redshift] Available memory: 9248.9625 MB out of 11264.0000 MB

    [Redshift]

    [Redshift] Loading Redshift procedural extensions…

    [Redshift] Done!

    [Redshift] Redshift for Maya 2019

    [Redshift] Version 3.0.13, Dec 20 2019

    [Redshift] Rendering layer ‘rs_particles’, frame 41 (1/4)

    [Redshift] Scene translation time: 4.52s

    [Redshift] =================================================================================================

    [Redshift] Rendering frame 41…

    [Redshift] AMM enabled

    [Redshift] =================================================================================================

    [Redshift] License acquired

    [Redshift] License for redshift-core 2020.09 (permanent)

    [Redshift]

    [Redshift] Rendering time: 17.5s (1 GPU(s) used)

    [Redshift] Saved file ‘\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\images\sn05_Fingerprint\sn05_sh02\sn05_sh02_PrintHallway_MusterTest_v02\particles.41.exr’ in 1.34s

    [Redshift] Frame done – total time for layer ‘rs_particles’, frame 41 (1/4): 25.11s

    [Redshift] License returned

    [Redshift] Rendering layer ‘rs_particles’, frame 42 (2/4)

    [Redshift] Scene translation time: 2.88s

    [Redshift] =================================================================================================

    [Redshift] Rendering frame 42…

    [Redshift] AMM enabled

    [Redshift] =================================================================================================

    [Redshift] License acquired

    [Redshift] License for redshift-core 2020.09 (permanent)

    [Redshift]

    [Redshift] Rendering time: 17.1s (1 GPU(s) used)

    [Redshift] Saved file ‘\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\images\sn05_Fingerprint\sn05_sh02\sn05_sh02_PrintHallway_MusterTest_v02\particles.42.exr’ in 1.34s

    [Redshift] Frame done – total time for layer ‘rs_particles’, frame 42 (2/4): 26.02s

    Stack trace:

    DependEngine.dll!TplugChildren::find

    DependEngine.dll!Tplug::findInBranchingNet

    OpenMaya.dll!Autodesk::Maya::OpenMaya20190000::MPlug::isConnected

    OpenMaya.dll!MPyUtil::valueToObject<Autodesk::Maya::OpenMaya20190000::MPlug>

    python27.dll!PyComplex_AsCComplex

    python27.dll!_PyObject_GenericGetAttrWithDict

    python27.dll!PyEval_EvalFrameEx

    python27.dll!PyEval_EvalCodeEx

    python27.dll!PyFunction_SetClosure

    python27.dll!PyObject_Call

    python27.dll!PyMethod_New

    python27.dll!PyObject_Call

    python27.dll!PyEval_CallObjectWithKeywords

    CommandEngine.dll!TpythonInterpreter::callPythonFunction

    OpenMaya.dll!THclient::operator=

    OpenMaya.dll!THclient::callbackInScriptNewAPI

    OpenMaya.dll!THclient::userAttributeChangedCallback

    OpenMaya.dll!THclient::userAttributeChangedCallbackCB

    Foundation.dll!TclientServer::notifyClients

    DependEngine.dll!TdependNode::sendAttributeChangedMsgInternal

    DependEngine.dll!TdrawDbChangeTracker::isListeningToSource

    DependEngine.dll!TevaluationManager::executeAsWorker
    DependEngine.dll!TparallelExecution::sAddNodesWithinDepthToSet

    tbb.dll!tbb::interface7::internal::task_arena_base::internal_execute

    DependEngine.dll!TparallelExecution::evaluateImplementation

    DependEngine.dll!TevaluationGraph::evaluate

    DependEngine.dll!TevaluationManager::evaluate

    DependEngine.dll!TcipEvaluationManager::operator=

    DependEngine.dll!TdependGraph::setTime

    OpenMayaAnim.dll!Autodesk::Maya::OpenMaya20190000::MAnimControl::setCurrentTime

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    OpenMaya.dll!THcommandObject::doIt

    CommandEngine.dll!TmetaCommand::doCommand

    CommandEngine.dll!TmetaCommandPtrArray::catenate

    Result: //192.168.1.6/Xsan/_PROJECTS/[PROJECT_PATH_REMOVED]/3D/Maya/scenes/sn05_Fingerprint/sn05_sh01_printHallway/sn05_sh02_PrintHallway_MusterTest_v02.ma

    Fatal Error. Attempting to save in C:/Users/pixel/AppData/Local/Temp/pixel.20200210.1521.ma

    // Maya exited with status -1073741819

    [MUSTER]Process terminated with exit code: -1073741819

    11th February 2020 at 6:10 pm #28953

    [MUSTER]Spawning process C:\Program Files\Autodesk\Maya2019\bin\Render.exe inside C:\Program Files\Autodesk\Maya2019\bin using the following command line flags:
    [MUSTER]-r redshift -logLevel 1 -gpu {2} -proj “\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya” -s 41.000 -e 44.000 -b 1.000 -rfs 41 -rfb 1 -pad 1 -rl particles “\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\scenes\sn05_Fingerprint\sn05_sh01_printHallway\sn05_sh02_PrintHallway_MusterTest_v02.mb”

    Starting “C:\Program Files\Autodesk\Maya2019\bin\mayabatch.exe”
    VP2 Error : Failed to initialize graphics device.

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    Cannot open file ‘:/expression.svg’, because: No such file or directory

    [Redshift] Redshift for Maya 2019

    [Redshift] Version 3.0.13, Dec 20 2019

    Installing Redshift MEL overrides…

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/connectNodeToAttrOverride.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createMayaSoftwareCommonGlobalsTab (2).mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createMayaSoftwareCommonGlobalsTab.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/createRenderNode.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/mayaBatchRender.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/MLdeleteUnused.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/relationshipEditor.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/renderWindowPanel.mel

    Sourcing //192.168.1.6/Xsan/_INTERNAL/Utility/Redshift/Plugins/Maya/Common/scripts/override/2019/renderWithCurrentRenderer.mel

    File read in 4.7 seconds.

    Result: //192.168.1.6/Xsan/_PROJECTS/[PROJECT_PATH_REMOVED]/3D/Maya/scenes/sn05_Fingerprint/sn05_sh01_printHallway/sn05_sh02_PrintHallway_MusterTest_v02.mb

    [Redshift] Cache path: C:\Users\pixel\AppData\Local\Redshift\Cache

    [Redshift]

    [Redshift] Redshift Initialized

    [Redshift] Version: 3.0.13, Dec 20 2019

    [Redshift] Windows Platform (Windows 10 Pro)

    [Redshift] Release Build

    [Redshift] Number of CPU HW threads: 16

    [Redshift] CPU speed: 3.79 GHz

    [Redshift] Total system memory: 63.88 GB

    [Redshift] TDR delay: 60s

    [Redshift] Current working dir: C:\Program Files\Autodesk\Maya2019\bin

    [Redshift] redshift_LICENSE=5055@192.168.3.25

    [Redshift] Creating CUDA contexts

    [Redshift] CUDA init ok

    [Redshift] Ordinals: { 2 }

    [Redshift] Initializing GPUComputing module (CUDA). Ordinal 2

    [Redshift] CUDA Ver: 10020

    [Redshift] Device 3/4 : GeForce RTX 2080 Ti

    [Redshift] Compute capability: 7.5

    [Redshift] Num multiprocessors: 68

    [Redshift] PCI busID: 66, deviceID: 0, domainID: 0

    [Redshift] Theoretical memory bandwidth: 616.000000 GB/Sec

    [Redshift] Measured PCIe bandwidth (pinned CPU->GPU): 6.110065 GB/s

    [Redshift] Measured PCIe bandwidth (pinned GPU->CPU): 2.536454 GB/s

    [Redshift] Measured PCIe bandwidth (paged CPU->GPU): 3.767165 GB/s

    [Redshift] Measured PCIe bandwidth (paged GPU->CPU): 2.396808 GB/s

    [Redshift] Estimated GPU->CPU latency (0): 0.049223 ms

    [Redshift] Estimated GPU->CPU latency (1): 0.049464 ms

    [Redshift] Estimated GPU->CPU latency (2): 0.049904 ms

    [Redshift] Estimated GPU->CPU latency (3): 0.050430 ms

    [Redshift] New CUDA context created

    [Redshift] Available memory: 9248.9625 MB out of 11264.0000 MB

    [Redshift]

    [Redshift] Loading Redshift procedural extensions…

    [Redshift] Done!

    [Redshift] Redshift for Maya 2019

    [Redshift] Version 3.0.13, Dec 20 2019

    [Redshift] Rendering layer ‘rs_particles’, frame 41 (1/4)

    [Redshift] Scene translation time: 4.52s

    [Redshift] =================================================================================================

    [Redshift] Rendering frame 41…

    [Redshift] AMM enabled

    [Redshift] =================================================================================================

    [Redshift] License acquired

    [Redshift] License for redshift-core 2020.09 (permanent)

    [Redshift]

    [Redshift] Rendering time: 17.5s (1 GPU(s) used)

    [Redshift] Saved file ‘\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\images\sn05_Fingerprint\sn05_sh02\sn05_sh02_PrintHallway_MusterTest_v02\particles.41.exr’ in 1.34s

    [Redshift] Frame done – total time for layer ‘rs_particles’, frame 41 (1/4): 25.11s

    [Redshift] License returned

    [Redshift] Rendering layer ‘rs_particles’, frame 42 (2/4)

    [Redshift] Scene translation time: 2.88s

    [Redshift] =================================================================================================

    [Redshift] Rendering frame 42…

    [Redshift] AMM enabled

    [Redshift] =================================================================================================

    [Redshift] License acquired

    [Redshift] License for redshift-core 2020.09 (permanent)

    [Redshift]

    [Redshift] Rendering time: 17.1s (1 GPU(s) used)

    [Redshift] Saved file ‘\\192.168.1.6\Xsan\_PROJECTS\[PROJECT_PATH_REMOVED]\3D\Maya\images\sn05_Fingerprint\sn05_sh02\sn05_sh02_PrintHallway_MusterTest_v02\particles.42.exr’ in 1.34s

    [Redshift] Frame done – total time for layer ‘rs_particles’, frame 42 (2/4): 26.02s

    Stack trace:

    DependEngine.dll!TplugChildren::find

    DependEngine.dll!Tplug::findInBranchingNet

    OpenMaya.dll!Autodesk::Maya::OpenMaya20190000::MPlug::isConnected

    OpenMaya.dll!MPyUtil::valueToObject<Autodesk::Maya::OpenMaya20190000::MPlug>

    python27.dll!PyComplex_AsCComplex

    python27.dll!_PyObject_GenericGetAttrWithDict

    python27.dll!PyEval_EvalFrameEx

    python27.dll!PyEval_EvalCodeEx

    python27.dll!PyFunction_SetClosure

    python27.dll!PyObject_Call

    python27.dll!PyMethod_New

    python27.dll!PyObject_Call

    python27.dll!PyEval_CallObjectWithKeywords

    CommandEngine.dll!TpythonInterpreter::callPythonFunction

    OpenMaya.dll!THclient::operator=

    OpenMaya.dll!THclient::callbackInScriptNewAPI

    OpenMaya.dll!THclient::userAttributeChangedCallback

    OpenMaya.dll!THclient::userAttributeChangedCallbackCB

    Foundation.dll!TclientServer::notifyClients

    DependEngine.dll!TdependNode::sendAttributeChangedMsgInternal

    DependEngine.dll!TdrawDbChangeTracker::isListeningToSource

    DependEngine.dll!TevaluationManager::executeAsWorker
    DependEngine.dll!TparallelExecution::sAddNodesWithinDepthToSet

    tbb.dll!tbb::interface7::internal::task_arena_base::internal_execute

    DependEngine.dll!TparallelExecution::evaluateImplementation

    DependEngine.dll!TevaluationGraph::evaluate

    DependEngine.dll!TevaluationManager::evaluate

    DependEngine.dll!TcipEvaluationManager::operator=

    DependEngine.dll!TdependGraph::setTime

    OpenMayaAnim.dll!Autodesk::Maya::OpenMaya20190000::MAnimControl::setCurrentTime

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    redshift4maya.mll!uninitializePlugin

    OpenMaya.dll!THcommandObject::doIt

    CommandEngine.dll!TmetaCommand::doCommand

    CommandEngine.dll!TmetaCommandPtrArray::catenate

    Result: //192.168.1.6/Xsan/_PROJECTS/[PROJECT_PATH_REMOVED]/3D/Maya/scenes/sn05_Fingerprint/sn05_sh01_printHallway/sn05_sh02_PrintHallway_MusterTest_v02.ma

    Fatal Error. Attempting to save in C:/Users/pixel/AppData/Local/Temp/pixel.20200210.1521.ma

    // Maya exited with status -1073741819

    [MUSTER]Process terminated with exit code: -1073741819

    11th February 2020 at 6:12 pm #28954

    Sorry, I accidentally posted the muster log twice. I’m unable to post the redshift log, I think it may be too large, but I can send it if needed.

    11th February 2020 at 7:18 pm #28957

    Hi Alex,

    A few point to clarify here:

    1) Deadline workers are the same of Muster instances. While workers runs in separate threads due to their internal coding, we run instances under the same process with different connections. This gives in our opinion better handling of pools with different instances. In the end both options spawns multiple command lines so there’s no difference in your scenario

    2) To split GPUs in Redshift with multi instancing, you have two options: the first one is to not set the GPU affinity mask and set the number of GPUs per instances to 1 , in that way, Muster will automatically change command line to use one GPU per instance. Using an instance mask fits better in combination with the borrow instance feature, this is a more complex topic where you can have 4 instances but you want to send a job using only two instances and lock the others. In that way you can assign what GPU gets instance 1 , and what gets instance 2 , and lock 3/4. By the way your setup works good even in that way.

    About the crash, it may be RAM or GPU overhead. If you dig with the borrow instance option, you can send light jobs to 4 instances and heavy jobs to 2 instances with 2GPUs each one.

    Hope this helps!

    11th February 2020 at 8:36 pm #28960

    Thank you Leonardo! This is very helpful info!

    Could you explain more about point number 2?

    the first one is to not set the GPU affinity mask and set the number of GPUs per instances to 1

    Is this basically what my current setup is?

    I’d be interested to hear more about the borrow instance option. It think that might be a good option. How would this work with the combination of other machines with only two cards?

    Could you walk me through the setup for something like this?

    you can send light jobs to 4 instances and heavy jobs to 2 instances with 2GPUs each one.

    This seems pretty straight-forward and, in theory, it could also be set to 1 instance with 4GPUs in case we want to use the “old” method correct?

    Thanks again!

    11th February 2020 at 8:42 pm #28963

    Okay let’s make that simple. Disable the GPU mask and spawn 4 instances.

    – When you want to send light jobs, set number of GPUS to 4 , Muster will split them.

    – When you want to send medium jobs, set number of GPUS to 2 and borrow instances to 1 , that means the jobs will be sent to 2 instances only and each instance will borrow(lock) another one, each rendering instances will grab one additional GPU

    – When you want to send heaby jobs, set number of GPUS to 4 and borrow instances to 3 , and the job will go on the entire GPUs set

    11th February 2020 at 8:52 pm #28964

    Alright cool.

    Just to clarify, when you say “set the number of GPUS to 4” I should be setting the “borrow instances” to 4 in my job settings?
    Or where do I set the number of GPUs in the job?

    11th February 2020 at 8:56 pm #28965

    Oh wait, I think I see.

    Is it the minimum physical GPUs setting?

    11th February 2020 at 8:57 pm #28966

    Sorry I was not clear. If neither a GPU mask, neither settings in templates configuration specify a number of GPUs per process, they are allocated automatically dividing the available GPUS / instances. If you spawn 4 instances you get 1 GPU per instance.

    So the phrase should have been : When you want to send light jobs with 4 GPUS, Muster will split them…
    When you want to send medium jobs, set borrow instances to 1 and Muster will set 2 GPUS per instance…
    heavy jobs , borrow to 3 and Muster will set 4 GPUs per instance..

    11th February 2020 at 8:59 pm #28967

    no, minimum GPUs settings is a global filter. If you want to send jobs only to machines that got 4 cards, you set it to 4.

    11th February 2020 at 9:08 pm #28968

    Okay, I think this is all making sense. Let me run some tests on my end to see how this is working.

    11th February 2020 at 10:27 pm #28971

    I ran some tests using different combinations of the borrow instances feature.

    I did run into one main issue. If I set the borrow instances to 3, it only renders on the machine with 4GPUs/4 instances. It doesn’t render on any machines with less than 4 instances, which makes sense. But I don’t think that will work with our configuration.

    We might need to consider going back to the first option. Splitting each GPU into a separate instance with the GPU mask. I’m going to run a couple of hardware tests by adding more RAM to the crashing machines to see if that is causing the issue.

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic.