/usr/local/lib/python3.10/dist-packages/horizon_plugin_pytorch/torch_patch.py:13: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _register_pytree_node(slice, _slice_flatten, _slice_unflatten)
`aidisdk` dependency is not available.
`aidisdk` dependency is not available.
INFO:hat.engine.ddp_trainer:Launch with rank: 7 world_size: None hostname: OE-J6-GPU-3-0-22 dist_url: tcp://localhost:10763 num_devices: 8 num_processes: 8
INFO:hat.engine.ddp_trainer:Launch with rank: 5 world_size: None hostname: OE-J6-GPU-3-0-22 dist_url: tcp://localhost:10763 num_devices: 8 num_processes: 8
INFO:hat.engine.ddp_trainer:Launch with rank: 0 world_size: None hostname: OE-J6-GPU-3-0-22 dist_url: tcp://localhost:10763 num_devices: 8 num_processes: 8
[W Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
W1213 15:51:45.703000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 173 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 174 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 175 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 176 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 177 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 179 via signal SIGTERM
W1213 15:51:45.704000 140551106629632 torch/multiprocessing/spawn.py:145] Terminating process 180 via signal SIGTERM
ERROR:__main__:train failed!

-- Process 5 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 75, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.10/dist-packages/hat/engine/ddp_trainer.py", line 505, in _main_func
    torch.cuda.set_device(local_rank % num_devices)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 399, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Traceback (most recent call last):
  File "/open_explorer/samples/ai_toolchain/horizon_model_train_sample/scripts/tools/train.py", line 287, in <module>
    raise e
  File "/open_explorer/samples/ai_toolchain/horizon_model_train_sample/scripts/tools/train.py", line 273, in <module>
    train(
  File "/open_explorer/samples/ai_toolchain/horizon_model_train_sample/scripts/tools/train.py", line 254, in train
    launch(
  File "/usr/local/lib/python3.10/dist-packages/hat/engine/ddp_trainer.py", line 426, in launch
    mp.spawn(
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 281, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 237, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 188, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 75, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.10/dist-packages/hat/engine/ddp_trainer.py", line 505, in _main_func
    torch.cuda.set_device(local_rank % num_devices)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 399, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.