Our submission returns the following runtime errors. It's related to CUDA multiprocessing. Can we check the submission server runs correctly?
Traceback (most recent call last):
File "/tmp/codalab/tmprWT7ov/run/program/evaluation.py", line 567, in <module>
auroc1 = run_mntd_crossval(trojan_model_dir, clean_model_dir, num_folds=5)
File "/tmp/codalab/tmprWT7ov/run/program/evaluation.py", line 466, in run_mntd_crossval
train_meta_network(meta_network, train_loader)
File "/tmp/codalab/tmprWT7ov/run/program/evaluation.py", line 396, in train_meta_network
for i, (net, label) in enumerate(train_loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 363, in __getitem__
return self.dataset[self.indices[idx]]
File "/tmp/codalab/tmprWT7ov/run/program/evaluation.py", line 363, in __getitem__
return torch.load(os.path.join(self.model_paths[index], 'model.pt')), self.labels[index]
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 607, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 882, in _load
result = unpickler.load()
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 857, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 846, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 157, in _cuda_deserialize
return obj.cuda(device)
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 71, in _cuda
with torch.cuda.device(device):
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 272, in __enter__
self.prev_idx = torch.cuda.current_device()
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 479, in current_device
_lazy_init()
File "/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py", line 205, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. To use CUDA with "
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Hello,
Sorry for the late reply. The problem was that networks were loading directly onto the GPU, which was causing errors in the loader. This error is fixed now. You should be able to resubmit the same zip file successfully.
All the best,
Mantas (TDC co-organizer)