Artificial Neural Networks and Deep Learning 2023 - Homework 2 Forum

Go back to competition Back to thread list Post in this thread

> Running failed

Hi,
Could you enlighten me on this error that i got during the run of my model : WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
2023-12-18 10:39:16.453391: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-18 10:39:16.489249: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-18 10:39:16.489282: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-18 10:39:16.489310: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-18 10:39:16.496236: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-18 10:39:18.911228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22987 MB memory: -> device: 0, name: Quadro RTX 6000, pci bus id: 0000:96:00.0, compute capability: 7.5
Traceback (most recent call last):
File "/multiverse/storage/lattari/Prj/postdoc/Courses/AN2DL_2023/Competition2_running_dir/worker_gpu5_dir/tmp/codalab/tmpNYIpnP/run/program/score.py", line 129, in
M = model(submission_dir)
^^^^^^^^^^^^^^^^^^^^^
File "/multiverse/storage/lattari/Prj/postdoc/Courses/AN2DL_2023/Competition2_running_dir/worker_gpu5_dir/tmp/codalab/tmpNYIpnP/run/input/res/model.py", line 6, in __init__
self.model = tf.keras.models.load_model(os.path.join(path, 'conv_10'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/keras/src/saving/saving_api.py", line 262, in load_model
return legacy_sm_saving_lib.load_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/training/py_checkpoint_reader.py", line 45, in error_translator
raise errors_impl.OpError(None, None, error_message, errors_impl.UNKNOWN)
tensorflow.python.framework.errors_impl.OpError: /multiverse/storage/lattari/Prj/postdoc/Courses/AN2DL_2023/Competition2_running_dir/worker_gpu5_dir/tmp/codalab/tmpNYIpnP/run/input/res/conv_10/variables/variables.data-00000-of-00001; No such file or directory

The file exists and is located exactly where it is written. Is there something wront on your side with the Swap memory ?

Posted by: EliosCama @ Dec. 18, 2023, 10:44 a.m.

No problem with the swap memory. Double check the folder name and position.

Posted by: an2dl.competitions @ Dec. 19, 2023, 9:29 a.m.
Post in this thread