Skip to content
This repository was archived by the owner on Aug 15, 2019. It is now read-only.

Error when train a model "tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running" #262

@VicGrygorchyk

Description

@VicGrygorchyk

Hi! When I try a comman python3 faceswap.py train -A ./photo/fst -B ./photo/snd -m ./photo/models/ I got the problem (log below).
Might worth to mention, I complied tensorflow myself, as when I used pip install tensorflow I got tensorflow not found error.
The command extract works without problem.
Else, I had error can't find module named 'numpy.core._multiarray_umath, as mentioned is this issue #261, so I updated numpy via pip install (But there is a comment that numpy 16 is broken, I have 1.16.2 ).

crash_report.2019.03.03.222044908664.log 
03/03/2019 22:20:25 MainProcess     training_0      _base           load_generator            DEBUG    Loading generator: b
03/03/2019 22:20:25 MainProcess     training_0      _base           load_generator            DEBUG    input_size: 64, output_size: 64
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initializing TrainingDataGenerator: (model_input_size: 64, model_output_shape: 64, training_opts: {'alignments': {'a': '/home/faceswap/photo/fst/alignments.json', 'b': '/home/faceswap/photo/snd/alignments.json'}, 'preview_scaling': 1.0, 'no_flip': False, 'preview_images': 14, 'training_size': 256, 'coverage_ratio': 0.625, 'mask_type': None, 'warp_to_landmarks': False, 'no_logs': False}, landmarks: False)
03/03/2019 22:20:25 MainProcess     training_0      training_data   set_mask_function         DEBUG    Mask function: None
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initializing ImageManipulation: (input_size: 64, output_size: 64, coverage_ratio: 0.625)
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initialized ImageManipulation
03/03/2019 22:20:25 MainProcess     training_0      training_data   __init__                  DEBUG    Initialized TrainingDataGenerator
03/03/2019 22:20:25 MainProcess     training_0      training_data   minibatch_ab              DEBUG    Queue batches: (image_count: 960, batchsize: 64, side: 'b', do_shuffle: True, is_timelapse: False)
03/03/2019 22:20:25 MainProcess     training_0      queue_manager   add_queue                 DEBUG    QueueManager adding: (name: 'train_b', maxsize: 512)
03/03/2019 22:20:25 MainProcess     training_0      queue_manager   add_queue                 DEBUG    QueueManager added: (name: 'train_b')
03/03/2019 22:20:25 MainProcess     training_0      multithreading  __init__                  DEBUG    Initializing MultiThread: (target: 'load_batches', thread_count: 1)
03/03/2019 22:20:25 MainProcess     training_0      multithreading  __init__                  DEBUG    Initialized MultiThread: 'load_batches'
03/03/2019 22:20:25 MainProcess     training_0      multithreading  start                     DEBUG    Starting thread(s): 'load_batches'
03/03/2019 22:20:25 MainProcess     training_0      multithreading  start                     DEBUG    Starting th  File "/home/faceswap/scripts/train.py", line 97, in process
    self.end_thread(thread, err)
  File "/home/faceswap/scripts/train.py", line 122, in end_thread
    thread.join()
  File "/home/faceswap/lib/multithreading.py", line 179, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/faceswap/lib/multithreading.py", line 117, in run
    self._target(*self._args, **self._kwargs)
  File "/home/faceswap/scripts/train.py", line 148, in training
    raise err
  File "/home/faceswap/scripts/train.py", line 138, in training
    self.run_training_cycle(model, trainer)
  File "/home/faceswap/scripts/train.py", line 210, in run_training_cycle
    trainer.train_one_step(viewer, timelapse)
  File "/home/faceswap/plugins/train/trainer/_base.py", line 149, in train_one_step
    self.log_tensorboard(side, side_loss)
  File "/home/faceswap/plugins/train/trainer/_base.py", line 172, in log_tensorboard
    self.tensorboard[side].on_batch_end(self.model.state.iterations, logs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/callbacks_v1.py", line 362, in on_batch_end
    profiler.start()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/eager/profiler.py", line 70, in start
    raise ProfilerAlreadyRunningError('Another profiler is running.')
tensorflow.python.eager.profiler.ProfilerAlreadyRunningError: Another profiler is running.
PyWavelets==1.0.2
pyxdg==0.25
PyYAML==3.11
pyzmq==18.0.0
qtconsole==4.4.3
requests==2.9.1
scikit-image==0.14.2
scikit-learn==0.20.3
scipy==1.2.1
screen-resolution-extra==0.0.0
Send2Trash==1.5.0
six==1.12.0
ssh-import-id==5.5
sympy==1.3
system-service==0.3
tensorboard==1.13.0
tensorflow==1.13.1
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
toolz==0.9.0
tornado==5.1.1
tqdm==4.31.1
traitlets==4.3.2
ufw==0.35
unattended-upgrades==0.1
urllib3==1.13.1
virtualenv==16.4.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.14.1
widgetsnbextension==3.4.2
xkit==0.0.0(venv) 

Please, point me out what I'm doing wrong with setup. Should I use another tensorflow version here?
If it makes any value, before getting error I see such logs:

03/03/2019 22:20:20 INFO     Log level set to: INFO
Using TensorFlow backend.
03/03/2019 22:20:22 INFO     Model A Directory: /home/faceswap/photo/solo
03/03/2019 22:20:22 INFO     Model B Directory: /home/faceswap/photo/ford
03/03/2019 22:20:22 INFO     Training data directory: /home/faceswap/photo/models
03/03/2019 22:20:22 INFO     ===============================================
03/03/2019 22:20:22 INFO     - Starting                                    -
03/03/2019 22:20:22 INFO     - Press 'ENTER' to save and quit              -
03/03/2019 22:20:22 INFO     - Press 'S' to save model weights immediately -
03/03/2019 22:20:22 INFO     ===============================================
03/03/2019 22:20:23 INFO     Loading data, this may take a while...
03/03/2019 22:20:23 INFO     Loading Model from Original plugin...
03/03/2019 22:20:24 INFO     Loading config: '/home/faceswap/config/train.ini'
03/03/2019 22:20:24 WARNING  No existing state file found. Generating.
03/03/2019 22:20:25 WARNING  Failed loading existing training data. Generating new models
03/03/2019 22:20:25 INFO     Loading Trainer from Original plugin...
03/03/2019 22:20:25 INFO     Enabled TensorBoard Logging
03/03/2019 22:20:44 CRITICAL Error caught! Exiting...
03/03/2019 22:20:44 ERROR    Caught exception in thread: 'training_0'
03/03/2019 22:20:46 ERROR    Got Exception on main handler:
Traceback (most recent call last):

Thanks to anyone trying to help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions