Can Step 3 be updated to allow you to pick up mid-generation? #130

jimsabo · 2022-09-13T00:21:59Z

jimsabo
Sep 13, 2022

I'm testing a really large batch, and ran out of disk space. Re-running step 3 cleared out the 6K files that had been generated and started over from scratch. It would be great to have a flag that says "don't do that, just pick up where you left off."

I'm also noting that step 2 isn't coming close to taxing my hardware-- if step 3 was smarter about not re-doing work, I could let step 2 run for a bit, then run step 3 in parallel, making whatever was ready to go. This would dramatically cut down how long it took to generate the total run.

A more complex solution for down the line would be to spawn step 3 as a separate thread that watched the generative sheet directory and processed whatever was ready. This would eliminate the need to do it manually, which would work better for overnight runs.

jalagar · 2022-09-13T03:46:35Z

jalagar
Sep 13, 2022
Maintainer

Hi @jimsabo thanks for your message. All your comments make sense, and I've actually faced this same issue while generating a large collection for a client. What I did is I wrote a custom script to basically first generate all 10k JSONs metadata, but only generate the first 1K gifs from those 10K. Then after uploading the files, I adjusted the editions so it would regenerate the next 1-2k, then 2-3k and so on. That way it wouldn't regenerate the JSON. However this script is not generalized and not ready for production, I might try to generalize it in a few weeks but no promises. I like the idea of parallelizing step3 while step 2 is being run, the tricky part is that step2 is in node, and step3 is in python. Probably could have a python script that spawns multiple processes (multithreading doesn't really work well in Python due to GIL).

Can you create an issue so we can track it https://github.com/jalagar/animated-art-engine/issues. Realistically I probably won't be able to implement a really good solution for a week or two while I am traveling but good to have it documented as an issue.

7 replies

jalagar Sep 16, 2022
Maintainer

Yea @reesepushkin @jimsabo that's the right approach and what I'm planning to implement when I get some more time.
If you are a coder, you could try replace batch.py with the following code

from step1_layers_to_spritesheet.build import main as step1_main
from step3_generative_sheet_to_output.build import main as step3_main
import subprocess
from utils.file import parse_global_config
import multiprocessing
import shutil
import os
global_config_json = parse_global_config()
num_total_frames = global_config_json["numberOfFrames"]
num_frames_per_batch = global_config_json["numFramesPerBatch"]
total_supply = global_config_json["totalSupply"]
use_multiprocessing = global_config_json["useMultiprocessing"]
processor_count = global_config_json["processorCount"]
start_index = global_config_json["startIndex"]
START_EDITION = 1
END_EDITION = 401


def create_from_dna(edition):
    subprocess.run(
        f"cd step2_spritesheet_to_generative_sheet && npm run create_from_dna {edition}",
        shell=True,
    )


def create_all_from_dna():
    if use_multiprocessing:
        if processor_count > multiprocessing.cpu_count():
            raise Exception(
                f"You are trying to use too many processors, you passed in {processor_count} "
                f"but your computer can only handle {multiprocessing.cpu_count()}. Change this value and run make step3 again."
            )

        args = [
            (edition,) for edition in range(START_EDITION, END_EDITION)
        ]
        with multiprocessing.Pool(processor_count) as pool:
            pool.starmap(
                create_from_dna,
                args,
            )
    else:
        # Then recreate DNA from the editions
        for edition in range(start_index, start_index + total_supply):
            create_from_dna(edition)


def main():
    if START_EDITION == 1:
        step1_main(0)
        subprocess.run(
            f"make step2",
            shell=True,
        )
    # generate DNA

    for i in range(num_total_frames // num_frames_per_batch):
        print(f"*******Starting Batch {i}*******")
        step1_main(i)
        # For first batch, run step2 normally to generate hashes
        shutil.rmtree("step2_spritesheet_to_generative_sheet/output/images")
        os.mkdir("step2_spritesheet_to_generative_sheet/output/images")
        create_all_from_dna()
        # Only generate gif if its the last batch
        step3_main(
            i,
            should_generate_output=i == (num_total_frames // num_frames_per_batch - 1),
        )


if __name__ == "__main__":
    main()

Then you can just replace START_EDITION and END_EDITION with the range of editions you want to generate, and it won't replace all the DNA or replace the gifs at the end.

I'll productionize this solution and make it more usable, and document it, but it's a hacky interim solution.

jimsabo Sep 16, 2022
Author

Thanks for saving me from figuring this out myself :)

jalagar Sep 17, 2022
Maintainer

Haha hope it works, not thoroughly tested

reesepushkin Sep 27, 2022

nicely done. will try it out

jalagar Sep 30, 2022
Maintainer

FYI in the latest repo, the code is in all.py instead of batch.py. I'm working on productionizing the code above and making it much easier and adding it in the readme, so I would probably wait if you want to batch it yourself.

jalagar · 2022-09-30T16:13:29Z

jalagar
Sep 30, 2022
Maintainer

@jimsabo @reesepushkin huge update, I have productionized the code above, and added documentation. Should be straight forward. Now the code will generate all the JSONs at once, then regenerate from the JSON based on the START_EDITION and END_EDITION. See the latest code and README:
https://github.com/jalagar/animated-art-engine#generate-entire-collection-in-parts

0 replies

jalagar · 2022-09-30T16:13:43Z

jalagar
Sep 30, 2022
Maintainer

@jimsabo @reesepushkin huge update, I have productionized the code above, and added documentation. Should be straight forward. Now the code will generate all the JSONs at once, then regenerate from the JSON based on the START_EDITION and END_EDITION. See the latest code and README:
https://github.com/jalagar/animated-art-engine#generate-entire-collection-in-parts

2 replies

reesepushkin Sep 30, 2022

Literally a legend

jalagar Sep 30, 2022
Maintainer

Haha the code isn't too bad, just a few lines! Appreciate it Reese :) hope it works for you

Can Step 3 be updated to allow you to pick up mid-generation? #130

Uh oh!

jimsabo Sep 13, 2022

Replies: 3 comments · 9 replies

Uh oh!

jalagar Sep 13, 2022 Maintainer

Uh oh!

jalagar Sep 16, 2022 Maintainer

Uh oh!

jimsabo Sep 16, 2022 Author

Uh oh!

Uh oh!

jalagar Sep 17, 2022 Maintainer

Uh oh!

reesepushkin Sep 27, 2022

Uh oh!

jalagar Sep 30, 2022 Maintainer

Uh oh!

jalagar Sep 30, 2022 Maintainer

Uh oh!

jalagar Sep 30, 2022 Maintainer

Uh oh!

reesepushkin Sep 30, 2022

Uh oh!

jalagar Sep 30, 2022 Maintainer

jimsabo
Sep 13, 2022

Replies: 3 comments 9 replies

jalagar
Sep 13, 2022
Maintainer

jalagar Sep 16, 2022
Maintainer

jimsabo Sep 16, 2022
Author

jalagar Sep 17, 2022
Maintainer

jalagar Sep 30, 2022
Maintainer

jalagar
Sep 30, 2022
Maintainer

jalagar
Sep 30, 2022
Maintainer

jalagar Sep 30, 2022
Maintainer