There have been published reports - including by the openbench team - that the predictive accuracy that is achievable by a learner on top of Morgan fingerprints increases / decreases proportional to the number of features that are generated.
I believe it would be a valuable experiment to compare the performance of the learner on top of ecfp4 embeddings across five different sizes (size of the smallest embedding from the other models, 1024, 2048, 4096, 8192).