CPU only inferencing is becoming a viable option for small quantized models with features such as Intel AMX and VNNI.
It would be nice to see how a inference engine can be tuned to get the maximal performance out of a CPU without requiring a GPU for small models.