Many operations could be made faster. 1. Use vectorized numpy operations instead of python lists 2. Process data in parallel (such as keyword generation)