Maybe there's a Python library for this? https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/sanitization_filter.rb