Skip to content

Commit 5bea47f

Browse files
authored
Merge pull request #373 from gjtorikian/incorporate-md-emoji
Use emoji from commonmarker
2 parents dcf38a2 + 549e19e commit 5bea47f

19 files changed

+515
-141
lines changed

Gemfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,16 @@ group :development do
2727
end
2828

2929
group :test do
30-
gem "commonmarker", "~> 1.0.0.pre4", require: false
30+
gem "commonmarker", "~> 1.0.0.pre7", require: false
3131
gem "gemoji", "~> 3.0", require: false
3232
gem "gemojione", "~> 4.3", require: false
33+
3334
gem "minitest"
3435

3536
gem "minitest-bisect", "~> 1.6"
3637

3738
gem "nokogiri", "~> 1.13"
3839

3940
gem "minitest-focus", "~> 1.1"
41+
gem "rouge", "~> 3.1", require: false
4042
end

README.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -230,19 +230,28 @@ end
230230

231231
For more information on how to write effective `NodeFilter`s, refer to the provided filters, and see the underlying lib, [Selma](https://www.github.com/gjtorikian/selma) for more information.
232232

233-
- `AbsoluteSourceFilter` - replace relative image urls with fully qualified versions
234-
- `EmojiFilter` - converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)!
235-
- `HttpsFilter` - Replacing http urls with https versions
236-
- `ImageMaxWidthFilter` - link to full size image for large images
237-
- `MentionFilter` - replace `@user` mentions with links
238-
- `SanitizationFilter` - allow sanitize user markup
239-
- `TableOfContentsFilter` - anchor headings with name attributes and generate Table of Contents html unordered list linking headings
240-
- `TeamMentionFilter` - replace `@org/team` mentions with links
233+
- `AbsoluteSourceFilter`: replace relative image urls with fully qualified versions
234+
- `EmojiFilter`: converts `:<emoji>:` to [emoji](http://www.emoji-cheat-sheet.com/)
235+
- (Note: the included `MarkdownFilter` will already convert emoji)
236+
- `HttpsFilter`: Replacing http urls with https versions
237+
- `ImageMaxWidthFilter`: link to full size image for large images
238+
- `MentionFilter`: replace `@user` mentions with links
239+
- `SanitizationFilter`: allow sanitize user markup
240+
- `SyntaxHighlightFilter`: applies syntax highlighting to `pre` blocks
241+
- (Note: the included `MarkdownFilter` will already apply highlighting)
242+
- `TableOfContentsFilter`: anchor headings with name attributes and generate Table of Contents html unordered list linking headings
243+
- `TeamMentionFilter`: replace `@org/team` mentions with links
241244

242245
## Dependencies
243246

244-
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem
245-
dependencies yourself.
247+
Since filters can be customized to your heart's content, gem dependencies are _not_ bundled; this project doesn't know which of the default filters you might use, and as such, you must bundle each filter's gem dependencies yourself.
248+
249+
For example, `SyntaxHighlightFilter` uses [rouge](https://github.com/jneen/rouge)
250+
to detect and highlight languages; to use the `SyntaxHighlightFilter`, you must add the following to your Gemfile:
251+
252+
```ruby
253+
gem "rouge"
254+
```
246255

247256
> **Note**
248257
> See the [Gemfile](/Gemfile) `:test` group for any version requirements.

UPGRADING.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ This project is now under a module called `HTMLPipeline`, not `HTML::Pipeline`.
1313
The following filters were removed:
1414

1515
- `AutolinkFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
16-
- `SyntaxHighlightFilter`: this is handled by [Commonmarker](https://www.github.com/gjtorikian/commonmarker) and can be disabled/enabled through the `MarkdownFilter`'s `context` hash
1716
- `SanitizationFilter`: this is handled by [Selma](https://www.github.com/gjtorikian/selma); configuration can be done through the `sanitization_config` hash
1817

1918
- `EmailReplyFilter`

lib/html_pipeline.rb

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,11 @@ def call(text, context: {}, result: {})
145145
context = context.freeze
146146
result ||= {}
147147

148-
payload = default_payload({ text_filters: @text_filters.map(&:name),
149-
context: context, result: result, })
148+
payload = default_payload({
149+
text_filters: @text_filters.map(&:name),
150+
context: context,
151+
result: result,
152+
})
150153
instrument("call_text_filters.html_pipeline", payload) do
151154
result[:output] =
152155
@text_filters.inject(text) do |doc, filter|
@@ -159,8 +162,11 @@ def call(text, context: {}, result: {})
159162
html = @convert_filter.call(text) unless @convert_filter.nil?
160163

161164
unless @node_filters.empty?
162-
payload = default_payload({ node_filters: @node_filters.map { |f| f.class.name },
163-
context: context, result: result, })
165+
payload = default_payload({
166+
node_filters: @node_filters.map { |f| f.class.name },
167+
context: context,
168+
result: result,
169+
})
164170
instrument("call_node_filters.html_pipeline", payload) do
165171
result[:output] = Selma::Rewriter.new(sanitizer: @sanitization_config, handlers: @node_filters).rewrite(html)
166172
end
@@ -178,8 +184,11 @@ def call(text, context: {}, result: {})
178184
#
179185
# Returns the result of the filter.
180186
def perform_filter(filter, doc, context: {}, result: {})
181-
payload = default_payload({ filter: filter.name,
182-
context: context, result: result, })
187+
payload = default_payload({
188+
filter: filter.name,
189+
context: context,
190+
result: result,
191+
})
183192
instrument("call_filter.html_pipeline", payload) do
184193
filter.call(doc, context: context, result: result)
185194
end
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# frozen_string_literal: true
2+
3+
HTMLPipeline.require_dependency("rouge", "SyntaxHighlightFilter")
4+
5+
class HTMLPipeline
6+
class NodeFilter
7+
# HTML Filter that syntax highlights text inside code blocks.
8+
#
9+
# Context options:
10+
#
11+
# :highlight => String represents the language to pick lexer. Defaults to empty string.
12+
# :scope => String represents the class attribute adds to pre element after.
13+
# Defaults to "highlight highlight-css" if highlights a css code block.
14+
#
15+
# This filter does not write any additional information to the context hash.
16+
class SyntaxHighlightFilter < NodeFilter
17+
def initialize(context: {}, result: {})
18+
super(context: context, result: result)
19+
# TODO: test the optionality of this
20+
@formatter = context[:formatter] || Rouge::Formatters::HTML.new
21+
end
22+
23+
SELECTOR = Selma::Selector.new(match_element: "pre", match_text_within: "pre")
24+
25+
def selector
26+
SELECTOR
27+
end
28+
29+
def handle_element(element)
30+
default = context[:highlight]&.to_s
31+
@lang = element["lang"] || default
32+
33+
scope = context.fetch(:scope, "highlight")
34+
35+
element["class"] = "#{scope} #{scope}-#{@lang}" if include_lang?
36+
end
37+
38+
def handle_text_chunk(text)
39+
return if @lang.nil?
40+
return if (lexer = lexer_for(@lang)).nil?
41+
42+
content = text.to_s
43+
44+
text.replace(highlight_with_timeout_handling(content, lexer), as: :html)
45+
end
46+
47+
def highlight_with_timeout_handling(text, lexer)
48+
Rouge.highlight(text, lexer, @formatter)
49+
rescue Timeout::Error => _e
50+
text
51+
end
52+
53+
def lexer_for(lang)
54+
Rouge::Lexer.find(lang)
55+
end
56+
57+
def include_lang?
58+
!@lang.nil? && !@lang.empty?
59+
end
60+
end
61+
end
62+
end

lib/html_pipeline/node_filter/table_of_contents_filter.rb

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@ class NodeFilter
2424
# result[:output].to_s
2525
# # => "<h1>\n<a id=\"ice-cube\" class=\"anchor\" href=\"#ice-cube\">..."
2626
class TableOfContentsFilter < NodeFilter
27-
SELECTOR = Selma::Selector.new(match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
28-
match_text_within: "h1, h2, h3, h4, h5, h6")
27+
SELECTOR = Selma::Selector.new(
28+
match_element: "h1 a[href], h2 a[href], h3 a[href], h4 a[href], h5 a[href], h6 a[href]",
29+
match_text_within: "h1, h2, h3, h4, h5, h6",
30+
)
2931

3032
def selector
3133
SELECTOR

lib/html_pipeline/sanitization_filter.rb

Lines changed: 135 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,70 @@ class SanitizationFilter
1616
# The main sanitization allowlist. Only these elements and attributes are
1717
# allowed through by default.
1818
DEFAULT_CONFIG = Selma::Sanitizer::Config.freeze_config({
19-
elements: ["h1", "h2", "h3", "h4", "h5", "h6", "br", "b", "i", "strong", "em", "a", "pre", "code",
20-
"img", "tt", "div", "ins", "del", "sup", "sub", "p", "picture", "ol", "ul", "table", "thead", "tbody", "tfoot",
21-
"blockquote", "dl", "dt", "dd", "kbd", "q", "samp", "var", "hr", "ruby", "rt", "rp", "li", "tr", "td", "th",
22-
"s", "strike", "summary", "details", "caption", "figure", "figcaption", "abbr", "bdo", "cite",
23-
"dfn", "mark", "small", "source", "span", "time", "wbr",],
19+
elements: [
20+
"h1",
21+
"h2",
22+
"h3",
23+
"h4",
24+
"h5",
25+
"h6",
26+
"br",
27+
"b",
28+
"i",
29+
"strong",
30+
"em",
31+
"a",
32+
"pre",
33+
"code",
34+
"img",
35+
"tt",
36+
"div",
37+
"ins",
38+
"del",
39+
"sup",
40+
"sub",
41+
"p",
42+
"picture",
43+
"ol",
44+
"ul",
45+
"table",
46+
"thead",
47+
"tbody",
48+
"tfoot",
49+
"blockquote",
50+
"dl",
51+
"dt",
52+
"dd",
53+
"kbd",
54+
"q",
55+
"samp",
56+
"var",
57+
"hr",
58+
"ruby",
59+
"rt",
60+
"rp",
61+
"li",
62+
"tr",
63+
"td",
64+
"th",
65+
"s",
66+
"strike",
67+
"summary",
68+
"details",
69+
"caption",
70+
"figure",
71+
"figcaption",
72+
"abbr",
73+
"bdo",
74+
"cite",
75+
"dfn",
76+
"mark",
77+
"small",
78+
"source",
79+
"span",
80+
"time",
81+
"wbr",
82+
],
2483

2584
attributes: {
2685
"a" => ["href"],
@@ -31,13 +90,77 @@ class SanitizationFilter
3190
"ins" => ["cite"],
3291
"q" => ["cite"],
3392
"source" => ["srcset"],
34-
all: ["abbr", "accept", "accept-charset", "accesskey", "action", "align", "alt", "aria-describedby",
35-
"aria-hidden", "aria-label", "aria-labelledby", "axis", "border", "char",
36-
"charoff", "charset", "checked", "clear", "cols", "colspan", "compact", "coords", "datetime", "dir",
37-
"disabled", "enctype", "for", "frame", "headers", "height", "hreflang", "hspace", "id", "ismap", "label", "lang",
38-
"maxlength", "media", "method", "multiple", "name", "nohref", "noshade", "nowrap", "open", "progress",
39-
"prompt", "readonly", "rel", "rev", "role", "rows", "rowspan", "rules", "scope", "selected", "shape",
40-
"size", "span", "start", "summary", "tabindex", "title", "type", "usemap", "valign", "value", "width", "itemprop",],
93+
all: [
94+
"abbr",
95+
"accept",
96+
"accept-charset",
97+
"accesskey",
98+
"action",
99+
"align",
100+
"alt",
101+
"aria-describedby",
102+
"aria-hidden",
103+
"aria-label",
104+
"aria-labelledby",
105+
"axis",
106+
"border",
107+
"char",
108+
"charoff",
109+
"charset",
110+
"checked",
111+
"clear",
112+
"cols",
113+
"colspan",
114+
"compact",
115+
"coords",
116+
"datetime",
117+
"dir",
118+
"disabled",
119+
"enctype",
120+
"for",
121+
"frame",
122+
"headers",
123+
"height",
124+
"hreflang",
125+
"hspace",
126+
"id",
127+
"ismap",
128+
"label",
129+
"lang",
130+
"maxlength",
131+
"media",
132+
"method",
133+
"multiple",
134+
"name",
135+
"nohref",
136+
"noshade",
137+
"nowrap",
138+
"open",
139+
"progress",
140+
"prompt",
141+
"readonly",
142+
"rel",
143+
"rev",
144+
"role",
145+
"rows",
146+
"rowspan",
147+
"rules",
148+
"scope",
149+
"selected",
150+
"shape",
151+
"size",
152+
"span",
153+
"start",
154+
"summary",
155+
"tabindex",
156+
"title",
157+
"type",
158+
"usemap",
159+
"valign",
160+
"value",
161+
"width",
162+
"itemprop",
163+
],
41164
},
42165
protocols: {
43166
"a" => { "href" => Selma::Sanitizer::Config::VALID_PROTOCOLS }.freeze,

lib/html_pipeline/version.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# frozen_string_literal: true
22

33
class HTMLPipeline
4-
VERSION = "3.0.0.pre1"
4+
VERSION = "3.0.0.pre2"
55
end

0 commit comments

Comments
 (0)