Skip to content

Commit 19ec326

Browse files
authored
SOLR-17979 Improve changes2html.py for authors and PR detection (#3831)
- Authors with url and github nick handled - Plain PR ref `#123` detected as PR#123 with github link - Correct a changelog yaml missing JIRA issue - Fix links in dev-docs/changelog.adoc - Describe logchangeArchive task. - Remove mention of Perl as a requirement for build
1 parent 7f635a9 commit 19ec326

File tree

5 files changed

+243
-33
lines changed

5 files changed

+243
-33
lines changed

changelog/v9.10.0/SOLR-17619 Use logchange for changelog management.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ authors:
55
- name: Jan Høydahl
66
nick: janhoy
77
url: https://home.apache.org/phonebook.html?uid=janhoy
8+
links:
9+
- name: SOLR-17619
10+
url: https://issues.apache.org/jira/browse/SOLR-17619

dev-docs/changelog.adoc

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ solr/
3737

3838
== 3. The YAML format
3939

40-
Below is an example of a changelog yaml fragment. The full yaml format is xref:https://logchange.dev/tools/logchange/reference/#tasks[documented here], but we normally only need `title`, `type`, `authors` and `links`. For a change without a JIRA, you can add the PR number in `issues`:
40+
Below is an example of a changelog yaml fragment. The full yaml format is https://logchange.dev/tools/logchange/reference/#yaml-entry-format[documented here], but we normally only need `title`, `type`, `authors` and `links`. For a change without a JIRA, you can add the PR number in `issues`:
4141

4242
[source, yaml]
4343
----
@@ -120,8 +120,13 @@ The logchange gradle plugin offers some tasks, here are the two most important:
120120

121121
| `logchangeRelease`
122122
| Creates a new changelog release by moving files from `changelog/unreleased/` directory to `changelog/vX.Y.Z` directory
123+
124+
| `logchangeArchive`
125+
| Archives the list of released versions up to (and including) the specified version by transferring their summaries to the `archive.md` file, merging all existing archives, and deleting the corresponding version directories.
123126
|===
124127

128+
The `logchangeRelease` and `logchangeGenerate` tasks are used by ReleaseWizard. The `logchangeArchive` task can be ran once for every major release or when the number of versioned changelog folders grow too large.
129+
125130
These are integrated in the Release Wizard.
126131

127132
=== 6.2 Migration tool
@@ -242,5 +247,5 @@ Example report output (Json or Markdown):
242247

243248
== 7. Further Reading
244249

245-
* xref:https://github.com/logchange/logchange[Logchange web page]
246-
* xref:https://keepachangelog.com/en/1.1.0/[keepachangelog.com website]
250+
* https://github.com/logchange/logchange[Logchange web page]
251+
* https://keepachangelog.com/en/1.1.0/[keepachangelog.com website]

dev-docs/how-to-contribute.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ In order to make a new contribution to Solr you will use the fork you have creat
3333
1. Create a new Jira issue in the Solr project: https://issues.apache.org/jira/projects/SOLR/issues
3434
2. Create a new branch in your Solr fork to provide a PR for your contribution on the newly created issue. Make any necessary changes for the given bug/feature in that branch. You can use additional information in these dev-docs to build and test your code as well as ensure it passes code quality checks.
3535
3. Once you are satisfied with your changes, get your branch ready for a PR by running `./gradlew tidy updateLicenses check -x test`. This will format your source code, update licenses of any dependency version changes and run all pre-commit tests. Commit the changes.
36-
* Note: the `check` command requires `perl` and `python3` to be present on your `PATH` to validate documentation.
36+
* Note: the `check` command requires `python3` to be present on your `PATH` to validate documentation.
3737
4. Open a PR of your branch against the `main` branch of the apache/solr repository. When you open a PR on your fork, this should be the default option.
3838
* The title of your PR should include the Solr Jira issue that you opened, i.e. `SOLR-12345: New feature`.
3939
* The PR description will automatically populate with a pre-set template that you will need to fill out.

dev-docs/solr-source-code.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ To build the documentation, type `./gradlew -p solr documentation`.
3434

3535
`./gradlew check` will assemble Solr and run all validation tasks unit tests.
3636

37-
NOTE: the `check` command requires `perl` and `python3` to be present on your `PATH` to validate documentation.
37+
NOTE: the `check` command requires `python3` to be present on your `PATH` to validate documentation.
3838

3939
To build the final Solr artifacts run `./gradlew assemble`.
4040

gradle/documentation/changes-to-html/changes2html.py

Lines changed: 230 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -138,40 +138,232 @@ def __init__(self, title="Solr Changelog"):
138138
self.GITHUB_ISSUE_PREFIX, 'GITHUB#{0}')
139139
]
140140

141-
def extract_issue_from_text(self, text):
141+
def _format_issue_link(self, url_prefix, issue_id, label):
142+
"""Format a single issue reference as an HTML anchor tag"""
143+
return f'<a href="{url_prefix}{issue_id}">{label}</a>'
144+
145+
def _extract_markdown_issue(self, text):
142146
"""
143-
Extract the first JIRA/GitHub issue from markdown text.
144-
Returns (issue_link_html, text_without_issue)
147+
Extract markdown-formatted JIRA/GitHub issues like [SOLR-123](url) or [PR#123](url).
148+
Returns (issue_link_html, text_without_issue) or (None, text) if not found.
145149
"""
146150
for pattern, url_prefix, label_fmt in self.issue_patterns:
147151
match = re.search(pattern, text)
148152
if match:
149153
issue_id = match.group(1)
150154
label = label_fmt.format(issue_id)
151-
issue_html = f'<a href="{url_prefix}{issue_id}">{label}</a>'
155+
issue_html = self._format_issue_link(url_prefix, issue_id, label)
152156
text_without = (text[:match.start()] + text[match.end():]).strip()
153157
return issue_html, text_without
158+
154159
return None, text
155160

161+
def _extract_plain_pr_references(self, text):
162+
"""
163+
Extract plain GitHub PR references like #123 or #123 #456.
164+
Only matches PRs that appear before the author list (before opening paren or at end).
165+
Returns (issue_link_html, text_without_issue) or (None, text) if not found.
166+
"""
167+
# Pattern: #\d+ optionally followed by more #\d+ before opening paren or end of string
168+
pattern = r'#(\d+)(?:\s+#(\d+))*\s*(?=\(|$)'
169+
match = re.search(pattern, text)
170+
171+
if not match:
172+
return None, text
173+
174+
# Extract all PR numbers from the matched text
175+
pr_numbers = re.findall(r'#(\d+)', match.group(0))
176+
if not pr_numbers:
177+
return None, text
178+
179+
# Format each PR as an HTML link and join with commas
180+
pr_links = [self._format_issue_link(self.GITHUB_PR_PREFIX, pr_num, f'PR#{pr_num}')
181+
for pr_num in pr_numbers]
182+
issue_html = ', '.join(pr_links)
183+
184+
# Remove the PR references from the text
185+
text_without = (text[:match.start()] + text[match.end():]).strip()
186+
return issue_html, text_without
187+
188+
def extract_issue_from_text(self, text):
189+
"""
190+
Extract the first issue reference from text.
191+
Tries in order: markdown JIRA/GitHub issues, plain GitHub PR references.
192+
Returns (issue_link_html, text_without_issue) or (None, text) if not found.
193+
"""
194+
# Try markdown-formatted issues first
195+
issue_html, text_without = self._extract_markdown_issue(text)
196+
if issue_html:
197+
return issue_html, text_without
198+
199+
# Fall back to plain GitHub PR references
200+
return self._extract_plain_pr_references(text)
201+
202+
def _format_single_author(self, author_text):
203+
"""
204+
Format a single author entry to HTML.
205+
Supports:
206+
- Plain name: "Jan Høydahl" -> "Jan Høydahl"
207+
- Markdown link: "[Jan Høydahl](url)" -> "<a href=\"url\">Jan Høydahl</a>"
208+
- Name with GitHub: "Jan Høydahl @janhoy" -> "<a href=\"https://github.com/janhoy\">Jan Høydahl</a>"
209+
- Link with GitHub: "[Jan Høydahl](url) @janhoy" -> "<a href=\"url\">Jan Høydahl</a> <a href=\"https://github.com/janhoy\">@janhoy</a>"
210+
"""
211+
author_text = author_text.strip()
212+
213+
# Extract markdown link: [text](url)
214+
markdown_link_match = re.search(r'\[([^\]]+)\]\(([^)]+)\)', author_text)
215+
# Extract GitHub handle: @username
216+
github_match = re.search(r'@(\w+)', author_text)
217+
218+
if markdown_link_match:
219+
# Has markdown link
220+
link_text = markdown_link_match.group(1)
221+
link_url = markdown_link_match.group(2)
222+
html = f'<a href="{link_url}">{self.escape_html(link_text)}</a>'
223+
224+
if github_match:
225+
# Has both markdown link and GitHub handle
226+
github_handle = github_match.group(1)
227+
html += f' <a href="https://github.com/{github_handle}">@{github_handle}</a>'
228+
229+
return html
230+
elif github_match:
231+
# Has GitHub handle but no markdown link - extract name and link it to GitHub
232+
github_handle = github_match.group(1)
233+
# Remove the @handle part to get just the name
234+
name = author_text.replace(f'@{github_handle}', '').strip()
235+
return f'<a href="https://github.com/{github_handle}">{self.escape_html(name)}</a>'
236+
else:
237+
# Plain name with no links
238+
return self.escape_html(author_text)
239+
240+
def _extract_one_author_group(self, text, start_pos):
241+
"""
242+
Extract one author group starting from start_pos (pointing to an opening paren).
243+
Returns (author_content, end_pos) or (None, start_pos) if no valid group.
244+
Handles markdown links [text](url) inside the group.
245+
"""
246+
if start_pos >= len(text) or text[start_pos] != '(':
247+
return None, start_pos
248+
249+
paren_depth = 0
250+
bracket_depth = 0
251+
content = []
252+
253+
for i in range(start_pos, len(text)):
254+
char = text[i]
255+
256+
# Track brackets to know if we're inside [text]
257+
if char == '[' and bracket_depth >= 0:
258+
bracket_depth += 1
259+
elif char == ']' and bracket_depth > 0:
260+
bracket_depth -= 1
261+
# Only track paren depth outside brackets
262+
elif bracket_depth == 0:
263+
if char == '(':
264+
paren_depth += 1
265+
elif char == ')':
266+
paren_depth -= 1
267+
if paren_depth == 0:
268+
# Found matching closing paren
269+
return ''.join(content[1:]).strip(), i # Skip opening paren
270+
271+
content.append(char)
272+
273+
return None, start_pos
274+
156275
def extract_authors(self, text):
157-
"""Extract authors from trailing parentheses"""
158-
# Match (author1) (author2) ... at the end
159-
match = re.search(r'\s*(\([^)]+(?:\)\s*\([^)]+)*\))\s*$', text)
160-
if match:
161-
authors_text = match.group(1)
162-
text_without_authors = text[:match.start()].strip()
163-
164-
# Parse individual authors
165-
authors = re.findall(r'\(([^)]+)\)', authors_text)
166-
authors_list = []
167-
for author_group in authors:
168-
# Split by comma or "and"
169-
for author in re.split(r',\s*|\s+and\s+', author_group):
276+
"""Extract authors from trailing parentheses, handling markdown links [text](url)"""
277+
authors_list = []
278+
279+
# Find all author groups at the end of the text
280+
# Work backwards from the end to find opening parentheses
281+
i = len(text) - 1
282+
283+
# Skip trailing whitespace
284+
while i >= 0 and text[i] in ' \t\n\r':
285+
i -= 1
286+
287+
if i < 0 or text[i] != ')':
288+
return None, text
289+
290+
# Find all complete author groups by working backwards
291+
author_positions = [] # List of (start, end) positions
292+
293+
while i >= 0:
294+
if text[i] == ')':
295+
# Find the matching opening paren for this closing paren
296+
paren_depth = 1
297+
bracket_depth = 0
298+
j = i - 1
299+
300+
while j >= 0 and paren_depth > 0:
301+
char = text[j]
302+
303+
# Track brackets
304+
if char == ']':
305+
bracket_depth += 1
306+
elif char == '[':
307+
bracket_depth -= 1
308+
# Track parens outside brackets
309+
elif bracket_depth == 0:
310+
if char == ')':
311+
paren_depth += 1
312+
elif char == '(':
313+
paren_depth -= 1
314+
315+
j -= 1
316+
317+
if paren_depth == 0:
318+
# Found matching opening paren at j+1
319+
start_pos = j + 1
320+
321+
# Check if this is part of a markdown link [text](url)
322+
# Markdown links have ] immediately before the (
323+
if start_pos > 0 and text[start_pos - 1] == ']':
324+
# This is a markdown link URL, not an author group
325+
# Continue searching backwards
326+
i = j
327+
else:
328+
# This is an author group
329+
author_positions.insert(0, (start_pos, i))
330+
331+
# Move past this group
332+
i = j
333+
334+
# Skip whitespace before next potential group
335+
while i >= 0 and text[i] in ' \t\n\r':
336+
i -= 1
337+
338+
# Check if there's another author group right before
339+
if i >= 0 and text[i] != ')':
340+
# No more author groups
341+
break
342+
else:
343+
break
344+
else:
345+
break
346+
347+
# Now process the found author groups
348+
if author_positions:
349+
# Extract text before first author group
350+
first_start = author_positions[0][0]
351+
text_without_authors = text[:first_start].strip()
352+
353+
# Extract and format each author group
354+
for start_pos, end_pos in author_positions:
355+
author_content = text[start_pos + 1:end_pos]
356+
357+
# Split by comma or "and" for multiple authors in one group
358+
for author in re.split(r',\s*|\s+and\s+', author_content):
170359
author = author.strip()
171360
if author:
172-
authors_list.append(author)
361+
formatted_author = self._format_single_author(author)
362+
authors_list.append(formatted_author)
363+
364+
if authors_list:
365+
return authors_list, text_without_authors
173366

174-
return authors_list, text_without_authors
175367
return None, text
176368

177369
def format_changelog_item(self, item_text):
@@ -183,17 +375,27 @@ def format_changelog_item(self, item_text):
183375
# Extract the issue
184376
issue_html, text_after_issue = self.extract_issue_from_text(item_text)
185377

186-
if not issue_html:
187-
return self.linkify_remaining_text(item_text)
378+
# Always try to extract authors, whether or not we found an issue
379+
authors_list, description = self.extract_authors(text_after_issue if issue_html else item_text)
188380

189-
# Extract authors and clean description
190-
authors_list, description = self.extract_authors(text_after_issue)
191-
description = re.sub(r'^[:\s]+', '', description).strip()
192-
193-
# Build HTML
194-
html = f'{issue_html}: {self.escape_html(description)}'
381+
if issue_html:
382+
# We have an issue link
383+
description = re.sub(r'^[:\s]+', '', description).strip()
384+
html = f'{issue_html}: {self.escape_html(description)}'
385+
else:
386+
# No issue link found
387+
if authors_list:
388+
# We have authors but no issue - just use the description part
389+
html = self.escape_html(description)
390+
else:
391+
# No issue and no authors - linkify the full text
392+
return self.linkify_remaining_text(item_text)
393+
394+
# Add authors if we have them
195395
if authors_list:
196-
html += f'<br /><span class="attrib">({self.escape_html(", ".join(authors_list))})</span>'
396+
# Authors are already formatted as HTML, don't escape
397+
html += f'<br /><span class="attrib">({", ".join(authors_list)})</span>'
398+
197399
return html
198400

199401
def linkify_remaining_text(self, text):

0 commit comments

Comments
 (0)