Skip to content

Commit d572eeb

Browse files
alambwesmadriangbakurmustafazhuqi-lucas
authored
Add explicit PMC/committers list to governance docs page (#17574)
* Add committers explicitly to governance page, with script * add license header * Update Wes McKinney's affiliation in governance.md * Update adriangb's affiliation * Update affiliation * Andy Grove Affiliation * Update Qi Zhu affiliation * Updatd linwei's info * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md * Apply suggestions from code review Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> * Apply suggestions from code review Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> * Apply suggestions from code review Co-authored-by: Yijie Shen <[email protected]> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> * Apply suggestions from code review Co-authored-by: Will Jones <[email protected]> * Clarify what is updated in the script * Apply suggestions from code review Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md Co-authored-by: Parth Chandra <[email protected]> * Update docs/source/contributor-guide/governance.md * prettier --------- Co-authored-by: Wes McKinney <[email protected]> Co-authored-by: Adrian Garcia Badaracco <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Qi Zhu <[email protected]> Co-authored-by: 张林伟 <[email protected]> Co-authored-by: xudong.w <[email protected]> Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> Co-authored-by: Yijie Shen <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: Parth Chandra <[email protected]>
1 parent b9517a1 commit d572eeb

File tree

2 files changed

+344
-4
lines changed

2 files changed

+344
-4
lines changed
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
#!/usr/bin/env python3
2+
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
21+
"""
22+
Utility for updating the committer list in the governance documentation
23+
by reading from the Apache DataFusion phonebook and combining with existing data.
24+
"""
25+
26+
import re
27+
import requests
28+
import sys
29+
import os
30+
from typing import Dict, List, NamedTuple, Set
31+
32+
33+
class Committer(NamedTuple):
34+
name: str
35+
apache: str
36+
github: str
37+
affiliation: str
38+
role: str
39+
40+
41+
# Return (pmc, committers) each a dictionary like
42+
# key: apache id
43+
# value: Real name
44+
45+
def get_asf_roster():
46+
"""Get the current roster from Apache phonebook."""
47+
# See https://home.apache.org/phonebook-about.html
48+
committers_url = "https://whimsy.apache.org/public/public_ldap_projects.json"
49+
50+
# people https://whimsy.apache.org/public/public_ldap_people.json
51+
people_url = "https://whimsy.apache.org/public/public_ldap_people.json"
52+
53+
try:
54+
r = requests.get(committers_url)
55+
r.raise_for_status()
56+
j = r.json()
57+
proj = j['projects']['datafusion']
58+
59+
# Get PMC members and committers
60+
pmc_ids = set(proj['owners'])
61+
committer_ids = set(proj['members']) - pmc_ids
62+
63+
except Exception as e:
64+
print(f"Error fetching ASF roster: {e}")
65+
return set(), set()
66+
67+
# Fetch people to get github handles and affiliations
68+
#
69+
# The data looks like this:
70+
# {
71+
# "lastCreateTimestamp": "20250913131506Z",
72+
# "people_count": 9932,
73+
# "people": {
74+
# "a_budroni": {
75+
# "name": "Alessandro Budroni",
76+
# "createTimestamp": "20160720223917Z"
77+
# },
78+
# ...
79+
# }
80+
try:
81+
r = requests.get(people_url)
82+
r.raise_for_status()
83+
j = r.json()
84+
people = j['people']
85+
86+
# make a dictionary with each pmc_id and value their real name
87+
pmcs = {p: people[p]['name'] for p in pmc_ids}
88+
committers = {c: people[c]['name'] for c in committer_ids}
89+
90+
except Exception as e:
91+
print(f"Error fetching ASF people: {e}")
92+
93+
94+
return pmcs, committers
95+
96+
97+
98+
def parse_existing_table(content: str) -> List[Committer]:
99+
"""Parse the existing committer table from the markdown content."""
100+
committers = []
101+
102+
# Find the table between the markers
103+
start_marker = "<!-- Begin Auto-Generated Committer List -->"
104+
end_marker = "<!-- End Auto-Generated Committer List -->"
105+
106+
start_idx = content.find(start_marker)
107+
end_idx = content.find(end_marker)
108+
109+
if start_idx == -1 or end_idx == -1:
110+
return committers
111+
112+
table_content = content[start_idx:end_idx]
113+
114+
# Parse table rows (skip header and separator)
115+
lines = table_content.split('\n')
116+
for line in lines:
117+
line = line.strip()
118+
if line.startswith('|') and '---' not in line and line.count('|') >= 4:
119+
# Split by | and clean up
120+
parts = [part.strip() for part in line.split('|')]
121+
if len(parts) >= 5:
122+
name = parts[1].strip()
123+
apache = parts[2].strip()
124+
github = parts[3].strip()
125+
affiliation = parts[4].strip()
126+
role = parts[5].strip()
127+
128+
if name and name != 'Name' and (not '-----' in name):
129+
committers.append(Committer(name, apache, github, affiliation, role))
130+
131+
return committers
132+
133+
134+
def generate_table_row(committer: Committer) -> str:
135+
"""Generate a markdown table row for a committer."""
136+
github_link = f"[{committer.github}](https://github.com/{committer.github})"
137+
return f"| {committer.name:<23} | {committer.apache:<39} |{committer.github:<39} | {committer.affiliation:<11} | {committer.role:<9} |"
138+
139+
140+
def sort_committers(committers: List[Committer]) -> List[Committer]:
141+
"""Sort committers by role ('PMC Chair', PMC, Committer) then by apache id."""
142+
role_order = {'PMC Chair': 0, 'PMC': 1, 'Committer': 2}
143+
144+
return sorted(committers, key=lambda c: (role_order.get(c.role, 3), c.apache.lower()))
145+
146+
147+
def update_governance_file(file_path: str):
148+
"""Update the governance file with the latest committer information."""
149+
try:
150+
with open(file_path, 'r') as f:
151+
content = f.read()
152+
except FileNotFoundError:
153+
print(f"Error: File {file_path} not found")
154+
return False
155+
156+
# Parse existing committers
157+
existing_committers = parse_existing_table(content)
158+
print(f"Found {len(existing_committers)} existing committers")
159+
160+
# Get ASF roster
161+
asf_pmcs, asf_committers = get_asf_roster()
162+
print(f"Found {len(asf_pmcs)} PMCs and {len(asf_committers)} committers in ASF roster")
163+
164+
165+
# Create a map of existing committers by apache id
166+
existing_by_apache = {c.apache: c for c in existing_committers}
167+
168+
# Update the entries based on the ASF roster
169+
updated_committers = []
170+
for apache_id, name in {**asf_pmcs, **asf_committers}.items():
171+
role = 'PMC' if apache_id in asf_pmcs else 'Committer'
172+
if apache_id in existing_by_apache:
173+
existing = existing_by_apache[apache_id]
174+
# Preserve PMC Chair role if already set
175+
if existing.role == 'PMC Chair':
176+
role = 'PMC Chair'
177+
updated_committers.append(Committer(
178+
name=existing.name,
179+
apache=apache_id,
180+
github=existing.github,
181+
affiliation=existing.affiliation,
182+
role=role
183+
))
184+
# add a new entry for new committers with placeholder values
185+
else:
186+
print(f"New entry found: {name} ({apache_id})")
187+
# Placeholder github and affiliation
188+
updated_committers.append(Committer(
189+
name=name,
190+
apache=apache_id,
191+
github="", # user should update
192+
affiliation="", # User should update
193+
role=role
194+
))
195+
196+
197+
# Sort the committers
198+
sorted_committers = sort_committers(updated_committers)
199+
200+
# Generate new table
201+
table_lines = [
202+
"| Name | Apache ID | github | Affiliation | Role |",
203+
"|-------------------------|-----------|----------------------------|-------------|-----------|"
204+
]
205+
206+
for committer in sorted_committers:
207+
table_lines.append(generate_table_row(committer))
208+
209+
new_table = '\n'.join(table_lines)
210+
211+
# Replace the table in the content
212+
start_marker = "<!-- Begin Auto-Generated Committer List -->"
213+
end_marker = "<!-- End Auto-Generated Committer List -->"
214+
215+
start_idx = content.find(start_marker)
216+
end_idx = content.find(end_marker)
217+
218+
if start_idx == -1 or end_idx == -1:
219+
print("Error: Could not find table markers in file")
220+
return False
221+
222+
# Find the end of the start marker line
223+
start_line_end = content.find('\n', start_idx) + 1
224+
225+
new_content = (
226+
content[:start_line_end] +
227+
new_table + '\n' +
228+
content[end_idx:]
229+
)
230+
231+
# Write back to file
232+
try:
233+
with open(file_path, 'w') as f:
234+
f.write(new_content)
235+
print(f"Successfully updated {file_path}")
236+
return True
237+
except Exception as e:
238+
print(f"Error writing file: {e}")
239+
return False
240+
241+
242+
def main():
243+
"""Main function."""
244+
# Default path to governance file
245+
script_dir = os.path.dirname(os.path.abspath(__file__))
246+
repo_root = os.path.dirname(script_dir)
247+
governance_file = os.path.join(repo_root, "source", "contributor-guide", "governance.md")
248+
249+
if len(sys.argv) > 1:
250+
governance_file = sys.argv[1]
251+
252+
if not os.path.exists(governance_file):
253+
print(f"Error: Governance file not found at {governance_file}")
254+
sys.exit(1)
255+
256+
print(f"Updating committer list in {governance_file}")
257+
258+
if update_governance_file(governance_file):
259+
print("Committer list updated successfully")
260+
else:
261+
print("Failed to update committer list")
262+
sys.exit(1)
263+
264+
265+
if __name__ == "__main__":
266+
main()

docs/source/contributor-guide/governance.md

Lines changed: 78 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,6 @@
1919

2020
# Governance
2121

22-
The current PMC and committers are listed in the [Apache Phonebook].
23-
24-
[apache phonebook]: https://projects.apache.org/committee.html?datafusion
25-
2622
## Overview
2723

2824
DataFusion is part of the [Apache Software Foundation] and is governed following
@@ -38,6 +34,84 @@ As much as practicable, we strive to make decisions by consensus, and anyone in
3834
the community is encouraged to propose ideas, start discussions, and contribute
3935
to the project.
4036

37+
## People
38+
39+
DataFusion is currently governed by the following individuals
40+
41+
<!--
42+
43+
The following table can be updated by running the following script:
44+
45+
docs/scripts/update_committer_list.py
46+
47+
Notes:
48+
49+
* The script only updates the Name and Apache ID columns. The rest of the data
50+
is manually provided.
51+
52+
-->
53+
54+
<!-- Begin Auto-Generated Committer List -->
55+
56+
| Name | Apache ID | github | Affiliation | Role |
57+
| ----------------------- | ---------------- | ------------------------------------------------------- | -------------- | --------- |
58+
| Andrew Lamb | alamb | [alamb](https://github.com/alamb) | InfluxData | PMC Chair |
59+
| Andrew Grove | agrove | [andygrove](https://github.com/andygrove) | Apple | PMC |
60+
| Mustafa Akur | akurmustafa | [akurmustafa](https://github.com/akurmustafa) | OHSU | PMC |
61+
| Berkay Şahin | berkay | [berkaysynnada](https://github.com/berkaysynnada) | Synnada | PMC |
62+
| Oleksandr Voievodin | comphead | [comphead](https://github.com/comphead) | Apple | PMC |
63+
| Daniël Heres | dheres | [Dandandan](https://github.com/Dandandan) | | PMC |
64+
| QP Hou | houqp | [houqp](https://github.com/houqp) | | PMC |
65+
| Jie Wen | jackwener | [jakevin](https://github.com/jackwener) | | PMC |
66+
| Jay Zhan | jayzhan | [jayzhan211](https://github.com/jayzhan211) | | PMC |
67+
| Jonah Gao | jonah | [jonahgao](https://github.com/jonahgao) | | PMC |
68+
| Kun Liu | liukun | [liukun4515](https://github.com/liukun4515) | | PMC |
69+
| Mehmet Ozan Kabak | ozankabak | [ozankabak](https://github.com/ozankabak) | Synnada, Inc | PMC |
70+
| Tim Saucer | timsaucer | [timsaucer](https://github.com/timsaucer) | | PMC |
71+
| L. C. Hsieh | viirya | [viirya](https://github.com/viirya) | Databricks | PMC |
72+
| Ruihang Xia | wayne | [waynexia](https://github.com/waynexia) | Greptime | PMC |
73+
| Wes McKinney | wesm | [wesm](https://github.com/wesm) | Posit | PMC |
74+
| Will Jones | wjones127 | [wjones127](https://github.com/wjones127) | LanceDB | PMC |
75+
| Xudong Wang | xudong963 | [xudong963](https://github.com/xudong963) | Polygon.io | PMC |
76+
| Adrian Garcia Badaracco | adriangb | [adriangb](https://github.com/adriangb) | Pydantic | Committer |
77+
| Brent Gardner | avantgardner | [avantgardnerio](https://github.com/avantgardnerio) | Coralogix | Committer |
78+
| Dmitrii Blaginin | blaginin | [blaginin](https://github.com/blaginin) | SpiralDB | Committer |
79+
| Piotr Findeisen | findepi | [findepi](https://github.com/findepi) | dbt Labs | Committer |
80+
| Jax Liu | goldmedal | [goldmedal](https://github.com/goldmedal) | Canner | Committer |
81+
| Huaxin Gao | huaxingao | [huaxingao](https://github.com/huaxingao) | | Committer |
82+
| Ifeanyi Ubah | iffyio | [iffyio](https://github.com/iffyio) | Validio | Committer |
83+
| Jeffrey Vo | jeffreyvo | [Jefffrey](https://github.com/Jefffrey) | | Committer |
84+
| Liu Jiayu | jiayuliu | [jimexist](https://github.com/jimexist) | | Committer |
85+
| Ruiqiu Cao | kamille | [Rachelint](https://github.com/Rachelint) | Tencent | Committer |
86+
| Kazuyuki Tanimura | kazuyukitanimura | [kazuyukitanimura](https://github.com/kazuyukitanimura) | | Committer |
87+
| Eduard Karacharov | korowa | [korowa](https://github.com/korowa) | | Committer |
88+
| Siew Kam Onn | kosiew | [kosiew](https://github.com/kosiew) | | Committer |
89+
| Lewis Zhang | linwei | [lewiszlw](https://github.com/lewiszlw) | diit.cn | Committer |
90+
| Matt Butrovich | mbutrovich | [mbutrovich](https://github.com/mbutrovich) | Apple | Committer |
91+
| Metehan Yildirim | mete | [metegenez](https://github.com/metegenez) | | Committer |
92+
| Marko Milenković | milenkovicm | [milenkovicm](https://github.com/milenkovicm) | | Committer |
93+
| Wang Mingming | mingmwang | [mingmwang](https://github.com/mingmwang) | | Committer |
94+
| Michael Ward | mjward | [Michael-J-Ward ](https://github.com/Michael-J-Ward) | | Committer |
95+
| Marco Neumann | mneumann | [crepererum](https://github.com/crepererum) | InfluxData | Committer |
96+
| Zhong Yanghong | nju_yaho | [yahoNanJing](https://github.com/yahoNanJing) | | Committer |
97+
| Paddy Horan | paddyhoran | [paddyhoran](https://github.com/paddyhoran) | Assured Allies | Committer |
98+
| Parth Chandra | parthc | [parthchandra](https://github.com/parthchandra) | Apple | Committer |
99+
| Rémi Dettai | rdettai | [rdettai](https://github.com/rdettai) | | Committer |
100+
| Chao Sun | sunchao | [sunchao](https://github.com/sunchao) | OpenAI | Committer |
101+
| Daniel Harris | thinkharderdev | [thinkharderdev](https://github.com/thinkharderdev) | Coralogix | Committer |
102+
| Raphael Taylor-Davies | tustvold | [tustvold](https://github.com/tustvold) | | Committer |
103+
| Weijun Huang | weijun | [Weijun-H](https://github.com/Weijun-H) | OrbDB | Committer |
104+
| Yang Jiang | yangjiang | [Ted-jiang](https://github.com/Ted-jiang) | Ebay | Committer |
105+
| Yijie Shen | yjshen | [yjshen](https://github.com/yjshen) | DataPelago | Committer |
106+
| Yongting You | ytyou | [2010YOUY01](https://github.com/2010YOUY01) | Independent | Committer |
107+
| Qi Zhu | zhuqi | [zhuqi-lucas](https://github.com/zhuqi-lucas) | Polygon.io | Committer |
108+
109+
<!-- End Auto-Generated Committer List -->
110+
111+
Note that the authoritative list of PMC and committers is the [Apache Phonebook]
112+
113+
[apache phonebook]: https://projects.apache.org/committee.html?datafusion
114+
41115
## Roles
42116

43117
- **Contributors**: Anyone who contributes to the project, whether it be code,

0 commit comments

Comments
 (0)