Skip to content

Commit c785fbb

Browse files
docs: New Contributor Guide + fork-based contributor workflow guide (replaces #273) (#281)
- new contributor guide (docs/guides/contributing/new_contributor_guide.md) - reorganizes Table of Contents, lumps technical guides under one tab - updates the contributing changes to a fork-based workflow as per best practices (rather than requesting access)
1 parent 5cd2e7f commit c785fbb

File tree

8 files changed

+372
-179
lines changed

8 files changed

+372
-179
lines changed
Lines changed: 73 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,90 @@
1-
Before following this workflow please refer to our [**Getting Started**](./overview.md) page for instructions on installing dependencies and setting up your development environment.
1+
# GitHub Contribution Workflow (Fork-Based)
22

3-
# Contributor Workflow
3+
Before following this workflow please refer to our [**New Contributor Guide**](./new_contributor_guide.md) page for instructions on installing dependencies and setting up your development environment.
44

55
## Overview
66

7-
All changes should be made in a feature branch, merged into `develop`, and later merged into `main` for a new release.
7+
All changes should be made in a forked repository, submitted via pull request to the upstream `develop` branch which will be later merged into `main` for a new release by maintainers.
88

9-
## Contributing new changes
109

11-
1. **Create a Feature Branch**
12-
- Branch from `develop` using `feature/<name>` or `bugfix/<name>`.
13-
- Example:
10+
## Opening a Pull Request (PR)
1411

15-
```shell
16-
git checkout develop
17-
git pull origin develop
18-
git checkout -b feature/new-feature
19-
```
12+
1. **Fork the repository**
2013

21-
2. **Make Changes & Push**
22-
- Commit changes with clear messages.
23-
- Push the branch.
14+
- Navigate to the [main repository](https://www.github.com/civictechdc/cib-mango-tree) on GitHub.
15+
- Click the *Fork* button in the upper right corner.
16+
- This creates a copy of the repository under your GitHub account.
2417

25-
```shell
26-
git add .
27-
git commit -m "Description of changes"
28-
git push origin feature/new-feature
29-
```
18+
2. **Clone your fork**
3019

31-
3. **Create a Pull Request**
32-
- Open a PR to merge into `develop`.
33-
- Address any review feedback.
20+
Clone your forked repository to your local machine:
3421

35-
4. **Merge & Clean Up**
36-
- After approval, merge into `develop`.
37-
- Delete the feature branch.
22+
```shell
23+
git clone https://github.com/YOUR-USERNAME/REPOSITORY-NAME.git
24+
cd REPOSITORY-NAME
25+
```
3826

39-
5. **Release**
40-
- When develop is clean and ready for a new major release, we will merge `develop` into `main`.
27+
3. **Add upstream remote**
4128

42-
## Workflow Diagram
29+
Add the original repository and name it as `upstream` remote:
4330

44-
```mermaid
45-
graph TD;
46-
A[Feature Branch] -->|Commit & Push| B[Pull Request];
47-
B -->|Review & Merge| C[Develop Branch];
48-
C -->|Release| D[Main Branch];
49-
```
31+
```shell
32+
git remote add upstream https://github.com/ORIGINAL-OWNER/REPOSITORY-NAME.git
33+
git remote -v
34+
```
5035

51-
# Next Steps
36+
4. **Create a feature branch**
5237

53-
Once you finish reading this it's recommended to check out the [architecture](./architecture.md) section.
38+
Branch from `develop` using `feature/<name>` or `bugfix/<name>`:
39+
40+
```shell
41+
git checkout develop
42+
git pull upstream develop
43+
git checkout -b feature/new-feature
44+
```
45+
46+
5. **Make changes & push to your fork**
47+
48+
- Commit changes with clear messages.
49+
- Push the branch to your forked repository.
50+
51+
```shell
52+
git add . # adds changes in all non-ignored files in current folder
53+
git commit -m "Description of changes"
54+
git push origin feature/new-feature
55+
```
56+
57+
6. **Create a pull request**
58+
59+
- Navigate to the original repository on GitHub.
60+
- Click *Pull requests* --> *New pull request*.
61+
- Click *Compare across forks*.
62+
- Set the base repository to `ORIGINAL-OWNER/REPOSITORY-NAME` and base branch to `develop`.
63+
- Set the head repository to `YOUR-USERNAME/REPOSITORY-NAME` and compare branch to `feature/new-feature`.
64+
- Click *Create pull request* and fill in the details.
65+
- Address any review feedback.
66+
67+
7. **Keep your fork updated**
68+
69+
Regularly sync your fork with the upstream repository:
70+
71+
```shell
72+
git checkout develop. # switch to local develop branch
73+
git pull upstream develop # sync the original branch with your local branch
74+
git push origin develop # update branch on your remote fork with synced local branch
75+
```
76+
77+
8. **After merge & clean up**
78+
79+
- After your PR is approved and merged into upstream `develop`, delete your feature branch:
80+
81+
```shell
82+
git checkout develop
83+
git branch -d feature/new-feature
84+
git push origin --delete feature/new-feature
85+
```
86+
87+
9. **Release**
88+
89+
- When `develop` is clean and ready for a new major release, maintainers will merge `develop` into `main` and create a new release (with your contributions included).
5490

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
The mission of CIB Mango Tree community is to bring peer-reviewed research methods on the detection of coordinated inauthentic behavior (CIB) in social media to the fingertips of journalists, fact-checkers, watchdogs, and researchers. We do this by developing an interactive analysis and dashboard app. You can read our project overview on [cibmangotree.org](https://cibmangotree.org).
2+
3+
The purpose of this guide is to provide big-picture instructions on how to contribute a new test to CIB Mango Tree and a more detailed guide for each individual step. You are welcome to take part in any and all of the steps for contributig a new test that are described below. Before reading any of the specific subsections you are recommended to read the Overview section.
4+
5+
6+
!!! info
7+
This guide is intended for contributing a new test to the library. If this is not your primary area of interest, there are other ways to contribute to the CIB Mango Project, for example in engagement and outreach activities and project management. See our Engagement Guide for more information on contributing in these ways (or ask us on Slack).
8+
9+
# Overview
10+
11+
## A library of tests
12+
13+
There is no single test that can provide definitive evidence that a snapshot of activity on social media is coordinated and/or inauthentic or not. Therefore -- at its core -- the goal of CIB Mango Tree is to be a library of different, but complementary tests. Each test highlights specific aspects of social media activity.
14+
15+
!!! example
16+
**Test A** might focus on finding post content that is copied verbatim a large number of times by different accounts (signaling coordination). On the other hand, **Test B** ignores the post contents and instead analyzes trending hashtags. The two tests are complementary since they analyze different aspects (post content vs. hashtag usage) of the same data.
17+
18+
## Contribution cycle
19+
20+
Completing the entire test contribution cycle, from ideation to implementation, requires sustained commitment. We recognize that not every contributor will have that level of bandwidth available. Hence the contribution process is broken down into separate stages. Each of these stages has a concrete hand-off, such that you can focus on the concrete deliverable for the specific stage without worrying about the next steps.
21+
22+
!["A left-to-right box and arrow workflow diagram"](../../img/new_contributor_workflow.svg)
23+
/// caption
24+
Contributing a new test consists of several steps.
25+
///
26+
27+
!!! example
28+
If you really enjoy diving into a new domain, open-ended problems, and reading papers, you can help us out by doing research and proposing a new test. If instead, you prefer tasks with more clearly defined outputs, you are welcome to review the documentation or implement a currently proposed test.
29+
30+
# I am interested in...
31+
32+
Depending on you interests and availability, you might want to jump to the section relevant to your goals.
33+
34+
<div class="grid cards" markdown>
35+
36+
- __1. Learning more__
37+
38+
Getting to know the community.
39+
40+
---
41+
42+
[:octicons-arrow-right-24: Learning more & community resources](#not-sure-yet-i-want-to-learn-more)
43+
{ .card }
44+
45+
- __2. Research__
46+
47+
Discovering new tests.
48+
49+
---
50+
51+
[:octicons-arrow-right-24: Researching a new test](#doing-research)
52+
{ .card }
53+
54+
- __3. Design__
55+
56+
Thinking about the big picture.
57+
58+
---
59+
60+
[:octicons-arrow-right-24: Designing a test](#designing-a-test)
61+
{ .card }
62+
63+
- __4. Prototype__
64+
65+
Turning ideas into proof-of-concepts.
66+
67+
---
68+
69+
[:octicons-arrow-right-24: Protoyping a test](#prototyping-a-test)
70+
{ .card }
71+
72+
- __5. Implement__
73+
74+
From proof-of-concept to production.
75+
76+
---
77+
78+
[:octicons-arrow-right-24: Implementing and integrating](#implementing-a-test-into-the-library)
79+
{ .card }
80+
81+
- __6. General software engineering__
82+
83+
Helping with improving code base.
84+
85+
---
86+
87+
[:octicons-arrow-right-24: Improving the code base](#helping-with-general-software-engineering)
88+
{ .card }
89+
90+
</div>
91+
92+
## Not sure yet, I want to learn more.
93+
94+
That is absolutely fine, we've all been there. If you can, the best way to find your way around the project is to join an Civic Tech DC project night in Washington, DC. You can see upcoming events and [register on Luma](https://lu.ma/civic-tech-dc). If in-person attendence is not an option, you can join us virtually too. You'll need to first [join our Slack space](https://www.civictechdc.org/slack) (look for the `#cib-mango-tree-*` channels), where you can learn more.
95+
96+
!!! info "Community resources"
97+
98+
| Platform link | Come here for |
99+
| --- | --- |
100+
| [Joining Slack](https://www.civictechdc.org/slack) | Joining the community chat and learn about our weekly virtual calls |
101+
| [Civic Tech DC website](https://www.civictechdc.org) | Upcoming in-person events |
102+
| [Civic Tech DC Luma](https://lu.ma/civic-tech-dc) | Following upcoming in-person events |
103+
104+
105+
## Doing research
106+
107+
It all starts here. CIB Mango Tree wants to make pre-existing social media analysis methods, descripted in peer-reviewed research broadly available. There are no set-in-stone processes here, the Internet, literature, and the community are your friends! However, if computational social science and social media data analysis are new domains to you, we provide some of the recent review articles that we think are a good starting point and the relevant communities to keep an eye on below.
108+
109+
!!! info
110+
Some of the references that can get you started:
111+
112+
- Mancocci et al. 2024. Detection and Characterization of Coordinated Online Behavior: A Survey. [https://doi.org/10.48550/arXiv.2408.01257](https://doi.org/10.48550/arXiv.2408.01257)
113+
- EU DisinfoLab. 2024. [CIB Detection Tree](https://www.disinfo.eu/publications/coordinated-inauthentic-behaviour-detection-tree/)
114+
115+
**Hand-off**: You should fill out the test outline [template document](https://docs.google.com/document/d/1eO2pbMfBZNznnCo4s3E9eINaR-2qxCLedo7EbaRmHsA/edit?tab=t.0#heading=h.35g3nbbzlngs) on our Google Drive to propose a new test for our library. This will help us have a record of what was done, the key references, and it will provide a good starting point for a new volunteer.
116+
117+
**Getting help**: You can ask around on the `#cib-mango-tree-product` channel. For example, you can share any papers you find interesting to get the conversation going. It is also always great to share short updates in the weekly calls or at the in-person event.
118+
119+
## Designing a test
120+
121+
If a test already has a test outline document that has been completed, the best next step is to give a very short presentation of the test and discuss it with the rest of the volunteers in one of the virtual calls (or at in-person events). The goal here is to take the outline and think about how:
122+
123+
1. The analysis method in the outline could be made useful for the test library and
124+
1. what are the possible implementations.
125+
126+
While it is often possible to take an outline and start prototyping immediately (see next step), we find it is always worthwhile having a discussion before diving into number crunching. This is a good step if you want to hone in your data science and presentation skills!
127+
128+
**Hand-off**: A short presentation (preferably in Google Slides) that explains the methodology, and speaks to why the test is useful – what it’s good for, and what it may have uniquely that other currently implemented tests don’t measure well.
129+
130+
!!! tip
131+
We recommend keeping the presentation quite brief (e.g. 3-5 slides, 10-15 minutes), so that we can have a rich discussion. We recommend including the following elements:
132+
133+
1. The main rationale for including the test.
134+
1. What type of data is required (e.g. user id, post content, timestamp)
135+
1. The main output/metric of the test (e.g. a list of identified posts, a list of accounts etc.).
136+
137+
**Getting help:** You are welcome to ask around on Slack for more information as you are preparing this and it’s absolutely fine if not everything is clear – the purpose is to discuss the test together as a community.
138+
139+
140+
## Prototyping a test
141+
142+
Once a test has been discussed and we have a good idea of what insights it could provide, we recommend that you start by creating a "quick and dirty" prototype of the test. In practice this means creating a local script (or a notebook etc., whatever works for you) that implements the main analysis and provides some basic outputs. This will allow you to quickly get a feel for what works and what does not without having to deal with the details of application codebase.
143+
144+
There are no specific requirements here, other than that the exploration should be lightweight, focus only on the main ideas and give us a more concrete idea regarding how we can move forward with integrating into the main codebase.
145+
146+
**Hand-off**: Ideally, the final hand-off for this part is a GitHub issue on our [main repository](https://github.com/civictechdc/cib-mango-tree) describing the prototyped test as a feature request. The goal here is to have a log of what was done (what worked well, what did not) such that in the future you or someone else can read the issue and proceed with implementing the test (see next step)
147+
148+
!!! tip
149+
When opening the issue, follow the template (you we'll be able to select it after you click the "New Issue" button). We recommend including the following information if possible. This would make it much easier to proceed:
150+
151+
- The dataset used to prototype the test (and providing any public link to the dataset if we don't have it in our Drive yet)
152+
- The required inputs for the prototype
153+
- If possible, a publicly-accessible link to a notebook (e.g. Google Colab) or analysis scripts,
154+
- Describe the dependencies used in the analysis (if applicable)
155+
- Concise description of the analysis, one or two main figures, and any of the outstanding issues
156+
157+
## Implementing a test into the library
158+
159+
This step involves implementing the new test using the prototyped test as a reference. The idea here is to take the code that was used to prototype (see [Prototype a test](#prototype-a-test) section) and integrate it into our main repository code base. In our parlance, that means implementing _an analyzer_.
160+
161+
This is possibly the most technically involved part of the entire process as it requires understanding the data science logic behind the test as well as getting to know the CIB Mango Tree code base and its requirements. We have dedicated technical guides to help you along the way.
162+
163+
!!! tip
164+
You can start by following these more specific guides:
165+
166+
1. [Setting up development environment](../get-started/installation.md)
167+
1. [Contributing new changes](../contributing/contributing.md) (opening a pull request)
168+
1. [Implementing Analyzers](../contributing/analyzers.md)
169+
170+
171+
**Hand-off**: The hand-off here is a pull request (PR) to our [main repository](https://github.com/civictechdc/cib-mango-tree) which introduces the code for the new analyzer.
172+
173+
174+
## Helping with general software engineering
175+
176+
You might not be interested in any specific parts of new test development, but are instead driven by general software engineering problems. We always welcome help in any of the following areas:
177+
178+
- Reviewing pull requests
179+
- Improving our existing [test suite](./testing.md)
180+
- Testing the general performance of analyzers (speed, memory, data science code)
181+
- Helping with the interactive side (terminal, GUI, other frontend)
182+
- Improving the application's storage system (backend, databases)
Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,35 @@
1-
## Testing
1+
## Runing the Tests
2+
3+
You can run all tests invoking the `pytest` command from project root:
4+
```bash
5+
# Run all test
6+
pytest
7+
```
8+
9+
To run a specific test file. This is useful when you're developing a specific feature and don't want to run the whole test suite:
10+
```
11+
pytest analyzers/hashtags/test_hashtags_analyzer.py
12+
```
13+
14+
Run specific test function:
15+
```
16+
pytest analyzers/hashtags/test_hashtags_analyzer.py::test_gini
17+
```
18+
19+
To get more information, run with verbose output:
20+
```
21+
pytest -v
22+
```
23+
24+
25+
## Implementing tests
226

327
The `testing` module provides testers for the primary and
4-
secondary analyzer modules. See the [example](https://github.com/civictechdc/mango-tango-cli/blob/develop/analyzers/example/README.md) for further references.
28+
secondary analyzer modules. See the [example](https://github.com/civictechdc/mango-tango-cli/blob/develop/analyzers/example/README.md) for further references.
29+
30+
31+
### Test Data
32+
33+
- Test data is co-located with analyzers folders in `test_data/` directories
34+
- Each analyzer should include its own test files
35+
- Tests use sample data to verify functionality

docs/guides/design-philosophy/architecture.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,22 @@ The application has three "domains":
2525

2626
```mermaid
2727
flowchart TD
28-
terminal["Terminal (core)"]
29-
application["Application (core)"]
30-
storage["Storage (core)"]
28+
terminal["Terminal (core)"]
29+
application["Application (core)"]
30+
storage["Storage (core)"]
3131
32-
importers["Importers (edge)"]
33-
semantic["Semantic Preprocessor (edge)"]
32+
importers["Importers (edge)"]
33+
semantic["Semantic Preprocessor (edge)"]
3434
35-
content["Analyzers/Web Presenters (content)"]
35+
content["Analyzers/Web Presenters (content)"]
3636
37-
terminal --> application
38-
application --> storage
37+
terminal --> application
38+
application --> storage
3939
40-
application --> importers
41-
application --> semantic
40+
application --> importers
41+
application --> semantic
4242
43-
application --> content
43+
application --> content
4444
```
4545

4646
# Questions, Comments, and Feedback

0 commit comments

Comments
 (0)