Skip to content

Commit 34b8e57

Browse files
committed
[docs] Add code review instructions file
1 parent c61ed95 commit 34b8e57

File tree

3 files changed

+321
-5
lines changed

3 files changed

+321
-5
lines changed

.github/copilot-instructions.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# OpenCTI Connectors Repository - Copilot Instructions
2+
3+
## Repository Overview
4+
5+
This is the **OpenCTI connectors** monorepo, containing 200+ Python-based connectors that integrate the OpenCTI threat intelligence platform with external tools and data sources. The repository uses a multi-connector architecture with shared utilities and strict validation pipelines.
6+
7+
**Key Statistics:**
8+
- **Language:** Python 3.11-3.12 (Alpine-based Docker images)
9+
- **Connector Types:** 128 external-import, 53 internal-enrichment, 28 stream, 6 internal-export-file, 6 internal-import-file
10+
- **Build System:** CircleCI with dynamic pipeline generation
11+
- **Testing:** pytest with isolated virtual environments per connector
12+
13+
## Critical Build & Validation Requirements
14+
15+
### Code Formatting (ALWAYS REQUIRED)
16+
17+
**Before committing any Python code changes, you MUST run both formatters:**
18+
19+
```bash
20+
# Install formatting tools (if not already installed)
21+
pip install isort==7.0.0 black==25.12.0 --user
22+
23+
# Run isort first
24+
isort --profile black --line-length 88 .
25+
26+
# Then run black
27+
black .
28+
```
29+
30+
**Note:** The CI will fail if code is not properly formatted. These commands MUST be run before pushing code.
31+
32+
### Linting Requirements
33+
34+
**1. Flake8 (Basic Linting)**
35+
```bash
36+
pip install flake8 --user
37+
flake8 --ignore=E,W .
38+
```
39+
40+
**2. Custom Pylint Plugin (STIX ID Validation - CRITICAL)**
41+
42+
This custom checker ensures STIX2 objects use deterministic IDs. **ALWAYS run this before committing connector code:**
43+
44+
```bash
45+
cd shared/pylint_plugins/check_stix_plugin
46+
pip install -r requirements.txt
47+
48+
# Run on your connector directory (example for external-import/mycconnector)
49+
PYTHONPATH=. python -m pylint ../../../external-import/myconnector \
50+
--disable=all \
51+
--enable=no_generated_id_stix,no-value-for-parameter,unused-import \
52+
--load-plugins linter_stix_id_generator
53+
```
54+
55+
**Common Issue:** If you create STIX2 objects, you MUST use deterministic ID generation via pycti or the new connectors-sdk models. Never let stix2 library auto-generate IDs.
56+
57+
### Running Tests
58+
59+
**Test Structure:** Each connector with tests has a `tests/test-requirements.txt` file. Tests run in isolated virtual environments.
60+
61+
```bash
62+
bash run_test.sh ./external-import/myconnector/tests/test-requirements.txt
63+
```
64+
65+
**Notes:**
66+
- Test script checks for changes from `master` branch
67+
- Tests only run if connector or connectors-sdk changed (non-master branches)
68+
- Installs latest pycti from GitHub master branch
69+
- If connector depends on connectors-sdk, installs local version
70+
- Output goes to `test_outputs/` directory
71+
72+
**Test Dependencies Pattern:**
73+
```
74+
pytest
75+
pycti
76+
connectors-sdk @ git+https://github.com/OpenCTI-Platform/connectors.git@master#subdirectory=connectors-sdk
77+
```
78+
79+
## Repository Structure
80+
81+
### Top-Level Directories
82+
83+
```
84+
├── .circleci/ # CI/CD configuration
85+
│ ├── config.yml # Main CircleCI workflow
86+
│ ├── scripts/ # Dynamic pipeline generation (generate_ci.py)
87+
│ └── vars.yml # Connector-specific build configurations
88+
├── .github/ # GitHub workflows & templates
89+
├── connectors-sdk/ # Shared SDK for connector development (Python 3.11+)
90+
├── external-import/ # 128 connectors for importing external threat intel
91+
├── internal-enrichment/ # 53 connectors for enriching existing data
92+
├── internal-export-file/ # 6 connectors for exporting data files
93+
├── internal-import-file/ # 6 connectors for importing data files
94+
├── stream/ # 28 connectors for streaming data
95+
├── shared/ # Shared utilities
96+
│ ├── pylint_plugins/ # Custom pylint plugins (STIX ID checker)
97+
│ └── tools/ # Manifest/schema generation scripts
98+
├── templates/ # Connector templates for each type
99+
└── tests/ # Repository-level tests
100+
```
101+
102+
### Standard Connector Structure
103+
104+
Every connector follows this pattern:
105+
106+
```
107+
external-import/myconnector/
108+
├── __metadata__/
109+
│ ├── connector_manifest.json # Connector metadata (title, description, etc.)
110+
│ ├── connector_config_schema.json # Config JSON schema (auto-generated)
111+
│ └── logo.png # Connector logo
112+
├── src/
113+
│ ├── connector/ # Main logic
114+
│ │ ├── connector.py # Core connector class
115+
│ │ ├── converter_to_stix.py # STIX conversion logic
116+
│ │ └── settings.py # Config validation
117+
│ ├── main.py # Entry point
118+
│ └── requirements.txt # Python dependencies
119+
├── tests/
120+
│ ├── tests_connector/ # Test modules
121+
│ ├── conftest.py # pytest configuration
122+
│ └── test-requirements.txt # Test dependencies
123+
├── .env.sample # Environment variable template
124+
├── docker-compose.yml # Docker deployment config
125+
├── Dockerfile # Container build
126+
├── entrypoint.sh # Container entrypoint
127+
└── README.md # Connector documentation
128+
```
129+
130+
## Creating New Connectors
131+
132+
```bash
133+
cd templates
134+
sh create_connector_dir.sh -t <TYPE> -n <NAME>
135+
```
136+
137+
**Types:** `external-import`, `internal-enrichment`, `stream`, `internal-import-file`, `internal-export-file`
138+
139+
**After creating:** Replace `Template`/`template` references, update `__metadata__/connector_manifest.json`, configure `.env.sample`, implement logic in `src/connector/connector.py`.
140+
141+
## Key Configuration Files
142+
143+
### Key Files
144+
145+
- **`.flake8`** - Ignores: E203, E266, E501, W503, F403, F401
146+
- **`.pre-commit-config.yaml`** - Pre-commit hooks (black, flake8, isort, GPG signing)
147+
- **`ci-requirements.txt`** - CI deps: isort==7.0.0, black==25.12.0, pytest==8.4.2
148+
- **`Makefile`** - Manifest/schema generation commands
149+
- **`run_test.sh`** - Test execution (checks changes, runs pytest)
150+
151+
### CircleCI Pipeline
152+
153+
**Workflow Steps:**
154+
1. **ensure_formatting** - isort and black checks (Python 3.12)
155+
2. **base_linter** - flake8 with `--ignore=E,W`
156+
3. **linter** - Custom pylint plugin for STIX ID validation
157+
4. **test** - pytest for changed connectors (parallelism: 4, Python 3.11)
158+
5. **build_manifest** - Generates manifest.json and config schemas
159+
6. **build** - Builds Docker images for changed connectors
160+
161+
Tests and builds only run for connectors with changes (unless on master or connectors-sdk changed).
162+
163+
## Connectors SDK
164+
165+
**Location:** `connectors-sdk/` - Provides STIX2.1 models with deterministic IDs, pre-built classes for IOCs/Authors/Markings/Relationships, exception handling, Pydantic config validation.
166+
167+
**Install:**
168+
```bash
169+
pip install "connectors-sdk @ git+https://github.com/OpenCTI-Platform/connectors.git@master#subdirectory=connectors-sdk"
170+
```
171+
172+
**Example:**
173+
```python
174+
from connectors_sdk.models import IPV4Address, OrganizationAuthor, TLPMarking
175+
from connectors_sdk.models.octi import related_to
176+
177+
author = OrganizationAuthor(name="Example Author")
178+
ip = IPV4Address(value="127.0.0.1", author=author, markings=[TLPMarking(level="amber+strict")])
179+
stix_object = ip.to_stix2_object() # Deterministic ID
180+
```
181+
182+
## Dockerfile Pattern
183+
184+
All connectors use Alpine-based Python images:
185+
186+
```dockerfile
187+
FROM python:3.12-alpine
188+
ENV CONNECTOR_TYPE=EXTERNAL_IMPORT
189+
190+
COPY src /opt/opencti-connector-name
191+
192+
RUN apk --no-cache add git build-base libmagic libffi-dev && \
193+
cd /opt/opencti-connector-name && \
194+
pip3 install --no-cache-dir -r requirements.txt && \
195+
apk del git build-base
196+
197+
COPY entrypoint.sh /
198+
RUN chmod +x /entrypoint.sh
199+
ENTRYPOINT ["/entrypoint.sh"]
200+
```
201+
202+
**Note:** Some connectors use Python 3.11 (see `.circleci/vars.yml` for exceptions).
203+
204+
## Common Issues & Workarounds
205+
206+
- **Test script fails "fatal: Not a valid object name origin/master"** - Script expects origin/master ref; fetch master branch first in shallow clones
207+
- **Tests not running** - Only runs for changed code; set `CIRCLE_BRANCH=master` to force all tests
208+
- **Pylint plugin fails** - Use pycti's `generate_id()` or connectors-sdk models for deterministic STIX IDs
209+
- **Import/dependency errors in tests** - Tests install latest pycti from GitHub; pin version if needed
210+
211+
## Manifest & Schema Generation
212+
213+
```bash
214+
make connector_manifest # Single connector
215+
make connectors_manifests # All connectors
216+
make connector_config_schema # Single schema
217+
make connectors_config_schemas # All schemas
218+
make global_manifest # Global manifest.json
219+
```
220+
221+
Scripts scan `__metadata__/connector_manifest.json` files and consolidate them.
222+
223+
## Python Version Requirements
224+
225+
- **Connectors SDK:** Python >=3.11, <3.13
226+
- **Most Connectors:** Python 3.12 (Alpine)
227+
- **Some Stream Connectors:** Python 3.11 (see `.circleci/vars.yml`)
228+
- **CI Environment:** Python 3.11 for tests, Python 3.12 for linting
229+
230+
## Important Notes
231+
232+
1. **ALWAYS format code** with black and isort before committing
233+
2. **ALWAYS run custom pylint plugin** when changing connector code that creates STIX objects
234+
3. **NEVER auto-generate STIX IDs** - use deterministic generation via pycti or connectors-sdk
235+
4. **Test isolation** - Each connector's tests run in separate virtual environments
236+
5. **Commit signing** - GPG signed commits required (pre-commit hook enforced)
237+
6. **Docker networking** - Locally, connectors expect `docker_default` network
238+
7. **Environment variables** - Use `.env.sample` as template, never commit secrets
239+
240+
## PR Submission
241+
242+
**Before submitting a PR:**
243+
244+
1. **Run all validation checks** (formatting, linting, custom pylint, tests)
245+
2. **Sign commits with GPG** (required by pre-commit hook)
246+
3. **Update metadata** (`__metadata__/connector_manifest.json`) if adding/modifying connectors
247+
4. **Update documentation** (README.md, add examples)
248+
5. **Test functionality** with different use cases
249+
6. **No secrets** in code or config files
250+
251+
**PR Checklist:**
252+
- [ ] Code formatted: `black .` and `isort --profile black .`
253+
- [ ] Linting passes: `flake8 --ignore=E,W .`
254+
- [ ] Custom pylint passes (for STIX objects)
255+
- [ ] Tests pass: `bash run_test.sh <path>`
256+
- [ ] Docker builds: `docker build -t test .`
257+
- [ ] Commits signed with GPG
258+
- [ ] Metadata and docs updated
259+
260+
**CircleCI pipeline:** formatting checks → linting → custom pylint → tests (parallel) → manifest build → Docker builds (only for changed connectors).
261+
262+
## Trust These Instructions
263+
264+
These instructions are comprehensive and tested. Only search for additional information if:
265+
- Instructions are incomplete for your specific use case
266+
- You encounter an error not documented here
267+
- You need connector-specific implementation details
268+
269+
For connector development patterns, refer to existing connectors as examples. The codebase is consistent, so patterns from one connector generally apply to others of the same type.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
applyTo: "**/*"
3+
---
4+
5+
When reviewing code, focus on:
6+
7+
## Security Critical Issues
8+
- Check for hardcoded secrets, API keys, or credentials
9+
- Look for SQL injection and XSS vulnerabilities
10+
- Verify proper input validation and sanitization
11+
- Review authentication and authorization logic
12+
13+
## Performance Red Flags
14+
- Identify N+1 database query problems
15+
- Spot inefficient loops and algorithmic issues
16+
- Check for memory leaks and resource cleanup
17+
- Review caching opportunities for expensive operations
18+
19+
## Code Quality Essentials
20+
- Functions should be focused and appropriately sized
21+
- Use clear, descriptive naming conventions
22+
- Ensure proper error handling throughout
23+
24+
## Review Style
25+
- Be specific and actionable in feedback
26+
- Explain the "why" behind recommendations
27+
- Acknowledge good patterns when you see them
28+
- Ask clarifying questions when code intent is unclear
29+
30+
Always prioritize security vulnerabilities and performance issues that could impact users.
31+
32+
Always suggest changes to improve readability. For example, this suggestion seeks to make the code more readable and also makes the validation logic reusable and testable.
33+
34+
// Instead of:
35+
if (user.email && user.email.includes('@') && user.email.length > 5) {
36+
submitButton.enabled = true;
37+
} else {
38+
submitButton.enabled = false;
39+
}
40+
41+
// Consider:
42+
function isValidEmail(email) {
43+
return email && email.includes('@') && email.length > 5;
44+
}
45+
46+
submitButton.enabled = isValidEmail(user.email);

external-import/threatfox/src/__main__.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ def import_data(self, state: Dict, now_dt: datetime, now_ts: int) -> None:
164164
last_processed_entry_running_max = last_processed_entry
165165

166166
for i, row in enumerate(csv_reader):
167-
if len(row) > 14:
167+
if len(row) > 15:
168168
self.helper.log_info(
169169
f"The csv line is badly formatted and will be ignored.(line: {i}, data: {row})"
170170
)
@@ -532,18 +532,19 @@ def __init__(self, row: Tuple) -> None:
532532
self.last_seen = None
533533

534534
self.confidence_level = int(row[9])
535-
self.reference = row[10]
535+
self.is_compromised = str(row[10]).lower() == "true"
536+
self.reference = row[11]
536537

537538
if self.reference == "None":
538539
self.reference = ""
539540

540-
self.tags = list(filter(None, row[11].split(",")))
541+
self.tags = list(filter(None, row[12].split(",")))
541542

542543
if self.threat_type:
543544
self.tags.insert(0, self.threat_type)
544545

545-
self.anonymous = bool(int(row[12]))
546-
self.reporter = row[13]
546+
self.anonymous = bool(int(row[13]))
547+
self.reporter = row[14]
547548

548549

549550
if __name__ == "__main__":

0 commit comments

Comments
 (0)