Skip to content

Commit a16c7e3

Browse files
author
codegen-bot
committed
Add basic docs
1 parent 04da3fa commit a16c7e3

File tree

1 file changed

+80
-7
lines changed

1 file changed

+80
-7
lines changed

docs/building-with-codegen/parsing-codebases.mdx

Lines changed: 80 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,29 @@ The primary entrypoint to programs leveraging Codegen is the [Codebase](/api-ref
99

1010
## Local Codebases
1111

12-
Construct a Codebase by passing in a path to a local `git` repository.
12+
Construct a Codebase by passing in a path to a local `git` repository or any subfolder within it. The path must be within a git repository (i.e., somewhere in the parent directory tree must contain a `.git` folder).
1313

1414
```python
1515
from codegen import Codebase
16+
from codegen.sdk.enums import ProgrammingLanguage
1617

17-
# Parse from a local directory
18+
# Parse from a git repository root
1819
codebase = Codebase("path/to/repository")
1920

20-
# Parse from current directory
21+
# Parse from a subfolder within a git repository
22+
codebase = Codebase("path/to/repository/src/subfolder")
23+
24+
# Parse from current directory (must be within a git repo)
2125
codebase = Codebase("./")
26+
27+
# Specify programming language (instead of inferring from file extensions)
28+
codebase = Codebase("./", programming_language=ProgrammingLanguage.TYPESCRIPT)
2229
```
2330

2431
<Note>
25-
This will automatically infer the programming language of the codebase and
26-
parse all files in the codebase.
32+
By default, Codegen will automatically infer the programming language of the codebase and
33+
parse all files in the codebase. You can override this by passing the `programming_language` parameter
34+
with a value from the `ProgrammingLanguage` enum.
2735
</Note>
2836

2937
<Tip>
@@ -38,16 +46,18 @@ To fetch and parse a repository directly from GitHub, use the `from_repo` functi
3846

3947
```python
4048
import codegen
49+
from codegen.sdk.enums import ProgrammingLanguage
4150

4251
# Fetch and parse a repository (defaults to /tmp/codegen/{repo_name})
4352
codebase = codegen.from_repo('fastapi/fastapi')
4453

45-
# Customize temp directory, clone depth, or specific commit
54+
# Customize temp directory, clone depth, specific commit, or programming language
4655
codebase = codegen.from_repo(
4756
'fastapi/fastapi',
4857
tmp_dir='/custom/temp/dir', # Optional: custom temp directory
49-
commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901',
58+
commit='786a8ada7ed0c7f9d8b04d49f24596865e4b7901', # Optional: specific commit
5059
shallow=False, # Optional: full clone instead of shallow
60+
programming_language=ProgrammingLanguage.PYTHON # Optional: override language detection
5161
)
5262
```
5363

@@ -56,6 +66,69 @@ codebase = codegen.from_repo(
5666
default. The clone is shallow by default for better performance.
5767
</Note>
5868

69+
## Configuration Options
70+
71+
You can customize the behavior of your Codebase instance by passing a `CodebaseConfig` object. This allows you to configure secrets (like API keys) and toggle specific features:
72+
73+
```python
74+
from codegen import Codebase
75+
from codegen.sdk.codebase.config import CodebaseConfig, GSFeatureFlags, Secrets
76+
77+
codebase = Codebase(
78+
"path/to/repository",
79+
config=CodebaseConfig(
80+
secrets=Secrets(
81+
openai_key="your-openai-key" # For AI-powered features
82+
),
83+
feature_flags=GSFeatureFlags(
84+
sync_enabled=True, # Enable graph synchronization
85+
... # Add other feature flags as needed
86+
)
87+
)
88+
)
89+
```
90+
91+
The `CodebaseConfig` allows you to configure:
92+
- `secrets`: API keys and other sensitive information needed by the codebase
93+
- `feature_flags`: Toggle specific features like language engines, dependency management, and graph synchronization
94+
95+
For a complete list of available feature flags and configuration options, see the [source code on GitHub](https://github.com/codegen-sh/codegen-sdk/blob/develop/src/codegen/sdk/codebase/config.py).
96+
97+
## Advanced Initialization
98+
99+
For more complex scenarios, Codegen supports an advanced initialization mode using `ProjectConfig`. This allows for fine-grained control over:
100+
101+
- Repository configuration
102+
- Base path and subdirectory filtering
103+
- Multiple project configurations
104+
105+
Here's an example:
106+
107+
```python
108+
from codegen import Codebase
109+
from codegen.git.repo_operator.local_repo_operator import LocalRepoOperator
110+
from codegen.git.schemas.repo_config import BaseRepoConfig
111+
from codegen.sdk.codebase.config import ProjectConfig
112+
from codegen.sdk.enums import ProgrammingLanguage
113+
114+
codebase = Codebase(
115+
projects = [
116+
ProjectConfig(
117+
repo_operator=LocalRepoOperator(
118+
repo_path="/tmp/codegen-sdk",
119+
repo_config=BaseRepoConfig(),
120+
bot_commit=True
121+
),
122+
programming_language=ProgrammingLanguage.TYPESCRIPT,
123+
base_path="src/codegen/sdk/typescript",
124+
subdirectories=["src/codegen/sdk/typescript"]
125+
)
126+
]
127+
)
128+
```
129+
130+
For more details on advanced configuration options, see the [source code on GitHub](https://github.com/codegen-sh/codegen-sdk/blob/develop/src/codegen/sdk/core/codebase.py).
131+
59132
## Supported Languages
60133

61134
Codegen currently supports:

0 commit comments

Comments
 (0)