Skip to content
125 changes: 80 additions & 45 deletions docs/building-with-codegen/files-and-directories.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@ icon: "folder-tree"
iconType: "solid"
---

Codegen provides two primary abstractions for working with your codebase's file structure:
Codegen provides three primary abstractions for working with your codebase's file structure:

- [File](/api-reference/core/File)
- [Directory](/api-reference/core/Directory)
- [File](/api-reference/core/File) - Represents a file in the codebase (e.g. README.md, package.json, etc.)
- [SourceFile](/api-reference/core/SourceFile) - Represents a source code file (e.g. Python, TypeScript, React, etc.)
- [Directory](/api-reference/core/Directory) - Represents a directory in the codebase

Both of these expose a rich API for accessing and manipulating their contents.
<Info>
[SourceFile](/api-reference/core/SourceFile) is a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files.
</Info>

This guide explains how to effectively use these classes to manage your codebase.

Expand All @@ -31,8 +34,10 @@ for file in codebase.files:

# Check if a file exists
exists = codebase.has_file("path/to/file.py")

```


These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.

```python
Expand All @@ -50,61 +55,45 @@ dir = file.directory
exists = codebase.has_directory("path/to/dir")
```

## Working with Non-Code Files (README, JSON, etc.)
## Differences between File and SourceFile

By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code files like README.md, package.json, or .env:
- [File](/api-reference/core/File) - a general purpose class that represents any file in the codebase including non-code files like README.md, .env, .json, image files, etc.
- [SourceFile](/api-reference/core/SourceFile) - a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files writtent in languages supported by the [codegen-sdk](/introduction/overview) (Python, TypeScript, JavaScript, React).

```python
# Get all files in the codebase (including README, docs, config files)
files = codebase.files(extensions="*")

# Print files that are not source code (documentation, config, etc)
for file in files:
if not file.filepath.endswith(('.py', '.ts', '.js')):
print(f"📄 Non-code file: {file.filepath}")
```
When getting a file with `codebase.get_file`, files ending in `.py, .js, .ts, .jsx, .tsx` are returned as [SourceFile](/api-reference/core/SourceFile) objects while other files are returned as [File](/api-reference/core/File) objects.

You can also filter for specific file types:
Furthermore, you can use the `isinstance` function to check if a file is a [SourceFile](/api-reference/core/SourceFile):

```python
# Get only markdown documentation files
docs = codebase.files(extensions=[".md", ".mdx"])
py_file = codebase.get_file("path/to/file.py")
if isinstance(py_file, SourceFile):
print(f"File {py_file.filepath} is a source file")

# Get configuration files
config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
```
# prints: `File path/to/file.py is a source file`

These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.
mdx_file = codebase.get_file("path/to/file.mdx")
if isinstance(mdx_file, File):
print(f"File {mdx_file.filepath} is a non-code file")

## Raw Content and Metadata

```python
# Grab raw file string content
content = file.content # For text files
print('Length:', len(content))
print('# of functions:', len(file.functions))

# Access file metadata
name = file.name # Base name without extension
extension = file.extension # File extension with dot
filepath = file.filepath # Full relative path
dir = file.directory # Parent directory

# Access directory metadata
name = dir.name # Base name without extension
path = dir.path # Full relative path from repository root
parent = dir.parent # Parent directory
# prints: `File path/to/file.mdx is a non-code file`
```

<Note>
Currently, the codebase object can only parse source code files of one language at a time. This means that if you want to work with both Python and TypeScript files, you will need to create two separate codebase objects.
</Note>

## Accessing Code

Files and Directories provide several APIs for accessing and iterating over their code.
[SourceFiles](/api-reference/core/SourceFile) and [Directories](/api-reference/core/Directory) provide several APIs for accessing and iterating over their code.

See, for example:

- `.functions` ([File](/api-reference/core/File#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](../api-reference/core/Function) in the file/directory
- `.classes` ([File](/api-reference/core/File#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](../api-reference/core/Class) in the file/directory
- `.imports` ([File](/api-reference/core/File#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](../api-reference/core/Import) in the file/directory
- `.functions` ([SourceFile](/api-reference/core/SourceFile#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](../api-reference/core/Function) in the file/directory
- `.classes` ([SourceFile](/api-reference/core/SourceFile#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](../api-reference/core/Class) in the file/directory
- `.imports` ([SourceFile](/api-reference/core/SourceFile#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](../api-reference/core/Import) in the file/directory
- `.get_function(...)` ([SourceFile](/api-reference/core/SourceFile#get-function) / [Directory](/api-reference/core/Directory#get-function)) - Get a specific function by name
- `.get_class(...)` ([SourceFile](/api-reference/core/SourceFile#get-class) / [Directory](/api-reference/core/Directory#get-class)) - Get a specific class by name
- `.get_global_var(...)` ([SourceFile](/api-reference/core/SourceFile#get-global-var) / [Directory](/api-reference/core/Directory#get-global-var)) - Get a specific global variable by name


```python
Expand Down Expand Up @@ -142,6 +131,52 @@ if main_function:
print(f"Local var: {var.name} = {var.value}")
```

## Working with Non-Code Files (README, JSON, etc.)

By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code [files](/api-reference/core/File) like README.md, package.json, or .env:

```python
# Get all files in the codebase (including README, docs, config files)
files = codebase.files(extensions="*")

# Print files that are not source code (documentation, config, etc)
for file in files:
if not file.filepath.endswith(('.py', '.ts', '.js')):
print(f"📄 Non-code file: {file.filepath}")
```

You can also filter for specific file types:

```python
# Get only markdown documentation files
docs = codebase.files(extensions=[".md", ".mdx"])

# Get configuration files
config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
```

These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.

## Raw Content and Metadata

```python
# Grab raw file string content
content = file.content # For text files
print('Length:', len(content))
print('# of functions:', len(file.functions))

# Access file metadata
name = file.name # Base name without extension
extension = file.extension # File extension with dot
filepath = file.filepath # Full relative path
dir = file.directory # Parent directory

# Access directory metadata
name = dir.name # Base name without extension
path = dir.path # Full relative path from repository root
parent = dir.parent # Parent directory
```

## Editing Files Directly

Files themselves are [`Editable`](../api-reference/core/Editable.mdx) objects, just like Functions and Classes.
Expand All @@ -153,7 +188,7 @@ Files themselves are [`Editable`](../api-reference/core/Editable.mdx) objects, j
This means they expose many useful operations, including:

- [`File.search`](../api-reference/core/File#search) - Search for all functions named "main"
- [`File.edit`](../api-reference/core/Editable#edit) - Edit the file
- [`File.edit`](../api-reference/core/File#edit) - Edit the file
- [`File.replace`](../api-reference/core/File#replace) - Replace all instances of a string with another string
- [`File.insert_before`](../api-reference/core/File#insert-before) - Insert text before a specific string
- [`File.insert_after`](../api-reference/core/File#insert-after) - Insert text after a specific string
Expand Down
2 changes: 1 addition & 1 deletion docs/building-with-codegen/the-editable-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Every Editable provides:
- [source](../api-reference/core/Editable#source) - the text content of the Editable
- [extended_source](../api-reference/core/Editable#extended_source) - includes relevant content like decorators, comments, etc.
- Information about the file that contains the Editable:
- [file](../api-reference/core/Editable#file) - the [File](../api-reference/core/File) that contains this Editable
- [file](../api-reference/core/Editable#file) - the [SourceFile](../api-reference/core/SourceFile) that contains this Editable
- Relationship tracking
- [parent_class](../api-reference/core/Editable#parent-class) - the [Class](../api-reference/core/Class) that contains this Editable
- [parent_function](../api-reference/core/Editable#parent-function) - the [Function](../api-reference/core/Function) that contains this Editable
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/flask-to-fastapi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ python run.py

The script will:

1. Process all Python files in your codebase
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to FastAPI patterns

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/python2-to-python3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ python run.py

The script will:

1. Process all Python files in your codebase
1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to Python 3 syntax

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/training-data.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ This will:

<Tip>
You can use any Git repository as your source codebase by passing the repo URL
to [Codebase.from_repo(...)](/api-reference/core/codebase#from-repo).
to [Codebase.from_repo(...)](/api-reference/core/Codebase#from-repo).
</Tip>

## Using the Training Data
Expand Down