diff --git a/docs/building-with-codegen/files-and-directories.mdx b/docs/building-with-codegen/files-and-directories.mdx
index 632b5138b..c03eb1674 100644
--- a/docs/building-with-codegen/files-and-directories.mdx
+++ b/docs/building-with-codegen/files-and-directories.mdx
@@ -5,12 +5,15 @@ icon: "folder-tree"
iconType: "solid"
---
-Codegen provides two primary abstractions for working with your codebase's file structure:
+Codegen provides three primary abstractions for working with your codebase's file structure:
-- [File](/api-reference/core/File)
-- [Directory](/api-reference/core/Directory)
+- [File](/api-reference/core/File) - Represents a file in the codebase (e.g. README.md, package.json, etc.)
+- [SourceFile](/api-reference/core/SourceFile) - Represents a source code file (e.g. Python, TypeScript, React, etc.)
+- [Directory](/api-reference/core/Directory) - Represents a directory in the codebase
-Both of these expose a rich API for accessing and manipulating their contents.
+
+ [SourceFile](/api-reference/core/SourceFile) is a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files.
+
This guide explains how to effectively use these classes to manage your codebase.
@@ -31,8 +34,10 @@ for file in codebase.files:
# Check if a file exists
exists = codebase.has_file("path/to/file.py")
+
```
+
These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.
```python
@@ -50,61 +55,58 @@ dir = file.directory
exists = codebase.has_directory("path/to/dir")
```
-## Working with Non-Code Files (README, JSON, etc.)
+## Differences between SourceFile and File
-By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code files like README.md, package.json, or .env:
+- [File](/api-reference/core/File) - a general purpose class that represents any file in the codebase including non-code files like README.md, .env, .json, image files, etc.
+- [SourceFile](/api-reference/core/SourceFile) - a subclass of [File](/api-reference/core/File) that provides additional functionality for source code files written in languages supported by the [codegen-sdk](/introduction/overview) (Python, TypeScript, JavaScript, React).
-```python
-# Get all files in the codebase (including README, docs, config files)
-files = codebase.files(extensions="*")
+The majority of intended use cases involve using exclusively [SourceFile](/api-reference/core/SourceFile) objects as these contain code that can be parsed and manipulated by the [codegen-sdk](/introduction/overview). However, there may be cases where it will be necessary to work with non-code files. In these cases, the [File](/api-reference/core/File) class can be used.
-# Print files that are not source code (documentation, config, etc)
-for file in files:
- if not file.filepath.endswith(('.py', '.ts', '.js')):
- print(f"📄 Non-code file: {file.filepath}")
-```
-
-You can also filter for specific file types:
+By default, the `codebase.files` property will only return [SourceFile](/api-reference/core/SourceFile) objects. To include non-code files the `extensions='*'` argument must be used.
```python
-# Get only markdown documentation files
-docs = codebase.files(extensions=[".md", ".mdx"])
+# Get all source files in the codebase
+source_files = codebase.files
-# Get configuration files
-config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
+# Get all files in the codebase (including non-code files)
+all_files = codebase.files(extensions="*")
```
-These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.
-## Raw Content and Metadata
+When getting a file with `codebase.get_file`, files ending in `.py, .js, .ts, .jsx, .tsx` are returned as [SourceFile](/api-reference/core/SourceFile) objects while other files are returned as [File](/api-reference/core/File) objects.
+
+Furthermore, you can use the `isinstance` function to check if a file is a [SourceFile](/api-reference/core/SourceFile):
```python
-# Grab raw file string content
-content = file.content # For text files
-print('Length:', len(content))
-print('# of functions:', len(file.functions))
+py_file = codebase.get_file("path/to/file.py")
+if isinstance(py_file, SourceFile):
+ print(f"File {py_file.filepath} is a source file")
-# Access file metadata
-name = file.name # Base name without extension
-extension = file.extension # File extension with dot
-filepath = file.filepath # Full relative path
-dir = file.directory # Parent directory
+# prints: `File path/to/file.py is a source file`
-# Access directory metadata
-name = dir.name # Base name without extension
-path = dir.path # Full relative path from repository root
-parent = dir.parent # Parent directory
+mdx_file = codebase.get_file("path/to/file.mdx")
+if not isinstance(mdx_file, SourceFile):
+ print(f"File {mdx_file.filepath} is a non-code file")
+
+# prints: `File path/to/file.mdx is a non-code file`
```
+
+ Currently, the codebase object can only parse source code files of one language at a time. This means that if you want to work with both Python and TypeScript files, you will need to create two separate codebase objects.
+
+
## Accessing Code
-Files and Directories provide several APIs for accessing and iterating over their code.
+[SourceFiles](/api-reference/core/SourceFile) and [Directories](/api-reference/core/Directory) provide several APIs for accessing and iterating over their code.
See, for example:
-- `.functions` ([File](/api-reference/core/File#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](../api-reference/core/Function) in the file/directory
-- `.classes` ([File](/api-reference/core/File#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](../api-reference/core/Class) in the file/directory
-- `.imports` ([File](/api-reference/core/File#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](../api-reference/core/Import) in the file/directory
+- `.functions` ([SourceFile](/api-reference/core/SourceFile#functions) / [Directory](/api-reference/core/Directory#functions)) - All [Functions](/api-reference/core/Function) in the file/directory
+- `.classes` ([SourceFile](/api-reference/core/SourceFile#classes) / [Directory](/api-reference/core/Directory#classes)) - All [Classes](/api-reference/core/Class) in the file/directory
+- `.imports` ([SourceFile](/api-reference/core/SourceFile#imports) / [Directory](/api-reference/core/Directory#imports)) - All [Imports](/api-reference/core/Import) in the file/directory
+- `.get_function(...)` ([SourceFile](/api-reference/core/SourceFile#get-function) / [Directory](/api-reference/core/Directory#get-function)) - Get a specific function by name
+- `.get_class(...)` ([SourceFile](/api-reference/core/SourceFile#get-class) / [Directory](/api-reference/core/Directory#get-class)) - Get a specific class by name
+- `.get_global_var(...)` ([SourceFile](/api-reference/core/SourceFile#get-global-var) / [Directory](/api-reference/core/Directory#get-global-var)) - Get a specific global variable by name
```python
@@ -142,9 +144,55 @@ if main_function:
print(f"Local var: {var.name} = {var.value}")
```
+## Working with Non-Code Files (README, JSON, etc.)
+
+By default, Codegen focuses on source code files (Python, TypeScript, etc). However, you can access all files in your codebase, including documentation, configuration, and other non-code [files](/api-reference/core/File) like README.md, package.json, or .env:
+
+```python
+# Get all files in the codebase (including README, docs, config files)
+files = codebase.files(extensions="*")
+
+# Print files that are not source code (documentation, config, etc)
+for file in files:
+ if not file.filepath.endswith(('.py', '.ts', '.js')):
+ print(f"📄 Non-code file: {file.filepath}")
+```
+
+You can also filter for specific file types:
+
+```python
+# Get only markdown documentation files
+docs = codebase.files(extensions=[".md", ".mdx"])
+
+# Get configuration files
+config_files = codebase.files(extensions=[".json", ".yaml", ".toml"])
+```
+
+These APIs are similar for [Directory](/api-reference/core/Directory), which provides similar methods for accessing files and subdirectories.
+
+## Raw Content and Metadata
+
+```python
+# Grab raw file string content
+content = file.content # For text files
+print('Length:', len(content))
+print('# of functions:', len(file.functions))
+
+# Access file metadata
+name = file.name # Base name without extension
+extension = file.extension # File extension with dot
+filepath = file.filepath # Full relative path
+dir = file.directory # Parent directory
+
+# Access directory metadata
+name = dir.name # Base name without extension
+path = dir.path # Full relative path from repository root
+parent = dir.parent # Parent directory
+```
+
## Editing Files Directly
-Files themselves are [`Editable`](../api-reference/core/Editable.mdx) objects, just like Functions and Classes.
+Files themselves are [`Editable`](/api-reference/core/Editable.mdx) objects, just like Functions and Classes.
Learn more about the [Editable API](/building-with-codegen/the-editable-api).
@@ -152,12 +200,12 @@ Files themselves are [`Editable`](../api-reference/core/Editable.mdx) objects, j
This means they expose many useful operations, including:
-- [`File.search`](../api-reference/core/File#search) - Search for all functions named "main"
-- [`File.edit`](../api-reference/core/Editable#edit) - Edit the file
-- [`File.replace`](../api-reference/core/File#replace) - Replace all instances of a string with another string
-- [`File.insert_before`](../api-reference/core/File#insert-before) - Insert text before a specific string
-- [`File.insert_after`](../api-reference/core/File#insert-after) - Insert text after a specific string
-- [`File.remove`](../api-reference/core/File#remove) - Remove a specific string
+- [`File.search`](/api-reference/core/File#search) - Search for all functions named "main"
+- [`File.edit`](/api-reference/core/File#edit) - Edit the file
+- [`File.replace`](/api-reference/core/File#replace) - Replace all instances of a string with another string
+- [`File.insert_before`](/api-reference/core/File#insert-before) - Insert text before a specific string
+- [`File.insert_after`](/api-reference/core/File#insert-after) - Insert text after a specific string
+- [`File.remove`](/api-reference/core/File#remove) - Remove a specific string
```python
# Get a file
@@ -183,7 +231,7 @@ file.insert_after("def end():\npass")
file.remove()
```
-You can frequently do bulk modifictions via the [`.edit(...)`](../api-reference/core/Editable#edit) method or [`.replace(...)`](../api-reference/core/File#replace) method.
+You can frequently do bulk modifictions via the [`.edit(...)`](/api-reference/core/Editable#edit) method or [`.replace(...)`](/api-reference/core/File#replace) method.
Most useful operations will have bespoke APIs that handle edge cases, update
@@ -192,7 +240,7 @@ You can frequently do bulk modifictions via the [`.edit(...)`](../api-reference/
## Moving and Renaming Files
-Files can be manipulated through methods like [`File.update_filepath()`](../api-reference/core/File#update-filepath), [`File.rename()`](../api-reference/core/File#rename), and [`File.remove()`](../api-reference/core/File#remove):
+Files can be manipulated through methods like [`File.update_filepath()`](/api-reference/core/File#update-filepath), [`File.rename()`](/api-reference/core/File#rename), and [`File.remove()`](/api-reference/core/File#remove):
```python
# Move/rename a file
@@ -216,7 +264,7 @@ for file in codebase.files:
## Directories
-[`Directories`](/api-reference/core/Directory) expose a similar API to the [File](../api-reference/core/File.mdx) class, with the addition of the `subdirectories` property.
+[`Directories`](/api-reference/core/Directory) expose a similar API to the [File](/api-reference/core/File.mdx) class, with the addition of the `subdirectories` property.
```python
# Get a directory
diff --git a/docs/building-with-codegen/the-editable-api.mdx b/docs/building-with-codegen/the-editable-api.mdx
index 78124b692..37236c430 100644
--- a/docs/building-with-codegen/the-editable-api.mdx
+++ b/docs/building-with-codegen/the-editable-api.mdx
@@ -17,7 +17,7 @@ Every Editable provides:
- [source](../api-reference/core/Editable#source) - the text content of the Editable
- [extended_source](../api-reference/core/Editable#extended_source) - includes relevant content like decorators, comments, etc.
- Information about the file that contains the Editable:
- - [file](../api-reference/core/Editable#file) - the [File](../api-reference/core/File) that contains this Editable
+ - [file](../api-reference/core/Editable#file) - the [SourceFile](../api-reference/core/SourceFile) that contains this Editable
- Relationship tracking
- [parent_class](../api-reference/core/Editable#parent-class) - the [Class](../api-reference/core/Class) that contains this Editable
- [parent_function](../api-reference/core/Editable#parent-function) - the [Function](../api-reference/core/Function) that contains this Editable
diff --git a/docs/tutorials/flask-to-fastapi.mdx b/docs/tutorials/flask-to-fastapi.mdx
index 314c92314..ba4fb3fce 100644
--- a/docs/tutorials/flask-to-fastapi.mdx
+++ b/docs/tutorials/flask-to-fastapi.mdx
@@ -190,7 +190,7 @@ python run.py
The script will:
-1. Process all Python files in your codebase
+1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to FastAPI patterns
diff --git a/docs/tutorials/python2-to-python3.mdx b/docs/tutorials/python2-to-python3.mdx
index 50abd5361..c72227d14 100644
--- a/docs/tutorials/python2-to-python3.mdx
+++ b/docs/tutorials/python2-to-python3.mdx
@@ -229,7 +229,7 @@ python run.py
The script will:
-1. Process all Python files in your codebase
+1. Process all Python [files](/api-reference/python/PyFile) in your codebase
2. Apply the transformations in the correct order
3. Maintain your code's functionality while updating to Python 3 syntax
diff --git a/docs/tutorials/training-data.mdx b/docs/tutorials/training-data.mdx
index 0f4608693..ced20867f 100644
--- a/docs/tutorials/training-data.mdx
+++ b/docs/tutorials/training-data.mdx
@@ -171,7 +171,7 @@ This will:
You can use any Git repository as your source codebase by passing the repo URL
- to [Codebase.from_repo(...)](/api-reference/core/codebase#from-repo).
+ to [Codebase.from_repo(...)](/api-reference/core/Codebase#from-repo).
## Using the Training Data