Skip to content

Commit c5d73f8

Browse files
committed
Merge branch 'main' of https://github.com/codeflash-ai/codeflash into part-1-windows-fixes
2 parents 8e5d03c + 553a192 commit c5d73f8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+2895
-1976
lines changed

.github/workflows/codeflash-optimize.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ jobs:
2020
CODEFLASH_AIS_SERVER: prod
2121
POSTHOG_API_KEY: ${{ secrets.POSTHOG_API_KEY }}
2222
CODEFLASH_API_KEY: ${{ secrets.CODEFLASH_API_KEY }}
23-
CODEFLASH_PR_NUMBER: ${{ github.event.number }}
2423
COLUMNS: 110
2524
steps:
2625
- name: 🛎️ Checkout

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ It uses advanced LLMs to generate multiple optimization ideas for your code, tes
1717
How to use Codeflash -
1818
- Optimize an entire existing codebase by running `codeflash --all`
1919
- Automate optimizing all __future__ code you will write by installing Codeflash as a GitHub action.
20-
- Optimize a Python workflow `python myscript.py` end-to-end by running `python -m codeflash.tracer -o benchmark.trace myscript.py`
20+
- Optimize a Python workflow `python myscript.py` end-to-end by running `codeflash optimize myscript.py`
2121

22-
Codeflash is used by top engineering teams at [Pydantic](https://github.com/pydantic/pydantic/pulls?q=is%3Apr+author%3Amisrasaurabh1+is%3Amerged), [Langflow](https://github.com/langflow-ai/langflow/issues?q=state%3Aclosed%20is%3Apr%20author%3Amisrasaurabh1), [Albumentations](https://github.com/albumentations-team/albumentations/issues?q=state%3Amerged%20is%3Apr%20author%3Akrrt7%20OR%20state%3Amerged%20is%3Apr%20author%3Aaseembits93%20) and many others to ship performant, expert level code.
22+
Codeflash is used by top engineering teams at [Pydantic](https://github.com/pydantic/pydantic/pulls?q=is%3Apr+author%3Amisrasaurabh1+is%3Amerged), [Langflow](https://github.com/langflow-ai/langflow/issues?q=state%3Aclosed%20is%3Apr%20author%3Amisrasaurabh1), [Roboflow](https://github.com/roboflow/inference/pulls?q=is%3Apr+is%3Amerged+codeflash+sort%3Acreated-asc), [Albumentations](https://github.com/albumentations-team/albumentations/issues?q=state%3Amerged%20is%3Apr%20author%3Akrrt7%20OR%20state%3Amerged%20is%3Apr%20author%3Aaseembits93%20) and many others to ship performant, expert level code.
2323

24-
Codeflash is great at optimizing AI Agents, Computer Vision algorithms, numerical code, backend code or anything else you might write with Python.
24+
Codeflash is great at optimizing AI Agents, Computer Vision algorithms, PyTorch code, numerical code, backend code or anything else you might write with Python.
2525

2626

2727
## Installation
@@ -50,6 +50,10 @@ Add codeflash as a development time dependency if you are using package managers
5050
codeflash --all
5151
```
5252
This can take a while to run for a large codebase, but it will keep opening PRs as it finds optimizations.
53+
3. Optimize a script:
54+
```
55+
codeflash optimize myscript.py
56+
```
5357

5458
## Documentation
5559
For detailed installation and usage instructions, visit our documentation at [docs.codeflash.ai](https://docs.codeflash.ai)
Binary file not shown.

code_to_optimize/code_directories/simple_tracer_e2e/workload.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from concurrent.futures import ThreadPoolExecutor
2+
from time import sleep
23

34

45
def funcA(number):
@@ -46,12 +47,20 @@ def _classify(self, features):
4647
class SimpleModel:
4748
@staticmethod
4849
def predict(data):
49-
return [x * 2 for x in data]
50+
result = []
51+
sleep(0.1) # can be optimized away
52+
for i in range(500):
53+
for x in data:
54+
computation = 0
55+
computation += x * i ** 2
56+
result.append(computation)
57+
return result
5058

5159
@classmethod
5260
def create_default(cls):
5361
return cls()
5462

63+
5564
def test_models():
5665
model = AlexNet(num_classes=10)
5766
input_data = [1, 2, 3, 4, 5]

codeflash/LICENSE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Business Source License 1.1
33
Parameters
44

55
Licensor: CodeFlash Inc.
6-
Licensed Work: Codeflash Client version 0.14.x
6+
Licensed Work: Codeflash Client version 0.15.x
77
The Licensed Work is (c) 2024 CodeFlash Inc.
88

99
Additional Use Grant: None. Production use of the Licensed Work is only permitted
@@ -13,7 +13,7 @@ Additional Use Grant: None. Production use of the Licensed Work is only permitte
1313
Platform. Please visit codeflash.ai for further
1414
information.
1515

16-
Change Date: 2029-06-09
16+
Change Date: 2029-07-03
1717

1818
Change License: MIT
1919

codeflash/api/cfapi.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -205,9 +205,10 @@ def get_blocklisted_functions() -> dict[str, set[str]] | dict[str, Any]:
205205
if pr_number is None:
206206
return {}
207207

208-
owner, repo = get_repo_owner_and_name()
209-
information = {"pr_number": pr_number, "repo_owner": owner, "repo_name": repo}
210208
try:
209+
owner, repo = get_repo_owner_and_name()
210+
information = {"pr_number": pr_number, "repo_owner": owner, "repo_name": repo}
211+
211212
req = make_cfapi_request(endpoint="/verify-existing-optimizations", method="POST", payload=information)
212213
req.raise_for_status()
213214
content: dict[str, list[str]] = req.json()
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING
4+
5+
from codeflash.cli_cmds.console import console, logger
6+
from codeflash.code_utils.config_consts import DEFAULT_IMPORTANCE_THRESHOLD
7+
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
8+
from codeflash.tracing.profile_stats import ProfileStats
9+
10+
if TYPE_CHECKING:
11+
from pathlib import Path
12+
13+
from codeflash.discovery.functions_to_optimize import FunctionToOptimize
14+
15+
16+
class FunctionRanker:
17+
"""Ranks and filters functions based on a ttX score derived from profiling data.
18+
19+
The ttX score is calculated as:
20+
ttX = own_time + (time_spent_in_callees / call_count)
21+
22+
This score prioritizes functions that are computationally heavy themselves (high `own_time`)
23+
or that make expensive calls to other functions (high average `time_spent_in_callees`).
24+
25+
Functions are first filtered by an importance threshold based on their `own_time` as a
26+
fraction of the total runtime. The remaining functions are then ranked by their ttX score
27+
to identify the best candidates for optimization.
28+
"""
29+
30+
def __init__(self, trace_file_path: Path) -> None:
31+
self.trace_file_path = trace_file_path
32+
self._profile_stats = ProfileStats(trace_file_path.as_posix())
33+
self._function_stats: dict[str, dict] = {}
34+
self.load_function_stats()
35+
36+
def load_function_stats(self) -> None:
37+
try:
38+
for (filename, line_number, func_name), (
39+
call_count,
40+
_num_callers,
41+
total_time_ns,
42+
cumulative_time_ns,
43+
_callers,
44+
) in self._profile_stats.stats.items():
45+
if call_count <= 0:
46+
continue
47+
48+
# Parse function name to handle methods within classes
49+
class_name, qualified_name, base_function_name = (None, func_name, func_name)
50+
if "." in func_name and not func_name.startswith("<"):
51+
parts = func_name.split(".", 1)
52+
if len(parts) == 2:
53+
class_name, base_function_name = parts
54+
55+
# Calculate own time (total time - time spent in subcalls)
56+
own_time_ns = total_time_ns
57+
time_in_callees_ns = cumulative_time_ns - total_time_ns
58+
59+
# Calculate ttX score
60+
ttx_score = own_time_ns + (time_in_callees_ns / call_count)
61+
62+
function_key = f"{filename}:{qualified_name}"
63+
self._function_stats[function_key] = {
64+
"filename": filename,
65+
"function_name": base_function_name,
66+
"qualified_name": qualified_name,
67+
"class_name": class_name,
68+
"line_number": line_number,
69+
"call_count": call_count,
70+
"own_time_ns": own_time_ns,
71+
"cumulative_time_ns": cumulative_time_ns,
72+
"time_in_callees_ns": time_in_callees_ns,
73+
"ttx_score": ttx_score,
74+
}
75+
76+
logger.debug(f"Loaded timing stats for {len(self._function_stats)} functions from trace using ProfileStats")
77+
78+
except Exception as e:
79+
logger.warning(f"Failed to process function stats from trace file {self.trace_file_path}: {e}")
80+
self._function_stats = {}
81+
82+
def _get_function_stats(self, function_to_optimize: FunctionToOptimize) -> dict | None:
83+
target_filename = function_to_optimize.file_path.name
84+
for key, stats in self._function_stats.items():
85+
if stats.get("function_name") == function_to_optimize.function_name and (
86+
key.endswith(f"/{target_filename}") or target_filename in key
87+
):
88+
return stats
89+
90+
logger.debug(
91+
f"Could not find stats for function {function_to_optimize.function_name} in file {target_filename}"
92+
)
93+
return None
94+
95+
def get_function_ttx_score(self, function_to_optimize: FunctionToOptimize) -> float:
96+
stats = self._get_function_stats(function_to_optimize)
97+
return stats["ttx_score"] if stats else 0.0
98+
99+
def rank_functions(self, functions_to_optimize: list[FunctionToOptimize]) -> list[FunctionToOptimize]:
100+
ranked = sorted(functions_to_optimize, key=self.get_function_ttx_score, reverse=True)
101+
logger.debug(
102+
f"Function ranking order: {[f'{func.function_name} (ttX={self.get_function_ttx_score(func):.2f})' for func in ranked]}"
103+
)
104+
return ranked
105+
106+
def get_function_stats_summary(self, function_to_optimize: FunctionToOptimize) -> dict | None:
107+
return self._get_function_stats(function_to_optimize)
108+
109+
def rerank_functions(self, functions_to_optimize: list[FunctionToOptimize]) -> list[FunctionToOptimize]:
110+
"""Ranks functions based on their ttX score.
111+
112+
This method calculates the ttX score for each function and returns
113+
the functions sorted in descending order of their ttX score.
114+
"""
115+
if not self._function_stats:
116+
logger.warning("No function stats available to rank functions.")
117+
return []
118+
119+
return self.rank_functions(functions_to_optimize)
120+
121+
def rerank_and_filter_functions(self, functions_to_optimize: list[FunctionToOptimize]) -> list[FunctionToOptimize]:
122+
"""Reranks and filters functions based on their impact on total runtime.
123+
124+
This method first calculates the total runtime of all profiled functions.
125+
It then filters out functions whose own_time is less than a specified
126+
percentage of the total runtime (importance_threshold).
127+
128+
The remaining 'important' functions are then ranked by their ttX score.
129+
"""
130+
stats_map = self._function_stats
131+
if not stats_map:
132+
return []
133+
134+
total_program_time = sum(s["own_time_ns"] for s in stats_map.values() if s.get("own_time_ns", 0) > 0)
135+
136+
if total_program_time == 0:
137+
logger.warning("Total program time is zero, cannot determine function importance.")
138+
return self.rank_functions(functions_to_optimize)
139+
140+
important_functions = []
141+
for func in functions_to_optimize:
142+
func_stats = self._get_function_stats(func)
143+
if func_stats and func_stats.get("own_time_ns", 0) > 0:
144+
importance = func_stats["own_time_ns"] / total_program_time
145+
if importance >= DEFAULT_IMPORTANCE_THRESHOLD:
146+
important_functions.append(func)
147+
else:
148+
logger.debug(
149+
f"Filtering out function {func.qualified_name} with importance "
150+
f"{importance:.2%} (below threshold {DEFAULT_IMPORTANCE_THRESHOLD:.2%})"
151+
)
152+
153+
logger.info(
154+
f"Filtered down to {len(important_functions)} important functions from {len(functions_to_optimize)} total functions"
155+
)
156+
console.rule()
157+
158+
return self.rank_functions(important_functions)

codeflash/cli_cmds/cli.py

Lines changed: 35 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,36 @@ def parse_args() -> Namespace:
2222

2323
init_actions_parser = subparsers.add_parser("init-actions", help="Initialize GitHub Actions workflow")
2424
init_actions_parser.set_defaults(func=install_github_actions)
25+
26+
trace_optimize = subparsers.add_parser("optimize", help="Trace and optimize a Python project.")
27+
28+
from codeflash.tracer import main as tracer_main
29+
30+
trace_optimize.set_defaults(func=tracer_main)
31+
32+
trace_optimize.add_argument(
33+
"--max-function-count",
34+
type=int,
35+
default=100,
36+
help="The maximum number of times to trace a single function. More calls to a function will not be traced. Default is 100.",
37+
)
38+
trace_optimize.add_argument(
39+
"--timeout",
40+
type=int,
41+
help="The maximum time in seconds to trace the entire workflow. Default is indefinite. This is useful while tracing really long workflows, to not wait indefinitely.",
42+
)
43+
trace_optimize.add_argument(
44+
"--output",
45+
type=str,
46+
default="codeflash.trace",
47+
help="The file to save the trace to. Default is codeflash.trace.",
48+
)
49+
trace_optimize.add_argument(
50+
"--config-file-path",
51+
type=str,
52+
help="The path to the pyproject.toml file which stores the Codeflash config. This is auto-discovered by default.",
53+
)
54+
2555
parser.add_argument("--file", help="Try to optimize only this file")
2656
parser.add_argument("--function", help="Try to optimize only this function within the given file path")
2757
parser.add_argument(
@@ -64,7 +94,8 @@ def parse_args() -> Namespace:
6494
)
6595
parser.add_argument("--no-draft", default=False, action="store_true", help="Skip optimization for draft PRs")
6696

67-
args: Namespace = parser.parse_args()
97+
args, unknown_args = parser.parse_known_args()
98+
sys.argv[:] = [sys.argv[0], *unknown_args]
6899
return process_and_validate_cmd_args(args)
69100

70101

@@ -102,6 +133,8 @@ def process_and_validate_cmd_args(args: Namespace) -> Namespace:
102133
if not Path(test_path).is_file():
103134
exit_with_message(f"Replay test file {test_path} does not exist", error_on_exit=True)
104135
args.replay_test = [Path(replay_test).resolve() for replay_test in args.replay_test]
136+
if env_utils.is_ci():
137+
args.no_pr = True
105138

106139
return args
107140

@@ -201,7 +234,7 @@ def handle_optimize_all_arg_parsing(args: Namespace) -> Namespace:
201234
"I need a git repository to run --all and open PRs for optimizations. Exiting..."
202235
)
203236
apologize_and_exit()
204-
if not args.no_pr and not check_and_push_branch(git_repo):
237+
if not args.no_pr and not check_and_push_branch(git_repo, git_remote=args.git_remote):
205238
exit_with_message("Branch is not pushed...", error_on_exit=True)
206239
owner, repo = get_repo_owner_and_name(git_repo)
207240
if not args.no_pr:

codeflash/cli_cmds/cmd_init.py

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,8 @@ def init_codeflash() -> None:
105105
usage_table.add_row(
106106
"codeflash --file <path-to-file> --function <function-name>", "Optimize a specific function within a file"
107107
)
108-
usage_table.add_row("codeflash --file <path-to-file>", "Optimize all functions in a file")
109-
usage_table.add_row(
110-
f"codeflash --all{module_string if module_string else ''}", "Optimize all functions in all files"
111-
)
108+
usage_table.add_row("codeflash optimize <myscript.py>", "Trace and find the best optimizations for a script")
109+
usage_table.add_row("codeflash --all", "Optimize all functions in all files")
112110
usage_table.add_row("codeflash --help", "See all available options")
113111

114112
completion_message = "⚡️ Codeflash is now set up!\n\nYou can now run any of these commands:"
@@ -239,7 +237,6 @@ def collect_setup_info() -> SetupInfo:
239237
)
240238
console.print(info_panel)
241239
console.print()
242-
243240
questions = [
244241
inquirer.List(
245242
"module_root",
@@ -274,7 +271,6 @@ def collect_setup_info() -> SetupInfo:
274271
message="Enter the path to your module directory",
275272
path_type=inquirer.Path.DIRECTORY,
276273
exists=True,
277-
normalize_to_absolute_path=False,
278274
)
279275
]
280276

@@ -331,7 +327,7 @@ def collect_setup_info() -> SetupInfo:
331327
elif tests_root_answer == custom_dir_option:
332328
custom_tests_panel = Panel(
333329
Text(
334-
"🧪 Enter a custom test directory path.\n\nPlease provide the path to your test directory.",
330+
"🧪 Enter a custom test directory path.\n\nPlease provide the path to your test directory, relative to the current directory.",
335331
style="yellow",
336332
),
337333
title="🧪 Custom Test Directory",
@@ -342,11 +338,7 @@ def collect_setup_info() -> SetupInfo:
342338

343339
custom_tests_questions = [
344340
inquirer.Path(
345-
"custom_tests_path",
346-
message="Enter the path to your tests directory",
347-
path_type=inquirer.Path.DIRECTORY,
348-
exists=False, # Allow creating new directories
349-
normalize_to_absolute_path=False,
341+
"custom_tests_path", message="Enter the path to your tests directory", path_type=inquirer.Path.DIRECTORY
350342
)
351343
]
352344

@@ -936,7 +928,8 @@ def configure_pyproject_toml(setup_info: SetupInfo) -> None:
936928
codeflash_section["tests-root"] = setup_info.tests_root
937929
codeflash_section["test-framework"] = setup_info.test_framework
938930
codeflash_section["ignore-paths"] = setup_info.ignore_paths
939-
codeflash_section["disable-telemetry"] = not enable_telemetry
931+
if not enable_telemetry:
932+
codeflash_section["disable-telemetry"] = not enable_telemetry
940933
if setup_info.git_remote not in ["", "origin"]:
941934
codeflash_section["git-remote"] = setup_info.git_remote
942935
formatter = setup_info.formatter
@@ -988,7 +981,7 @@ def install_github_app() -> None:
988981
)
989982
click.launch("https://github.com/apps/codeflash-ai/installations/select_target")
990983
click.prompt(
991-
f"Press Enter once you've finished installing the github app from https://github.com/apps/codeflash-ai/installations/select_target{LF}",
984+
f"Press Enter once you've finished installing the github app from https://github.com/apps/codeflash-ai/installations/select_target{LF}",
992985
default="",
993986
type=click.STRING,
994987
prompt_suffix="",

codeflash/cli_cmds/workflows/codeflash-optimize.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ jobs:
2121
runs-on: ubuntu-latest
2222
env:
2323
CODEFLASH_API_KEY: ${{ secrets.CODEFLASH_API_KEY }}
24-
CODEFLASH_PR_NUMBER: ${{ github.event.number }}
2524
{{ working_directory }}
2625
steps:
2726
- name: 🛎️ Checkout

0 commit comments

Comments
 (0)