add prefect to organize into flows and tasks #17

groovecoder · 2025-04-01T14:25:05Z

Adds prefect and new flows and tasks modules
Adds boto3 and src/aws.py module to enable uploading data artifacts to S3
Moves duration_to_minutes from notebooks/meetings.ipynb to src/meetings.py
Adds more files to .gitignore

* Adds prefect and new `flows` and `tasks` modules * Adds boto3 and `src/aws.py` module to enable uploading data artifacts to S3 * Moves `duration_to_minutes` from `notebooks/meetings.ipynb` to `src/meetings.py` * Adds more files to `.gitignore`

groovecoder · 2025-04-01T15:00:38Z

README.md

+- `data/`: local data artifacts
+- `flows/`: prefect flows
+- `notebooks/`: Jupyter notebooks for analysis and exploration
+- `scripts/`: one off scripts for downloading, conversions, etc


@jdungan : be sure to remove this line from the README.md too.

kaizengrowth · 2025-04-02T06:32:17Z

README.md

 - `src/`: Source code for the scraper
  - `models/`: Pydantic models for data representation
- 'scripts`: one off scripts for downloading, conversions, etc
+- `tasks/`: prefect tasks


Is the next task to convert all scripts to prefect tasks?

Not necessarily.

The only thing I'm sure is that we should move as much of the core logic (fetching, parsing, transforming, invoking models, writing outputs, etc.) as possible into either "src" or "functions" modules that do not import or depend on any orchestration library.

That way, we can seep our core logic as de-coupled as possible from prefect, or airflow, or langchain, or whatever other orchestration tool we want to try.

So, I think it would be better to convert or refactor code from scripts into code in "src" or "functions" modules.

kaizengrowth

Nice! Really liking the structured way that tasks and flows are organized in Prefect, and its neat dashboard!

NIT - a few small comments.

Will try tackling one of the #TODO's to get a sense of converting a script to a task, and add to the translate_meetings() flow.

kaizengrowth · 2025-04-02T06:43:41Z

tasks/meetings.py

+@task
+async def create_meetings_csv():
+    meetings = await get_meetings()
+    print(f"Got meetings: {meetings}")


Instead of print statements, do we want to consider importing logging for debug messages?

We might then connect logs to cloudwatch for db monitoring and alerts?

Yeah that's a good idea. I was using lazy print statements while developing, and prefect has the handy log_prints=True argument for flows which converts prints into logs that show up on the flow and task runs.

But yeah - we should do some "real" logging in our code without relying on prefect. It looks like we can also configure prefect to capture logging from our own code too, so we aren't depending on prefect for logging, but we ARE able to see our logging in prefect.

And then I assume we can have cloudwatch also watch our logging?

kaizengrowth · 2025-04-02T06:47:40Z

tasks/meetings.py

+
+@task
+async def create_meetings_csv():
+    meetings = await get_meetings()


We might want to add error handling here in a try/exceptblock.

kaizengrowth · 2025-04-02T06:52:10Z

src/aws.py

+from botocore.exceptions import ClientError, NoCredentialsError, PartialCredentialsError
+
+def is_aws_configured():
+    required_vars = ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY']


Do we also want to return the AWS_REGION?

kaizengrowth · 2025-04-02T06:57:04Z

src/aws.py

+
+
+def create_bucket_if_not_exists(bucket_name):
+    s3 = boto3.client('s3')


Maybe adding reusable get_s3_client() and get_aws_config() methods could be helpful for reusing in other files

kaizengrowth · 2025-04-02T06:58:12Z

src/aws.py

+            print(f"Client error: {e}")
+
+
+def upload_to_s3(file_path, bucket_name, s3_path):


Do we also want to add a download_from_s3() method?

groovecoder · 2025-04-02T15:26:25Z

Merging this now so we can build on top of it. We'll make some of the improvements @kaizengrowth suggested as we go.

…n 3.13

jdungan and others added 4 commits March 31, 2025 12:34

clean up scripts, tweak subtitles

1085bc0

Merge branch 'prefect-prep' into try-prefect-for-orchestration

24c1569

remove scripts directory

893c755

groovecoder commented Apr 1, 2025

View reviewed changes

kaizengrowth reviewed Apr 2, 2025

View reviewed changes

kaizengrowth approved these changes Apr 2, 2025

View reviewed changes

groovecoder requested a review from jdungan April 2, 2025 11:54

groovecoder force-pushed the try-prefect-for-orchestration branch 3 times, most recently from 8378ff9 to 9687479 Compare April 2, 2025 15:31

update pyproject to require python<3.13; prefect has a bug with pytho…

2355e6a

…n 3.13

groovecoder force-pushed the try-prefect-for-orchestration branch from 9687479 to 2355e6a Compare April 2, 2025 15:36

groovecoder merged commit b91af41 into main Apr 2, 2025
1 check passed

groovecoder deleted the try-prefect-for-orchestration branch April 2, 2025 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add prefect to organize into flows and tasks #17

add prefect to organize into flows and tasks #17

Uh oh!

groovecoder commented Apr 1, 2025

Uh oh!

groovecoder Apr 1, 2025

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

groovecoder Apr 2, 2025

Uh oh!

kaizengrowth left a comment •

edited

Loading

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

groovecoder Apr 2, 2025 •

edited

Loading

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

kaizengrowth Apr 2, 2025

Uh oh!

groovecoder commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def create_bucket_if_not_exists(bucket_name):
		s3 = boto3.client('s3')

		print(f"Client error: {e}")


		def upload_to_s3(file_path, bucket_name, s3_path):

add prefect to organize into flows and tasks #17

add prefect to organize into flows and tasks #17

Uh oh!

Conversation

groovecoder commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaizengrowth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

groovecoder Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

groovecoder commented Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaizengrowth left a comment •

edited

Loading

groovecoder Apr 2, 2025 •

edited

Loading