Skip to content

jtsw1990/glimpse-gpt-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Glimpse

Glimpse is a data engineering project designed to:

  • Read in latest news snippets from source API
  • Combined and process text, metadata into a single input
  • Call LLM API to generate a single paragraph prompt
  • Use prompt to feed into image generation API
  • Use social platform API to automatically generate content daily

System Components

External APIs

AWS Ecosystem

Social Media

Project Architecture

Project Architecture

Components that are not IaC:

  • Updating of AWS credentials/IAM user
  • SNS topic set up + subscriptions
  • Pandas lambda layer
  • OpenAI lambda layer
  • Creation of environment variables for content create lambda
  • Posting of content

Useful Links

Development Workflow

  1. Create and branch off new issue
  2. Install all local dependencies in requirements.txt as well as setting up serverless locally
  3. Use #%% magic from vscode jupyter extension to run isolated lambda functions
  4. Replicate variables locally using sample files
  5. To test, upload a sample raw_feed.json from local into glimpse-landing-dev through the AWS console. This should kick off the pipeline automatically
  6. If everything runs correctly, an email should be sent to jtsw1990@gmail.com with the content feed
  7. If not, review the logs, check each lambda's latest timestamp to identify error messages
  8. Delete the raw_feed.json from glimpse-landing-dev and feature.json from glimpse-feature-store if applicable to keep things clean
  9. Repeat steps 3-8 until tests run as expected
  10. Run ruff check . --fix to highlight any linting issues
  11. Run sls deploy to push latest adjustments to AWS (Note the components not included in IAC above and apply accordingly)
  12. Run git workflow to push to feature branch
  13. Merge back into main

Project Goals

  • Become a wizard in building infrastructure
  • To know the right practices and tools to avoid running notebooks manually in datascience projects
  • To be able to weigh options for different solutions given a specific stack and situation
  • Have fun learning and hopefully build something cool along the way

Optional Goals

  • Get used to the standard git development process (TBD) which will help with work
  • Create a personal project template that can be reused
  • Add an element of content creation to this

About

Repository to build an end to end automated pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages