refactor(scraper): fix, optimize, refactor#207
Merged
mini-bomba merged 10 commits intomainfrom Feb 15, 2025
Merged
Conversation
eec0ac9 to
c88956b
Compare
bec1ae2 to
35f6e6c
Compare
qamarq
requested changes
Feb 15, 2025
Member
qamarq
left a comment
There was a problem hiding this comment.
rozjezdzaja sie prowadzacy z kursami
- replaced export {} syntax with marking each function as export at
declaration
- exported all interfaces
- refactored all scrap*() arrow functions to proper functions with
proper return typing
- added some new interfaces for return types of scrap*() functions
- made all scrap*() functions throw improved errors instead of returning
undefined
- split the main `run()` function into multiple task functions - task functions may share data between them using properties of the command object - tasks should batch updates and run them all in a few queries by using the `*Many()` method variants of lucid models - implemented simple async semaphores for ratelimiting - the number of running parallel fetch and DB tasks is limited and can be adjusted using commandline flags - it actually works now (on my machine)
results in a ~40% speedup in that task (~25s -> ~15s)
i need that Set.difference in my scraper
896bfa2 to
4505792
Compare
today's session of pointless debugging was brought to you by today's sponsor, adonis! do you want your code to absolutely explode every time you attempt to do a bulk SQL action? do you despise the common-sense assumptions, such as the bulk fetch function returning items in the same order as in the list you provided? do you like wasting hours sitting in the debugger, inventing new debugging techniques, such as setting a conditional breakpoint on `Math.random() < 0.001`? then adonis is perfect for you! rewrite your web project in adonis today! use promo code `mini_bomba` to get 50% more pointless debugging for your first rewrite and a free database implosion on your first tests in production!
Member
|
Looks good to me. Can't find any issues |
a21da1c to
552b03f
Compare
c87cd09 to
8949123
Compare
Member
simon-the-shark
left a comment
There was a problem hiding this comment.
nie wczytywałem się bardzo dokładnie w logikę samego scrapowania, ale zakładam że jak działa to działa.
Kodzik bardzo ładny z dodatkiem kraftowych elementów :P
Zostawiłem parę nitpicków, ale równie dobrze to można to mergować jak chcecie
- replace () => {return {...};} with () => ({...})
- remove commented-out code
- move utils to their own files in /app/utils
- create '#utils/' subpath imports for /app/utils
aef8424 to
45d79e8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this pr does the following to the scraper command:
run()function into multiple task functions*Many()method variants of lucid models(except it kinda doesnt)(now it does)todo: (required before merge)
figure out why courses are always being duplicated on each scrapedonefurther optimization possibilities:
the archive task could probably be rewritten in raw sql for better performancedone