Asset downloads in spaCy model config file #11938
Replies: 1 comment
-
Thanks for the note, that sounds like a really interesting application. To be clear, you'd like do something like this, right?
That isn't possible. While it's not hard to imagine how it would work, because pipelines do need to be local to source from them, any implementation would just be downloading the package in the background. So while it would save a step, it ultimately wouldn't be that different from using a local path. (As a note, you don't have to use a path to refer to models - if you package them, you can install them and refer to them by name, just like pretrained pipelines. Not a big difference but may help avoid relative path weirdness.)
Maybe I'm misunderstanding, but there's no requirement that a project file be tied to a git repo, project files can just be used on their own, as long as each has its own directory. In this case it would be a little more complicated than usual use of projects, but you could have a simple script that generates a project file (since it's just yaml) with the assets section covering your s3 resources and vars covering the components to source. The vars could then be passed as overrides to
More generally, we are planning to break the projects functionality out into its own library next year, so we're very interested in any potential use patterns like yours. I've also been working on automating changes to pipelines, like swapping out base models / tok2vecs, in this PR, though I'm still working out the most useful way to make that available. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm managing an NLP platform for our organisation that we've based on spaCy (thanks for an awesome framework!!), and we store trained models in MLflow. I frequently want to source components from a previously trained model or grab a different base model/vectors. I haven't gone for spaCy Projects as it seems to me it would add complexity to the platform (we're not managing each project as a git repo), but I would really like the "assets" capability from Projects. For example, I would like to be able to just specify the URL pointing to a trained model (e.g. in S3) when I want to source a component from it, rather than having to point to a local path.
Am I right in saying that this isn't currently possible? And that it could be achieved by creating a registered function that heavily copies from the
download_file
function here, that usessmart_open
to download file URLs for the Projects asset functionality?Beta Was this translation helpful? Give feedback.
All reactions