Refactor existing caching mechanism into pluggable system#766
Refactor existing caching mechanism into pluggable system#766unjello wants to merge 2 commits intogocolly:masterfrom
Conversation
Create an interface `Cache`, similiar to `storage.Storage` that will allow for pluggable caching extensions. Caching subsystem is created with full access to Request and Response objects, to potentialy make more complex caching decisions based on both request and response headers. Because of the need to access those objects, `Cache` is not created in its own package like `storage`, but in root `colly`. Closes gocolly#103
|
@asciimoo would you happen to have a second to review this? |
| // Init initializes the caching backend | ||
| Init() error |
There was a problem hiding this comment.
Do we really need Init() as a part of the interface? This is not typical in Go. Can't we just assume that whateve Cache implementation is passed to colly, it's already initialized?
There was a problem hiding this comment.
The main reason I've decided to go with Init is consistency - it may not be idiomatic for Go, but it is used by colly all over the place... Collector, httpBackend, etc. My other reason was the way I was thinking about backwards compatibility - i.e. passing SetCollector via NewCollector, but after some thinking, I may call NewFileSystemCache just as well. If you'd like me to remove Init just let me know.
There was a problem hiding this comment.
I may call NewFileSystemCache just as well. If you'd like me to remove Init just let me know.
Yeah, that sounds much better. Please do.
| // Close finalizes caching backend | ||
| Close() error |
There was a problem hiding this comment.
What's the expected lifecycle of the Cache implementation? Is it tightly coupled to the Collector's one, or it can be shared between several collectors, and something else must close it?
There was a problem hiding this comment.
That's a very good question. The backwards-compatible FileSystemCache is tightly coupled with Collector, but the way to use other caches would be via Collector.SetCache only. Unfortunately in current form, where it's .SetCache who's calling cache.Init it kind of defeats the purpose. I will remove .Init from the .SetCache and document that this method should pass an initialised object, and a caller is responsible for closing. For the backwards-compatibility scenario, it should not matter too much. Collector does not have a .Close method, and FileSystemCache did not do any closing either.
There was a problem hiding this comment.
I thought about this a bit.
I believe if cache is set through the environment variable, collector's Close should call the cache's Close as well. If cache is initialized through SetCache, then it should not be closed. Moreover, if there's a cache previously initialized through the environment variable when SetCache is called, it should be closed before replacing it with the new one.
Do you agree?
| if request.Method != "GET" || request.Header.Get("Cache-Control") == "no-cache" { | ||
| return nil, ErrRequestNoCache | ||
| } |
There was a problem hiding this comment.
Does this check has to be inside Cache implementation rather than somewhere in its caller? I think cacheability of a HTTP request doesn't really depend on where you store its response.
There was a problem hiding this comment.
The main reason I put that inside a Cache is for each caching to make their own decisions - i.e. override Cache-Control and cache everything if they want. It did not feel right to have a decision made in two places. OTOH, maybe it'd be good to refactor this to a method like Cache.Filter or maybe to make a CacheFilter class with a default implementation, such that other Cache implementations do not have to reinvent the wheel if they just want to take a default behaviour.
There was a problem hiding this comment.
Okay, at least let's move this check into unexported function for the time being.
|
I certainly approve the efforts do decouple caching from the guts of colly. |
Create an interface
Cache, similiar tostorage.Storagethat will allow for pluggable caching extensions. Caching subsystem is created with full access to Request and Response objects, to potentialy make more complex caching decisions based on both request and response headers. Because of the need to access those objects,Cacheis not created in its own package likestorage, but in rootcolly.Closes #103