@@ -349,6 +349,111 @@ error conditions:
349349 should not use bundle URIs for fetch unless the server has explicitly
350350 recommended it through a `bundle.heuristic` value.
351351
352+ Example Bundle Provider organization
353+ ------------------------------------
354+
355+ The bundle URI feature is intentionally designed to be flexible to
356+ different ways a bundle provider wants to organize the object data.
357+ However, it can be helpful to have a complete organization model described
358+ here so providers can start from that base.
359+
360+ This example organization is a simplified model of what is used by the
361+ GVFS Cache Servers (see section near the end of this document) which have
362+ been beneficial in speeding up clones and fetches for very large
363+ repositories, although using extra software outside of Git.
364+
365+ The bundle provider deploys servers across multiple geographies. Each
366+ server manages its own bundle set. The server can track a number of Git
367+ repositories, but provides a bundle list for each based on a pattern. For
368+ example, when mirroring a repository at `https://<domain>/<org>/<repo>`
369+ the bundle server could have its bundle list available at
370+ `https://<server-url>/<domain>/<org>/<repo>`. The origin Git server can
371+ list all of these servers under the "any" mode:
372+
373+ [bundle]
374+ version = 1
375+ mode = any
376+
377+ [bundle "eastus"]
378+ uri = https://eastus.example.com/<domain>/<org>/<repo>
379+
380+ [bundle "europe"]
381+ uri = https://europe.example.com/<domain>/<org>/<repo>
382+
383+ [bundle "apac"]
384+ uri = https://apac.example.com/<domain>/<org>/<repo>
385+
386+ This "list of lists" is static and only changes if a bundle server is
387+ added or removed.
388+
389+ Each bundle server manages its own set of bundles. The initial bundle list
390+ contains only a single bundle, containing all of the objects received from
391+ cloning the repository from the origin server. The list uses the
392+ `creationToken` heuristic and a `creationToken` is made for the bundle
393+ based on the server's timestamp.
394+
395+ The bundle server runs regularly-scheduled updates for the bundle list,
396+ such as once a day. During this task, the server fetches the latest
397+ contents from the origin server and generates a bundle containing the
398+ objects reachable from the latest origin refs, but not contained in a
399+ previously-computed bundle. This bundle is added to the list, with care
400+ that the `creationToken` is strictly greater than the previous maximum
401+ `creationToken`.
402+
403+ When the bundle list grows too large, say more than 30 bundles, then the
404+ oldest "_N_ minus 30" bundles are combined into a single bundle. This
405+ bundle's `creationToken` is equal to the maximum `creationToken` among the
406+ merged bundles.
407+
408+ An example bundle list is provided here, although it only has two daily
409+ bundles and not a full list of 30:
410+
411+ [bundle]
412+ version = 1
413+ mode = all
414+ heuristic = creationToken
415+
416+ [bundle "2022-02-13-1644770820-daily"]
417+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644770820-daily.bundle
418+ creationToken = 1644770820
419+
420+ [bundle "2022-02-09-1644442601-daily"]
421+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-09-1644442601-daily.bundle
422+ creationToken = 1644442601
423+
424+ [bundle "2022-02-02-1643842562"]
425+ uri = https://eastus.example.com/<domain>/<org>/<repo>/2022-02-02-1643842562.bundle
426+ creationToken = 1643842562
427+
428+ To avoid storing and serving object data in perpetuity despite becoming
429+ unreachable in the origin server, this bundle merge can be more careful.
430+ Instead of taking an absolute union of the old bundles, instead the bundle
431+ can be created by looking at the newer bundles and ensuring that their
432+ necessary commits are all available in this merged bundle (or in another
433+ one of the newer bundles). This allows "expiring" object data that is not
434+ being used by new commits in this window of time. That data could be
435+ reintroduced by a later push.
436+
437+ The intention of this data organization has two main goals. First, initial
438+ clones of the repository become faster by downloading precomputed object
439+ data from a closer source. Second, `git fetch` commands can be faster,
440+ especially if the client has not fetched for a few days. However, if a
441+ client does not fetch for 30 days, then the bundle list organization would
442+ cause redownloading a large amount of object data.
443+
444+ One way to make this organization more useful to users who fetch frequently
445+ is to have more frequent bundle creation. For example, bundles could be
446+ created every hour, and then once a day those "hourly" bundles could be
447+ merged into a "daily" bundle. The daily bundles are merged into the
448+ oldest bundle after 30 days.
449+
450+ It is recommened that this bundle strategy is repeated with the `blob:none`
451+ filter if clients of this repository are expecting to use blobless partial
452+ clones. This list of blobless bundles stays in the same list as the full
453+ bundles, but uses the `bundle.<id>.filter` key to separate the two groups.
454+ For very large repositories, the bundle provider may want to _only_ provide
455+ blobless bundles.
456+
352457Implementation Plan
353458-------------------
354459
0 commit comments