Skip to content

Commit f685f66

Browse files
committed
fix(firestore-bigquery-export): disable backfill temporarily (#2005)
* fix(firestore-bigquery-export): disable onInstall backfill * chore(firestore-bigquery-export): increment CHANGELOG and version
1 parent 8c0ce7c commit f685f66

File tree

7 files changed

+3203
-2073
lines changed

7 files changed

+3203
-2073
lines changed

firestore-bigquery-export/CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
## Version 0.1.47
2+
3+
fix - temporarily disable backfill feature
4+
15
## Version 0.1.46
26

37
feature - add the ability to select Firestore database instance

firestore-bigquery-export/README.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -158,14 +158,8 @@ essential for the script to insert data into an already partitioned table.)
158158

159159
* Exclude old data payloads: If enabled, table rows will never contain old data (document snapshot before the update), which should be more performant, and avoid potential resource limitations.
160160

161-
* Import existing Firestore documents into BigQuery?: Do you want to import existing documents from your Firestore collection into BigQuery? These documents will have each have a special changelog with the operation of `IMPORT` and the timestamp of epoch. This ensures that any operation on an imported document supersedes the import record.
162-
163-
* Existing Documents Collection: Specify the path of the Cloud Firestore Collection you would like to import from. This may or may not be the same Collection for which you plan to mirror changes. If you want to use a collectionGroup query, provide the collection name value here, and set 'Use Collection Group query' to true. You may use `{wildcard}` notation with an enabled collectionGroup query to match a subcollection of all documents in a collection (e.g., `chatrooms/{chatid}/posts`).
164-
165161
* Use Collection Group query: Do you want to use a [collection group](https://firebase.google.com/docs/firestore/query-data/queries#collection-group-query) query for importing existing documents? You have to enable collectionGroup query if your import path contains subcollections. Warning: A collectionGroup query will target every collection in your Firestore project that matches the 'Existing documents collection'. For example, if you have 10,000 documents with a subcollection named: landmarks, this will query every document in 10,000 landmarks collections.
166162

167-
* Docs per backfill: When importing existing documents, how many should be imported at once? The default value of 200 should be ok for most users. If you are using a transform function or have very large documents, you may need to set this to a lower number. If the lifecycle event function times out, lower this value.
168-
169163
* Cloud KMS key name: Instead of Google managing the key encryption keys that protect your data, you control and manage key encryption keys in Cloud KMS. If this parameter is set, the extension will specify the KMS key name when creating the BQ table. See the PREINSTALL.md for more details.
170164

171165

@@ -174,7 +168,7 @@ essential for the script to insert data into an already partitioned table.)
174168

175169
* **fsexportbigquery:** Listens for document changes in your specified Cloud Firestore collection, then exports the changes into BigQuery.
176170

177-
* **fsimportexistingdocs:** Imports exisitng documents from the specified collection into BigQuery. Imported documents will have a special changelog with the operation of `IMPORT` and the timestamp of epoch.
171+
* **fsimportexistingdocs:** Imports existing documents from the specified collection into BigQuery. Imported documents will have a special changelog with the operation of `IMPORT` and the timestamp of epoch.
178172

179173
* **syncBigQuery:** A task-triggered function that gets called on BigQuery sync
180174

firestore-bigquery-export/extension.yaml

Lines changed: 50 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
name: firestore-bigquery-export
16-
version: 0.1.46
16+
version: 0.1.47
1717
specVersion: v1beta
1818

1919
displayName: Stream Firestore to BigQuery
@@ -63,7 +63,7 @@ resources:
6363
- name: fsimportexistingdocs
6464
type: firebaseextensions.v1beta.function
6565
description:
66-
Imports exisitng documents from the specified collection into BigQuery.
66+
Imports existing documents from the specified collection into BigQuery.
6767
Imported documents will have a special changelog with the operation of
6868
`IMPORT` and the timestamp of epoch.
6969
properties:
@@ -405,39 +405,39 @@ params:
405405
- label: No
406406
value: no
407407

408-
- param: DO_BACKFILL
409-
label: Import existing Firestore documents into BigQuery?
410-
description: >-
411-
Do you want to import existing documents from your Firestore collection
412-
into BigQuery? These documents will have each have a special changelog
413-
with the operation of `IMPORT` and the timestamp of epoch. This ensures
414-
that any operation on an imported document supersedes the import record.
415-
type: select
416-
required: true
417-
default: no
418-
options:
419-
- label: Yes
420-
value: yes
421-
- label: No
422-
value: no
423-
424-
- param: IMPORT_COLLECTION_PATH
425-
label: Existing Documents Collection
426-
description: >-
427-
Specify the path of the Cloud Firestore Collection you would like to
428-
import from. This may or may not be the same Collection for which you plan
429-
to mirror changes. If you want to use a collectionGroup query, provide the
430-
collection name value here, and set 'Use Collection Group query' to true.
431-
You may use `{wildcard}` notation with an enabled collectionGroup query to
432-
match a subcollection of all documents in a collection (e.g.,
433-
`chatrooms/{chatid}/posts`).
434-
type: string
435-
validationRegex: "^[^/]+(/[^/]+/[^/]+)*$"
436-
validationErrorMessage:
437-
Firestore collection paths must be an odd number of segments separated by
438-
slashes, e.g. "path/to/collection".
439-
example: posts
440-
required: false
408+
# - param: DO_BACKFILL
409+
# label: Import existing Firestore documents into BigQuery?
410+
# description: >-
411+
# Do you want to import existing documents from your Firestore collection
412+
# into BigQuery? These documents will have each have a special changelog
413+
# with the operation of `IMPORT` and the timestamp of epoch. This ensures
414+
# that any operation on an imported document supersedes the import record.
415+
# type: select
416+
# required: true
417+
# default: no
418+
# options:
419+
# - label: Yes
420+
# value: yes
421+
# - label: No
422+
# value: no
423+
424+
# - param: IMPORT_COLLECTION_PATH
425+
# label: Existing Documents Collection
426+
# description: >-
427+
# Specify the path of the Cloud Firestore Collection you would like to
428+
# import from. This may or may not be the same Collection for which you plan
429+
# to mirror changes. If you want to use a collectionGroup query, provide the
430+
# collection name value here, and set 'Use Collection Group query' to true.
431+
# You may use `{wildcard}` notation with an enabled collectionGroup query to
432+
# match a subcollection of all documents in a collection (e.g.,
433+
# `chatrooms/{chatid}/posts`).
434+
# type: string
435+
# validationRegex: "^[^/]+(/[^/]+/[^/]+)*$"
436+
# validationErrorMessage:
437+
# Firestore collection paths must be an odd number of segments separated by
438+
# slashes, e.g. "path/to/collection".
439+
# example: posts
440+
# required: false
441441

442442
- param: USE_COLLECTION_GROUP_QUERY
443443
label: Use Collection Group query
@@ -458,20 +458,20 @@ params:
458458
- label: No
459459
value: no
460460

461-
- param: DOCS_PER_BACKFILL
462-
label: Docs per backfill
463-
description: >-
464-
When importing existing documents, how many should be imported at once?
465-
The default value of 200 should be ok for most users. If you are using a
466-
transform function or have very large documents, you may need to set this
467-
to a lower number. If the lifecycle event function times out, lower this
468-
value.
469-
type: string
470-
example: 200
471-
validationRegex: "^[1-9][0-9]*$"
472-
validationErrorMessage: Must be a postive integer.
473-
default: 200
474-
required: true
461+
# - param: DOCS_PER_BACKFILL
462+
# label: Docs per backfill
463+
# description: >-
464+
# When importing existing documents, how many should be imported at once?
465+
# The default value of 200 should be ok for most users. If you are using a
466+
# transform function or have very large documents, you may need to set this
467+
# to a lower number. If the lifecycle event function times out, lower this
468+
# value.
469+
# type: string
470+
# example: 200
471+
# validationRegex: "^[1-9][0-9]*$"
472+
# validationErrorMessage: Must be a postive integer.
473+
# default: 200
474+
# required: true
475475

476476
- param: KMS_KEY_NAME
477477
label: Cloud KMS key name
@@ -513,8 +513,7 @@ events:
513513
lifecycleEvents:
514514
onInstall:
515515
function: initBigQuerySync
516-
processingMessage:
517-
Configuring BigQuery Sync and running import if configured.
516+
processingMessage: Configuring BigQuery Sync.
518517
onUpdate:
519518
function: setupBigQuerySync
520519
processingMessage: Configuring BigQuery Sync

firestore-bigquery-export/firestore-bigquery-change-tracker/package-lock.json

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

firestore-bigquery-export/functions/src/index.ts

Lines changed: 67 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ export const initBigQuerySync = functions.tasks
190190
await eventTracker.initialize();
191191

192192
/** Run Backfill */
193-
if (config.doBackfill) {
193+
if (false) {
194194
await getFunctions()
195195
.taskQueue(
196196
`locations/${config.location}/functions/fsimportexistingdocs`,
@@ -207,69 +207,69 @@ export const initBigQuerySync = functions.tasks
207207
return;
208208
});
209209

210-
exports.fsimportexistingdocs = functions.tasks
211-
.taskQueue()
212-
.onDispatch(async (data, context) => {
213-
const runtime = getExtensions().runtime();
214-
if (!config.doBackfill || !config.importCollectionPath) {
215-
await runtime.setProcessingState(
216-
"PROCESSING_COMPLETE",
217-
"Completed. No existing documents imported into BigQuery."
218-
);
219-
return;
220-
}
221-
222-
const offset = (data["offset"] as number) ?? 0;
223-
const docsCount = (data["docsCount"] as number) ?? 0;
224-
225-
const query = config.useCollectionGroupQuery
226-
? getFirestore(config.databaseId).collectionGroup(
227-
config.importCollectionPath.split("/")[
228-
config.importCollectionPath.split("/").length - 1
229-
]
230-
)
231-
: getFirestore(config.databaseId).collection(config.importCollectionPath);
232-
233-
const snapshot = await query
234-
.offset(offset)
235-
.limit(config.docsPerBackfill)
236-
.get();
237-
238-
const rows = snapshot.docs.map((d) => {
239-
return {
240-
timestamp: new Date().toISOString(),
241-
operation: ChangeType.IMPORT,
242-
documentName: `projects/${config.bqProjectId}/databases/(default)/documents/${d.ref.path}`,
243-
documentId: d.id,
244-
eventId: "",
245-
pathParams: resolveWildcardIds(config.importCollectionPath, d.ref.path),
246-
data: eventTracker.serializeData(d.data()),
247-
};
248-
});
249-
try {
250-
await eventTracker.record(rows);
251-
} catch (err: any) {
252-
/** If configured, event tracker wil handle failed rows in a backup collection */
253-
functions.logger.log(err);
254-
}
255-
if (rows.length == config.docsPerBackfill) {
256-
// There are more documents to import - enqueue another task to continue the backfill.
257-
const queue = getFunctions().taskQueue(
258-
`locations/${config.location}/functions/fsimportexistingdocs`,
259-
config.instanceId
260-
);
261-
await queue.enqueue({
262-
offset: offset + config.docsPerBackfill,
263-
docsCount: docsCount + rows.length,
264-
});
265-
} else {
266-
// We are finished, set the processing state to report back how many docs were imported.
267-
runtime.setProcessingState(
268-
"PROCESSING_COMPLETE",
269-
`Successfully imported ${
270-
docsCount + rows.length
271-
} documents into BigQuery`
272-
);
273-
}
274-
await events.recordCompletionEvent({ context });
275-
});
210+
// exports.fsimportexistingdocs = functions.tasks
211+
// .taskQueue()
212+
// .onDispatch(async (data, context) => {
213+
// const runtime = getExtensions().runtime();
214+
// if (!config.doBackfill || !config.importCollectionPath) {
215+
// await runtime.setProcessingState(
216+
// "PROCESSING_COMPLETE",
217+
// "Completed. No existing documents imported into BigQuery."
218+
// );
219+
// return;
220+
// }
221+
222+
// const offset = (data["offset"] as number) ?? 0;
223+
// const docsCount = (data["docsCount"] as number) ?? 0;
224+
225+
// const query = config.useCollectionGroupQuery
226+
// ? getFirestore(config.databaseId).collectionGroup(
227+
// config.importCollectionPath.split("/")[
228+
// config.importCollectionPath.split("/").length - 1
229+
// ]
230+
// )
231+
// : getFirestore(config.databaseId).collection(config.importCollectionPath);
232+
233+
// const snapshot = await query
234+
// .offset(offset)
235+
// .limit(config.docsPerBackfill)
236+
// .get();
237+
238+
// const rows = snapshot.docs.map((d) => {
239+
// return {
240+
// timestamp: new Date().toISOString(),
241+
// operation: ChangeType.IMPORT,
242+
// documentName: `projects/${config.bqProjectId}/databases/(default)/documents/${d.ref.path}`,
243+
// documentId: d.id,
244+
// eventId: "",
245+
// pathParams: resolveWildcardIds(config.importCollectionPath, d.ref.path),
246+
// data: eventTracker.serializeData(d.data()),
247+
// };
248+
// });
249+
// try {
250+
// await eventTracker.record(rows);
251+
// } catch (err: any) {
252+
// /** If configured, event tracker wil handle failed rows in a backup collection */
253+
// functions.logger.log(err);
254+
// }
255+
// if (rows.length == config.docsPerBackfill) {
256+
// // There are more documents to import - enqueue another task to continue the backfill.
257+
// const queue = getFunctions().taskQueue(
258+
// `locations/${config.location}/functions/fsimportexistingdocs`,
259+
// config.instanceId
260+
// );
261+
// await queue.enqueue({
262+
// offset: offset + config.docsPerBackfill,
263+
// docsCount: docsCount + rows.length,
264+
// });
265+
// } else {
266+
// // We are finished, set the processing state to report back how many docs were imported.
267+
// runtime.setProcessingState(
268+
// "PROCESSING_COMPLETE",
269+
// `Successfully imported ${
270+
// docsCount + rows.length
271+
// } documents into BigQuery`
272+
// );
273+
// }
274+
// await events.recordCompletionEvent({ context });
275+
// });

0 commit comments

Comments
 (0)