diff --git a/bin/index.js b/bin/index.js index fadd780..ef5cd0f 100755 --- a/bin/index.js +++ b/bin/index.js @@ -70,6 +70,11 @@ new Command() '-t, --turn-off-link-shortening', "Use the full puml server link instead of the tiny url, if your diagrams are too big this won't work", ) + .addOption(new Option( + '-l, --polr-url ', + "Use polr shortener instead tinyurl - optionally specify polr api key in POLR_APIKEY env var", + ) + ) .action(opts => { const useDefaultGitignorePath = !opts.gitignorePath @@ -81,6 +86,11 @@ new Command() opts.respectGitignore = !opts.ignoreGitignore opts.imageFormats = opts.imageFormats === 'both' ? ['png', 'svg'] : [opts.imageFormats] + if (opts.shouldShortenLinks == false && opts.polrUrl != null) { + console.error("Specifying polr shortener url but disabling shortening not allowed - please specify only one of these two options") + return + } + // If a gitignore path wasn't specified, don't try and parse it if (useDefaultGitignorePath && !fs.existsSync(opts.gitignorePath)) { opts.respectGitignore = false diff --git a/dist_puml/puml/level_1_system_view.png b/dist_puml/puml/level_1_system_view.png index 77fc40d..9fb4fd3 100644 Binary files a/dist_puml/puml/level_1_system_view.png and b/dist_puml/puml/level_1_system_view.png differ diff --git a/dist_puml/puml/level_1_system_view.svg b/dist_puml/puml/level_1_system_view.svg new file mode 100644 index 0000000..6e7a99a --- /dev/null +++ b/dist_puml/puml/level_1_system_view.svg @@ -0,0 +1,2019 @@ + My Project Name - System View The goals of my project are described here     Labelers[Human Resources]Labelers B  Labeler Type 2 Labelers A  Labeler Type 2 Labeling System Videos  Users and internal customersModel Training + Inference    Trigger samplesfor labeling Store labelingresults Send samples tolabelers Send labelsSend model scoresSend labels \ No newline at end of file diff --git a/dist_puml/puml/level_2_container_view.png b/dist_puml/puml/level_2_container_view.png index 1497002..c9f9437 100644 Binary files a/dist_puml/puml/level_2_container_view.png and b/dist_puml/puml/level_2_container_view.png differ diff --git a/dist_puml/puml/level_2_container_view.svg b/dist_puml/puml/level_2_container_view.svg index 02aac65..9feb959 100644 --- a/dist_puml/puml/level_2_container_view.svg +++ b/dist_puml/puml/level_2_container_view.svg @@ -1,32 +1,32 @@ - My Project Name - Container View ● Goal 1● Goal 2● Goal 3● Goal 4      Label System FooBarBaz[System]Pipeline[some-etl-framework] ETL job for sampling / labelingLabelRetrievalJob[some-etl-framework] DAG for retrieving and storinglabel data from labelers.some_other_label_db[BigTable] Stores all samples which havebeen labeled. type_y_samples_{date}[BigQueryDailyTable] A days worth of samples collectedduring single run of the pipeline. SampleQueueFoo[SQS] Samples which should be doublereviewedReviewQueueAPI[NodeJS] API for pulling samples for reviewExportBigQuerySnapshot[some-etl-framework] A job to take daily BQ snapshotsofsome_other_label_dbLabeler A  Labeler-A Description LabelADb[BigQuery] A database where type A labelsare storedModelScoreDB[BigQuery] A database where model scoresare storedLabeler B  Labeler-B Description labeled_datasource_bucket[GCS] Where the labelers write theirlabeling resultsModel Training + Inference                 Label System FooBarBaz[System]Pipeline[some-etl-framework] ETL job for sampling / labelingLabelRetrievalJob[some-etl-framework] DAG for retrieving and storinglabel data from labelers.some_other_label_db[BigTable] Stores all samples which havebeen labeled. type_y_samples_{date}[BigQueryDailyTable] A days worth of samples collectedduring single run of the pipeline. SampleQueueFoo[SQS] Samples which should be doublereviewedReviewQueueAPI[NodeJS] API for pulling samples for reviewExportBigQuerySnapshot[some-etl-framework] A job to take daily BQ snapshotsof some_other_label_dbLabeler A       Labeler-A Description LabelADb[BigQuery] A database where type A labelsare storedModelScoreDB[BigQuery] A database where model scoresare storedLabeler B       Labeler-B Description labeled_datasource_bucket[GCS] Where the labelers write theirlabeling resultsModel Training + Inference                Scheduler[some-etl-framework] Kicks off the job at some intervalDatasourceExtractTransform[Operator] Pull data from labeler datasourceand perform any transformsbefore storing the dataDatasourceLoader[Operator] Update some_other_label_db withnew labelsSomeMicroService[SubDag] Foo bar bazPerformCalculation[Operator] Trigger the calculation of xyzCalculatePrevalence[Operator] Trigger the calculation prevalenceand store itsome_other_label_db[BigTable] Stores all samples which havebeen labeled.   When all labelsretrieved       \ No newline at end of file diff --git a/dist_puml/puml/level_3_component_view_pipeline.png b/dist_puml/puml/level_3_component_view_pipeline.png index 502cd20..953d12c 100644 Binary files a/dist_puml/puml/level_3_component_view_pipeline.png and b/dist_puml/puml/level_3_component_view_pipeline.png differ diff --git a/dist_puml/puml/level_3_component_view_pipeline.svg b/dist_puml/puml/level_3_component_view_pipeline.svg new file mode 100644 index 0000000..de058fb --- /dev/null +++ b/dist_puml/puml/level_3_component_view_pipeline.svg @@ -0,0 +1,2269 @@ + The Pipeline - Component View ● Goal 1● Goal 2● Goal 3● Goal 4      Entire Pipeline Infrastructure Pipeline DAGsome_other_label_db[BigTable] Stores all samples which havebeen labeled. ReviewQueueAPI[NodeJS] An API used for pulling labelstype_y_samples_{date}[BigQueryDailyTable] A days worth of samples collectedduring single run of the pipeline. SamplesForLabelingQueue[SQS] Queue for double reviewScheduler[some-etl-framework] Kicks off pipeline at some intervalWait For DependentTables[worker] Waits for dependent tables to bepopulatedLabel Type A Sampler[worker] Queries for type A labelsLabel Type B Sampler[worker] Receives all sampling labels typeBSendSamplesForLabeling[worker] Send samples for labelingLabeler B   Labeler B Description labeled_datasource_bucket[GCS] Where the labelers write theirlabeling resultsModelScoreDB[BigQuery] A database where the inferencescore is stored Query used tocreate table tosample from  Check what sampleshave already beensent for labeling****         \ No newline at end of file diff --git a/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.png b/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.png new file mode 100644 index 0000000..501b8dd Binary files /dev/null and b/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.png differ diff --git a/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.svg b/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.svg new file mode 100644 index 0000000..f6114d0 --- /dev/null +++ b/dist_puml/puml/level_4_activity_diagram_export_bq_snapshot_job.svg @@ -0,0 +1,2111 @@ +  Under construction...   \ No newline at end of file diff --git a/dist_puml/puml/level_4_activity_diagram_sampler_a.png b/dist_puml/puml/level_4_activity_diagram_sampler_a.png index 2d877d1..1db1765 100644 Binary files a/dist_puml/puml/level_4_activity_diagram_sampler_a.png and b/dist_puml/puml/level_4_activity_diagram_sampler_a.png differ diff --git a/dist_puml/puml/level_4_activity_diagram_sampler_a.svg b/dist_puml/puml/level_4_activity_diagram_sampler_a.svg index b93b0e2..307e879 100644 --- a/dist_puml/puml/level_4_activity_diagram_sampler_a.svg +++ b/dist_puml/puml/level_4_activity_diagram_sampler_a.svg @@ -1,4 +1,4 @@ - Sampler A - Activity Diagram   Inputs  sample_padding_factor: Description ofsample_padding_factor  sample_strategies: Description ofsample_strategies  type_x_sample_pool: A subquery from which to query samples, should be joined to anycolumns that you want to include in the type_y_samples_{date} table operatorOperatorABCREMOVEsamples of type x fromtype_x_sample_pool i.e. SELECT * FROM type_x_sample_pool EXCEPT DISTINCT (SELECT * FROM type_y_samples_{date})Create sub-query table LIMIT=sampling_strategy.sample_size * sample_padding_factor SELECT sampling_strategy.name AS strategy(to add strategy column) NEXT: sampling_strategyFOR EACH:sampling_strategy UNION ALLstrategy tables DEDUPLICATEsamples by adding the"strategies" column. For each duplicate add the strategy to the strategiescolumn in the distinct row.i.e.  SELECT ARRAY_AGG(strategy) as strategies... FROM all_strategies_table GROUP BY uid SAVEsubquery todistinct_type_x_sample_pooltable Outputs  distinct_type_x_sample_pool: Description ofdistinct_type_x_sample_pooloutput  operatorOperatorDEFNote:More details on algorithm here  SETvariable...extra_space=COUNT(*) of type_y_samples_{date} WHERE send_for_labeling = TRUE samples_for_strategy = SELECT * WHERE sampling_strategy.name IN strategies ORDER BY len(strategies) ASC WHILE len(samples_for_strategy) > sampling_strategy.sample_size:   bla bla bla  code code code FOR sample IN samples_for_strategy   1234  5678 IS len(strategy_samples) less than strategy.sample_size ? Yes  extra_space += sampling_strategy.sample_size - len(strategy_samples) No   Do nothing NEXT: sampling_strategyFOR EACH:sampling_strategy REALLOCATE extra_spaceto increase the sample sizes of strategies whose size was reduced WRITEoutput tosome_other_label_dbtable      table 1BigQuery Description for table 1id:stringsome_id:string: Description for some_idsome_id_2:string: Description for some_id_2config:JSON: Description for configtable 2BigQuery Description for table 2id_1:stringid_2:stringlabel:string: Description for labelscheduled_at:timestamp: Description forscheduled_atfoo_bar_ts:timestamp: Description forfoo_bar_tsbaz_ts:timestamp: Description for baz_tsabc_ts:timestamp: Description for abc_tstable 3BigQuery Description for table 3id:stringthingy_id:string: Description for thingy_idconfig:JSON: Configuration usedtype_y_samples_{date}BigQueryDailyTable A days worth of samples collected during singlerun of the pipeline. some_id:stringstrategy:string: Description of strategysend_for_labeling:boolean: Description ofsend for labelingMetadata Columnsmodel_score:numbernum_views:numberis_awesome:boolean...table 4BigQuery Description for table 4id:stringname:string: Description of namevalue:string: Description of valuelabeler_id:string: Description of labeler_idsource:string: Description of sourcesome_other_field:string: Description ofsome_other_fieldtable 1BigQuery Description for table 1id: stringsome_id: string : Description for some_idsome_id_2: string : Description for some_id_2config:  JSON : Description for configtable 2BigQuery Description for table 2id_1: stringid_2: stringlabel: string : Description for labelscheduled_at: timestamp : Description forscheduled_atfoo_bar_ts: timestamp : Description forfoo_bar_tsbaz_ts: timestamp : Description for baz_tsabc_ts: timestamp : Description for abc_tstable 3BigQuery Description for table 3id: stringthingy_id: string : Description for thingy_idconfig:  JSON : Configuration usedtype_y_samples_{date}BigQueryDailyTable A days worth of samples collected during singlerun of the pipeline. some_id: stringstrategy: string : Description of strategysend_for_labeling: boolean : Description ofsend for labelingMetadata Columnsmodel_score: numbernum_views: numberis_awesome: boolean...table 4BigQuery Description for table 4id: stringname: string  : Description of namevalue: string  : Description of valuelabeler_id: string : Description of labeler_idsource: string : Description of sourcesome_other_field: string : Description ofsome_other_field