Skip to content

Commit 8862727

Browse files
authored
feat(api): dataset fields statistics (#1360)
Solves apify/apify-core#18807 - proposal for new API endpoint `/v2/datasets/{datasetId}/field-statistics` which should return [dataset field statistics](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation#dataset-field-statistics)
1 parent 9c49b84 commit 8862727

File tree

7 files changed

+83
-1
lines changed

7 files changed

+83
-1
lines changed
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
title: DatasetFieldStatistics
2+
type: object
3+
properties:
4+
min:
5+
type: number
6+
description: 'Minimum value of the field. For numbers, this is calculated directly. For strings, this is the length of the shortest string. For arrays, this is the length of the shortest array. For objects, this is the number of keys in the smallest object.'
7+
nullable: true
8+
max:
9+
type: number
10+
description: 'Maximum value of the field. For numbers, this is calculated directly. For strings, this is the length of the longest string. For arrays, this is the length of the longest array. For objects, this is the number of keys in the largest object.'
11+
nullable: true
12+
nullCount:
13+
type: number
14+
description: 'How many items in the dataset have a null value for this field.'
15+
nullable: true
16+
emptyCount:
17+
type: number
18+
description: 'How many items in the dataset are `undefined`, meaning that for example empty string is not considered empty.'
19+
nullable: true
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
title: GetDatasetStatisticsResponse
2+
required:
3+
- data
4+
type: object
5+
properties:
6+
data:
7+
type: object
8+
properties:
9+
fieldStatistics:
10+
type: object
11+
nullable: true
12+
additionalProperties:
13+
$ref: ./DatasetFieldStatistics.yaml
14+
description: 'When you configure the dataset [fields schema](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation), we measure the statistics such as `min`, `max`, `nullCount` and `emptyCount` for each field.
15+
This property provides statistics for each field from dataset fields schema.
16+
<br/></br>See dataset field statistics [documentation](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation#dataset-field-statistics) for more information.'

apify-api/openapi/components/tags.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -762,6 +762,10 @@
762762
x-legacy-doc-urls:
763763
- '#/reference/datasets/item-collection'
764764
x-trait: 'true'
765+
- name: Datasets/Statistics
766+
x-displayName: Statistics
767+
x-parent-tag-name: Datasets
768+
x-trait: 'true'
765769
- name: Request queues
766770
x-displayName: Request queues
767771
x-legacy-doc-urls:

apify-api/openapi/components/x-tag-groups.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
- Datasets/Dataset collection
6464
- Datasets/Dataset
6565
- Datasets/Item collection
66+
- Datasets/Statistics
6667
- name: Request queues
6768
tags:
6869
- Request queues

apify-api/openapi/openapi.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,8 @@ paths:
568568
$ref: 'paths/datasets/datasets@{datasetId}.yaml'
569569
'/v2/datasets/{datasetId}/items':
570570
$ref: 'paths/datasets/datasets@{datasetId}@items.yaml'
571+
'/v2/datasets/{datasetId}/statistics':
572+
$ref: 'paths/datasets/datasets@{datasetId}@statistics.yaml'
571573
/v2/request-queues:
572574
$ref: paths/request-queues/request-queues.yaml
573575
'/v2/request-queues/{queueId}':
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
get:
2+
tags:
3+
- Datasets/Statistics
4+
summary: Get dataset statistics
5+
description: |
6+
Returns statistics for given dataset.
7+
Currently provides only [field statistics](https://docs.apify.com/platform/actors/development/actor-definition/dataset-schema/validation#dataset-field-statistics).
8+
9+
operationId: dataset_statistics_get
10+
parameters:
11+
- name: datasetId
12+
in: path
13+
description: Dataset ID or `username~dataset-name`.
14+
required: true
15+
style: simple
16+
schema:
17+
type: string
18+
example: WkzbQMuFYuamGv3YF
19+
responses:
20+
'200':
21+
description: ''
22+
content:
23+
application/json:
24+
schema:
25+
$ref: "../../components/schemas/datasets/GetDatasetStatisticsResponse.yaml"
26+
example:
27+
data:
28+
fieldStatistics:
29+
name:
30+
nullCount: 122
31+
price:
32+
min: 59
33+
max: 89
34+
# TODO: add clients methods
35+
# x-js-parent: DatasetClient
36+
# x-js-name: statistics
37+
# x-js-doc-url: https://docs.apify.com/api/client/js/reference/class/DatasetClient#statistics
38+
# x-py-parent: DatasetClientAsync
39+
# x-py-name: statistics
40+
# x-py-doc-url: https://docs.apify.com/api/client/python/reference/class/DatasetClientAsync#statistics

package-lock.json

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)