Skip to content

Commit d664184

Browse files
mohamedelabbas1996mihowannavik
authored
Framework for exporting data (#725)
* feat: added celery export occurrence task * feat: added export & export_status endpoints * added migration files * fixed migration conflict * fix: disabled pagination for export action * fix: merged migrations * feat: added DataExport Job Type * Implemented JSON export for occurrence data * feat: Added support for csv file format * chore: Moved export actions to a separate view under the exports app * chore: ignore unresolvable type errors * chore: remove dependencies for darwincore export in this PR * fix: use mixin for get_active_project * feat: register export views in api router * feat: Implemented Data Export Framework & Occurrence Exports - Designed a structured framework for data exports. - Integrated export registry for modular and extensible export formats. - Implemented occurrences export for simple JSON and CSV formats. * feat: Added more fields to the OccurrenceTabularSerializer * Refactor DataExport Model and API & Admin Integration - Added 'project' relationship to DataExport model. - Refactored 'status' and 'file_url' in DataExport to be computed properties from the associated Job. - Updated DataExportViewSet.get_queryset() to filter by active project and optimize queries with select_related('job'). - Updated DataExportSerializer to include nested Job details instead of separate job-related fields. - Removed redundant retrieve and list methods from DataExportViewSet, relying on DRF's default behavior. - Added DataExportAdmin with list display, filtering, and a new action to manually trigger an export job. - Squashed migrations to consolidate schema changes and reduce migration overhead. * Removed DataExport status field * chore: Raise NotImplemented for abstract methods * Brought back DataExport file_url field * Refactor Data Export: Improve Filtering, Naming, and JSON Validity - Removed filtering options (date start, date end, taxon) - Added filtering option for collection - Simplified job naming to match other job formats: - Assigned all export jobs a shared type - Enhanced job stage information to provide more useful details - Updated job stage labels to 'Sentence case' - Embedded export object details in job responses - Allowed API users to specify all export settings in the request body - Implemented automatic file deletion when an export is deleted - Included 'created_at' and 'updated_at' fields in exports list response - Enabled sorting by ID, format, and created_at, updated_at in the export list endpoint - Included job name in export responses for better context - Fixed JSON export issues: - Ensured JSON is valid (single root element, no multiple arrays) - Flattened JSON structure - Fixed serialization error ('set' object is not JSON serializable) - Added verification information to the csv format * fix: Added missing migration file * fix: Added missing migration file * fix: tweak labels to be sentence case * fix: update CSV export field from verification -> verification_status * Improve DataExport handling, filtering, and cleanup logic - Added support for `project` as a writeable field in DataExport serializer - Made `collection` filter optional and included collection name in responses - Connected related SourceImageCollection to job if collection filter is applied - Included unit test to verify file deletion on export removal - Switched to using serializer for DataExport creation with validation - Fixed sorting in exports endpoint * test: multiple methods of nesting related obj data for exports * feat: return absolute urls for export files * Refactor Export Logic and Add Export Stats - Moved export logic to run_export() for better encapsulation. - Added file_size and record_count fields to DataExport for tracking export statistics. - Added unit tests to ensure the number of exported records matches the number of occurrences in the collection for both CSV and JSON formats. * Enhance Export Details - Added 'Number of records exported' as a stage param to track the number of records during export. - Introduced filters_display field in the DataExport model to precompute and optimize display-friendly filters, reducing unnecessary queries. - Returned the raw file_size value in the API response to enable sorting, and used Django's filesizeformat to provide a more readable file size format. * fix: make summary count consistent with exports * feat: update and return total record count before starting export * feat: update total record count before exporting first batch * feat: lower batch size for exports to increase update frequency * chore: reset all migrations to main * chore: recreate migrations * chore: moved export format validation logic to the serializer * chore: changed collection filter param name to collection_id * chore: fix type hints --------- Co-authored-by: Michael Bunsen <notbot@gmail.com> Co-authored-by: Anna Viklund <annamariaviklund@gmail.com>
1 parent 78a0cb2 commit d664184

File tree

24 files changed

+1052
-9
lines changed

24 files changed

+1052
-9
lines changed

ami/base/permissions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def add_object_level_permissions(
6666
# Do not return create, view permissions at object-level
6767
filtered_permissions -= {"create", "view"}
6868
permissions.update(filtered_permissions)
69-
response_data["user_permissions"] = permissions
69+
response_data["user_permissions"] = list(permissions)
7070
return response_data
7171

7272

@@ -86,7 +86,7 @@ def add_collection_level_permissions(user: User | None, response_data: dict, mod
8686

8787
if user and project and f"create_{model.__name__.lower()}" in get_perms(user, project):
8888
permissions.add("create")
89-
response_data["user_permissions"] = permissions
89+
response_data["user_permissions"] = list(permissions)
9090
return response_data
9191

9292

ami/exports/__init__.py

Whitespace-only changes.

ami/exports/admin.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
from django.contrib import admin
2+
from django.http import HttpRequest
3+
4+
from .models import DataExport
5+
6+
7+
@admin.register(DataExport)
8+
class DataExportAdmin(admin.ModelAdmin):
9+
"""
10+
Admin panel for managing DataExport objects.
11+
"""
12+
13+
list_display = ("id", "user", "format", "status_display", "project", "created_at", "get_job")
14+
list_filter = ("format", "project")
15+
search_fields = ("user__username", "format", "project__name")
16+
readonly_fields = ("status_display", "file_url_display")
17+
18+
fieldsets = (
19+
(
20+
None,
21+
{
22+
"fields": ("user", "format", "project", "filters"),
23+
},
24+
),
25+
(
26+
"Job Info",
27+
{
28+
"fields": ("status_display", "file_url_display"),
29+
"classes": ("collapse",), # This makes job-related fields collapsible in the admin panel
30+
},
31+
),
32+
)
33+
34+
def get_queryset(self, request: HttpRequest):
35+
"""
36+
Optimize queryset by selecting related project and job data.
37+
"""
38+
return super().get_queryset(request).select_related("project", "job")
39+
40+
@admin.display(description="Status")
41+
def status_display(self, obj):
42+
return obj.status # Calls the @property from the model
43+
44+
@admin.display(description="File URL")
45+
def file_url_display(self, obj):
46+
return obj.file_url # Calls the @property from the model
47+
48+
@admin.display(description="Job ID")
49+
def get_job(self, obj):
50+
"""Displays the related job ID or 'No Job' if none exists."""
51+
return obj.job.id if obj.job else "No Job"
52+
53+
@admin.action(description="Run export job")
54+
def run_export_job(self, request: HttpRequest, queryset):
55+
"""
56+
Admin action to trigger the export job manually.
57+
"""
58+
for export in queryset:
59+
if export.job:
60+
export.job.enqueue()
61+
62+
self.message_user(request, f"Started export job for {queryset.count()} export(s).")
63+
64+
actions = [run_export_job]

ami/exports/apps.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from django.apps import AppConfig
2+
3+
4+
class ExportsConfig(AppConfig):
5+
default_auto_field = "django.db.models.BigAutoField"
6+
name = "ami.exports"
7+
8+
def ready(self):
9+
import ami.exports.signals # noqa: F401

ami/exports/base.py

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
import logging
2+
import os
3+
from abc import ABC, abstractmethod
4+
5+
from ami.exports.utils import apply_filters
6+
7+
logger = logging.getLogger(__name__)
8+
9+
10+
class BaseExporter(ABC):
11+
"""Base class for all data export handlers."""
12+
13+
file_format = "" # To be defined in child classes
14+
serializer_class = None
15+
filter_backends = []
16+
17+
def __init__(self, data_export):
18+
self.data_export = data_export
19+
self.job = data_export.job if hasattr(data_export, "job") else None
20+
self.project = data_export.project
21+
self.queryset = apply_filters(
22+
queryset=self.get_queryset(), filters=data_export.filters, filter_backends=self.get_filter_backends()
23+
)
24+
self.total_records = self.queryset.count()
25+
if self.job:
26+
self.job.progress.add_stage_param(self.job.job_type_key, "Number of records exported", 0)
27+
self.job.progress.add_stage_param(self.job.job_type_key, "Total records to export", self.total_records)
28+
self.job.save()
29+
30+
@abstractmethod
31+
def export(self):
32+
"""Perform the export process."""
33+
raise NotImplementedError()
34+
35+
@abstractmethod
36+
def get_queryset(self):
37+
raise NotImplementedError()
38+
39+
def get_serializer_class(self):
40+
return self.serializer_class
41+
42+
def get_filter_backends(self):
43+
from ami.main.api.views import OccurrenceCollectionFilter
44+
45+
return [OccurrenceCollectionFilter]
46+
47+
def update_export_stats(self, file_temp_path=None):
48+
"""
49+
Updates record_count based on queryset and file size after export.
50+
"""
51+
# Set record count from queryset
52+
self.data_export.record_count = self.queryset.count()
53+
54+
# Check if temp file path is provided and update file size
55+
56+
if file_temp_path and os.path.exists(file_temp_path):
57+
self.data_export.file_size = os.path.getsize(file_temp_path)
58+
59+
# Save the updated values
60+
self.data_export.save()
61+
62+
def update_job_progress(self, records_exported):
63+
"""
64+
Updates job progress and record count.
65+
"""
66+
if self.job:
67+
self.job.progress.update_stage(
68+
self.job.job_type_key, progress=round(records_exported / self.total_records, 2)
69+
)
70+
self.job.progress.add_or_update_stage_param(
71+
self.job.job_type_key, "Number of records exported", records_exported
72+
)
73+
self.job.save()

ami/exports/format_types.py

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
import csv
2+
import json
3+
import logging
4+
import tempfile
5+
6+
from django.core.serializers.json import DjangoJSONEncoder
7+
from rest_framework import serializers
8+
9+
from ami.exports.base import BaseExporter
10+
from ami.exports.utils import get_data_in_batches
11+
from ami.main.models import Occurrence
12+
13+
logger = logging.getLogger(__name__)
14+
15+
16+
def get_export_serializer():
17+
from ami.main.api.serializers import OccurrenceSerializer
18+
19+
class OccurrenceExportSerializer(OccurrenceSerializer):
20+
detection_images = serializers.SerializerMethodField()
21+
22+
def get_detection_images(self, obj: Occurrence):
23+
"""Convert the generator field to a list before serialization"""
24+
if hasattr(obj, "detection_images") and callable(obj.detection_images):
25+
return list(obj.detection_images()) # Convert generator to list
26+
return []
27+
28+
def get_permissions(self, instance_data):
29+
return instance_data
30+
31+
def to_representation(self, instance):
32+
return serializers.HyperlinkedModelSerializer.to_representation(self, instance)
33+
34+
return OccurrenceExportSerializer
35+
36+
37+
class JSONExporter(BaseExporter):
38+
"""Handles JSON export of occurrences."""
39+
40+
file_format = "json"
41+
42+
def get_serializer_class(self):
43+
return get_export_serializer()
44+
45+
def get_queryset(self):
46+
return (
47+
Occurrence.objects.filter(project=self.project)
48+
.select_related(
49+
"determination",
50+
"deployment",
51+
"event",
52+
)
53+
.with_timestamps() # type: ignore[union-attr] Custom queryset method
54+
.with_detections_count()
55+
.with_identifications()
56+
)
57+
58+
def export(self):
59+
"""Exports occurrences to JSON format."""
60+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".json", mode="w", encoding="utf-8")
61+
with open(temp_file.name, "w", encoding="utf-8") as f:
62+
first = True
63+
f.write("[")
64+
records_exported = 0
65+
for i, batch in enumerate(get_data_in_batches(self.queryset, self.get_serializer_class())):
66+
json_data = json.dumps(batch, cls=DjangoJSONEncoder)
67+
json_data = json_data[1:-1] # remove [ and ] from json string
68+
f.write(",\n" if not first else "")
69+
f.write(json_data)
70+
first = False
71+
records_exported += len(batch)
72+
self.update_job_progress(records_exported)
73+
f.write("]")
74+
75+
self.update_export_stats(file_temp_path=temp_file.name)
76+
return temp_file.name # Return file path
77+
78+
79+
class OccurrenceTabularSerializer(serializers.ModelSerializer):
80+
"""Serializer to format occurrences for tabular data export."""
81+
82+
event_id = serializers.IntegerField(source="event.id", allow_null=True)
83+
event_name = serializers.CharField(source="event.name", allow_null=True)
84+
deployment_id = serializers.IntegerField(source="deployment.id", allow_null=True)
85+
deployment_name = serializers.CharField(source="deployment.name", allow_null=True)
86+
project_id = serializers.IntegerField(source="project.id", allow_null=True)
87+
project_name = serializers.CharField(source="project.name", allow_null=True)
88+
89+
determination_id = serializers.IntegerField(source="determination.id", allow_null=True)
90+
determination_name = serializers.CharField(source="determination.name", allow_null=True)
91+
determination_score = serializers.FloatField(allow_null=True)
92+
verification_status = serializers.SerializerMethodField()
93+
94+
class Meta:
95+
model = Occurrence
96+
fields = [
97+
"id",
98+
"event_id",
99+
"event_name",
100+
"deployment_id",
101+
"deployment_name",
102+
"project_id",
103+
"project_name",
104+
"determination_id",
105+
"determination_name",
106+
"determination_score",
107+
"verification_status",
108+
"detections_count",
109+
"first_appearance_timestamp",
110+
"last_appearance_timestamp",
111+
"duration",
112+
]
113+
114+
def get_verification_status(self, obj):
115+
"""
116+
Returns 'Verified' if the occurrence has identifications, otherwise 'Not verified'.
117+
"""
118+
return "Verified" if obj.identifications.exists() else "Not verified"
119+
120+
121+
class CSVExporter(BaseExporter):
122+
"""Handles CSV export of occurrences."""
123+
124+
file_format = "csv"
125+
126+
serializer_class = OccurrenceTabularSerializer
127+
128+
def get_queryset(self):
129+
return (
130+
Occurrence.objects.filter(project=self.project)
131+
.select_related(
132+
"determination",
133+
"deployment",
134+
"event",
135+
)
136+
.with_timestamps() # type: ignore[union-attr] Custom queryset method
137+
.with_detections_count()
138+
.with_identifications()
139+
)
140+
141+
def export(self):
142+
"""Exports occurrences to CSV format."""
143+
144+
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".csv", mode="w", newline="", encoding="utf-8")
145+
146+
# Extract field names dynamically from the serializer
147+
serializer = self.serializer_class()
148+
field_names = list(serializer.fields.keys())
149+
records_exported = 0
150+
with open(temp_file.name, "w", newline="", encoding="utf-8") as csvfile:
151+
writer = csv.DictWriter(csvfile, fieldnames=field_names)
152+
writer.writeheader()
153+
154+
for i, batch in enumerate(get_data_in_batches(self.queryset, self.serializer_class)):
155+
writer.writerows(batch)
156+
records_exported += len(batch)
157+
self.update_job_progress(records_exported)
158+
self.update_export_stats(file_temp_path=temp_file.name)
159+
return temp_file.name # Return the file path
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Generated by Django 4.2.10 on 2025-04-02 20:12
2+
3+
from django.conf import settings
4+
from django.db import migrations, models
5+
import django.db.models.deletion
6+
7+
8+
class Migration(migrations.Migration):
9+
initial = True
10+
11+
dependencies = [
12+
("main", "0058_alter_project_options"),
13+
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
14+
]
15+
16+
operations = [
17+
migrations.CreateModel(
18+
name="DataExport",
19+
fields=[
20+
("id", models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name="ID")),
21+
("created_at", models.DateTimeField(auto_now_add=True)),
22+
("updated_at", models.DateTimeField(auto_now=True)),
23+
(
24+
"format",
25+
models.CharField(
26+
choices=[
27+
("occurrences_simple_json", "occurrences_simple_json"),
28+
("occurrences_simple_csv", "occurrences_simple_csv"),
29+
],
30+
max_length=255,
31+
),
32+
),
33+
("filters", models.JSONField(blank=True, null=True)),
34+
("filters_display", models.JSONField(blank=True, null=True)),
35+
("file_url", models.URLField(blank=True, null=True)),
36+
("record_count", models.PositiveIntegerField(default=0)),
37+
("file_size", models.PositiveBigIntegerField(default=0)),
38+
(
39+
"project",
40+
models.ForeignKey(
41+
on_delete=django.db.models.deletion.CASCADE, related_name="exports", to="main.project"
42+
),
43+
),
44+
(
45+
"user",
46+
models.ForeignKey(
47+
on_delete=django.db.models.deletion.CASCADE,
48+
related_name="exports",
49+
to=settings.AUTH_USER_MODEL,
50+
),
51+
),
52+
],
53+
options={
54+
"abstract": False,
55+
},
56+
),
57+
]

ami/exports/migrations/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)