-
Couldn't load subscription status.
- Fork 32
EM-6870 add mimetype update task #2920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mdayican
wants to merge
26
commits into
master
Choose a base branch
from
EM-6870-add-mimetask
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
193df78
EM-6870 add mimetype update task
mdayican f98fcab
add test code
mdayican 457e0ca
run every 3 sec
mdayican 5657542
fix typo
mdayican d78d65f
fix writing
mdayican 34125ea
store docversion to db
mdayican bcaba3d
update mimetype
mdayican a248e93
enable logging
mdayican 27fc77a
make @Transactional
mdayican 5c38535
make @Transactional
mdayican 030cda5
use transactional save
mdayican 3e6137c
refactor
mdayican 99a0b55
remove exceptions
mdayican e903e69
remove exceptions
mdayican bb6dd21
use @Transactional(propagation = Propagation.REQUIRES_NEW)
mdayican 711307c
use @Transactional(propagation = Propagation.REQUIRES_NEW)
mdayican 6e7392e
remove testing code
mdayican b1174fe
remove testing code
mdayican 4bf93ee
Merge branch 'master' into EM-6870-add-mimetask
mdayican c2c01c1
Merge branch 'master' into EM-6870-add-mimetask
yogesh-hullatti 3e63e7f
fix Pr comment
mdayican 4c88bda
Merge branch 'master' into EM-6870-add-mimetask
mdayican c56d1f6
fix Pr comments
mdayican 3f88613
add values
mdayican 8fdb101
Merge branch 'master' into EM-6870-add-mimetask
mdayican 5dca62f
fix pr comments
mdayican File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,7 +2,7 @@ description: Helm chart for the HMCTS CDM Document Management APO | |
| apiVersion: v2 | ||
| name: dm-store | ||
| home: https://github.com/hmcts/document-management-store-app | ||
| version: 2.3.5 | ||
| version: 2.3.6 | ||
| maintainers: | ||
| - name: HMCTS Evidence Management Team | ||
| email: [email protected] | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
106 changes: 106 additions & 0 deletions
106
src/main/java/uk/gov/hmcts/dm/config/batch/MimeTypeUpdateTask.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| package uk.gov.hmcts.dm.config.batch; | ||
|
|
||
| import com.microsoft.applicationinsights.core.dependencies.google.common.collect.Lists; | ||
| import org.apache.commons.collections4.CollectionUtils; | ||
| import org.apache.commons.lang3.time.StopWatch; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
| import org.springframework.beans.factory.annotation.Value; | ||
| import org.springframework.data.domain.PageRequest; | ||
| import org.springframework.data.domain.Pageable; | ||
| import org.springframework.stereotype.Service; | ||
| import uk.gov.hmcts.dm.repository.DocumentContentVersionRepository; | ||
| import uk.gov.hmcts.dm.service.DocumentContentVersionService; | ||
|
|
||
| import java.util.List; | ||
| import java.util.UUID; | ||
| import java.util.concurrent.ExecutorService; | ||
| import java.util.concurrent.Executors; | ||
|
|
||
| /** | ||
| * This task periodically checks for Document Content Versions where the mimeTypeUpdated flag is false. | ||
| * It will then read the blob from storage, detect the correct MIME type, and update the database record. | ||
| */ | ||
| @Service | ||
| public class MimeTypeUpdateTask implements Runnable { | ||
|
|
||
| private static final Logger log = LoggerFactory.getLogger(MimeTypeUpdateTask.class); | ||
|
|
||
| private final DocumentContentVersionService documentContentVersionService; | ||
| private final DocumentContentVersionRepository documentContentVersionRepository; | ||
|
|
||
| @Value("${spring.batch.mimeTypeUpdate.batchSize}") | ||
| private int batchSize; | ||
|
|
||
| @Value("${spring.batch.mimeTypeUpdate.noOfIterations}") | ||
| private int noOfIterations; | ||
|
|
||
| @Value("${spring.batch.mimeTypeUpdate.threadLimit}") | ||
| private int threadLimit; | ||
|
|
||
| public MimeTypeUpdateTask(DocumentContentVersionService documentContentVersionService, | ||
| DocumentContentVersionRepository documentContentVersionRepository) { | ||
| this.documentContentVersionService = documentContentVersionService; | ||
| this.documentContentVersionRepository = documentContentVersionRepository; | ||
| } | ||
|
|
||
| @Override | ||
| public void run() { | ||
| log.info("Started MIME Type Update job."); | ||
| StopWatch stopWatch = new StopWatch(); | ||
| stopWatch.start(); | ||
|
|
||
| try { | ||
| log.info("threadLimit: {}, noOfIterations: {}, batchSize: {}", threadLimit, noOfIterations, batchSize); | ||
|
|
||
| for (int i = 0; i < noOfIterations; i++) { | ||
| if (!getAndUpdateMimeTypes(i)) { | ||
| // Stop iterating if a run finds no records to process | ||
| log.info("No records found in iteration {}. Stopping job.", i); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| } catch (Exception e) { | ||
| log.error("MIME Type Update job failed with Error message: {}", e.getMessage(), e); | ||
| } finally { | ||
| stopWatch.stop(); | ||
| log.info("MIME Type Update job finished and took {} ms", stopWatch.getDuration().toMillis()); | ||
| } | ||
| } | ||
|
|
||
| private boolean getAndUpdateMimeTypes(int iteration) { | ||
| StopWatch iterationStopWatch = new StopWatch(); | ||
| iterationStopWatch.start(); | ||
|
|
||
| Pageable pageable = PageRequest.of(0, batchSize); | ||
|
|
||
| List<UUID> documentIds = documentContentVersionRepository | ||
| .findDocumentContentVersionIdsForMimeTypeUpdate(pageable); | ||
|
|
||
| if (CollectionUtils.isEmpty(documentIds)) { | ||
| iterationStopWatch.stop(); | ||
| log.info("Iteration {}: No records found for MIME type update. Total time: {} ms", | ||
| iteration, iterationStopWatch.getDuration().toMillis()); | ||
| return false; // Indicates no records were found | ||
| } | ||
|
|
||
| log.info("Iteration {}: Found {} records to process for MIME type update.", iteration, documentIds.size()); | ||
|
|
||
| int batchCommitSize = 500; // Define the batch size for committing to the DB | ||
| List<List<UUID>> batches = Lists.partition(documentIds, batchCommitSize); | ||
|
|
||
| try (ExecutorService executorService = Executors.newFixedThreadPool(threadLimit)) { | ||
| batches.forEach( | ||
| batch -> executorService.submit(() -> | ||
| documentContentVersionService.updateMimeType(batch)) | ||
| ); | ||
| } | ||
|
|
||
|
|
||
| iterationStopWatch.stop(); | ||
| log.info("Time taken to complete iteration number: {} was : {} ms", iteration, | ||
| iterationStopWatch.getDuration().toMillis()); | ||
| return true; // Indicates records were processed | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
46 changes: 41 additions & 5 deletions
46
src/main/java/uk/gov/hmcts/dm/service/DocumentContentVersionService.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,39 +1,75 @@ | ||
| package uk.gov.hmcts.dm.service; | ||
|
|
||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
| import org.springframework.beans.factory.annotation.Autowired; | ||
| import org.springframework.stereotype.Service; | ||
| import org.springframework.transaction.annotation.Propagation; | ||
| import org.springframework.transaction.annotation.Transactional; | ||
| import uk.gov.hmcts.dm.domain.DocumentContentVersion; | ||
| import uk.gov.hmcts.dm.domain.StoredDocument; | ||
| import uk.gov.hmcts.dm.repository.DocumentContentVersionRepository; | ||
| import uk.gov.hmcts.dm.repository.StoredDocumentRepository; | ||
|
|
||
| import java.util.List; | ||
| import java.util.Optional; | ||
| import java.util.UUID; | ||
|
|
||
| @Transactional | ||
| @Service | ||
| public class DocumentContentVersionService { | ||
|
|
||
| private final DocumentContentVersionRepository documentContentVersionRepository; | ||
| private static final Logger log = LoggerFactory.getLogger(DocumentContentVersionService.class); | ||
|
|
||
| private final DocumentContentVersionRepository documentContentVersionRepository; | ||
| private final StoredDocumentRepository storedDocumentRepository; | ||
| private final MimeTypeDetectionService mimeTypeDetectionService; // New dependency | ||
|
|
||
| @Autowired | ||
| public DocumentContentVersionService(DocumentContentVersionRepository documentContentVersionRepository, | ||
| StoredDocumentRepository storedDocumentRepository) { | ||
| StoredDocumentRepository storedDocumentRepository, | ||
| MimeTypeDetectionService mimeTypeDetectionService) { // Injected here | ||
mdayican marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| this.documentContentVersionRepository = documentContentVersionRepository; | ||
| this.storedDocumentRepository = storedDocumentRepository; | ||
| this.mimeTypeDetectionService = mimeTypeDetectionService; | ||
| } | ||
|
|
||
| public Optional<DocumentContentVersion> findById(UUID id) { | ||
| return documentContentVersionRepository.findById(id); | ||
| } | ||
|
|
||
| @Transactional | ||
| public Optional<DocumentContentVersion> findMostRecentDocumentContentVersionByStoredDocumentId(UUID id) { | ||
| return storedDocumentRepository | ||
| .findByIdAndDeleted(id, false) | ||
| .map(StoredDocument::getMostRecentDocumentContentVersion); | ||
| .findByIdAndDeleted(id, false) | ||
| .map(StoredDocument::getMostRecentDocumentContentVersion); | ||
| } | ||
|
|
||
| @Transactional(propagation = Propagation.REQUIRES_NEW) | ||
| public void updateMimeType(List<UUID> documentVersionIdList) { | ||
|
|
||
| for (UUID documentVersionId : documentVersionIdList) { | ||
| log.info("Processing MIME type update for ID: {}", documentVersionId); | ||
|
|
||
| String detectedMimeType = mimeTypeDetectionService.detectMimeType(documentVersionId); | ||
|
|
||
| if (detectedMimeType == null) { | ||
| log.warn( | ||
| "Could not detect MIME type for {}. Marking as processed to prevent retries.", | ||
| documentVersionId | ||
| ); | ||
| documentContentVersionRepository.markMimeTypeUpdated(documentVersionId); | ||
| continue; | ||
| } | ||
| log.info("Updating MIME type for document {}. New: [{}].", | ||
| documentVersionId, detectedMimeType); | ||
|
|
||
| documentContentVersionRepository.updateMimeType(documentVersionId, detectedMimeType); | ||
|
|
||
| log.info("Updated documentVersion id:{}, mimeType:{}", | ||
| documentVersionId, | ||
| detectedMimeType | ||
| ); | ||
| } | ||
| } | ||
| } | ||
|
|
||
59 changes: 59 additions & 0 deletions
59
src/main/java/uk/gov/hmcts/dm/service/MimeTypeDetectionService.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| package uk.gov.hmcts.dm.service; | ||
|
|
||
| import org.apache.commons.io.input.BoundedInputStream; | ||
| import org.apache.tika.Tika; | ||
| import org.apache.tika.metadata.Metadata; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
| import org.springframework.stereotype.Service; | ||
|
|
||
| import java.io.IOException; | ||
| import java.io.InputStream; | ||
| import java.util.UUID; | ||
|
|
||
| /** | ||
| * Service to detect the MIME type of a document stored in blob storage. | ||
| */ | ||
| @Service | ||
| public class MimeTypeDetectionService { | ||
|
|
||
| private static final Logger log = LoggerFactory.getLogger(MimeTypeDetectionService.class); | ||
| private static final int MAX_BYTES_TO_READ = 2 * 1024 * 1024; // 2 MB is sufficient for Tika to detect type | ||
|
|
||
| private final BlobStorageReadService blobStorageReadService; | ||
|
|
||
| public MimeTypeDetectionService(BlobStorageReadService blobStorageReadService) { | ||
| this.blobStorageReadService = blobStorageReadService; | ||
| } | ||
|
|
||
| /** | ||
| * Detects the MIME type of a document version by reading the first few bytes from its blob. | ||
| * | ||
| * @param documentVersionId The UUID of the document version. | ||
| * @return The detected MIME type as a String, or null if detection fails. | ||
| */ | ||
| public String detectMimeType(UUID documentVersionId) { | ||
| log.debug("Attempting to detect MIME type for document version ID: {}", documentVersionId); | ||
| try (InputStream inputStream = blobStorageReadService.getInputStream(documentVersionId); | ||
| BoundedInputStream limitedStream = BoundedInputStream.builder() | ||
| .setInputStream(inputStream) | ||
| .setMaxCount(MAX_BYTES_TO_READ) | ||
| .get()) { | ||
|
|
||
| Tika tika = new Tika(); | ||
| Metadata metadata = new Metadata(); | ||
| String mimeType = tika.detect(limitedStream, metadata); | ||
| log.info("Detected MIME type for {} as: {}", documentVersionId, mimeType); | ||
| return mimeType; | ||
|
|
||
| } catch (IOException e) { | ||
mdayican marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| log.error("Failed to read blob stream for MIME type detection on document version {}", | ||
| documentVersionId); | ||
| return null; | ||
| } catch (Exception e) { | ||
| log.error("An unexpected error occurred during MIME type detection for document version {}", | ||
| documentVersionId); | ||
| return null; | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.