Skip to content

Conversation

@fractal3000
Copy link
Contributor

@fractal3000 fractal3000 commented Sep 7, 2024

@fractal3000 fractal3000 marked this pull request as draft September 9, 2024 09:09
@fractal3000
Copy link
Contributor Author

fractal3000 commented Sep 9, 2024

I moved it to "draft" to add specific logging checks in tests.

@fractal3000 fractal3000 marked this pull request as ready for review September 9, 2024 13:57
testImplementation 'org.junit.vintage:junit-vintage-engine'
testImplementation 'org.spockframework:spock-core'
testImplementation 'org.mockito:mockito-core'
testImplementation 'org.spockframework:spock-core'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spock, still, added twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


package io.jmix.search.exception;

public abstract class ParserResolvingException extends Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public class EmptyFileExtensionException extends ParserResolvingException {

public class UnsupportedFileFormatException extends Exception {
public static final String MESSAGE = "Extension of the file %s is empty";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


public class UnsupportedFileExtensionException extends ParserResolvingException {

public static final String MESSAGE = "The file %s with '%s' extension is not supported. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add a public API if it is not intended to be used by users. You don't event use this constant in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

public class UnsupportedFileExtensionException extends ParserResolvingException {

public static final String MESSAGE = "The file %s with '%s' extension is not supported. " +
"Only following file extensions are supported %s.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided not to specify which extensions are allowed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it so because the problem with "fogotten extensions" had been solved.

this.parserSupplier = parserSupplier;
}

public String getSymbols() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

These symbols are called extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so to make this field name be different from the class name that countains a "extension" word.

import java.util.Optional;

@Component
@Component("search_FileProcessor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't change public API. Bean name is public API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. But this package is marked as "internal"
image

private static final Logger log = LoggerFactory.getLogger(FileProcessor.class);

protected FileStorageLocator fileStorageLocator;
private final FileParserResolver fileParserResolver;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return stringWriter.toString();
}

protected Parser getParser(FileRef fileRef) throws UnsupportedFileFormatException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert both methods and call fileParserResolver from them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. But this class is in the package is annotated with an "Internal" annotation.


import java.util.function.Supplier;

public enum SupportedFileExtensions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What is the point of this enum? It still cannot be extended if needed + you've created FileParserResolver bean to iterate over available constants which can be easily done in the enum itself (see any enum for entity attribute).

  2. Remember that the issue that you fix is a bug, not a feature. But all these changes to much for a bug fix.

Copy link
Contributor Author

@fractal3000 fractal3000 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The main goal of using enums is much more safe using them as string constansts in the "switch" expressions. If the new item is added to a "enum" and the item is not added to the correspondant "swith" constructions the compile time error will occured. Another point is to incapsulate the string constants and the parsers into the "enum" constants. This makes code more clean and reliable.

The FileParserResolver have been created to icapsulate getting a parser into a "enum" counstant. If we should add any new extention we will have to point the correspondant parser in the same time. And there are no an ability to do forget to do it. Such incupsulating makes string link between the string constant of the extension(such as "txt" or "doc") and the parser. If we do so a proggamer can't add any new extension and not add a parser for it. As a result of adding such "enum" item a programmer doesn't need to add any changes to algoritms. The case of using the new file format will be supported automatically.

  1. Except the small change in the io.jmix.search.utils.FileProcessor#getParser method signature there are no any other breaking changes in the public API. Additionaly the changing type of the exception of this method was approved by Ivan Gavrilov and this change is already commited to the release 2.3 and to the "master" branches. The other changes are not breaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the io.jmix.search.utils.FileProcessor class is in the "internal" marked package. And with "The boy scout rule".

Copy link
Contributor Author

@fractal3000 fractal3000 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I found at least two places in code with incorrect behaviour(excepting the main problem).

@fractal3000 fractal3000 linked an issue Sep 11, 2024 that may be closed by this pull request
@fractal3000 fractal3000 requested a review from glebfox September 13, 2024 14:14

@Component("search_PDFParserResolver")
@Order(100)
public class PDFParserResolver implements FileParserResolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name and extension do not match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and covered by the unit test


@Component("search_RTFParserResolver")
@Order(100)
public class RTFParserResolver implements FileParserResolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name and extension do not match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

testImplementation 'org.junit.vintage:junit-vintage-engine'
testImplementation 'org.spockframework:spock-core'
testImplementation 'org.mockito:mockito-core'
testImplementation 'org.spockframework:spock-core'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spock, still, added twice

* Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "DOCX", "DOCX"].
* @return collection of supported extensions
*/
List<String> getExtension();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. getSupportedExtensions()
  2. Usually, when we have a collection of beans that may or may not be applied, they either have a method that returns true/false whether this bean can be used, e.g. boolean supports(...) (ConditionGenerator, AuthenticationLocaleResolver, etc. just try to search for such method). Another option is to have a @Nullable method and the first bean which return a value is used, e.g. ComponentGenerationStrategy.

The point is that the calling code should not check applicability itself. And one of the reasons is that the supports method can be implemented differently for each bean.

In this case, the option with the supports method is better then @Nullable return value.

Ok, you can leave getExtension() method to collect all available extensions for an exception, just rename it to getSupportedExtensions()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more flexible engine was implemented. This method was moved to the abstract class. The more general method "getCriteriaDescription()" was described instead of it.

public interface FileParserResolver {

/**
* Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "DOCX", "DOCX"].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of FileParserResolver support uppercase extensions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an implementation. The interface gives the ability to provide both lowercase and uppercase extensions.

def parserResolver = new FileParserResolverManager(List.of(resolver, resolver2))

expect:
parserResolver.getParser(createFileRefMock("docx")) == parser1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make tests for uppercase extensions

@fractal3000 fractal3000 changed the title "Search" add-on adds unnecessary stacktrace to log if an unsupported file format is provided #3660 Ability to extend supported extension - file parser mapping #3690 Sep 18, 2024
@fractal3000 fractal3000 requested a review from glebfox September 18, 2024 21:00
"rtf" | RTFParser
"odt" | OpenDocumentParser
"ods" | OpenDocumentParser
"doc" | OfficeParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no tests for UPPER CASE extensions which are not supported

Screenshot 2024-09-23 at 16 17 40

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

this.fileParserResolvers = fileParserResolvers;
}

public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc

Copy link
Contributor Author

@fractal3000 fractal3000 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is in the internal package. Is it neccessary to add JavaDoc for all its methods?


public Parser getParser(FileRef fileRef) throws UnsupportedFileTypeException {
if (fileParserResolvers.isEmpty()) {
throw new EmptyFileParserResolversList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exception classes must be named in form of FooException
  2. You don't need a custom exception class until you explicitly handle it. Throw IllegalStateException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Fixed

* An exception that is thrown when a user added some file of the type that is not supported
* and there are no any known parser for.
*/
public class UnsupportedFileTypeException extends Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 2.3 you have a different exception class name. You cannot rename public API, so either revert prev name or rename it in 2.3 until we published a next bug fix release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

package io.jmix.search.exception;

import org.apache.commons.io.FilenameUtils;
public class EmptyFileParserResolversList extends RuntimeException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Exception class must be named in form FooException
  2. You don't need a custom exception class until you explicitly handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception was deleted. Fixed.

@Component("search_OldOfficeDocumentsParserResolver")
@Order(100)
public class OldMSOfficeDocumentsParserResolver extends AbstractExtensionBasedFileParserResolver {
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Java

Empty line after class or interface definition

P.S. fix all classes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

import java.util.List;

/**
* Is a part of the extendable engine the gives an ability to implement custom file parser resolvers and to support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. For example:

Interface to be implemented by a custom file parser resolver

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public interface FileParserResolver {

/**
* This method should return the description that describes the constraints or the constraint for the files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Parser getParser();

/**
* This method should implement the logic for checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JavaDoc is written for API users, not for developers who will implement it. Writing style for a methods: starting with the verb defining the main action.

Check https://youtrack.haulmont.com/articles/PLAT-A-1158/Code-Style#Javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"Only the following file parsing criteria are supported:\n -%s";

/**
* @param fileName - the name of the file which type is not supported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hyphen is redundant.

Screenshot 2024-09-23 at 16 50 23

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public abstract class AbstractExtensionBasedFileParserResolver implements FileParserResolver {

/**
* Returns a collection of supported extensions of the supported file type. E.g. ["xlsx", "XLSX", "docx", "DOCX"].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to explicitly specify both lowercase and uppercase extensions now?
Lets require (and mention it in javadoc) lowercase only and convert the actual extension to lowercase within support method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed with Gleb such option but we desided not to implement it within this issue. It will be implemented in a separate issue. #3696

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc was corrected

*
* @return collection of supported extensions
*/
public abstract List<String> getSupportedExtensions();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set is better for contains operation than List.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public interface FileParserResolver {

/**
* This method should return the description that describes the constraints or the constraint for the files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name is getCriteriaDescription but javadoc tells about "constraints".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

*
* @return criteria description
*/
String getCriteriaDescription();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this public API method?
Isn't it better to delegate this logic to the final consumer? The only purpose of this is to generate message like 'The file extension should be one of the following: ...'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer doesn't know anything about the file checking criteria that are implemented in the resolvers. The aim was to give to the user ability to get comprehensive information what is going wrong. If we remove this method we just could say that "A resolver(and parser) for the file couldn't be found".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message of the AbstractExtensionBasedFileParserResolver was corrected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method was left without changes as it was discussed.

* A search principle is based on the sequential applying FileParserResolver objects' checks for the given file.
*/
@Component("search_FileParserResolverManager")
public class FileParserResolverManager {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class consumes existing FileParserResolvers internally, but doesn't provide outside anything about them.
It provides the final Parsers.
So maybe this class should be named as FileParserProvider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, fixed

if (resolver.supports(fileRef)) {
return resolver.getParser();
}
messages.add(resolver.getCriteriaDescription());
Copy link
Collaborator

@Gavrilov-Ivan Gavrilov-Ivan Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to process completely unsupported file it will collect "criteria descriptions" from all existing resolvers.
As a result the final exception message will contain multiple "The file extension should be one of the following: ..." expressions.

That is the point in favor of the FileParserResolver#getCriteriaDescription method irrelevance.

This loop should collect extensions supported by unsuitable resolvers and provide them to exception if non of existing resolvers is suitable. And exception generates the final message based on provided extensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. Left without changes.

* @return an instance of a file parser
*/
Parser getParser();
FileParsingBundle getParsingBundle();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe FileParserKit?


public record FileParsingBundle(
@NotNull Parser parser,
@NotNull Function<StringWriter, BodyContentHandler> bodyContentHandlerGenerator,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ContentHandler.
BodyContentHandler is a specific implementation.

@fractal3000 fractal3000 marked this pull request as draft September 27, 2024 16:01
@glebfox glebfox removed their request for review September 30, 2024 10:57
@fractal3000 fractal3000 force-pushed the feature/3660-2-search-add-on-adds-unnecessary-stacktrace-to-log-if-an-unsupported-file-format-is-provided branch from 57e81d9 to 59c6a88 Compare September 18, 2025 14:13
@fractal3000 fractal3000 marked this pull request as ready for review September 19, 2025 16:38
@glebfox glebfox removed the request for review from Gavrilov-Ivan October 10, 2025 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to extend supported extension - file parser mapping

4 participants