Skip to content

feat: add action attachment.storeDecryptedPdf#381

Merged
ahochsteger merged 9 commits intoahochsteger:mainfrom
MikeDabrowski:pdf-decrypt
Mar 2, 2025
Merged

feat: add action attachment.storeDecryptedPdf#381
ahochsteger merged 9 commits intoahochsteger:mainfrom
MikeDabrowski:pdf-decrypt

Conversation

@MikeDabrowski
Copy link
Contributor

@MikeDabrowski MikeDabrowski commented Jul 16, 2024

Description

This PR intends to add decryptPDF action. It will take attached pdfs, store the original and decrypted in the chosen location.

It also adds new dependency @cantoo/pdf-lib fork that allows for pdf decrypting. This library uses promises, which makes decrypting the pdf async as well. The rest of the code is synchronous.

Fixes #355)

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this been tested?

TODOs

  • Hide async complexity in the lib itself and provide a built-in action to decrypt PDFs (if possible somehow)
  • Review the direct exposal of PDFDocument using GmailProcessorLib and maybe encapsulate it in ctx.env.pdfDocument.
  • Investigate the possibility to move the (huge) pdf lib to a separate Google Apps Script Library to make it an optional dependency.
  • Update the documentation with good examples

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes

@MikeDabrowski
Copy link
Contributor Author

@ahochsteger I started adding the feature to the gmail-processor and would like to ask for some guidance.
Because pdf-lib uses promises I decided to create separate action, but in theory they could be merged together. In my initial test, promise usage had no visible side effects - but I was not relying on anything returned from the function. Async here could impact many other places.

@ahochsteger
Copy link
Owner

@MikeDabrowski thanks for the PR - so far it looks good to me for the start.
I'll add some comments to the code to let you know how I usually do things in Gmail Processor, esp. to be able to do test automation (both locally during build but as well as using end-to-end tests directly on Google Apps Script).

For local testing using Jest tests I usually mock services provided by GAS like Utilities and make them available via the environment context like ctx.env.utilities. See the EnvProvider.ts to see what GAS services can be accessed through the environment context and are automatically provided as mocks for local Jest tests.

Give me some time to try it out myself and I'll give you some more guidance or maybe directly change some things myself in case I feel that it may be a bit tricky to solve.

Copy link
Owner

@ahochsteger ahochsteger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the changes and put some comments into the code.
But I still have to think about how to support async functions in Gmail Processor actions though ...
In case you've got some ideas I'm all ears ;-).

},
)
});
const actionMeta = context.proc.gdriveAdapter.getActionMeta(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the tricky part here, since getDecryptedPdf is an async function that has been setup but not yet executed when reaching this line, so we cannot tell, if the decryption was successful and return the correct state.
This is something we have to check and test in-depth and find a good way to allow async actions in general, since this currently really limits what can be used in Gmail Processor actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry I do not have any idea how to solve that problem. I know that GAS has at least some partial support for top level await but probably using that would turn most of this lib to async/await functions - which will be a lot of work


/** Store and decrypt an attachment to a Google Drive location. */
@writingAction<AttachmentContext>()
public static storeAndDecrypt(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When certain changes affect the documentation and the configuration schema which is auto-generated.
That's why it is recommended to run npm run pre-commit that takes care of that and also runs all the local tests to verify if everything is still ok.
Also have a look at the Development Guide for more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many scripts are not cross platform I am afraid.

I switched to linux but running the pre-commit throws lots of errors of missing dependencies.

[docs] scripts/update-docs.sh: line 33: gojq: command not found
[docs] npm run update:docs exited with code 127
[examples] scripts/update-examples.sh: line 20: gojq: command not found
..and more

*/
decryptedPdfDescription?: string
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some ideas about the features and configurability:

  • We may let the user decide, if both files should be kept or just the decrypted version (in which case just location would be enough to set.
  • Is there really a use case to provide a different description for the decrypted file instead of just use description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the first point - Does saveOriginal sound good?

And I don't have any use case for desc of decrypted file. But I also do not have any use case for descriptions in general. Just followed the pattern here

export async function getDecryptedPdf(processedFileObject: GoogleAppsScript.Gmail.GmailAttachment, password: string) {
const bytes = processedFileObject.getBytes();
const fileBase64 = Utilities.base64Encode(bytes);
const pdfDoc = await PDFDocument.load(fileBase64, { password, ignoreEncryption: true});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd support Pattern substitution for the password to be able to set it as variable in the global settings (or maybe provide a way in the future to securely provide secrets) without having to hard-code them inside the action configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought having it defined in the action args is the right way - you may have different passwords for different actions/emails/pdfs. How would you marry them if they were placed in the global settings?

@MikeDabrowski
Copy link
Contributor Author

But I still have to think about how to support async functions in Gmail Processor actions though ... In case you've got some ideas I'm all ears ;-).

Just from the top of my head - the web IDE of GAS mentioned me at some point that top level await is available. So GAS have at least some async support built in already. When I was hacking the decrypting without gmail-processor I just made the outer function async and went with it.

I assume that doing the same here might not be so easy - this lib is fare more complex than I thought initially. However, even if you'd have to make every single fn async, I suppose it would still be usable. At least in the basic way, as described in the docs. The way I am using it is I just have an outer function processMails which calls the gmail-processor. Nothing more nothing else. I don't know if there are any other usages that would break if run would become async.

The other idea, the 'kinda works' idea, is to put the async stuff into separate action, just add then and leave it be. Just make sure that whatever and whenever it does its thing it won't impact any other process. But lets leave it as the last resort option, it is not really for production code :/

Thanks for the review, I'll try to carve some time to address it in the next few days

@coveralls
Copy link
Collaborator

coveralls commented Feb 23, 2025

Pull Request Test Coverage Report for Build 13615723321

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 47 of 103 (45.63%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 89.922%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/lib/e2e/E2E.ts 0 4 0.0%
src/lib/actions/AttachmentActions.ts 15 67 22.39%
Totals Coverage Status
Change from base Build 13322142560: -0.4%
Covered Lines: 8712
Relevant Lines: 9537

💛 - Coveralls

@MikeDabrowski
Copy link
Contributor Author

Have you figured a way to handle async functions ?

@ahochsteger
Copy link
Owner

ahochsteger commented Feb 24, 2025

@MikeDabrowski I was able to do update this PR and it is now in a functional state using an async custom action.
Moving the async support into the library did not (yet) work, but I'm still investigating how it is possible to reduce the complexity on the usage side.
It would be great, if you could give it a try and give feedback using the beta testing script id 1yhOQyl_xWtnGJn_bzlL7oA4d_q5KoMyZyWIqXDJX1SY7bi22_lpjMiQK with version HEAD.
This is a working example that uses a simple encrypted PDF that is included in this PR at src/e2e-test/files/encrypted.pdf (uses "dry-run" mode, wjich you might want to change):

function decryptPdfRun() {
  const config = {
    description:
      "The action `custom.decryptAndStorePdf` decrypts and stores a PDF file.",
    settings: {
      markProcessedMethod: "mark-read",
    },
    global: {
      thread: {
        match: {
          query:
            "has:attachment -in:trash -in:drafts -in:spam after:{{date.now|formatDate('yyyy-MM-dd')}} is:unread subject:\"[GmailProcessor-Test] decryptPdf\"",
        },
      },
    },
    threads: [
      {
        match: {
          query: "subject:([GmailProcessor-Test] decryptPdf)",
        },
        attachments: [
          {
            description: "Process all attachments named 'encrypted*.pdf'",
            match: {
              name: "(?<basename>encrypted.*)\\.pdf$",
            },
            actions: [
              {
                name: "custom.decryptAndStorePdf",
                args: {
                  location:
                    "/GmailProcessor-Tests/e2e/advanced/{{message.date|formatDate('yyyy-MM-dd')}}/decrypted.pdf",
                  conflictStrategy: "replace",
                  password: "test",
                },
              },
            ],
          },
        ],
      },
    ],
  }

  const customActions = [
    {
      name: "decryptAndStorePdf",
      action: async (ctx, args) => {
        const location = args.location
        try {
          ctx.log.info(`decryptAndStorePdf(): location=${location}`)
          const attachment = ctx.attachment.object
          const base64Content = ctx.env.utilities.base64Encode(
            attachment.getBytes(),
          )
          ctx.log.info(`decryptAndStorePdf(): Loading PDF document ...`)
          const pdfDoc = await GmailProcessorLib.PDFDocument.load(
            base64Content,
            {
              password: args.password,
              ignoreEncryption: true,
            },
          )
          ctx.log.info(`decryptAndStorePdf(): Decrypt PDF content ...`)
          const decryptedContent = await pdfDoc.save()
          ctx.log.info(`decryptAndStorePdf(): Create new PDF blob ...`)
          const decryptedPdf = ctx.env.utilities.newBlob(
            decryptedContent,
            attachment.getContentType(),
            attachment.getName(),
          )
          ctx.log.info(
            `decryptAndStorePdf(): Store PDF file to '${location}' ...`,
          )
          return ctx.proc.gdriveAdapter.createFileFromAction(
            ctx,
            args.location,
            decryptedPdf,
            args.conflictStrategy,
            args.description,
            "decrypted PDF",
            "custom",
            "custom.decryptAndStorePdf",
          )
        } catch (e) {
          ctx.log.error(
            `Error while saving decrypted pdf to ${location}: ${e}`,
          )
          throw e
        }
      },
    },
  ]
  return GmailProcessorLib.run(config, "dry-run", customActions)
}

@ahochsteger
Copy link
Owner

ahochsteger commented Feb 24, 2025

To summarize, these topics I'd like to address before releasing it (added to the description as well):

  • Hide async complexity in the lib itself and provide a built-in action to decrypt PDFs (if possible somehow)
  • Review the direct exposal of PDFDocument using GmailProcessorLib and maybe encapsulate it in ctx.env.pdfDocument.
  • Investigate the possibility to move the (huge) pdf lib to a separate Google Apps Script Library to make it an optional dependency.
  • Update the documentation with good examples

@ahochsteger
Copy link
Owner

@MikeDabrowski I was now able to fully integrate it as a async action attachment.storeDecryptedPdf that can be used this way:

  {
    "name": "attachment.storeDecryptedPdf",
    "args": {
      "location": "decrypted.pdf",
      "conflictStrategy": "replace",
      "password": "...",
    },
  }

I intentionally left all additional properties to store both the original and the decrypted version out to keep the implementation as simple as possible. The original version may stored by an additional attachment.store action anyway.

@ahochsteger ahochsteger marked this pull request as ready for review March 2, 2025 13:23
@ahochsteger ahochsteger changed the title wip: add decrypt pdf action feat: add action attachment.storeDecryptedPdf Mar 2, 2025
@ahochsteger ahochsteger merged commit 077e5d2 into ahochsteger:main Mar 2, 2025
2 checks passed
@MikeDabrowski
Copy link
Contributor Author

@ahochsteger Brilliant work! Thank you for completing this.

I can confirm that version 35 is working for me in my 'production' case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDF password removal

3 participants