Skip to content

Conversation

@eirikbakke
Copy link
Contributor

Try to work around clipboard issues on Windows by always calling Clipboard.setContents from the Event Dispatch Thread.

This attempted workaround was motivated by @xxDark's investigation here.

The problem mentioned was that IsGettingOwnership would be returning true when awt_toolkit.cpp in the JDK tries to handle the WM_DESTROYCLIPBOARD event from the Win32, leading to the event being left unhandled and the cached clipboard content in Clipboard.contents remaining stale.

Grepping through the OpenJDK codebase, the IsGettingOwnership flag seems only to be modified when Java_sun_awt_windows_WClipboard_openClipboard is called with a non-null argument for newOwner. This, in turn, happens only when Clipboard.setContents is called.

Since race condition is related to a flag touched only by Clipboard.setContents, I am inclined to test this workaround where we call setContents only from the Event Dispatch Thread.

Note that IsGettingOwnership is called from the Toolkit thread, not the Event Dispatch Thread; from @xxDark's explanation I understand that these are different. But there's a high chance that the Toolkit thread and the EDT are "better" synchronized than the Toolkit thread and a random RequestProcessor thread.

Unfortunately I have never been able to reproduce the clipboard bug reliably on my own machine, and I spent my time mostly on MacOS these days. So I'm leaving this PR here for others to try if they have the chance.

…board.setContents from the Event Dispatch Thread.
@eirikbakke eirikbakke added do not merge Don't merge this PR, it is not ready or just demonstration purposes. Platform [ci] enable platform tests (platform/*) UI User Interface ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) labels Feb 24, 2025
@xxDark
Copy link

xxDark commented Feb 24, 2025

Hi. This is not the issue and won't really help. You can call setContents on any thread. See the change in this PR openjdk/jdk#23736 to see what needs to be changed in the OpenJDK. There is a race condition in an upcall from native code. The only way I see this can be fixed temporary in NetBeans is to write an agent that will transform handleContentsChanged to schedule the execution on the EDT when needed and synchronize on clipboard access.

@eirikbakke
Copy link
Contributor Author

@xxDark Just to confirm; did the change in that OpenJDK PR fix the original problem in NetBeans, rather than just avoiding the assertion failures?

Did you figure out how calling handleContentsChanged from the wrong thread could end up causing IsGettingOwnership to return true when it shouldn't?

Our challenge is that we can't easily change the JDK, but we can easily make changes to NetBeans, if a workaround can be found there.

@xxDark
Copy link

xxDark commented Feb 24, 2025

@xxDark Just to confirm; did the change in that OpenJDK PR fix the original problem in NetBeans, rather than just avoiding the assertion failures?

Did you figure out how calling handleContentsChanged from the wrong thread could end up causing IsGettingOwnership to return true when it shouldn't?

Our challenge is that we can't easily change the JDK, but we can easily make changes to NetBeans, if a workaround can be found there.

Yes, I've verified that this fixes the issue.
However, I've not got to figuring out how this fixes the issue. I may take a look tomorrow at what OpenClipboard does on Windows under the hood/take a deeper look into AWT innerworkings with the clipboard.
I made a java agent that patches WClipboard that can be used in NetBeans and can confirm that with it as well, I can't reproduce the issue. (Without it, I can reproduce it immediately)

@xxDark
Copy link

xxDark commented Feb 24, 2025

I've published the agent to https://github.com/xxDark/clipboard-agent. Someone from NetBeans team can adapt it as I've not figured out how to set the project up properly in IntelliJ, unfortunately.

@eirikbakke
Copy link
Contributor Author

@xxDark Whoa, I didn't know you could do that! Thanks for posting and documenting the agent code. I think we could adapt that, if nobody discovers a simpler workaround.

@neilcsmith-net
Copy link
Member

@eirikbakke well, we did just decide against adding an agent from #3386 as part of #7928 That also has clipboard implications to address in NB26.

@eirikbakke
Copy link
Contributor Author

@neilcsmith-net Yeah, it's a bit of a heavy-handed approach. I'd prefer if we could find a simpler workaround.

I still think it would be worth testing this PR, in case there are bugs that make setContents not fully thread-safe.

@xxDark
Copy link

xxDark commented Feb 24, 2025

I'd prefer if we could find a simpler workaround.

I can probably make a workaround that does not need a java agent specifically for NetBeans, but that will require working with JDK internals regardless. If that is OK to do so, then I can probably make a PR

@mbien
Copy link
Member

mbien commented Feb 24, 2025

if it is indeed a JDK bug, getting it in should have priority IMO. Workarounds are just distractions which would have to be removed again anyway most likely.

This bug exists since pre-apache times its not something what has to be rushed. Some JDK bugs are also backported, this is something what we can't do in NB since the major versions go only forward with no hotfixes - JDK update would cover that too.

JDK PRs won't even reach the mailing list before the paperwork is done. Would be good to get in touch with someone from the JDK client team but I don't know anyone (anymore) unfortunately. And just-to-be-sure it would be good to check that this is indeed fixing the issues of those who are affected.

If it would be linux I would have hosted a JDK dev build by now but I don't want to setup a win image in vbox right now.

@xxDark
Copy link

xxDark commented Feb 24, 2025

And just-to-be-sure it would be good to check that this is indeed fixing the issues of those who are affected.

I think that it would be great if some users could test the agent (edit: or, I could implement some hack in NetBeans to address this issue without the agent, if it is OK to reach into JDK internals, effectively achieving the same fix) to see if it fixes the issue. Would be awkward if it ends up not fixing the bug (although assert failures in the OpenJDK already show that something is wrong..).

@mbien
Copy link
Member

mbien commented Feb 24, 2025

I think that it would be great if some users could test the agent

yep. I am a bit busy right now, but any other PMC could take a look through the agent code and if its ok create a dev build of NB (via PR) for testing purposes which injects it on startup, then post a mail to dev/users to get feedback etc.

Thanks for putting so much effort into this btw @xxDark

@mbien
Copy link
Member

mbien commented Feb 24, 2025

(i mean. It doesn't even need a dev build. the instructions in the agent readme should be enough.)

@xxDark
Copy link

xxDark commented Feb 24, 2025

I think I've figured out what happens (this is just an educated guess because I didn't really follow what pfnWowEmptyClipBoard calls, but the symptoms of lost `WM_DESTROYCLIPBOARD` are here):
BOOL __stdcall OpenClipboard(HWND hWndNewOwner)
{
  DWORD emptyClipboard;
  DWORD result = NtUserOpenClipboard(hWndNewOwner, &emptyClipboard);
  if ( emptyClipboard )
    ClientEmptyClipboard();
  return result;
}

Call to OpenClipboard fails in the JDK, because both toolkit & EDT are trying to access it.
emptyClipboard never gets set to true (?), so ClientEmptyClipboard is never called. I think that ClientEmptyClipboard is a user-mode version of EmptyClipboard which calls NtUserEmptyClipboard. In MS documentation, there is a reference to WM_DESTROYCLIPBOARD (keep this code in mind).
ClientEmptyClipboard looks like this (and not called):

void ClientEmptyClipboard(void)
{
  struct _HANDLENODE *v0; // rbx
  struct _HANDLENODE *v1; // rdi

  RtlEnterCriticalSection(&gcsClipboard);
  v0 = (struct _HANDLENODE *)gphn;
  if ( gphn )
  {
    do
    {
      v1 = *(struct _HANDLENODE **)v0;
      if ( *((_QWORD *)v0 + 3) )
        DeleteClientClipboardHandle(v0);
      RtlFreeHeap(pUserHeap, 0, v0);
      v0 = v1;
    }
    while ( v1 );
  }
  gphn = 0LL;
  if ( pfnWowEmptyClipBoard )
    pfnWowEmptyClipBoard(); // Supposed to send WM_DESTROYCLIPBOARD here, but because OpenClipboard did not succeed, the message is silently discarded (?)
  RtlLeaveCriticalSection(&gcsClipboard);
}

So TL;DR LostOwnership never gets a chance to be called because WM_DESTROYCLIPBOARD does not get delivered. If we follow what native code does: supposed to clear current cached contents, but never gets a chance to do so.

Edit: There also could be a possibility that due to how the code is structured, JDK code calls CloseClipboard all the time as well, which again, introduces another function into the mix that is constantly called from multiple threads, regardless of whether the previous OpenClipboard call succeeded on another thread. Either way, this is 99% a race condition that somehow results in the window manager not sending WM_DESTROYCLIPBOARD~ or JDK ignoring the sent message due to the bogus check on line 1039 in awt_toolkit.cpp that at the right moment gets flipped to true (most likely!), discarding the message.

@neilcsmith-net
Copy link
Member

@xxDark curious on your thoughts on the patch at JetBrains/JetBrainsRuntime@6b2c6ff Mind you we know JBR doesn't fix the issue for some reporters here, and I also noticed people are reporting the issue isn't fixed there either, and are even referring to the bug report and replicator in NetBeans.

@xxDark
Copy link

xxDark commented Feb 25, 2025

@xxDark curious on your thoughts on the patch at JetBrains/JetBrainsRuntime@6b2c6ff Mind you we know JBR doesn't fix the issue for some reporters here, and I also noticed people are reporting the issue isn't fixed there either, and are even referring to the bug report and replicator in NetBeans.

Yep, sounds exactly like the issue in JDK we pinpointed.
JetBrains/JetBrainsRuntime@6b2c6ff#diff-ac6d3b12e337195a310db3fc283231f5b7e4e7c555eb0955c49fede89ed6046bR102 is the same I did with breakpoint trick "If one puts a breakpoint here https://github.com/openjdk/jdk/blob/a891630817844c8c42994da3b3110925ca4595a0/src/java.desktop/share/classes/sun/awt/datatransfer/SunClipboard.java#L135 and forcefully sets contents to null, magically, everything will work again."
Call to JetBrains/JetBrainsRuntime@6b2c6ff#diff-ac6d3b12e337195a310db3fc283231f5b7e4e7c555eb0955c49fede89ed6046bR122 on window activation if ownership is lost. "AWT is supposed to call lostSelectionOwnershipImpl from the toolkit event loop, but this never happens, leaving cached "contents" in SunClipboard poisoned."

@xxDark
Copy link

xxDark commented Feb 25, 2025

I think it doesn't fix the issue because to me it seems like a hacky attempt to fix the main issue with the race condition that was found. The toolkit is still calling OpenClipboard without proper synchronization on Java side.

@xxDark
Copy link

xxDark commented Feb 25, 2025

After more digging, to answer your question @eirikbakke: I can't say for sure the cause of this.
There are multiple factors:

  • Windows is closed source, so tracing the interaction between user mode and the kernel is difficult. I don't have enough knowledge to do this. Does pfnWowEmptyClipBoard post the related window message? Is it done in the kernel mode?
  • JDK on its own has issues with concurrent clipboard access, as was figured out
  • There is a lot of code without comments in AWT making investigation harder. For example, my current fix ensures that clipboard interactions from the troolkit thread are enqueued to EDT if there is one running. However, is this the correct fix? There is this comment https://github.com/openjdk/jdk/blob/cfeb7d6c964f63184c939f6f0625c6e7f1afdc31/src/java.desktop/windows/classes/sun/awt/windows/WClipboard.java#L67. While it talks about not calling the code on the toolkit thread, what if the opposite is also true and there are scenarios when some code should never be called on the EDT? There is no further elaboration on what "security holes" authors of this class talks about.
  • Further extending the second point, native code is also missing a lot of comments. Take the problematic AwtClipboard::IsGettingOwnership check (which I now think is probably the culprit). I don't have the time at this moment to do further diagnostics on this. From the sources:
    /* This flag is set while we call EmptyClipboard to indicate to
    WM_DESTROYCLIPBOARD handler that we are not losing ownership */

    But why is this flag needed? If you follow along, you would assume, that if the window messages are processed on the toolkit thread, this seem to make no sense. The only point at which this flag is set to true is here. Why? This method never blocks the caller thread and never waits for the WM_DESTROYCLIPBOARD to arrive. The WM_DESTROYCLIPBOARD arrives on the toolkit thread, but the caller can be any thread. So this is just a race condition and a check here seem bogus to me. Thats why a lot of users probably see the bug manifest only after some time. But for me it happens immediately. I did further experiments yesterday, and noticed that if I just spam CTRL+C on a non-empty line, which apparently copies the current line, this is enough to trigger the bug if you quickly swap to the secondary window.

One thing for certain is that there is a flaw at how JDK handles clipboard access IMO. Logic like this for example is equivalent to this:

try {
    lock.lock();
    ....
} finally {
    lock.unlock();
}

What if Lock::lock throws? finally closure will just blindly unlock the lock if another thread is currently holding it? Absolutely, if there is no protection against that. According to ReactOS source code, there is some protection in CloseClipboard. But I can't for sure that there is in Windows. Maybe the current logic is correct and intended to be this way. But this seems very wrong to me

@johntor
Copy link
Contributor

johntor commented Feb 27, 2025

I can confirm that clipboard-agent is working fine!

Nice job!
J!

@eirikbakke
Copy link
Contributor Author

eirikbakke commented Feb 28, 2025

@xxDark Thanks for the further investigation! That's a very useful writeup for the future.

@johntor Thanks for testing! Did you observe copy/paste bugs in the past, which were eliminated by the agent? EDIT: And in particular, I noticed in another thread that you have also been experimenting with changes to the SecurityManager settings. It would be useful to know how sure you are that the bug is fixed by the xxDark's "agent" workaround, as opposed to other changes you may have experimented with. E.g. disable the agent, run NetBeans until you see the cut/paste bugs, then enable the agent without making any other configuration changes, and see how long you can go without seeing the cut/paste bugs.

@johntor
Copy link
Contributor

johntor commented Mar 9, 2025

@johntor Thanks for testing! Did you observe copy/paste bugs in the past, which were eliminated by the agent? EDIT: And in particular, I noticed in another thread that you have also been experimenting with changes to the SecurityManager settings. It would be useful to know how sure you are that the bug is fixed by the xxDark's "agent" workaround, as opposed to other changes you may have experimented with. E.g. disable the agent, run NetBeans until you see the cut/paste bugs, then enable the agent without making any other configuration changes, and see how long you can go without seeing the cut/paste bugs.

Sorry for my late reply but I am sick the last 7 days.

Using no additional parameters, after 5-10 minutes clipboard is working only internally and I was not able to find a way to make it work again. Not cut / double copy / copy-paste inside restore the normal operation. I always have to restart NB to make it work.

I'm willing to help (at least with testing, because I don't understand the code)!
Take care!
J!

@johntor
Copy link
Contributor

johntor commented Mar 9, 2025

With the default options

CopyPasteTest results

I updated the old CopyPasteTest because some idiots in Microsoft killed the Alt+Tab functionality after 40 years.

Now I am ready for testing...

@johntor
Copy link
Contributor

johntor commented Mar 9, 2025

Using clipboard-agent 100% success

CopyPasteTest

@mbien
Copy link
Member

mbien commented Mar 9, 2025

@johntor thanks for the feedback. Good to hear that the JDK patch is working for you. I think you are the first of the affected who tested it and also reported the results.

@eirikbakke
Copy link
Contributor Author

@johntor Thanks a lot for testing again. This puts some more confidence @xxDark's JDK patch.

@eirikbakke
Copy link
Contributor Author

@mbien Silly question: When I tag a PR like this with ci:dev-build, where does the dev build go? Is it deleted after a certain number of days?

@mbien
Copy link
Member

mbien commented Mar 10, 2025

@eirikbakke check label tooltip

@eirikbakke
Copy link
Contributor Author

@mbien Ah, thanks! Will do the lock-unlock trick to make a new build...

@apache apache locked and limited conversation to collaborators Mar 10, 2025
@apache apache unlocked this conversation Mar 10, 2025
@eirikbakke
Copy link
Contributor Author

OK, I have had a chance to do my own experiments now, on my old Windows 11 laptop on the latest master (debf00a).

The clipboard bug is actually super-easy to reproduce for me now:

  1. Copy/Paste some text from NetBeans to Notepad.
  2. Copy/Paste some text from Notepad to NetBeans.
  3. Copy/Paste some textfrom NetBeans to Notepad. The last paste will paste nothing. Nothing I do after that will make it possible to copy something from NetBeans to Notepad again.

Conclusions:

  1. This PR does not help, as @xxDark predicted. So I'm closing it.
  2. The -J-DTopSecurityManager.disable=false -J-Djava.security.manager=allow -J-Dorg.netbeans.NbClipboard.level=FINEST configuration does not help on my machine. I verified that log entries was being printed at FINEST, so the settings were definitively applied.
  3. @xxDark's agent patch (version d2c21935f195) worked. I turned it on, and off, and on, and off, and on again; each time when the agent was enabled, cut and paste worked fine. Each time when the agent was disabled, cut and paste was broken. Completely reliably.

TL;DR: I have a high confidence that @xxDark's patch works.

I'll continue the discussion over at #7051 ... thanks again, everyone!

@eirikbakke eirikbakke closed this Mar 11, 2025
@eirikbakke
Copy link
Contributor Author

Actually, @xxDark , for my own experiments, is there an easy way I can "sabotage" your clipboard-agent bytecode so that it does everything exactly the same except presumably fixing the bug? I just want to make 100% sure that fixing the JDK bug is what fixes the NetBeans clipboard issues, and not some slight change in timing caused by the presence of the agent.

@xxDark
Copy link

xxDark commented Mar 11, 2025

Actually, @xxDark , for my own experiments, is there an easy way I can "sabotage" your clipboard-agent bytecode so that it does everything exactly the same except presumably fixing the bug? I just want to make 100% sure that fixing the JDK bug is what fixes the NetBeans clipboard issues, and not some slight change in timing caused by the presence of the agent.

You can compile OpenJDK with asserts in awt enabled. Nothing else I can think of
There may be some build here by Aleksey but I can't be 100% certain (or if there are even Windows builds here somewhere). Or here. But again, I don't know if any of the builds here have AWT assertions on

@eirikbakke
Copy link
Contributor Author

@xxDark Oh, yeah, there's 100% a JDK bug that was fixed by the patch, given the assertion failures. Was just trying to make the smallest possible change that would prove that it is the same bug as we are seeing in NetBeans. But I think I'm already convinced... turning the agent on and off very reliably fixed and un-fixed the bug several times when I tested it.

Thanks a bunch!

@xxDark
Copy link

xxDark commented Mar 11, 2025

Building JDK on Windows is straightforward if one reads and follows all instructions here (Requirements for cygwin, quirks building on Windows) (Replace jdk repo with one of your choice, for example jdk17u-dev, but I would recommend always looking at the instructions from the mainline). The most painful thing is getting Microsoft tooling installed correctly. To build the JDK with AWT assertions on, I ran this bash configure --with-extra-cflags='-DDEBUG=1' --with-extra-cxxflags='-DDEBUG=1' --with-tools-dir=/cygdrive/d/MSV2019/VC/Auxiliary --with-boot-jdk=/cygdrive/d/jdks/23.0.1
After that make images should be the only command needed to build a working JDK image.
For Visual Studio 2019, I have these components installed
image

@eirikbakke
Copy link
Contributor Author

Thanks for typing up instructions! My main challenge is that my only Windows laptop is a very old and slow one...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:dev-build [ci] produce a dev-build zip artifact (7 days expiration, see link on workflow summary page) do not merge Don't merge this PR, it is not ready or just demonstration purposes. Platform [ci] enable platform tests (platform/*) UI User Interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants