Skip to content

Conversation

@tsunoyu
Copy link
Contributor

@tsunoyu tsunoyu commented Mar 5, 2025

Enhance Privacy Sandbox Related Website Sets and Apple App Site Association Tracking

This pull request enhances the tracking of Privacy Sandbox Related Website Sets (RWS) features and Apple App Site Association by improving data collection and parsing for:

  • Privacy Sandbox:

    • Related Website Sets: Parses the /.well-known/related-website-set.json file to extract detailed information about website relationships, including primary domain, associated sites, service sites, ccTLDs, and rationale.
  • Apple App Site Association: Parses the /.well-known/apple-app-site-association file to detect the presence of applinks, webcredentials, activitycontinuation, and appclips services.

Changes:

  • Updated parseResponse function to handle and parse the following files:
    • /.well-known/related-website-set.json
    • /.well-known/apple-app-site-association

Rationale:

  • Privacy Sandbox:

    • Related Website Sets: Collecting detailed information about website relationships provides valuable insights into the usage and adoption of this feature.
  • Apple App Site Association: Detecting the presence of different services in the apple-app-site-association file helps us understand how websites integrate with Apple features like universal links, shared web credentials, and App Clips.


Test websites:

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm hopefully fixing the linting error in #754 so will merge this after that one.

@tsunoyu
Copy link
Contributor Author

tsunoyu commented Mar 6, 2025

@tunetheweb After reviewing the test results, I noticed that the implementation was collecting more data than intended. I’ve made a new commit to refine the logic and ensure it aligns with the expected behavior. Is Lint Code Base error something I need to further action?

@tunetheweb
Copy link
Member

Is Lint Code Base error something I need to further action?

Just committed the fix and updated this branch with the last code. Fingers crossed that fixes it!!

@tsunoyu
Copy link
Contributor Author

tsunoyu commented Mar 6, 2025

@tunetheweb Thank you for your review and support. Would it be possible to run one more test with following additonal URLs to check the collected data?

@tunetheweb
Copy link
Member

Would it be possible to run one more test with following additonal URLs to check the collected data?

Doing that last commit to remove the trailing bank space will trigger the re-test. You can always do a dummy commit (e.g; alter a comment) if you need to do this again.

https://google.com/ (instead of https://www.google.com/)

Let's see how this goes in this latest test run, but suspect the auto-redirect will ruin this here. Maybe the custom metric should also look at domain as well as origin?

@tsunoyu
Copy link
Contributor Author

tsunoyu commented Mar 6, 2025

@tunetheweb is this something you could help? I was reviwing the code but could not figure out what is causing this issue.

I am seeing incorrect "found: false" response when trying to crawl /.well-known/related-website-set.json particularly for https://google.com/ .
Testing the https://google.com/ withi this new logic it should be able to access the https://google.com/.well-known/related-website-set.json path which has following information.

{
   "contact": "[email protected]",
   "primary": "https://google.com",
   "associatedSites": [
      "https://youtube.com",
      "https://android.com"
   ],
   "serviceSites": [
      "https://googleusercontent.com"
   ],
   "rationaleBySite": {
      "https://youtube.com": "Clearly presents an affiliation with Google.com via shared branding, account login, and privacy policy. Used for preserving user journeys related to sign-in / authentication.",
      "https://android.com": "Clearly presents an affiliation with Google.com via shared branding, account login, and privacy policy. Used for preserving user journeys related to sign-in / authentication.",
      "https://googleusercontent.com": "Secure content serving"
   }
}

However the test result shows

{
  "/.well-known/related-website-set.json": {
    "found": false
  }
}

For https://mercadolibre.com it is working as intended with the response below

    "/.well-known/related-website-set.json": {
      "found": true,
      "data": {
        "primary": "https://mercadolibre.com",
        "associatedSites": [
          "https://mercadolivre.com",
          "https://mercadopago.com",
          "https://mercadoshops.com",
          "https://portalinmobiliario.com",
          "https://tucarro.com"
        ]
      }
    }

@tunetheweb
Copy link
Member

tunetheweb commented Mar 6, 2025

Yes this is what I thought would happen as mentioned about your test is for site https://google.com/ which then redirects to https://www.google.com/. Then you load /.well-known/related-website-set.json. Since that's a relative path, it loads relative to the current URL not the original test URL.

You could try changing it to load $WPT_TEST_URL/.well-known/related-website-set.json (I can't remember if $WPT_TEST_URL is the original test URL but think it is).

However, if testing www.google.com then shouldn't it look at both /.well-known/related-website-set.json and google.com/.well-known/related-website-set.json? So should your custom metric test both URLs?

@tsunoyu tsunoyu force-pushed the update-rws-apple branch from 924550b to 1c5ca06 Compare March 6, 2025 12:47
@tunetheweb tunetheweb marked this pull request as draft March 6, 2025 14:14
@pmeenan
Copy link
Member

pmeenan commented Mar 6, 2025

FWIW, you should probably parse $WPT_TEST_URL and use the parts to construct the relative url because it won't always be the top-level, it will be whatever URL was tested (and yes, it is the URL that was submitted for the test, not whatever page finally ended up loading).

@tsunoyu tsunoyu marked this pull request as ready for review March 6, 2025 17:01
@tsunoyu
Copy link
Contributor Author

tsunoyu commented Mar 7, 2025

CORS Issue Preventing Data Fetch

I tried running the following code in the browser console:

fetch("https://google.com/.well-known/related-website-set.json")
  .then(response => response.json())
  .then(console.log)
  .catch(console.error);

However, the request was blocked due to CORS policy restrictions:

Access to fetch at 'https://google.com/.well-known/related-website-set.json' 
from origin 'https://www.google.com' has been blocked by CORS policy: 
No 'Access-Control-Allow-Origin' header is present on the requested resource.

Could it be the case that because of this restriction, we are unable to retrieve the required data.


Since this issue prevents us from fetching data, it might be alright just go with the previous commit:

🔗 d1f41d45721607abe7549efc13a8be59742e711b

and try to get the data if it is not blocked. Or are we able to get data even there is a block by CORS policy?

@tunetheweb
Copy link
Member

Ah CORS. Damn you. Yeah I think you're right so stuck with only measuring same-origin. Can you revert to that version and then let's merge this.

@github-actions
Copy link

github-actions bot commented Mar 7, 2025

https://almanac.httparchive.org/en/2022/

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": false
    },
    "/.well-known/apple-app-site-association": {
      "found": false
    },
    "/.well-known/related-website-set.json": {
      "found": false
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": false
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {}
      }
    },
    "/.well-known/security.txt": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": true,
        "url": "https://almanac.httparchive.org/.well-known/security.txt/",
        "content_type": "text/html; charset=utf-8"
      }
    },
    "/.well-known/change-password": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": true,
        "url": "https://almanac.httparchive.org/.well-known/change-password/"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://almanac.httparchive.org/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}
https://timesinternet.in

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": false
    },
    "/.well-known/apple-app-site-association": {
      "found": false
    },
    "/.well-known/related-website-set.json": {
      "found": false
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": false
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {}
      }
    },
    "/.well-known/security.txt": {
      "found": false,
      "data": {
        "status": 403,
        "redirected": false,
        "url": "https://timesinternet.in/.well-known/security.txt",
        "content_type": "text/html"
      }
    },
    "/.well-known/change-password": {
      "found": false,
      "data": {
        "status": 403,
        "redirected": false,
        "url": "https://timesinternet.in/.well-known/change-password"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": false,
      "data": {
        "status": 403,
        "redirected": false,
        "url": "https://timesinternet.in/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}
https://mercadolibre.com

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": true,
      "data": {
        "deep_linking": true,
        "credential_sharing": true
      }
    },
    "/.well-known/apple-app-site-association": {
      "found": true,
      "data": {
        "app_links": true,
        "web_credentials": false
      }
    },
    "/.well-known/related-website-set.json": {
      "found": true,
      "data": {
        "primary": "https://mercadolibre.com",
        "associatedSites": [
          "https://mercadolivre.com",
          "https://mercadopago.com",
          "https://mercadoshops.com",
          "https://portalinmobiliario.com",
          "https://tucarro.com"
        ]
      }
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": false
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {}
      }
    },
    "/.well-known/security.txt": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://mercadolibre.com/.well-known/security.txt",
        "content_type": "text/html"
      }
    },
    "/.well-known/change-password": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://mercadolibre.com/.well-known/change-password"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://mercadolibre.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}
https://on.com

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": true,
      "data": {
        "deep_linking": true,
        "credential_sharing": true
      }
    },
    "/.well-known/apple-app-site-association": {
      "found": true,
      "data": {
        "app_links": true,
        "web_credentials": true
      }
    },
    "/.well-known/related-website-set.json": {
      "found": false
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": false
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {
          "*": [
            "/account",
            "/authentication",
            "/account-confirmation"
          ]
        }
      }
    },
    "/.well-known/security.txt": {
      "found": true,
      "data": {
        "status": 200,
        "redirected": true,
        "url": "https://www.on.com/en-us/.well-known/security.txt",
        "content_type": "text/html;charset=utf-8",
        "signed": false,
        "all_required_exist": false,
        "only_one_requirement_broken": false,
        "valid": false
      }
    },
    "/.well-known/change-password": {
      "found": true,
      "data": {
        "status": 200,
        "redirected": true,
        "url": "https://www.on.com/en-us/.well-known/change-password"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": true,
      "data": {
        "status": 200,
        "redirected": true,
        "url": "https://www.on.com/en-us/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}
https://google.com

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": true,
      "data": {
        "deep_linking": true,
        "credential_sharing": true
      }
    },
    "/.well-known/apple-app-site-association": {
      "found": false
    },
    "/.well-known/related-website-set.json": {
      "found": false
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": true,
      "data": {
        "provider_urls": [
          "https://accounts.google.com/gsi/fedcm.json"
        ],
        "accounts_endpoint": "https://accounts.google.com/gsi/fedcm/listaccounts",
        "login_url": "https://accounts.google.com/gsi/fedcm/signin"
      }
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {
          "*": [
            "/accounts/ClientLogin",
            "/accounts/ClientAuth",
            "/accounts/o8",
            "/shopping/ratings/account/metrics",
            "/nonprofits/account/"
          ]
        }
      }
    },
    "/.well-known/security.txt": {
      "found": true,
      "data": {
        "status": 200,
        "redirected": false,
        "url": "https://www.google.com/.well-known/security.txt",
        "content_type": "text/plain",
        "signed": false,
        "contact": [
          "https://g.co/vulnz",
          "mailto:[email protected]"
        ],
        "expires": [
          "2025-04-01T00:00:00z"
        ],
        "encryption": [
          "https://services.google.com/corporate/publickey.txt"
        ],
        "acknowledgments": [
          "https://bughunters.google.com/"
        ],
        "policy": [
          "https://g.co/vrp"
        ],
        "hiring": [
          "https://g.co/SecurityPrivacyEngJobs"
        ],
        "all_required_exist": true,
        "only_one_requirement_broken": false,
        "valid": true
      }
    },
    "/.well-known/change-password": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://www.google.com/.well-known/change-password"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://www.google.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}
https://www.google.com

WPT result details

Changed custom metrics values:

{
  "_well-known": {
    "/.well-known/assetlinks.json": {
      "found": true,
      "data": {
        "deep_linking": true,
        "credential_sharing": true
      }
    },
    "/.well-known/apple-app-site-association": {
      "found": false
    },
    "/.well-known/related-website-set.json": {
      "found": false
    },
    "/.well-known/privacy-sandbox-attestations.json": {
      "found": false
    },
    "/.well-known/gpc.json": {
      "found": false
    },
    "/.well-known/web-identity": {
      "found": true,
      "data": {
        "provider_urls": [
          "https://accounts.google.com/gsi/fedcm.json"
        ],
        "accounts_endpoint": "https://accounts.google.com/gsi/fedcm/listaccounts",
        "login_url": "https://accounts.google.com/gsi/fedcm/signin"
      }
    },
    "/.well-known/passkey-endpoints": {
      "found": false
    },
    "/.well-known/webauthn": {
      "found": false
    },
    "/robots.txt": {
      "found": true,
      "data": {
        "matched_disallows": {
          "*": [
            "/accounts/ClientLogin",
            "/accounts/ClientAuth",
            "/accounts/o8",
            "/shopping/ratings/account/metrics",
            "/nonprofits/account/"
          ]
        }
      }
    },
    "/.well-known/security.txt": {
      "found": true,
      "data": {
        "status": 200,
        "redirected": false,
        "url": "https://www.google.com/.well-known/security.txt",
        "content_type": "text/plain",
        "signed": false,
        "contact": [
          "https://g.co/vulnz",
          "mailto:[email protected]"
        ],
        "expires": [
          "2025-04-01T00:00:00z"
        ],
        "encryption": [
          "https://services.google.com/corporate/publickey.txt"
        ],
        "acknowledgments": [
          "https://bughunters.google.com/"
        ],
        "policy": [
          "https://g.co/vrp"
        ],
        "hiring": [
          "https://g.co/SecurityPrivacyEngJobs"
        ],
        "all_required_exist": true,
        "only_one_requirement_broken": false,
        "valid": true
      }
    },
    "/.well-known/change-password": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://www.google.com/.well-known/change-password"
      }
    },
    "/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/": {
      "found": false,
      "data": {
        "status": 404,
        "redirected": false,
        "url": "https://www.google.com/.well-known/resource-that-should-not-exist-whose-status-code-should-not-be-200/"
      }
    }
  }
}

@tunetheweb tunetheweb merged commit 3f73319 into HTTPArchive:main Mar 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants