-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Add support for language exclusion in sitemap.xml #7778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add support for language exclusion in sitemap.xml #7778
Conversation
This PR adds the ability to exclude specific languages from sitemap.xml generation. ### What’s new: - New property `DisallowLanguages` in `SitemapXmlSettings` - `SitemapModelFactory` and alternate URL logic now respects this setting - Default behavior remains unchanged (no exclusions) ### Why: In some scenarios, site owners may want to: - Prevent certain language versions from being listed in sitemaps (e.g. test/staging content) - Avoid conflicting SEO signals when the same languages are excluded in `robots.txt` This setting provides fine-grained control and separates concerns between `robots.txt` and sitemap configuration. Closes nopSolutions#7777
5609f6f
to
aae0eed
Compare
aae0eed
to
052034e
Compare
@@ -912,16 +878,26 @@ var name when name.Equals(nameof(ProductTag), StringComparison.InvariantCultureI | |||
_ => GetUrlHelper().RouteUrl(routeName, values, protocol) | |||
}; | |||
|
|||
//url for current language | |||
var url = await routeUrlAsync(routeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code does not need to be removed. By default, the main address is the address without localization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! However, I'd like to clarify why we replaced this part:
var url = await routeUrlAsync(routeName,
getRouteParamsAwait != null ? await getRouteParamsAwait(null) : null,
await GetHttpProtocolAsync());
with this one:
var languages = _localizationSettings.SeoFriendlyUrlsForLanguagesEnabled
? (await _languageService.GetAllLanguagesAsync(storeId: store.Id))
.Where(lang => !_sitemapXmlSettings.DisallowLanguages.Contains(lang.Id))
.ToList()
: null;
var workingLanguage = await _workContext.GetWorkingLanguageAsync();
var language = languages?.FirstOrDefault(lang => lang.Id == store.DefaultLanguageId)
?? languages?.FirstOrDefault()
?? workingLanguage;
var url = await routeUrlAsync(
routeName,
getRouteParamsAwait != null ? await getRouteParamsAwait(language.Id) : null,
await GetHttpProtocolAsync());
if (language.Id != workingLanguage.Id)
url = GetLocalizedUrl(url, language);
Why this change?
Even though the original version uses getRouteParamsAwait(null)
, that null
does not eliminate language context. In fact:
- For some entities (like News),
LanguageId
is passed explicitly (e.g.,news.LanguageId
). - For others, like in
GetSeoRouteParamsAwait(...)
, theLanguageId
is not provided, and the system falls back to the current language fromIWorkContext
, resolved in_urlRecordService.GetSeNameAsync(...)
.
This makes the result dependent on the active user/session, which is undesirable for sitemap generation.
In addition to that, the default IOutboundParameterTransformer
implementation in LanguageParameterTransformer.cs
also:
- First checks for the language in
RouteValues
- Then falls back to
IWorkContext
This behavior is not fully deterministic and leads to inconsistencies in the resulting URLs.
That’s why we explicitly select a language
for sitemap URL generation:
- Prefer the default store language (if allowed)
- Fallback to the first allowed language
- Fallback to the current one (last resort)
And finally, we use this condition:
if (language.Id != workingLanguage.Id)
url = GetLocalizedUrl(url, language);
to ensure the generated URL reflects the explicitly chosen language, overriding whatever may have been implicitly applied via IOutboundParameterTransformer
.
This guarantees that all sitemap entries are generated consistently and intentionally, regardless of the current user context or route state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I understand your idea and the problem you are solving, but it is not related to the ticket itself. Let's create a separate ticket and a separate pull request for this task. I will review everything again. As for the task #7777 itself, we decided not to implement it in the core for now
I think this logic is unnecessary in this context, it will be enough to simply filter languages using the DisallowLanguages setting |
/// <summary> | ||
/// Disallow languages | ||
/// </summary> | ||
public List<int> DisallowLanguages { get; set; } = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To configure it, you will have to add UI similar to RobotsTxtSettings.DisallowLanguages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now there is no UI for administrating the SitemapXmlSettings so I will add new section for all settings, not just for this new property if you are ok with that?
Hi @bambuca. Thank you very much for your help. I left some comments on your code. |
Thanks! Just to clarify — this PR actually addresses two related aspects:
This way, the sitemap is generated in a deterministic and controlled manner, aligned with store and SEO settings. (Reference: earlier comment) |
This PR adds the ability to exclude specific languages from sitemap.xml generation.
What’s new:
DisallowLanguages
inSitemapXmlSettings
SitemapModelFactory
and alternate URL logic now respects this settingWhy:
In some scenarios, site owners may want to:
robots.txt
This setting provides fine-grained control and separates concerns between
robots.txt
and sitemap configuration.Closes #7777