Skip to content

Conversation

bambuca
Copy link
Contributor

@bambuca bambuca commented Jul 16, 2025

This PR adds the ability to exclude specific languages from sitemap.xml generation.

What’s new:

  • New property DisallowLanguages in SitemapXmlSettings
  • SitemapModelFactory and alternate URL logic now respects this setting
  • Default behavior remains unchanged (no exclusions)

Why:

In some scenarios, site owners may want to:

  • Prevent certain language versions from being listed in sitemaps (e.g. test/staging content)
  • Avoid conflicting SEO signals when the same languages are excluded in robots.txt

This setting provides fine-grained control and separates concerns between robots.txt and sitemap configuration.

Closes #7777

This PR adds the ability to exclude specific languages from sitemap.xml generation.

### What’s new:
- New property `DisallowLanguages` in `SitemapXmlSettings`
- `SitemapModelFactory` and alternate URL logic now respects this setting
- Default behavior remains unchanged (no exclusions)

### Why:
In some scenarios, site owners may want to:
- Prevent certain language versions from being listed in sitemaps (e.g. test/staging content)
- Avoid conflicting SEO signals when the same languages are excluded in `robots.txt`

This setting provides fine-grained control and separates concerns between `robots.txt` and sitemap configuration.

Closes nopSolutions#7777
@bambuca bambuca force-pushed the feature/7777-sitemap-exclude-languages branch from 5609f6f to aae0eed Compare July 17, 2025 16:16
@bambuca bambuca force-pushed the feature/7777-sitemap-exclude-languages branch from aae0eed to 052034e Compare July 17, 2025 16:17
@@ -912,16 +878,26 @@ var name when name.Equals(nameof(ProductTag), StringComparison.InvariantCultureI
_ => GetUrlHelper().RouteUrl(routeName, values, protocol)
};

//url for current language
var url = await routeUrlAsync(routeName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not need to be removed. By default, the main address is the address without localization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! However, I'd like to clarify why we replaced this part:

var url = await routeUrlAsync(routeName,
    getRouteParamsAwait != null ? await getRouteParamsAwait(null) : null,
    await GetHttpProtocolAsync());

with this one:

var languages = _localizationSettings.SeoFriendlyUrlsForLanguagesEnabled
    ? (await _languageService.GetAllLanguagesAsync(storeId: store.Id))
        .Where(lang => !_sitemapXmlSettings.DisallowLanguages.Contains(lang.Id))
        .ToList()
    : null;

var workingLanguage = await _workContext.GetWorkingLanguageAsync();
var language = languages?.FirstOrDefault(lang => lang.Id == store.DefaultLanguageId)
    ?? languages?.FirstOrDefault()
    ?? workingLanguage;

var url = await routeUrlAsync(
    routeName,
    getRouteParamsAwait != null ? await getRouteParamsAwait(language.Id) : null,
    await GetHttpProtocolAsync());

if (language.Id != workingLanguage.Id)
    url = GetLocalizedUrl(url, language);

Why this change?

Even though the original version uses getRouteParamsAwait(null), that null does not eliminate language context. In fact:

  • For some entities (like News), LanguageId is passed explicitly (e.g., news.LanguageId).
  • For others, like in GetSeoRouteParamsAwait(...), the LanguageId is not provided, and the system falls back to the current language from IWorkContext, resolved in _urlRecordService.GetSeNameAsync(...).

This makes the result dependent on the active user/session, which is undesirable for sitemap generation.

In addition to that, the default IOutboundParameterTransformer implementation in LanguageParameterTransformer.cs also:

  • First checks for the language in RouteValues
  • Then falls back to IWorkContext

This behavior is not fully deterministic and leads to inconsistencies in the resulting URLs.

That’s why we explicitly select a language for sitemap URL generation:

  • Prefer the default store language (if allowed)
  • Fallback to the first allowed language
  • Fallback to the current one (last resort)

And finally, we use this condition:

if (language.Id != workingLanguage.Id)
    url = GetLocalizedUrl(url, language);

to ensure the generated URL reflects the explicitly chosen language, overriding whatever may have been implicitly applied via IOutboundParameterTransformer.

This guarantees that all sitemap entries are generated consistently and intentionally, regardless of the current user context or route state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I understand your idea and the problem you are solving, but it is not related to the ticket itself. Let's create a separate ticket and a separate pull request for this task. I will review everything again. As for the task #7777 itself, we decided not to implement it in the core for now

@skoshelev
Copy link
Contributor

I think this logic is unnecessary in this context, it will be enough to simply filter languages using the DisallowLanguages setting

/// <summary>
/// Disallow languages
/// </summary>
public List<int> DisallowLanguages { get; set; } = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To configure it, you will have to add UI similar to RobotsTxtSettings.DisallowLanguages

Copy link
Contributor Author

@bambuca bambuca Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now there is no UI for administrating the SitemapXmlSettings so I will add new section for all settings, not just for this new property if you are ok with that?

@skoshelev
Copy link
Contributor

Hi @bambuca.

Thank you very much for your help. I left some comments on your code.

@bambuca
Copy link
Contributor Author

bambuca commented Jul 24, 2025

I think this logic is unnecessary in this context, it will be enough to simply filter languages using the DisallowLanguages setting

Thanks! Just to clarify — this PR actually addresses two related aspects:

  1. It ensures that alternate URLs are generated only for allowed languages (those not listed in DisallowLanguages), which is especially important when SeoFriendlyUrlsForLanguagesEnabled is true.

  2. It also makes sure that the main language of the sitemap is consistent by using store.DefaultLanguageId, instead of relying on the current session/context (IWorkContext), which could vary per request.
    This avoids situations where, for example, the admin user has a language selected that is not allowed in the sitemap (e.g. listed in DisallowLanguages) — and yet the whole sitemap gets generated in that disallowed language.

This way, the sitemap is generated in a deterministic and controlled manner, aligned with store and SEO settings.

(Reference: earlier comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support per-language exclusions in sitemap.xml generation
3 participants