NReed.Dev

Sitecore | Optimizely | .NET

Adding a Sitemap with Custom Urls

In the realm of web development and content management systems, every project comes with its unique set of challenges. Recently, I had the opportunity to work on a fascinating project involving a client’s existing Sitecore 10.3 instance without SXA (Sitecore Experience Accelerator) installed. The task at hand? To support multiple country sites under the same domain. This meant overcoming technical hurdles to ensure seamless navigation and accessibility across different regions while maintaining a unified online presence.

Overview of the Project

Our client’s requirement was clear: they needed to expand their online presence to cater to various international markets, all under the umbrella of a single domain. However, their existing Sitecore setup lacked the necessary framework, such as SXA, to facilitate this expansion seamlessly. To address this, we embarked on a journey to implement custom solutions that would not only support multiple country sites but also ensure a cohesive user experience.

Customizations Implemented

One of the primary challenges we encountered was enabling URLs to share the same domain while remaining distinct for each country site. To achieve this, we opted to install SXA and leverage its capabilities. However, integrating SXA into an existing non-SXA Sitecore instance posed its own set of challenges. We needed to override the default Sitemap functionality to ensure compatibility and functionality across the board. This involved customizing the SxaSitemapHandler to accommodate the specific requirements of our project.

Additionally, we needed to address the issue of custom URLs for each country site. The out-of-the-box functionality of Sitecore’s ItemCrawler in Sitecore.XA.Foundation.SiteMetadata.Sitemap didn’t quite meet our needs. We tackled this by patching the ItemCrawler to support custom URLs, enabling us to generate sitemaps tailored to each country site’s unique URL structure.

Technical Details

For those interested in the nitty-gritty technical aspects, here’s a breakdown of the customizations we implemented:

XML
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
	<sitecore>
		<pipelines>
			<httpRequestBegin>
				<processor 
					patch:instead="*[@type='Sitecore.XA.Foundation.SiteMetadata.Pipelines.HttpRequestBegin.SxaSitemapHandler, Sitecore.XA.Foundation.SiteMetadata']"
        	type="Sitecore.Project.International.Sitemap.CustomSitemapHandler, Sitecore.Project" resolve="true" CacheExpiration="30">
				</processor>
			</httpRequestBegin>
		</pipelines>
		<services>
			<register serviceType="Sitecore.XA.Foundation.SiteMetadata.Services.ISitemapManager, Sitecore.XA.Foundation.SiteMetadata" 
					  implementationType="Sitecore.Project.Sitemap.CustomSitemapManager, Sitecore.Project" lifetime="Singleton"
					  patch:instead="register[@implementationType='Sitecore.XA.Foundation.SiteMetadata.Services.SitemapManager,
					   Sitecore.XA.Foundation.SiteMetadata']"/>
		</services>
		<experienceAccelerator>
			<siteMetadata>
				<sitemapItemCrawler>
					<add name="itemCrawler" patch:instead="*[@type='Sitecore.XA.Foundation.SiteMetadata.Sitemap.ItemCrawler, Sitecore.XA.Foundation.SiteMetadata']"
										 	type="Sitecore.Project.Sitemap.CustomSitemapItemCrawler,Sitecore.Project"/>
				</sitemapItemCrawler>
			</siteMetadata>
		</experienceAccelerator>
	</sitecore>
</configuration>

Our first goal was to make sure we could access our sitemap on every domain we’ve got live on our site. The SxaSitemapHandler handles this job, directing requests to ‘/sitemap.xml’ to the correct site and pulling in the necessary settings to generate the right sitemap.

But with multiple sites all using the same domain, things got a bit trickier.

I added the customization to the Handler so that it would find the right sitemap regardless of domain. In addition I allowed it to default via lines 17-22 to find the pre-existing sitemap.xml file that was for the original site.

C#
public class CustomSitemapHandler : SxaSitemapHandler
    {
        protected readonly ISitemapManager CustomSitemapManager = ServiceLocator.ServiceProvider.GetService<CustomSitemapManager>();
        public override void Process(HttpRequestArgs args)
        {
             Uri url = HttpContext.Current.Request.Url;
            bool flag = url.PathAndQuery.EndsWith("/sitemap.xml", StringComparison.OrdinalIgnoreCase);
            string fileName = Path.GetFileName(url.PathAndQuery);
            bool flag2 = fileName != null && 
                fileName.EndsWith(".xml", StringComparison.OrdinalIgnoreCase) && 
                fileName.StartsWith("sitemap-", StringComparison.OrdinalIgnoreCase);
            if (!(flag || flag2))
            {
                return;
            }
            //If it is not any of the new sites (known by lack of additional parameters) then default to serving the original 
            //sitemap.xml file for the original site
            var parameters = url.Segments.Skip(1).Where(x => x != string.Empty).ToArray();
            if (!parameters.Any() || !IsCustomUrl(parameters)) 
            { 
                return;
            }
            Context.Site = GetSiteByUrl(url);
            
            SitemapSettings sitemapSettings = GetSitemapSettings();
            if (sitemapSettings == null)
            {
                Log.Info("SitemapHandler (sitemap.xml) : missing sitemap settings item", this);
                return;
            }
            if (sitemapSettings.CacheType == SitemapStatus.Inactive)
            {
                Log.Info("SitemapHandler (sitemap.xml) : " + $"sitemap is off (status : {sitemapSettings.CacheType})", this);
                return;
            }
            SitemapContent sitemap = ((CustomSitemapManager)SitemapManager).GetCustomSitemap(Context.Site);
            if (sitemap == null)
            {
                return;
            }
            if (flag)
            {
                Item settingsItem = GetSettingsItem();
                CheckboxField checkboxField = 
                		settingsItem.Fields[Sitecore.XA.Foundation.SiteMetadata.Templates.Sitemap._SitemapSettings.Fields.SitemapIndex];
                if (sitemap.Values.Count == 1 && !checkboxField.Checked)
                {
                    SetResponse(args.HttpContext.Response, sitemap.Values.FirstOrDefault());
                }
                else
                {
                    NameValueCollection nameValueCollection = new NameValueCollection();
                    if (checkboxField.Checked)
                    {
                        nameValueCollection.Merge(GetExternalSitemaps(settingsItem));
                    }
                    int num = 1;
                    string indexUrlPrefix = GetIndexUrlPrefix(settingsItem);
                    foreach (string value2 in sitemap.Values)
                    {
                        _ = value2;
                        string value = $"{indexUrlPrefix}/sitemap-{num++}.xml";
                        nameValueCollection.Add($"{Guid.NewGuid()}", value);
                    }
                    ISitemapGenerator service = ServiceLocator.ServiceProvider.GetService<ISitemapGenerator>();
                    SetResponse(args.HttpContext.Response, service.BuildSitemapIndex(nameValueCollection));
                }
            }
            else
            {
                if (!int.TryParse(Path.GetFileNameWithoutExtension(url.PathAndQuery).Replace("sitemap-", string.Empty), out var result) ||
                	 sitemap.Values.Count < result)
                {
                    return;
                }
                SetResponse(args.HttpContext.Response, sitemap.Values[result - 1]);
            }
            args.AbortPipeline();
        }
        protected virtual bool IsSiteMapRequest(Uri url)
        {
            if (!url.PathAndQuery.EndsWith("/sitemap.xml", StringComparison.OrdinalIgnoreCase) && 
            		!url.PathAndQuery.EndsWith("/local-sitemap.xml", StringComparison.OrdinalIgnoreCase))
                return false;
            return true;
        }
        protected bool IsUrlValidForSitemapFiles(Uri url)
        {
            string vurl = HttpContext.Current.Request.Url.PathAndQuery;
            int lastIndex = vurl.LastIndexOf("/");
            if (lastIndex < 0)
                return false;
            if (vurl.Length > vurl.LastIndexOf("/") + 1)
            {
                string sitemapFileName = vurl.Substring(vurl.LastIndexOf("/") + 1);
                return UrlUtils.IsUrlValidForFile(url, this.CurrentSite, $"/{sitemapFileName}");
            }
            return false;
        }
        protected SiteContext GetSiteByUrl(Uri url)
        {
            var language = Sitecore.Context.Data.FilePathLanguage;
            // Get possible sites for the current language
            var possibleSites = GetSitesForLanguage(language);
            if (possibleSites == null || !possibleSites.Any())
            {
                var defaultSite = GetDefaultSite();
                if (defaultSite != null)
                {
                    return defaultSite; // Return default site if no sites found for the language
                }
            }
            var database = Sitecore.Context.Database;
            var parameters = url.Segments.Skip(1).Where(x => x != string.Empty).ToArray();
            // Loop through possible sites to find the matching region
            foreach (var possibleSite in possibleSites)
            {
                // Construct the path of the site definition item
                string siteDefinitionItemPath = $"{possibleSite.RootPath}/Settings/Site Grouping/{possibleSite.Name}";
                // Get the Sitecore item representing the site definition item
                Item siteDefinitionItem = database.GetItem(siteDefinitionItemPath);
                if (siteDefinitionItem != null)
                {
                    var siteRegionField = siteDefinitionItem["Site Region"];
                    if (siteRegionField == null || !ID.IsID(siteRegionField)) continue;
                    var siteRegionId = new ID(siteRegionField);
                    
                    if (siteRegionId != ID.Null)
                    {
                        var siteRegion = database.GetItem(siteRegionId);
                        if (siteRegion != null && siteRegion["Site Region"] == parameters.First().Replace("/", string.Empty))
                        {
                            return possibleSite;
                        }
                    }
                }
            }
            var site = GetDefaultSite();
            return site;
        }
        private List<SiteContext> GetSitesForLanguage(Language language)
        {
            return Factory.GetSiteInfoList().Where(siteInfo => siteInfo.Language.Contains(language.CultureInfo.Name))
            			.Select(x => Factory.GetSite(x.Name)).ToList();
        }
        private SiteContext GetDefaultSite()
        {
            var database = Sitecore.Context.Database;
            if (database.Name == "core") database = Factory.GetDatabase("master");
            foreach (var site in Sitecore.Configuration.Factory.GetSiteInfoList())
            {
                string siteDefinitionItemPath = $"{site.RootPath}/Settings/Site Grouping/{site.Name}";
                // Get the Sitecore item representing the site definition item
                var siteDefinitionItem = database.GetItem(siteDefinitionItemPath);
                if (siteDefinitionItem != null)
                {
                    var isDefaultSiteField = siteDefinitionItem["IsDefaultSite"];
                    if (isDefaultSiteField == null || isDefaultSiteField.ValueOrEmpty() == "0") continue;
                    return Factory.GetSite(site.Name);
                }
            }
            return null;
        }
        private bool IsCustomUrl(string[] parameters)
        {
           //Test to see if your url matches the parameters of the url of a site that is managed by SXA 
           // and is sharing the domain with other sites
        }
    }
Expand

Next on the agenda was tweaking the ItemCrawler. Initially, we attempted to address this by configuring the custom link provider in the sitemap. However, we hit a snag. Our project had a specific requirement where each site was designated to serve content in a single language. Unfortunately, content authors occasionally slipped up, inadvertently publishing content in unintended languages. This led to a headache as SXA, out of the box, would include these unintended language versions as alternate links in our sitemap – definitely not what we wanted.

Enter the CustomItemCrawler. This nifty tool allowed us to fine-tune our crawling process, ensuring that we only picked up the right language versions of items for each site. Additionally, it provided the flexibility to implement other custom functionalities, such as incorporating a “Do Not Track” field for specific pages.

C#
public class CustomSitemapItemCrawler : Sitecore.XA.Foundation.SiteMetadata.Sitemap.ItemCrawler
    {
        public override IList<Item> GetItems(Item homeItem)
        {
            List<Item> list = new List<Item>();
            Queue<Item> queue = new Queue<Item>();
            using (new SecurityDisabler())
            {
                if (IsValid(homeItem))
                {
                    list.Add(homeItem);
                }
                if (HasValidChildren(homeItem))
                {
                    queue.Enqueue(homeItem);
                }
                while (queue.Count != 0)
                {
                    foreach (Item child in queue.Dequeue().Axes.GetDescendants())
                    {
                        if (!list.Contains(child))
                        {
                            if (IsValid(child))
                            {
                                list.Add(child);
                            }
                            if (HasValidChildren(child))
                            {
                                queue.Enqueue(child);
                            }
                        }
                    }
                }
            }
            list = list.Where((Item i) => i.Security.CanRead(Sitecore.Context.User)).ToList();
            
            return list.Where((Item i) => i.Versions.Count > 0).ToList();
        }
    }
Expand

Now finally the glue that puts it all together!

The CustomSiteManager allows us to make sure that we use our CustomItemCrawler. I understand that this may look a bit unusual. After all it is doing the exact same thing as the GetSitemap! However due to how the SXA Sitemap is built none of the custom code seen above was reachable without it. Perhaps it is a Sitecore-ism.

C#
public class CustomSitemapManager : SitemapManager
    {
        public new SitemapContent GetSitemap(SiteContext site)
        {
            SitemapContent fromCache = this.SitemapCacheManager.GetFromCache(site);
            if (fromCache != null && fromCache.Values.Any<string>())
                return fromCache;
            SitemapContent sitemap = this.GenerateSitemap(site);
            this.SitemapCacheManager.SetCache(site, sitemap);
            return sitemap;
        }
        public SitemapContent GetCustomSitemap(SiteContext site)
        {
            SitemapContent fromCache = this.SitemapCacheManager.GetFromCache(site);
            if (fromCache != null && fromCache.Values.Any<string>())
                return fromCache;
            SitemapContent sitemap = this.GenerateSitemap(site);
            this.SitemapCacheManager.SetCache(site, sitemap);
            return sitemap;
        }
    }

That’s it!

Now the sitemaps for our custom urls create their respective sitemap AND the original non-SXA site still can reach its sitemap!

Happy coding!