One of my clients had a simple request:
"Please provide a list in Excel of every single valid URL on the live global site, please."
After running ScreamingFrog and obtaining a report with missing URLs (the final list returned was faulty – likely due to the software’s inability to hit specific links only available via AJAX rendered components) - we had a couple options on the table:
- Similar to functionality found in a Sitemap component, we need a simple ASPX page that loops through the content that filters out everything but the global English version, generate the URLs, trigger a web request to determine the URL's web status, and display it on the page (or create a Download button to get the list). Code this, deploy it, etc.
- Do all of the above - but with Powershell - which happened to already be installed on the CMS
GUESS which I opted for? :)
Yeah!...you guessed it!
Let's get right into it.
Using the Get-ChildItem command and targetting a specific part of the content tree (explicitly using the Web DB), we get the initial list of English versioned items.
$itemsWithMatchingCondition = Get-ChildItem -Path web:'/sitecore/content/WebsiteName/Home' -Language 'en' -Version * -Recurse
With this specific implementation, I was lucky enough to have a stable template naming convention where all items using a template that ended with "Page" were always going to be...well...pages.
(Without this luck, I may have had to check if the item contained at least a main layout within the renderings).
To filter this, we'll use a simple IF statement with a LIKE operator against the initial item list's item:
iif ($item.Template.Name -like $script:pageString)
Now that we have a list of page items we want to process, we need to generate the item's URL.
This handy function that sets the site context, configures the UrlOptions, and gets the URL via the LinkManager does just that:
This handy function that sets the site context, configures the UrlOptions, and gets the URL via the LinkManager does just that:
function Get-ItemUrl($itemToProcess){ [Sitecore.Context]::SetActiveSite("website") $urlop = New-Object ([Sitecore.Links.UrlOptions]::DefaultOptions) $urlop.AddAspxExtension = $false $urlop.AlwaysIncludeServerUrl = $true $linkUrl = [Sitecore.Links.LinkManager]::GetItemUrl($itemToProcess,$urlop) $linkUrl }
Here's the fun part!
Per the requirement, we'll need to validate that the URLs Sitecore was generating were actually functioning. Any non-functioning URLs (if any) shouldn't be included in the final report (only status code 200).
Powershell lets us make web requests - which we could then check the status of.
All we need to do here is pass in the URL we generated and expect a true or false value in return:
function IsValidPageStatus($urStr){ $return = $false; $HTTP_Request = [System.Net.WebRequest]::Create($urStr) $HTTP_Response = $HTTP_Request.GetResponse() $HTTP_Status = [int]$HTTP_Response.StatusCode if ($HTTP_Status -eq 200) { $return = $true } else { Write-Host $urStr Write-Host "Response: " $HTTP_Status $return = $false } $HTTP_Response.Close() return $return }
(Note: Any page URL that fails will be listed in the console after the script completes.)
After every URL goes through this check, we add the item to the array list:
if($isValidUrl){ $script:itemIDsWithPassedCriteria.Add($item) > $null }
Finally, build out the report - which can then be exported via the Powershell ISE in CSV/Excel format:
if ($script:itemIDsWithPassedCriteria.Count -eq 0) { Write-Warning "No page items found." }else{ $props = @{ InfoTitle = "Live Page Urls" InfoDescription = "Provides a list of all valid page URLs " PageSize = 100 } $script:itemIDsWithPassedCriteria|Show-ListView @props -Property @{ Label = "Url"; Expression = { Get-ItemUrl ($_) } } Close-Window }
Here's the full script:
<# .SYNOPSIS Provides a list report of all valid page URLs .AUTHOR Written by Gabe Streza #> # Variables $script:pageString = "* Page" #page string function GetItemsWhichUsePageTemplate() { $itemsWithMatchingCondition = Get-ChildItem -Path web:'/sitecore/content/WebsiteName/Home' -Language 'en' -Version * -Recurse { if ($item.Template.Name -like $script:pageString) { $linkUrl = Get-ItemUrl($item) $isValidUrl = IsValidPageStatus($linkUrl) if($isValidUrl){ $script:itemIDsWithPassedCriteria.Add($item) > $null # The output of the Add is ignored } } } } function Get-ItemUrl($itemToProcess){ [Sitecore.Context]::SetActiveSite("website") $urlop = New-Object ([Sitecore.Links.UrlOptions]::DefaultOptions) $urlop.AddAspxExtension = $false $urlop.AlwaysIncludeServerUrl = $true $linkUrl = [Sitecore.Links.LinkManager]::GetItemUrl($itemToProcess,$urlop) $linkUrl } function IsValidPageStatus($urStr){ $return = $false; $HTTP_Request = [System.Net.WebRequest]::Create($urStr) $HTTP_Response = $HTTP_Request.GetResponse() $HTTP_Status = [int]$HTTP_Response.StatusCode if ($HTTP_Status -eq 200) { $return = $true } else { Write-Host $urStr Write-Host "Response: " $HTTP_Status $return = $false } $HTTP_Response.Close() return $return } $script:itemIDsWithPassedCriteria = New-Object System.Collections.ArrayList GetItemsWhichUsePageTemplate if ($script:itemIDsWithPassedCriteria.Count -eq 0) { Write-Warning "No page items found." }else{ $props = @{ InfoTitle = "Live Page Urls" InfoDescription = "Provides a list of all valid page URLs " PageSize = 100 } $script:itemIDsWithPassedCriteria|Show-ListView @props -Property @{ Label = "Url"; Expression = { Get-ItemUrl ($_) } } Close-Window } Write-Host "Done."
This took about 8 minutes to process a 2000 page site - which is good for a one-time run - but there are certainly some optimizations we should make if this was a report the client would use repeatedly in order to make it a bit snappier. For this purpose, we're all set!
Feel free to grab this, tinker with it, and make it your own!
Let me know in the comments if this has helped - or if you have any additional recommendations.
Very nicely coded :)
ReplyDelete