Loading MS Connect To A DB–Part 2: Parsing & Loading RSS Data
Tuesday, January 11, 2011 at 6:56AM In Part 1 of the series we set up our database tables ready to load the data, next up is pulling the data from the RSS feed. I’m using PowerShellISE for all this work, it’s a simple and quick tool for building scripts.
Subscribing To The RSS Using PSRSS
The PSRSS module makes it extremely easy to add a new RSS subscription, we just need to call New-Feed with a name and the URL to the XML file.
The URL for the new SQL Server Connect items feed is https://connect.microsoft.com/rss/68/RecentFeedbackForConnection.xml. I decided to give this feed a name of "Connect: SQL Server New" giving us the Powershell command
New-Feed -Name "Connect: SQL Server New" -Url https://connect.microsoft.com/rss/68/RecentFeedbackForConnection.xml
We can confirm that the subscription was successfully created using Get-Feed.
get-feed | select Name | where-object {$_.Name -match "Connect: SQL Server New"}
Reading The RSS
Pulling articles from the RSS subscription is as easy as calling Get-Article along with the feed name.
In order to start working with the data I wanted to easily identify the member parts of the articles. To do this I decided to just pull the first article and look through it’s contents
Get-Article "Connect: SQL Server New" | select -First 1
Looking through the results we see a couple of key items that we are going to want to load into the database:
- Title (Title column in ConnectItems)
- Link (URL column)
- Guid (ID column)
- Description (ItemDescription column)
- Author (Author column)
- Modified (Modified column)
The PubDate would be very useful to have as an article create date however this appears to just be a default value that doesn’t bear any actual relation to the date the Connect Item was added.
It’s great that we have direct mappings for much of the data we want to load, our challenge comes with trying to get the Status, Up-Votes and Down-Votes from the Description.
Parsing The Description Data
Let’s take a closer look at the Description data
Report Builder reports errors with controls i.e. textbox22. But there is no way to find that control.
In BIDS there is a drop down to select a control by. There needs to be something similar in RB.<br><br>Status: Active, 4 Up-Votes, 0 Down-Votes, 0 workarounds, 0 comments, feedback id: 634019
Having looked through a few other articles there was always the <br><br> prior to the information that I wanted to pull. This allowed me to use the substring function of Powershell along with LastIndexOf of to pull just the data highlighted above.
The Substring function works the same way as in SQL, pass in the string along with the starting position you want to read from and the number of characters you want to read.
LastIndexOf will return the position of the final entry of a given string in the data you pass it.
If we use this item as an example the string <br><br> is last seen at position 198
$FeedItem = Get-Article "Connect: SQL Server New" | select -First 1
$FeedItem.Description.LastIndexOf("<br><br>")
This value can be used within the Substring function as a starting position. A simple math function adding 8 will eliminate <br><br> being in the results, and by not passing a second numerical value to Substring we actually get all the data from the position 206 to the end of the Description.
$FeedItem = Get-Article "Connect: SQL Server New" | select -First 1
$FeedItem.Description.Substring($FeedItem.Description.LastIndexOf("<br><br>")+8)
This has given us a new string to work with that we can parse down to get our information.
Status: Active, 4 Up-Votes, 0 Down-Votes, 0 workarounds, 0 comments, feedback id: 634019
Grabbing The Vote Counts
So how to get the counts out of the data? Enter the awesomeness of Regular Expressions.
Using our LastIndexOf and Substring functions again we can now pass this into the regex class and get do some pattern matching to pull data.
In particular we’re only looking for numbers in the string. Using the expression \d{1,5} allows us to look for numerical value between 1 and 5 characters long (download a regular expressions cheat sheet to help understand this).
$FeedItem = Get-Article "Connect: SQL Server New" | select -First 1
$Substr = $FeedItem.Description.Substring($FeedItem.Description.LastIndexOf("<br><br>")+8)
$UpCount = [regex]::Matches($Substr.Substring($Substr.LastIndexOf("Up-Vote")-5, 5), "\d{1,5}") | Select Value
[int]$UpCount.Value
Running this gives us a result of 4. Confirming this against the full string we know we’re getting the right data.
We repeat this changing Up-Vote for Down-Vote to get the Down-Votes from the description.
Another use of substring provides us with the status.
All of these values are loaded into variables for use later.
Loading The Data To The Database
Our Status information as well as Up-Vote and Down-Vote counts are sitting in variables. We also load the full Description, Link, Title, Guid, Author and Modified data into variables as well. We use these to build a SQL insert statement:
$FeedItem = Get-Article "Connect: SQL Server New" | select -First 1
$Substr = $FeedItem.Description.Substring($FeedItem.Description.LastIndexOf("<br><br>")+8) -replace "'", "" -replace "`"", ""
$UpCount = [regex]::Matches($Substr.Substring($Substr.LastIndexOf("Up-Vote")-5, 5), "\d{1,5}") | Select Value
$ItemUpVote = [int]$UpCount.Value
$DownCount = [regex]::Matches($Substr.Substring($Substr.LastIndexOf("Down-Vote")-5, 5), "\d{1,5}") | Select Value
$ItemDownVote = [int]$DownCount.Value
$val = $Substr.Substring(0, $Substr.LastIndexOf("Up-Vote"))
$ItemStatus = $val.Substring(0, $val.LastIndexOf(",")) -replace "Status: ",""
$ItemDescription = $FeedItem.Description.Substring(0, $FeedItem.Description.LastIndexOf("<br><br>")) -replace "'", "" -replace "`"", ""
$ItemURL = $FeedItem.Link
$ItemTitle = $FeedItem.Title -replace "'", "" -replace "`"", ""
[int]$ItemID = $FeedItem.Guid
$ItemAuthor = $FeedItem.Author
$ItemModified = $FeedItem.Modified
$SQLInsert = @"
INSERT INTO [dbo].[ConnectItems_Staging]
([ID],[Title],[URL],[ItemStatus],[UpVoteCount],[DownVoteCount],[ItemDescription],[Author],[Modified])
VALUES
($ItemID,'$ItemTitle','$ItemURL','$ItemStatus',$ItemUpVote,$ItemDownVote,'$ItemDescription','$ItemAuthor','$ItemModified')
"@
$SQLInsert
The produced insert statement is
INSERT INTO [dbo].[ConnectItems_Staging]
([ID],[Title],[URL],[ItemStatus],[UpVoteCount],[DownVoteCount],[ItemDescription],[Author],[Modified])
VALUES
(634019,'Enable controls to be found by name in Report Builder','http://connect.microsoft.com/SQLServer/feedback/details/634019/enable-controls-to-be-found-by-name-in-report-builder', 'Active', 4,0,'Report Builder reports errors with controls i.e. textbox22. But there is no way to find that control.In BIDS there is a drop down to select a control by. There needs to be something similar in RB.','Simon Sabin','12/30/2010 14:50:13')
Key within the script is the multi-lined statement
$SQLInsert = @"
INSERT INTO [dbo].[ConnectItems_Staging]
([ID],[Title],[URL],[ItemStatus],[UpVoteCount],[DownVoteCount],[ItemDescription],[Author],[Modified])
VALUES
($ItemID,'$ItemTitle','$ItemURL','$ItemStatus',$ItemUpVote,$ItemDownVote,'$ItemDescription','$ItemAuthor','$ItemModified')
"@
This doesn’t need to be multi-lined however that does make it a lot easier to read. Creating this is as simple as encapsulating the data between @” and “@.
Note (I’m not sure if this is a bug or deliberate) is that the “@ have to be the first two characters on the line ending the statement.
Hopefully you’re loading the SQL cmdlets with your Powershell profile, if not you’ll need to load those manually in order to work with the database
Add-PSSnapin SqlServerCmdletSnapin100 | out-null
Add-PSSnapin SqlServerProviderSnapin100 | out-null
With these loaded inserting the data is as simple as running invoke-sql.
invoke-sqlcmd -ServerInstance "localhost" -Database "DenaliConnect" -Query "$SQLInsert"
The data is now loaded into the ConnectItems_Staging table. We then called the MERGE proc in order to populate the main table (this helps prevent attempts to load duplicate data).
invoke-sqlcmd -ServerInstance "localhost" -Database "DenaliConnect" -Query "exec dbo.[MergeItems]"
A quick query in SSMS will validates the data is loaded correctly.
Final Steps
Not content with just pulling in the new items RSS feed I also decided to pull the recently modified one as well. This helped keep data consistent and add items that might have been missed.
To simplify management of this I turned all the logic for parsing the data into a function called Get-DenaliConnect and just called the function for each RSS feed.
Get-DenaliConnect "Connect: SQL Server New" https://connect.microsoft.com/rss/68/RecentFeedbackForConnection.xml
Get-DenaliConnect "Connect: SQL Server Modified" https://connect.microsoft.com/rss/68/RecentlyModifiedFeedbackForConnection.xml
You can download the full script here.
Next up we’ll cover keeping the data current.
- Intro
- Part 1 – Database & Powershell setup
- Part 2 – Parsing the RSS & loading the data
- Part 3 – Keeping the data current
- Part 4 – Loading old data


Reader Comments