Continuing with my PowerShell Series, today, I decided to play around with PowerShell to see if it could simplify working with XML content.
It took a little playing around with PowerShell to figure things out, but once you understand the basics, it become quite evident that, once again, PowerShell is definitely a most capable XML parser.
This article focuses on simply reading and extracting information from an XML file, but PowerShell also offers a vast array of XML manipulation method. So although not covered in this article, the option definitely exists. Perhaps that will be the subject of another post in the future.
Web Content vs File Content
The first thing to understand is that working with web XML is initiated differently that when working with a XML file.
Web XML
To load XML coming from the web you would use a command like
$doc = (New-Object System.Net.WebClient).DownloadString("YourURL")
File XML
To load XML coming from a file on your PC/server you would use a command like
$file = 'YourFilePathAndFilename' $doc = New-Object System.XML.XMLDocument $doc.Load( $file )
Once that part is done, working with the XML content is done in the same manner.
Extracting Information From XML
For the following example I am using the W3School plant catalog sample xml.
Extract a List of Plants
So, let’s say we wanted to extract a list of Common plant names from the XML, we could use a PowerShell command such as:
$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml')
$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "}
$result = $combined -join ','
Write-Host $result
Thus, we can create the following function that will return a delimiter separated string of value and then break the string into an array of values as shown below.
From the Web
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "}" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
From a Local File
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$file = 'C:\Users\Dev\Documents\plant_catalog.xml' " & vbCrLf & _
"$doc = New-Object System.XML.XMLDocument " & vbCrLf & _
"$doc.Load( $file ) " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} " & vbCrLf & _
"$result = $combined -join '~~'"
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
Regardless of whether you are reading XML from a web server or a file, we can test both versions out by simply doing:
Public Sub ListItems()
'Output the list of returned values to VBA immediate window
Dim aItems() As String
Dim vItem As Variant
Dim i As Long
aItems() = ReadXML("Common")
For Each vItem In aItems
i = i + 1
Debug.Print i, vItem
Next
End Sub
which will then return a list like
1 Bloodroot 2 Columbine 3 Marsh Marigold 4 Cowslip 5 Dutchman's-Breeches 6 Ginger, Wild 7 Hepatica 8 Liverleaf 9 Jack-In-The-Pulpit 10 Mayapple 11 Phlox, Woodland 12 Phlox, Blue 13 Spring-Beauty 14 Trillium 15 Wake Robin 16 Violet, Dog-Tooth 17 Trout Lily 18 Adder's-Tongue 19 Anemone 20 Grecian Windflower 21 Bee Balm 22 Bergamot 23 Black-Eyed Susan 24 Buttercup 25 Crowfoot 26 Butterfly Weed 27 Cinquefoil 28 Primrose 29 Gentian 30 Blue Gentian 31 Jacob's Ladder 32 Greek Valerian 33 California Poppy 34 Shooting Star 35 Snakeroot 36 Cardinal Flower
Extract a List of Zones
As you can see, the function has a single argument, the element to return, so we can extract the Zone, instead of the Common name by simply doing
Public Sub ListItems()
'Output the list of returned values to VBA immediate window
Dim aItems() As String
Dim vItem As Variant
Dim i As Long
aItems() = ReadXML("Zone")
For Each vItem In aItems
i = i + 1
Debug.Print i, vItem
Next
End Sub
It’s that simple, change one word, and the same code gets you what you’re after.
Unique Values
If you use the above function to get a Zone listing back from the XML, you’ll see a long list of duplicate values. So what is one to do if you simply want a list of unique value?
Yes, you could use VBA and iterate through the array to produce a new filtered array, or some other VBA manipulation. That’s all fine, but why not simply let PowerShell do all the lifting for you!
PowerShell has a Select-object -unique command for just such a need.
So, with a simple addition to the intial PowerShell command we can produce a unique listing of values. Thus, the function then becomes
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique" & vbCrLf & _
"$result = $combined -join ',' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, ",")
End Function
and if you now execute a procedure like
Public Sub ListItems()
'Output the list of returned values to VBA immediate window
Dim aItems() As String
Dim vItem As Variant
Dim i As Long
aItems() = ReadXML("Zone")
For Each vItem In aItems
i = i + 1
Debug.Print i, vItem
Next
End Sub
you now get an output of
1 4 2 3 3 7 4 5 5 6 6 Annual 7 3 - 5 8 2
Sorting Values
Now the above is looking good, as we got rid of all the duplicates, but it still appears a little messy and can be hard to review as it is in whatever order it was input in the XML. Wouldn’t it be nice if we could order it nicely to simplify reviewing.
Once again, yes we could get into using some bubble sort routine, or some other array code, but why do so and complicate your life when PowerShell already has everything already available to use to perform such sorting.
PowerShell has a Sort-Object command for just such a need.
So, with another simple addition to our PowerShell command we can produce a unique and sorted listing of values. Thus, the function then becomes:
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique | Sort-Object" & vbCrLf & _
"$result = $combined -join ',' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, ",")
End Function
and now when we run the ListItems sub we get:
1 2 2 3 3 3 - 5 4 4 5 5 6 6 7 7 8 Annual
What if I want a Descending ordered list you may ask. Well, nothing could be easier we simple add the -Descending parameter to Sort-Object, thus the function becomes:
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique | Sort-Object -Descending" & vbCrLf & _
"$result = $combined -join ',' "
Debug.Print sCmd
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, ",")
End Function
which results in a Descending Unique listing of values
1 Annual 2 7 3 6 4 5 5 4 6 3 - 5 7 3 8 2
Extract XML From a WordPress Feed
As another example, I figured I’d demonstrate extracting a listing of the titles of posts from my website RSS feed. The concept is always the same. In the example below I’ve combined everything into a single procedure (just for the sake of simplicity)
Function ReadRSSXML() As Variant
Dim sCmd As String
Dim sItems As String
Dim aItems() As String
Dim vItem As Variant
Dim i As Long
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.devhut.net/feed') " & vbCrLf & _
"$combined = $doc.SelectNodes('//rss/channel/item') | ForEach-Object {$_.title} " & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
aItems() = Split(sItems, "~~")
For Each vItem In aItems
i = i + 1
Debug.Print i, vItem
Next
End Function
and this outputs the list of my last 10 posts on this website to the VBA immediate window.
As with all things in programming I thought I’d also illustrate that the line
$combined = $doc.SelectNodes('//rss/channel/item') | ForEach-Object {$_.title}
could also be
$combined = $doc.GetElementsByTagName('item') | ForEach-Object {$_.title}
or
$combined = $doc.rss.channel.item | ForEach-Object {$_.title}
so there are a number of ways to bind to the proper element(s).
Filtering XML Data
It’s nice to get the full content of the XML file, as shown above, but what about filtering the data and only returning what I actually am looking for!
Once again, PowerShell already has everything we need under the hood. No need to get into VBA array manipulations… like in SQL, we can simply build and apply a Where clause to our PowerShell command.
Single Criterion
So let say we want to return a list of Common plant names for Zone 4, and for good measure let us also sort it, we would then do something like:
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '4'} | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
running the ListItems procedure will then produce a result like:
1 Adder's-Tongue 2 Bee Balm 3 Bergamot 4 Bloodroot 5 Blue Gentian 6 Buttercup 7 Cowslip 8 Crowfoot 9 Gentian 10 Hepatica 11 Jack-In-The-Pulpit 12 Liverleaf 13 Marsh Marigold 14 Trout Lily 15 Violet, Dog-Tooth
Multiple Criteria
Example 1 – AND
Taking things a little further to illustrate adding more than one criterion to the Where clause, let’s say we want a listing of all the Zone 4 plants that have a Light of ‘Mostly Sunny’, then the would do something like:
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '4' -and $_.Light -eq 'Mostly Sunny'} | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
running the ListItems procedure will then produce a result like:
1 Marsh Marigold
Example 2 – Or, In, Match
Another example would be if we wanted to pull a list of plants that are for Zones 5 Or 7, the function for that could look like
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '5' -or $_.Zone -eq '7'} | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
Or we could have done
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -in '5', '7'} | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
resulting in results like
1 Spring-Beauty 2 Trillium 3 Wake Robin
Notice how by filter using the equality (-eq or -in) the results are both omitting the Primrose whose zone is 3-5. If we wanted to include that in our resultset, we could instead do something like
Function ReadXML(sNode As String) As Variant
Dim sCmd As String
Dim sItems As String
sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
"$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -match '[5,7]'} | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
"$result = $combined -join '~~' "
sItems = PS_GetOutput(sCmd)
ReadXML = Split(sItems, "~~")
End Function
resulting in results like
1 Primrose 2 Spring-Beauty 3 Trillium 4 Wake Robin
So when creating your Where filter you need to know what type of data you are working with and be sure to validate by testing.
Simplicity of PowerShell in VBA
I hope you are starting to see how simple this all is once you understand the basics. The basic VBA function is always the same, the only thing we are actually changing is the PowerShell command. So by adding, removing, modifying those 3 little lines that constitute the PowerShell command we effectively control what is returned. That is the power of PowerShell.
A Special Word of Caution
I’m sure you’ve noticed that the way most of the commands work is that I use PowerShell to build and return a list of values and then convert the list to an array in VBA for further use. Be very careful what separator you choose to use in PowerShell. I have found a comma to be problematic and sometimes commas can naturally be found in the content being returned. As such, I’d advise you to use a character or sequence of characters that will never naturally occur, things like: ~-~, /%$/ and as always, test, test and test some more.
A Few Resources on the Subject
- VBA – Run PowerShell Command
- ForEach-Object (Microsoft.PowerShell.Core) – PowerShell | Microsoft Docs
- Select-Object (Microsoft.PowerShell.Utility) – PowerShell | Microsoft Docs
- Sort-Object (Microsoft.PowerShell.Utility) – PowerShell | Microsoft Docs
- about Comparison Operators – PowerShell | Microsoft Docs
- W3Schools Plant Catalog Sample XML
VBA will live forever!