Read XML with VBA and PowerShell

XML

Continuing with my PowerShell Series, today, I decided to play around with PowerShell to see if it could simplify working with XML content.

It took a little playing around with PowerShell to figure things out, but once you understand the basics, it become quite evident that, once again, PowerShell is definitely a most capable XML parser.

This article focuses on simply reading and extracting information from an XML file, but PowerShell also offers a vast array of XML manipulation method. So although not covered in this article, the option definitely exists. Perhaps that will be the subject of another post in the future.

Web Content vs File Content

The first thing to understand is that working with web XML is initiated differently that when working with a XML file.

Web XML

To load XML coming from the web you would use a command like

$doc = (New-Object System.Net.WebClient).DownloadString("YourURL")

File XML

To load XML coming from a file on your PC/server you would use a command like

$file = 'YourFilePathAndFilename' 
$doc = New-Object System.XML.XMLDocument 
$doc.Load( $file )

Once that part is done, working with the XML content is done in the same manner.

Extracting Information From XML

For the following example I am using the W3School plant catalog sample xml.

Extract a List of Plants

So, let’s say we wanted to extract a list of Common plant names from the XML, we could use a PowerShell command such as:

$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml')
$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "}
$result = $combined -join ','
Write-Host $result

Thus, we can create the following function that will return a delimiter separated string of value and then break the string into an array of values as shown below.

From the Web

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "}" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

From a Local File

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$file = 'C:\Users\Dev\Documents\plant_catalog.xml' " & vbCrLf & _
           "$doc = New-Object System.XML.XMLDocument " & vbCrLf & _
           "$doc.Load( $file ) " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} " & vbCrLf & _
           "$result = $combined -join '~~'"
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

Regardless of whether you are reading XML from a web server or a file, we can test both versions out by simply doing:

Public Sub ListItems()
'Output the list of returned values to VBA immediate window
    Dim aItems()              As String
    Dim vItem                 As Variant
    Dim i                     As Long

    aItems() = ReadXML("Common")
    For Each vItem In aItems
        i = i + 1
        Debug.Print i, vItem
    Next
End Sub

which will then return a list like

 1            Bloodroot
 2            Columbine
 3            Marsh Marigold
 4            Cowslip
 5            Dutchman's-Breeches
 6            Ginger, Wild
 7            Hepatica
 8            Liverleaf
 9            Jack-In-The-Pulpit
 10           Mayapple
 11           Phlox, Woodland
 12           Phlox, Blue
 13           Spring-Beauty
 14           Trillium
 15           Wake Robin
 16           Violet, Dog-Tooth
 17           Trout Lily
 18           Adder's-Tongue
 19           Anemone
 20           Grecian Windflower
 21           Bee Balm
 22           Bergamot
 23           Black-Eyed Susan
 24           Buttercup
 25           Crowfoot
 26           Butterfly Weed
 27           Cinquefoil
 28           Primrose
 29           Gentian
 30           Blue Gentian
 31           Jacob's Ladder
 32           Greek Valerian
 33           California Poppy
 34           Shooting Star
 35           Snakeroot
 36           Cardinal Flower

Extract a List of Zones

As you can see, the function has a single argument, the element to return, so we can extract the Zone, instead of the Common name by simply doing

Public Sub ListItems()
'Output the list of returned values to VBA immediate window
    Dim aItems()              As String
    Dim vItem                 As Variant
    Dim i                     As Long

    aItems() = ReadXML("Zone")
    For Each vItem In aItems
        i = i + 1
        Debug.Print i, vItem
    Next
End Sub

It’s that simple, change one word, and the same code gets you what you’re after.

Unique Values

If you use the above function to get a Zone listing back from the XML, you’ll see a long list of duplicate values. So what is one to do if you simply want a list of unique value?

Yes, you could use VBA and iterate through the array to produce a new filtered array, or some other VBA manipulation. That’s all fine, but why not simply let PowerShell do all the lifting for you!

PowerShell has a Select-object -unique command for just such a need.

So, with a simple addition to the intial PowerShell command we can produce a unique listing of values. Thus, the function then becomes

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique" & vbCrLf & _
           "$result = $combined -join ',' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, ",")
End Function

and if you now execute a procedure like

Public Sub ListItems()
'Output the list of returned values to VBA immediate window
    Dim aItems()              As String
    Dim vItem                 As Variant
    Dim i                     As Long

    aItems() = ReadXML("Zone")
    For Each vItem In aItems
        i = i + 1
        Debug.Print i, vItem
    Next
End Sub

you now get an output of

 1            4
 2            3
 3            7
 4            5
 5            6
 6            Annual
 7            3 - 5
 8            2

Sorting Values

Now the above is looking good, as we got rid of all the duplicates, but it still appears a little messy and can be hard to review as it is in whatever order it was input in the XML. Wouldn’t it be nice if we could order it nicely to simplify reviewing.

Once again, yes we could get into using some bubble sort routine, or some other array code, but why do so and complicate your life when PowerShell already has everything already available to use to perform such sorting.

PowerShell has a Sort-Object command for just such a need.

So, with another simple addition to our PowerShell command we can produce a unique and sorted listing of values. Thus, the function then becomes:

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique | Sort-Object" & vbCrLf & _
           "$result = $combined -join ',' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, ",")
End Function

and now when we run the ListItems sub we get:

 1            2
 2            3
 3            3 - 5
 4            4
 5            5
 6            6
 7            7
 8            Annual

What if I want a Descending ordered list you may ask. Well, nothing could be easier we simple add the -Descending parameter to Sort-Object, thus the function becomes:

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT') | ForEach-Object {$_." & sNode & "} | Select-object -unique | Sort-Object -Descending" & vbCrLf & _
           "$result = $combined -join ',' "
    Debug.Print sCmd
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, ",")
End Function

which results in a Descending Unique listing of values

 1            Annual
 2            7
 3            6
 4            5
 5            4
 6            3 - 5
 7            3
 8            2

Extract XML From a WordPress Feed

As another example, I figured I’d demonstrate extracting a listing of the titles of posts from my website RSS feed. The concept is always the same. In the example below I’ve combined everything into a single procedure (just for the sake of simplicity)

Function ReadRSSXML() As Variant
    Dim sCmd                  As String
    Dim sItems                As String
    Dim aItems()              As String
    Dim vItem                 As Variant
    Dim i                     As Long

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.devhut.net/feed') " & vbCrLf & _
           "$combined = $doc.SelectNodes('//rss/channel/item') | ForEach-Object {$_.title}  " & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    aItems() = Split(sItems, "~~")
    For Each vItem In aItems
        i = i + 1
        Debug.Print i, vItem
    Next
End Function

and this outputs the list of my last 10 posts on this website to the VBA immediate window.

As with all things in programming I thought I’d also illustrate that the line

$combined = $doc.SelectNodes('//rss/channel/item') | ForEach-Object {$_.title}

could also be

$combined = $doc.GetElementsByTagName('item') | ForEach-Object {$_.title} 

or

$combined = $doc.rss.channel.item | ForEach-Object {$_.title}

so there are a number of ways to bind to the proper element(s).

Filtering XML Data

It’s nice to get the full content of the XML file, as shown above, but what about filtering the data and only returning what I actually am looking for!

Once again, PowerShell already has everything we need under the hood. No need to get into VBA array manipulations… like in SQL, we can simply build and apply a Where clause to our PowerShell command.

Single Criterion

So let say we want to return a list of Common plant names for Zone 4, and for good measure let us also sort it, we would then do something like:

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '4'}  | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

running the ListItems procedure will then produce a result like:

 1            Adder's-Tongue
 2            Bee Balm
 3            Bergamot
 4            Bloodroot
 5            Blue Gentian
 6            Buttercup
 7            Cowslip
 8            Crowfoot
 9            Gentian
 10           Hepatica
 11           Jack-In-The-Pulpit
 12           Liverleaf
 13           Marsh Marigold
 14           Trout Lily
 15           Violet, Dog-Tooth

Multiple Criteria

Example 1 – AND

Taking things a little further to illustrate adding more than one criterion to the Where clause, let’s say we want a listing of all the Zone 4 plants that have a Light of ‘Mostly Sunny’, then the would do something like:

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '4' -and $_.Light -eq 'Mostly Sunny'}  | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

running the ListItems procedure will then produce a result like:

 1            Marsh Marigold

Example 2 – Or, In, Match

Another example would be if we wanted to pull a list of plants that are for Zones 5 Or 7, the function for that could look like

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -eq '5' -or $_.Zone -eq '7'}  | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

Or we could have done

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -in '5', '7'}  | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

resulting in results like

 1            Spring-Beauty
 2            Trillium
 3            Wake Robin

Notice how by filter using the equality (-eq or -in) the results are both omitting the Primrose whose zone is 3-5. If we wanted to include that in our resultset, we could instead do something like

Function ReadXML(sNode As String) As Variant
    Dim sCmd                  As String
    Dim sItems                As String

    sCmd = "$doc = (New-Object System.Net.WebClient).DownloadString('https://www.w3schools.com/xml/plant_catalog.xml') " & vbCrLf & _
           "$combined = $doc.SelectNodes('CATALOG/PLANT')| Where{$_.Zone -match '[5,7]'}  | ForEach-Object {$_." & sNode & "} | Sort-Object" & vbCrLf & _
           "$result = $combined -join '~~' "
    sItems = PS_GetOutput(sCmd)
    ReadXML = Split(sItems, "~~")
End Function

resulting in results like

 1            Primrose
 2            Spring-Beauty
 3            Trillium
 4            Wake Robin

So when creating your Where filter you need to know what type of data you are working with and be sure to validate by testing.

Simplicity of PowerShell in VBA

I hope you are starting to see how simple this all is once you understand the basics. The basic VBA function is always the same, the only thing we are actually changing is the PowerShell command. So by adding, removing, modifying those 3 little lines that constitute the PowerShell command we effectively control what is returned. That is the power of PowerShell.

A Special Word of Caution

I’m sure you’ve noticed that the way most of the commands work is that I use PowerShell to build and return a list of values and then convert the list to an array in VBA for further use. Be very careful what separator you choose to use in PowerShell. I have found a comma to be problematic and sometimes commas can naturally be found in the content being returned. As such, I’d advise you to use a character or sequence of characters that will never naturally occur, things like: ~-~, /%$/ and as always, test, test and test some more.

A Few Resources on the Subject

One response on “Read XML with VBA and PowerShell