VBA – RegEx – Retrieve HTML Img Src MIME Type

I was working on some HTML automation recently and needed to extract the MIME type from an HTML Img Src Attribute value.

I could get into Left, Right, Mid, InStr, Len, …, but I thought I’d stretch my VBA RegEx legs a little and create a simple reusable function to do the job.

In case it can help anyone else out there, here it is!

'---------------------------------------------------------------------------------------
' Procedure : RegEx_HTML_Img_GetSrcMIMEType
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : http://www.cardaconsultants.com
' Purpose   : Extract the image mime type from the img src
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - https://creativecommons.org/licenses/by-sa/4.0/
' Req'd Refs: Early Binding -> Microsoft VBScript Regular Expressions X.X
'             Late Binding  -> None required
' References:
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sImgSrc   : The HTML img src attribute value
'
' Usage:
' ~~~~~~
' ? RegEx_HTML_Img_GetSrcMIMEType("data:image/png;base64,iVBORw0KGgo")
'   Returns -> png
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2023-02-25              Initial Release
'---------------------------------------------------------------------------------------
Function RegEx_HTML_Img_GetSrcMIMEType(ByVal sImgSrc As String) As String
    On Error GoTo Error_Handler
    #Const RegEx_EarlyBind = False   'True => Early Binding / False => Late Binding
    #If RegEx_EarlyBind = True Then
        Dim oRegEx            As VBScript_RegExp_55.RegExp
        Dim oMatches          As VBScript_RegExp_55.MatchCollection

        Set oRegEx = New VBScript_RegExp_55.RegExp
    #Else
        Dim oRegEx            As Object
        Dim oMatches          As Object

        Set oRegEx = CreateObject("VBScript.RegExp")
    #End If

    With oRegEx
        .Pattern = "data:image\/(.*?);" 'Extract src -> src\s*=\s*"([^"]+)"
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
        Set oMatches = .Execute(sImgSrc)
    End With
    If oMatches.Count <> 0 Then _
       RegEx_HTML_Img_GetSrcMIMEType = oMatches(0).SubMatches(0)

Error_Handler_Exit:
    On Error Resume Next
    Set oMatches = Nothing
    Set oRegEx = Nothing
    Exit Function

Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Source: RegEx_HTML_Img_GetSrcMIMEType" & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl <> 0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

It is very straightforward to use, you simply pass the src attribute value to the function and it will return the MIME type.

Usage Example

? RegEx_HTML_Img_GetImgSrcMIMEType("data:image/png;base64,iVBORw0KGgo...")

will return a value of:

png

 

Using Plain Vanilla VBA

As I said earlier, we can obvious extract this information using standard string manipulation functions. Below is one way it can be accomplished:

'---------------------------------------------------------------------------------------
' Procedure : HTML_Img_GetSrcMIMEType
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : http://www.cardaconsultants.com
' Purpose   : Extract the image mime type from the img src
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - https://creativecommons.org/licenses/by-sa/4.0/
' Req'd Refs: None required
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sImgSrc   : The HTML img src attribute value
'
' Usage:
' ~~~~~~
' ? HTML_Img_GetSrcMIMEType("data:image/png;base64,iVBORw0KGgo")
'   Returns -> png
'
' Revision History:
' Rev       Date(yyyy-mm-dd)        Description
' **************************************************************************************
' 1         2023-02-25              Initial Release
'---------------------------------------------------------------------------------------
Function HTML_Img_GetSrcMIMEType(ByVal sImgSrc As String) As String
    On Error GoTo Error_Handler

    If Left(sImgSrc, 5) <> "data:" Then GoTo Error_Handler_Exit
    sImgSrc = Replace(sImgSrc, "data:image/", "")
    HTML_Img_GetSrcMIMEType = Left(sImgSrc, InStr(sImgSrc, ";") - 1)

Error_Handler_Exit:
    On Error Resume Next
    Exit Function

Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Source: HTML_Img_GetSrcMIMEType" & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl <> 0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

Usage Example

? HTML_Img_GetSrcMIMEType("data:image/png;base64,iVBORw0KGgo...")

will also return a value of:

png

My goal here was to expand my knowledge of RegEx and have a little fun. That said, if we are strictly considering performance, the plain vanilla VBA approach does perform faster than using RegEx. Ultimately, the choice is yours!