VBA Convert Foreign/Accent Characters to Plain English Characters

In many cases in can be useful to be able to “sanitize” foreign characters such as accented characters with their equivalent non-accented characters.

For instance, when trying to create a zip file in Windows (SendTo -> Compressed (zipped) folder) as it will complain if files include such characters.

Another use is when creating/naming Folders, such characters have no place (this also includes specials characters and spaces).

For example, I was looking to convert French accented characters into their English counterpart. I didn’t want to use any of the extended ascii characters since they can be the source of potential problems. So I wanted to easily switch

ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðòóôõöùúûüýÿ

to

AAAAAACEEEEIIIIOOOOOUUUUYaaaaaaceeeeiiiioooooouuuuyy

The API Approach

So, how exact could we convert a string?  Well, many years ago (I think back in 2012), I came across the following that I’ve only tweak a little (added error handling and x64 declaration).  Sadly, I don’t remember where I came across it, but I know it can be useful to some.

#If Win64 Then
    'x64 Declarations
    Private Declare PtrSafe Function FoldString Lib "kernel32" Alias "FoldStringA" _
                                                (ByVal dwMapFlags As Byte, _
                                                 ByVal lpSrcStr As LongPtr, _
                                                 ByVal cchSrc As Long, _
                                                 ByVal lpDestStr As LongPtr, _
                                                 ByVal cchdest As Long) As Long
#Else
    'x32 Declaration
    Private Declare Function FoldString Lib "kernel32" Alias "FoldStringA" _
                                        (ByVal dwMapFlags As Byte, _
                                         ByVal lpSrcStr As Long, _
                                         ByVal cchSrc As Long, _
                                         ByVal lpDestStr As Long, _
                                         ByVal cchdest As Long) As Long
#End If

'---------------------------------------------------------------------------------------
' Procedure : SanitizeForeignCharacters
' Author    : Unknown - Found only years ago, contact me should you know who created
'               this code originally
' Modified by:Daniel Pineault, CARDA Consultants Inc.
' Website   : http://www.cardaconsultants.com
' Purpose   : Convert a string by replacing foreign/accented characted with standard
'             letters.
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - https://creativecommons.org/licenses/by-sa/4.0/
' Req'd Refs: None req'd
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInputString  The string to convert
'
' Usage:
' ~~~~~~
' SanitizeForeignCharacters("LémuçelÇë")    -> LemucelCe
'
' Revision History:
' Rev       Date(yyyy/mm/dd)        Description
' **************************************************************************************
' 1         Unknown                 Initial Release
' 2         2012-08-28              Code cleanup and error handling added
' 3         2017-08-18              Added x64 bit support
' 4         2017-08-19              Corrected the i declaration as Len returns a long
'                                   *** if used in VS then Len returns an integer ***
'---------------------------------------------------------------------------------------
Function SanitizeForeignCharacters(sInputString As String) As String
    On Error GoTo Error_Handler
    Dim i                     As Long 'Integer -> for VS

    SanitizeForeignCharacters = Space(Len(sInputString))
    For i = 0 To Len(sInputString) * 2 - 2 Step 2
        FoldString &H40, StrPtr(sInputString) + i, 1, _
                   StrPtr(SanitizeForeignCharacters) + i, 1
    Next i

Error_Handler_Exit:
    On Error Resume Next
    Exit Function

Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Source: SanitizeForeignCharacters" & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl <> 0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

As such it will convert a string such as:

Par exemple, vous souhaiterez peut-être ouvrir un document qui contient une macro qui le centre de gestion de la confidentialité désactive, car le centre de gestion de la confidentialité estime la macro potentiellement dangereux.

would return

Par exemple, vous souhaiterez peut-etre ouvrir un document qui contient une macro qui le centre de gestion de la confidentialite desactive, car le centre de gestion de la confidentialite estime la macro potentiellement dangereux.

The 100% Pure VBA Approach

I’ve used the above for years and it works beautifully. That said, I also know as a developer, it is best to minimize as much as possible the use of things like ActiveX controls, References, and APIs and wanted to create a 100% VBA function to perform the same function. This is what I created for this purpose.

'---------------------------------------------------------------------------------------
' Procedure : RemoveAccents
' Author    : Daniel Pineault, CARDA Consultants Inc.
' Website   : http://www.cardaconsultants.com
' Purpose   : Replace foreign/accented charactes with standard letters within a string
' Copyright : The following is release as Attribution-ShareAlike 4.0 International
'             (CC BY-SA 4.0) - https://creativecommons.org/licenses/by-sa/4.0/
' Req'd Refs: None req'd
'
' Input Variables:
' ~~~~~~~~~~~~~~~~
' sInputString  The string to convert
'
' Usage:
' ~~~~~~
' RemoveAccents("LémuçelÇë")    -> LemucelCe
'
' Revision History:
' Rev       Date(yyyy/mm/dd)        Description
' **************************************************************************************
' 1         2017-08-18              Initial Release
'---------------------------------------------------------------------------------------
Public Function RemoveAccents(sInputString As String) As String
    On Error GoTo Error_Handler
    Dim aNotAllowed()         As String
    Dim aReplacements()       As String
    Dim i                     As Long

    aNotAllowed = Split("À,Á,Â,Ã,Ä,Å,Ç,È,É,Ê,Ë,Ì,Í,Î,Ï,Ò,Ó,Ô,Õ,Ö,Ù,Ú,Û,Ü,Ý,à,á,â,ã" & _
                        ",ä,å,ç,è,é,ê,ë,ì,í,î,ï,ð,ò,ó,ô,õ,ö,ù,ú,û,ü,ý,ÿ", ",")
    aReplacements = Split("A,A,A,A,A,A,C,E,E,E,E,I,I,I,I,O,O,O,O,O,U,U,U,U,Y,a,a,a" & _
                          ",a,a,a,c,e,e,e,e,i,i,i,i,o,o,o,o,o,o,u,u,u,u,y,y", ",")

    If UBound(aNotAllowed) <> UBound(aReplacements) Then
        MsgBox "The number of 'Not allowed characters' does not match the number of" & _
               " 'Replacements'.", vbCritical Or vbOKOnly, "Operation Terminated"
        GoTo Error_Handler_Exit
    End If

    RemoveAccents = sInputString
    For i = 0 To UBound(aNotAllowed)
        RemoveAccents = Replace(RemoveAccents, aNotAllowed(i), aReplacements(i))
    Next i

Error_Handler_Exit:
    On Error Resume Next
    Exit Function

Error_Handler:
    MsgBox "The following error has occurred" & vbCrLf & vbCrLf & _
           "Error Number: " & Err.Number & vbCrLf & _
           "Error Source: RemoveAccents" & vbCrLf & _
           "Error Description: " & Err.Description & _
           Switch(Erl = 0, "", Erl <> 0, vbCrLf & "Line No: " & Erl) _
           , vbOKOnly + vbCritical, "An Error has Occurred!"
    Resume Error_Handler_Exit
End Function

Pay particular attention to the note with states

This requires a Option Compare Binary declaration otherwise string case is ignored.

In reality this function is very versatile because it can be used to replace any characters/terms by simply updating the 2 arrays.

Also, performance testing did not reveal any difference in execution time between the 2 techniques making the choice of using the 100% pure VBA function even more appealing!

So, at the end of the day, we clearly see that there is, yet again, more than one way to perform the same task. Which you choose to use is entirely up to you, but at least now you have some options to explore.

8 responses on “VBA Convert Foreign/Accent Characters to Plain English Characters

  1. azer

    After testing in my French Access 2010 , I have an issue :
    the original string is : “andrééà LémuçelÇë”
    and the result output is : “andrEEA LEmuCelCE”
    note that I added for test the exemple given in the function code : (“LémuçelÇë”) -> LemucelCe.
    I don’t get why all the caracters with accents are replaced by capital letter
    does it have a link with the “the note with states” that I don”t understand ?
    Your function would help me build a search formula that would not require to know the accents in the name or firstname.

    1. Daniel Pineault Post author

      I don’t get the results you describe. I get andreea LemucelCe as the converted string? Not sure exactly what is going on.

      Further testing also shows that I don’t need the declaration after all. Not sure what led me to think I did but, it is working fine without it.

  2. azer

    I reply to my question
    With some research I found a proposal made on the utter access forum that does not have the same CAPITAL issue, it might bright some light ?

    Public Function RemoveAccents2(ByVal inputString As String) As String
    Const accentString As String = “ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÏÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóóôõöùúûüýÿ”
    Const nonAccentStr As String = “AAAAAACEEEEIIIIINOOOOOUUUUYaaaaaaceeeeiiiionoooooouuuuyy”
    Dim i As Integer
    For i = 1 To Len(accentString)
    inputString = Replace(inputString, Mid(accentString, i, 1), Mid(nonAccentStr, i, 1), , , vbBinaryCompare)
    Next i
    RemoveAccents2 = inputString
    End Function

  3. Alex Saavedra

    Useful thread. Here goes yet another version for extended ISO Latin charset:

    Public Function SanitizeText(ByVal inputString As String) As String
        Dim diacriticCode() As String, sanitizedCode() As String, i As Integer
        diacriticCode = Split("192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383", ",")
        sanitizedCode = Split("97,97,97,97,97,97,97,99,101,101,101,101,105,105,105,105,101,110,111,111,111,111,111,111,117,117,117,117,121,112,115,97,97,97,97,97,97,97,99,101,101,101,101,105,105,105,105,101,110,111,111,111,111,111,111,117,117,117,117,121,112,121,97,97,97,97,97,97,99,99,99,99,99,99,99,99,100,100,100,100,101,101,101,101,101,101,101,101,101,101,103,103,103,103,103,103,103,103,104,104,104,104,105,105,105,105,105,105,105,105,105,105,105,105,106,106,107,107,107,108,108,108,108,108,108,108,108,108,108,110,110,110,110,110,110,110,110,110,111,111,111,111,111,111,111,111,114,114,114,114,114,114,115,115,115,115,115,115,115,115,116,116,116,116,116,116,117,117,117,117,117,117,117,117,117,117,117,117,119,119,121,121,121,122,122,122,122,122,122,115", ",")
        For i = LBound(diacriticCode) To UBound(sanitizedCode)
            inputString = Replace(inputString, ChrW(diacriticCode(i)), Chr(sanitizedCode(i)), , , vbBinaryCompare)
        Next i
        SanitizeText = Trim(inputString)
    End Function

    I used the following table for equivalences:
    https://docs.oracle.com/cd/E29584_01/webhelp/mdex_basicDev/src/rbdv_chars_mapping.html

  4. RBD

    you need “OPTION COMPARE BINARY” statement at the top of your function. Microsoft access considers the following two letters as equivalent if you do not have Option Compare Binary declaration:

    A = a evaluates to TRUE

    With Option Compare Binary, A = a evaluates to FALSE

  5. Dennis

    Thanks Daniel,
    could you explain the need of the last comma within brakets in the arrays?
    Is it just a placeholder to add new characters in the array?

    aNotAllowed = Split(“À,…,ÿ”, “,”)
    aReplacements = Split(“A,…,y”, “,”)

    Bye