Here is a script which removes diacritic marks. It appears to work, but I'm not entirely sure what is happening. There was a certain amount of trial and error involved in writing this, so I would like someone with more experience to look at it. It feels like a clumsy technique I'm using. Diacritics should be removed from both extended ansi and from unicode. I couldn't find a way to do this without calling _WinAPI_MultiByteToWideChar twice.
I'm sure I don't need all this code.
[ autoit ]
#include <WinAPI.au3> #region - Example Local $sTestString = "" ;For $i = 192 To 255 ; $sTestString &= Chr($i) ; extended latin from Windows 1252 code page ;Next For $i = 256 To 382 $sTestString &= ChrW($i) ; Extended latin alpha characters Next Local $newString = _StripDiacriticMarks($sTestString) MsgBox(0, "", $sTestString & @LF & @LF & $newString) #endregion Func _StripDiacriticMarks($sText) If Not IsString($sText) Then Return SetError(1, 0, $sText) Local $sCurrChar, $sSplitChar, $sElement, $sNewString = "" For $i = 1 To StringLen($sText) $sCurrChar = StringMid($sText, $i, 1) $sSplitChar = _WinAPI_MultiByteToWideChar($sCurrChar, 3, $MB_COMPOSITE) $sElement = DllStructGetData($sSplitChar, 1) If StringIsAlpha($sElement) Then $sCurrChar = $sElement ElseIf DllStructGetSize($sSplitChar) > 4 Then $sSplitChar = _WinAPI_MultiByteToWideChar($sCurrChar, 3, $MB_COMPOSITE, True) For $j = 1 To Stringlen($sSplitChar) $sElement = StringMid($sSplitChar, $j, 1) If StringIsAlpha($sElement) Then $sCurrChar = $sElement ExitLoop EndIf Next EndIf $sNewString &= $sCurrChar Next Return $sNewString EndFunc
I'm sure I don't need all this code.