I just stumbled over an issue w/ unicode characters in passwords.
I extracted my machine's account password using the windows LSARetrievePrivateData API using the Win32 Python Extensions. Result: was a unicode string with one catch: it contained the character '\ude09', a lower surrogate character with no higher surrogate in front. I don't know whether this is a Python issue, an issue with the auto-generated password or what. The password is not a valid unicode string.
Using this string in JGSS fails pre-authentication because the UTF-16LE encoder in sun.security.krb5.internal.crypto.dk.DkCrypto#charToUtf16 doesn't like the sequence and inserts an "error" sequence FDFF.
If however, I use the following encoding, authentication against our PDC works fine:
static byte[] charToUtf16(char[] chars) {
ByteBuffer buffer = ByteBuffer.allocate(2 * chars.length).order(ByteOrder.LITTLE_ENDIAN);
buffer.asCharBuffer().put(chars);
return buffer.array();
}
This is agnostic of surrogates and maybe closer to what the RFC describes:
"Each Windows UNICODE character is encoded in little-endian format of 2 octets each."
Maybe someone who's in this a little deeper than me can judge whether DkCrypto should be changed.
Thanks
Matthias