Skip to Main Content

Java Database Connectivity (JDBC)

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

How to trim String to utf-8 length and avoid GC

ppanteraMar 24 2011 — edited Mar 24 2011
Before I write Strings into the database I need to trim them so they won't be wider than the column. But Strings are Unicode and the column is UTF-8, so I need to trim the String to the length it will be when encoded in UTF-8. I've written the following utility to do this:


/**
* Note that byteLimit == 0 means no limit. Only use this if you're sure the data being written will
* never be too long.
*/
public static String shortenStringToFitInOracleColumnBytes(String str, int byteLimit) {
if (byteLimit != 0) {
if (!StringUtils.isEmpty(str)) {
try {
byte[] utf8Bytes = str.getBytes("UTF-8");
if (utf8Bytes.length > byteLimit) {
int newLength = 0;
int nextCharOffset = 0;
int charBytes;
boolean done = false;
do {
charBytes = utf8CharBytes(utf8Bytes[nextCharOffset]);
if (nextCharOffset + charBytes <= byteLimit) {
newLength++;
nextCharOffset += charBytes;
} else {
done = true;
}
} while (!done);
str = str.substring(0, newLength);
}
} catch (UnsupportedEncodingException e) {
}
} else {
str = null;
}
}
return str;
}

private static int utf8CharBytes(int utf8Byte) {
if ((utf8Byte & 0x80) == 0) {
return 1;
} else if ((utf8Byte & 0xE0) == 0xC0) {
return 2;
} else if ((utf8Byte & 0xF0) == 0xE0) {
return 3;
} else {
return 4;
}
}

This works well but I'd like to do the same thing without calling String.getBytes("UTF-8"). This converts the String to UTF-8 bytes, which I then discard and it needs to be collected.

There is no version of String.getBytes("UTF-8") where you can re-use the same byte array over and over. I wish there were.

What I need is a utility where I can send it a char and it tells me how many bytes that char would require in its UTF-8 representation.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 21 2011
Added on Mar 24 2011
2 comments
1,091 views