Base40 encode and decode
962952Sep 18 2012 — edited Sep 18 2012Hi
I'm looking to compress (and decompress) some already short strings (think about the same amount of text as a search engine might show for a result).
One of the most effective ways of compressing short strings seems to be base 40 encoding as found in the accepted answer here: http://stackoverflow.com/questions/7389252/shorten-an-already-short-string-in-java
At least against my data, it seems to outperform LZF and Smaz.
However, I can't for the life of me figure out how to decode it. I've even found encode and decode implementations in C, but my C is woefully inadequate to derive the Java: http://www.drdobbs.com/embedded-systems/slimming-strings-with-custom-base-40-pac/229400732
To show willing, here's one of many attempts at writing a method to decode it:
public String unpack(byte[] input) { //FIXME: No workie.
ByteArrayInputStream bois = new ByteArrayInputStream(input);
DataInputStream dis = new DataInputStream(bois);
StringBuilder sb = new StringBuilder();
char a,b,c;
try {
while ((a = dis.readChar()) != '\0' && (b = dis.readChar()) != '\0' && (c = dis.readChar()) != '\0') {
sb.append(chars.charAt(a % 40));
sb.append(chars.charAt(b / 40 % 40));
sb.append(chars.charAt(c / 40 / 40));
}
} catch (IOException e) {
throw new AssertionError(e);
}
return sb.toString();
}
Could anyone help me out? Thanks in advance.