Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Custom collation with RuleBasedCollator and apostrophe combinations

807580Mar 22 2010 — edited Mar 22 2010
Wondering if anyone can help me with a RuleBasedCollator problem (refer to: http://java.sun.com/j2se/1.4.2/docs/api/java/text/RuleBasedCollator.html).

I am working with an indigenous language that has the following dictionary sort:

DESIRED SORT ORDER:
a < 'a < b < c < d < dz < e < 'e < ee < g < gw < h < i < 'i < ii < k < k' < kw < ky < k'w < k'y < etc. < t < t' < ts' < etc.

You can see from the above that sometimes apostrophe comes in combination with a letter before the letter ( as in 'a, 'e, 'i) and sometimes after (as in k', k'w, t', ts')

There are additional unicode letters that are also part of this sort order. The following class is used by an XSLT parser (Saxon) and it works great except for the words that begin with apostrophe.
package com.lhtrees.xslt;

import java.text.Collator;
import java.text.RuleBasedCollator;
import java.text.ParseException;
import java.lang.StringBuffer;
import java.io.FileReader;
import java.io.BufferedReader;

public class LangXCollation extends RuleBasedCollator
{
  public LangXCollation() throws ParseException
  {
    super(traditionalLangXRules);
  }
  
  private static String asterisk  = new String("\u003A");
  private static String equalSign  = new String("\u003D");
  private static String hyphen  = new String("\u002D");
  private static String diacriticUnderscore = new String("\u0331");
  private static String diacriticApostrophe = new String("\u0027");
  private static String smallBarK = new String("\u1E35");
  private static String capitalBarK = new String("\u1E34");
  private static String smallBarL = new String("\u0142");
  private static String capitalBarL = new String("\u0141");
  private static String smallTildeL = new String("\u026B");
  private static String diacriticTilde = new String("\u0334");
  private static String smallUmlautU = new String("\u00FC");
  private static String capitalUmlautU = new String("\u00DC");
  private static String smallUmlautW = new String("\u1E85");
  private static String capitalUmlautW = new String("\u1E84");

  private static String traditionalLangXRules =
   ("='-';'=';'*' < a,A " +
        "< 'a,'A " +
	"< aa,Aa " +
	"< a" + diacriticUnderscore + " " +
	"< 'a" + diacriticUnderscore + " " +
	"< b,B " +
	"< c,C " +
	"< d,D " +
	"< dz,Dz " +
	"< e,E " +
	"< 'e,'E " +
	"< ee,Ee " +
	"< f,F " +
	"< g,G " +
	"< gw,Gw " +
	"< gy,Gy " +
	"< g" + diacriticUnderscore + ",G" + diacriticUnderscore + " " +
	"< h,H " +
	"< i,I " +
	"< 'i,'I " +
	"< ii,Ii " +
	"< k,K " +
	"< k'K' " +
	"< kw,Kw " +
	"< ky,Ky " +
	"< k'w,K'w " +
	"< k'y,K'y " +
	"< " + smallBarK + "," + capitalBarK + " " +
	"< " + smallBarK + "'" + "," + capitalBarK + "' " +
	"< l,L " +
	"< 'l,'L " +
	"< " + smallBarL + "," + capitalBarL + "," + smallTildeL + "," + "L" + diacriticTilde + " " +
	"< m,M " +
	"< 'm,'M " +
	"< n,N " +
	"< 'n,'N " +
	"< o,O " +
	"< 'o,'O " +
	"< oo,Oo " +
	"< p,P " +
	"< p',P' " +
	"< s,S " +
	"< t,T " +
	"< t',T' " +
	"< ts,Ts " +
	"< ts',Ts' " +
	"< u,U " +
	"< uu,Uu " +
	"< " + smallUmlautU + "," + capitalUmlautU + " " +
	"< " + smallUmlautU + smallUmlautU + "," + capitalUmlautU + capitalUmlautU + " " +
	"< w,W " +
	"< 'w,'W " +
	"< " + smallUmlautW + "," + capitalUmlautW + " " +
	"< " + "'" + "smallUmlautW," + "'" + "capitalUmlautW " +
	"< x,X " +
	"< y,Y "
	"< 'y,'Y " 
	);
}
Given the above, the words that begin with apostrophes sort at the end of the file rather than interspersed throughout.

I have tried quoting the apostrophe "'" and it makes no difference. I have tried referring to a variable diacriticApostrophe to see if that makes a difference. It does not.

If I add apostrophe to the beginning of the sort order as in:
private static String traditionalTsimshianRules =
   ("='-';'=';'*' < "'" < a,A " +
        "< 'a,'A " +
	"< aa,Aa " +
        etc. 
then all the apostrophe initial data sorts at the beginning of the file.

No matter what I have tried, I have not been able to get the DESIRED SORT ORDER.

Any help would be much appreciated.

Thanks,

Larry

Note that the - , = , and * are setup as ignored characters above.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 19 2010
Added on Mar 22 2010
4 comments
313 views