Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regex - how to remove duplicated substrings?

800308Nov 27 2007 — edited Nov 28 2007
Folks,

Is Doc Jam in da' house?

I'm trying to remove duplicate points (ie: zero-length-lines) from the Well Known Text representation of various geometries.

I think this code is "fairly close" to what I'm after... but no bananas yet
import java.util.regex.Pattern;
import java.util.regex.Matcher;

class KrcHarness
{
    /**
     * removeDuplicatePointsFromWkt
     *
     * @param actualWkt String
     * @return String
     */
    private static String removeDuplicatePointsFromWkt(String wkt) {
        Pattern pattern = Pattern.compile("(\\d+(\\.\\d+)? -\\d+(\\.\\d+)?( \\d+(\\.\\d+)?)?,)\\1");
        while(true) {
            Matcher matcher = pattern.matcher(wkt);
            if(!matcher.find()) break;
            String group = matcher.group();
            String replacement = group.substring(0,group.indexOf(",")+1);
            wkt = matcher.replaceAll(replacement);
            System.out.println("DEBUG: group=\""+group+"\", replacement=\""+replacement+"\", wkt=\""+wkt+"\"");
        }
        return wkt;
    }

    public static void main(String[] args) {
        String actual = "MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150 -15.05,150 -15,150 -15,150.35 -15.35,150.35 -15,150.35 -15,150.4 -15)))";
        System.out.println("actual   : "+actual);
        System.out.println("clean    : "+removeDuplicatePointsFromWkt(actual));
    }

}
I get the output
actual   : MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150 -15.05,150 -15,150 -15,150.35 -15.35,150.35 -15,150.35 -15,150.4 -15)))
DEBUG: group="150 -15.05,150 -15.05,", replacement="150 -15.05,", wkt="MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150 -15.05,150.35 -15.35,150 -15.05,150.4 -15)))"
DEBUG: group="150 -15.05,150 -15.05,", replacement="150 -15.05,", wkt="MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150.35 -15.35,150 -15.05,150.4 -15)))"
clean    : MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150.35 -15.35,150 -15.05,150.4 -15)))
I need the output
Actual : MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150 -15.05,150 -15,150 -15,150.35 -15.35,150.35 -15,150.35 -15,150.4 -15)))
===================== =============== =====================
manual : MULTIPOLYGON (((150.4 -15,150.4 -15.45,150 -15.05,150 -15,150.35 -15.35,150.35 -15,150.4 -15)))


I don't understand why it's find the same string twice, and why it doesn't find the second & third repeated-group. I must be missing something basic.

Please Does anyone know how to remove a series of repetitions from a string?

Thanx for any help... Keith.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 26 2007
Added on Nov 27 2007
15 comments
503 views