Skip to Main Content

New to Java

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Confused about Tokenizing with Regex

807601Jun 24 2008 — edited Jun 25 2008
Im using a regex pattern to tokenize a String.
The code runs fine but Im curious about the output.
Here's the code:
public class Test2 {
    	public static void main(String[] args) {
    			String[] tokens = args[1].split(args[0]);
    			for(String s : tokens)
    				System.out.println("Token: >"+s+"<");
    	}
    }  
My code prints brackets around the output to allow for whitespaces.
Here is my command line invocation where args[0] is the regex pattern to be used and args[1] is the source String:
java Test2 "\d*" "cY 39r k"
The output was:
Token: ><
Token: >c<
Token: >Y<
Token: > <
Token: ><
Token: >r<
Token: > <
Token: >k<
Am I right in saying, that at cell 0, a 'c' resides, which is a delimiter as it is not a digit so an empty String >< is printed. Cell 1 contains 'Y' which is a delimiter as it is not a digit, so >c< is printed. Then in cell 2 a whitespace resides, which is not a digit, so it therefore counts as a delimiter. but why isn't >cY< printed? Here it prints a whitespace > < which is the delimiter. I would have thought >cY< would be printed.
I read the Java tutorial on searching using Regex and if it was a search I can understand that (off the top of my head) the output would be:
"" @ start index 0 and end index 0
"" @ start index 1 end index 1
"" @ start 2 end 2
39 @ start 3 end 5
"" @ start 5 end 5
"" @ start 6 end 6
"" @ start 7 end 7
"" @ start 8 end 8

I just dont understand what's going on when using the above regex expression as a delimiter when tokenizing.
Please help!
Thank you
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 23 2008
Added on Jun 24 2008
7 comments
209 views