Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Domains and regex

807603Dec 27 2007 — edited Dec 28 2007
Hi,
I'm facing the problem to extract from a URL different parts of it, in particular I'm interested in extracting the pair "second level domain" + "top level domain".
I've used the following pattern in order to extract the domain (subdomain+second level+top level domain) plus other info like the parameters.
Pattern = "\\b((https?)://([-a-zA-Z0-9.]+)(:[0-9]*)?(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?)"

The third group extracts (i.e.) "www.google.com" from a URL like "http://www.google.com:8080/?a=b&c=d", but my target is to extract "google.com".
Has anybody an advice to address this issue (it's clear that I'm not a regex expert...) ?
Thanks a lot

ny
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 25 2008
Added on Dec 27 2007
28 comments
726 views