Server returned HTTP response code: 403 for URL
843834Jun 25 2002 — edited May 3 2007Hi everyone, this is both a question concerning Java (specifically the URL class) and google. I've created a web-bot program, but it seems to be having some trouble with pages returned (i.e. after someone has requested a search) from www.google.com or in my case www.google.ca
I think it has something to do with hidden input fields, as my program works fine with many of the other search engines on the web (for example: www.altavista.com). That is, my program is successfully able to load the web page returned from altavista (when a user does a serach through their search engine) and then parse the webpage and traverse the links on that webpage.
Can someone suggest why the Java URL class returns:
Server returned HTTP response code: 403 for URL
I realize that 403 denotes:
Forbidden. Access to this URL is forbidden.
when I try to put the URL of a search results page from google into the constructor and subsequently call a method on the resulting URL object (for example the method: getContent()) I get an IOException with
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:691)
at java.net.URLConnection.getContent(URLConnection.java:582)
at java.net.URL.getContent(URL.java:969)
being returned in the stack trace.
Again, I have to emphasize that this problem isn't occuring when I use the search results pages from other search engines (like altavista).
Also, here is one of the URL's that I tried using for the search results from google:
http://www.google.ca/search?q=dog&hl=en&ie=UTF-8&oe=UTF8
This search results page was generated by me entering "dog" in the search text and pressing the "Google Search" button.
And here is the page generated from altavista:
http://www.altavista.com/sites/search/web?q=dog&pg=q&avkw=tgz&kl=XX
which works fine.
Thanks for any and all suggestions,
Tim