Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

How to click on all links in a page with "cURL" or "wget"?

Here you can discuss every aspect of Debian. Note: not for support requests!
Post Reply
Message
Author
hack3rcon
Posts: 746
Joined: 2015-02-16 09:54
Has thanked: 48 times

How to click on all links in a page with "cURL" or "wget"?

#1 Post by hack3rcon »

Hello,
I want to click on all links in a page by "cURL" or "wget" tool. I found a curl command, but it show me below error:

Code: Select all

$ curl -r -l 2 https://www.TARGET.com/
Warning: Invalid character is found in given range. A specified range MUST 
Warning: have only digits in 'start'-'stop'. The server's response to this 
Warning: request is uncertain.
curl: (7) Couldn't connect to server
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Invalid Header</h2>
<hr><p>HTTP Error 400. The request has an invalid header name.</p>
</BODY></HTML>
How can I fix it?

Thank you.

User avatar
Head_on_a_Stick
Posts: 14114
Joined: 2014-06-01 17:46
Location: London, England
Has thanked: 81 times
Been thanked: 132 times

Re: How to click on all links in a page with "cURL" or "wget

#2 Post by Head_on_a_Stick »

Read the man page. And what does "click on all links" mean, exactly? That doesn't seem to make any sense at all in the context of curl or wget. What are you actually trying to do?
deadbang

hack3rcon
Posts: 746
Joined: 2015-02-16 09:54
Has thanked: 48 times

Re: How to click on all links in a page with "cURL" or "wget

#3 Post by hack3rcon »

Head_on_a_Stick wrote:Read the man page. And what does "click on all links" mean, exactly? That doesn't seem to make any sense at all in the context of curl or wget. What are you actually trying to do?
Thanks.
Consider a web page that some links are in it. I like to use the wget or cURL for sending a request to that page, it's like clicking on all of the links.
I saw https://askubuntu.com/questions/639069/ ... e-webpages, but:

Code: Select all

$ curl -r -l 2 https://www.URL.com/
Warning: Invalid character is found in given range. A specified range MUST 
Warning: have only digits in 'start'-'stop'. The server's response to this 
Warning: request is uncertain.
curl: (7) Couldn't connect to server
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
<HTML><HEAD><TITLE>Bad Request</TITLE>
<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>
<BODY><h2>Bad Request - Invalid Header</h2>
<hr><p>HTTP Error 400. The request has an invalid header name.</p>
</BODY></HTML>

reinob
Posts: 1189
Joined: 2014-06-30 11:42
Has thanked: 97 times
Been thanked: 45 times

Re: How to click on all links in a page with "cURL" or "wget

#4 Post by reinob »

That's what happens when you blindly type something somebody wrote in some forum.
Read the manual.
-r in wget is not the same as -r in curl.

hack3rcon
Posts: 746
Joined: 2015-02-16 09:54
Has thanked: 48 times

Re: How to click on all links in a page with "cURL" or "wget

#5 Post by hack3rcon »

reinob wrote:That's what happens when you blindly type something somebody wrote in some forum.
Read the manual.
-r in wget is not the same as -r in curl.
In cURL:
-r, --range <range> Retrieve only the bytes within RANGE
--raw Do HTTP "raw"; no transfer decoding
I know a wget command like below cna do it, but it download whole the website:

Code: Select all

$ wget -r -p -k http://website
I just want wget send a request like click on all links.

reinob
Posts: 1189
Joined: 2014-06-30 11:42
Has thanked: 97 times
Been thanked: 45 times

Re: How to click on all links in a page with "cURL" or "wget

#6 Post by reinob »

hack3rcon wrote:
reinob wrote:That's what happens when you blindly type something somebody wrote in some forum.
Read the manual.
-r in wget is not the same as -r in curl.
In cURL:
-r, --range <range> Retrieve only the bytes within RANGE
--raw Do HTTP "raw"; no transfer decoding
I know a wget command like below cna do it, but it download whole the website:

Code: Select all

$ wget -r -p -k http://website
I just want wget send a request like click on all links.
You're going to have to define "click".
If you mean that every link on a webpage should be requested (GET /.../ HTTP/1.0, etc.) then recursive wget is what you want.

If you have a problem with wget actually storing the downloaded page, then you have another problem to deal with, which is easy enough (you can just wipe the folder when you're done). Or use "-O /dev/null".

If your "click" can also be a HEAD request (instead of a GET request), then you can use "wget --recursive --spider", which will "click" (HEAD) every link without downloading anything.

hack3rcon
Posts: 746
Joined: 2015-02-16 09:54
Has thanked: 48 times

Re: How to click on all links in a page with "cURL" or "wget

#7 Post by hack3rcon »

reinob wrote:
hack3rcon wrote:
reinob wrote:That's what happens when you blindly type something somebody wrote in some forum.
Read the manual.
-r in wget is not the same as -r in curl.
In cURL:
-r, --range <range> Retrieve only the bytes within RANGE
--raw Do HTTP "raw"; no transfer decoding
I know a wget command like below cna do it, but it download whole the website:

Code: Select all

$ wget -r -p -k http://website
I just want wget send a request like click on all links.
You're going to have to define "click".
If you mean that every link on a webpage should be requested (GET /.../ HTTP/1.0, etc.) then recursive wget is what you want.

If you have a problem with wget actually storing the downloaded page, then you have another problem to deal with, which is easy enough (you can just wipe the folder when you're done). Or use "-O /dev/null".

If your "click" can also be a HEAD request (instead of a GET request), then you can use "wget --recursive --spider", which will "click" (HEAD) every link without downloading anything.
Thank you.
Consider https://www.amazon.com/s?k=debian&i=str ... nb_sb_noss URL, you can see a list of books on that page, I want to use cURL or wget tool, to click on all books on that page. Is it clear?

Post Reply