Java爬虫爬取wallhaven的图片
需要的jar包:jsuop
wallhaven网站拒绝java程序访问,所以要伪装报头。
发送请求时
Connection con = Jsoup.connect(imgSrc)
.header("User-agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36")
.timeout(10*1000);//伪装报头;
下载连接时
connection.setRequestProperty("User-agent", "Mozilla/4.0");//伪装报头
查看报头(以谷歌浏览器为例子)
F12 → Network → XHR → 点击Name下任意一个 → Request Payload
源码👇
package spider;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.net.URLEncoder;
import java.util.regex.*;
public class Robot {
public static void