明薄淡.说传.
谁用过curl抓取过百度知道啊
$url = "https://zhidao.baidu.com/search?word=%CA%C7%D5%E6%B5%C4%C2%F0";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;exit;
为什么抓不到啊
szzl0z0z.槽吐.
timeout设置长点呢??淡薄明
明薄淡.说传.
也不行
应该不是时间的原因,百度知道打开又不慢
szzl0z0z.槽吐.
加上这个头信息试下 明薄淡.说传.
这个头信息一定要加自己浏览器上的吗?
山景李.槽吐.
成都的 国信安 招聘 php讲师,有兴趣的,可以去网上看看,邦朋友扩散的,感谢!
明薄淡.说传.
$url = "tieba.baidu.com";
$ch = curl_init();
$timeout = 60;
$headers = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36';
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用户检测的网页里需要增加下面两行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
$contents = curl_exec($ch);
curl_close($ch);
var_dump($contents);exit;还是不行szzl0z0z.槽吐.
这是啥用法 明薄淡.说传.
改成数组也没用啊
$headers = array('Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36',
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding:gzip, deflate, sdch, br',
'Accept-Language:zh-CN,zh;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Cookie:BAIDUID=52D9D3AE97D2DF649C5E55B85986C6FD:FG=1; Hm_lvt_6859ce5aaf00fb00387e6434e4fcc925=1473401376; Hm_lpvt_6859ce5aaf00fb00387e6434e4fcc925=1473401389',
'Host:zhidao.baidu.com',
'Upgrade-Insecure-Requests:1',
);
全部的header信息都加进去了也没用啊
��♓阅� �.跃活.
伪装成百度爬虫 明薄淡.说传.
额你怎么知道百度蜘蛛的ip的啊 ��♓阅� �.跃活.
我用这个成功抓到了阿里云的数据
��♓阅� �.跃活.
/**
* cURL获取网页内容
* @author huliangming<215628355@qq.com> 哥哥要变百度蜘蛛了
* @param [type] [param]
* @return [type] [description]
*/
public static function GetContent( $url )
{
$ch = curl_init();
$ip = '220.181.108.91'; // 百度蜘蛛
$timeout = 15;
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_TIMEOUT,0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // 对认证证书来源的检查
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE); // 从证书中检查SSL加密算法是否存在
//伪造百度蜘蛛IP
curl_setopt($ch,CURLOPT_HTTPHEADER,array('X-FORWARDED-FOR:'.$ip.'','CLIENT-IP:'.$ip.''));
//伪造百度蜘蛛头部
curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
$content = curl_exec($ch);
if($content === false)
{//输出错误信息
$no = curl_errno($ch);
switch(trim($no))
{
case 28 : $error = '访问目标地址超时'; break;
default : $error = curl_error($ch); break;
}
return $error;
}
else
{
return $content;
}
}
��♓阅� �.跃活.
$url = "https://zhidao.baidu.com/search?word=b" ;
$data = Curl::GetContent($url);
$data = iconv("gb2312",'utf-8',$data);
echo $data;die;
明薄淡.说传.
三克油