php怎么只获取文章文字内容
短信预约 -IT技能 免费直播动态提醒
php只获取文章文字内容的方法:1、创建一个PHP示例文件;2、通过定义“function curl_request ( $url , $post = '' , $cookie = '' , $returnCookie = 0 ) {...}”方法实现只抓取网页文字内容,并过滤其标签即可。
本教程操作环境:Windows7系统、PHP8.1版、Dell G3电脑。
php怎么只获取文章文字内容?
php只抓取网页body文字内容,并过滤网页标签
php只抓取网页文字内容,并过滤其标签,说干就干,开始!
代码如下:
<?php
function curl_request ( $url , $post = '' , $cookie = '' , $returnCookie = 0 ) {
$ua = $ua==''?$_SERVER ['HTTP_USER_AGENT']:'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)' ;
$curl = curl_init ( ) ;
curl_setopt ( $curl , CURLOPT_URL , $url ) ;
curl_setopt ( $curl , CURLOPT_USERAGENT , $ua ) ;
curl_setopt ( $curl , CURLOPT_FOLLOWLOCATION , 1 ) ;
curl_setopt ( $curl , CURLOPT_AUTOREFERER , 1 ) ;
curl_setopt ( $curl , CURLOPT_REFERER , "https://www.baidu.com" ) ;
if ( $post ) {
curl_setopt ( $curl , CURLOPT_POST , 1 ) ;
curl_setopt ( $curl , CURLOPT_POSTFIELDS , http_build_query ( $post ) ) ;
}
if ( $cookie ) {
curl_setopt ( $curl , CURLOPT_COOKIE , $cookie ) ;
}
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt ( $curl , CURLOPT_HEADER , $returnCookie ) ;
curl_setopt ( $curl , CURLOPT_TIMEOUT , 10 ) ;
curl_setopt ( $curl , CURLOPT_RETURNTRANSFER , 1 ) ;
$data = curl_exec ( $curl ) ;
if ( curl_errno ( $curl ) ) {
return curl_error ( $curl ) ;
}
curl_close ( $curl ) ;
if ( $returnCookie ) {
list ( $header , $body ) = explode ( "\r\n\r\n" , $data , 2 ) ;
preg_match_all ( "/Set\-Cookie:([^;]*);/" , $header , $matches ) ;
$info [ 'cookie' ] = substr ( $matches [ 1 ] [ 0 ] , 1 ) ;
$info [ 'content' ] = $body ;
return $info ;
} else {
//return $data ;
$data=mb_convert_encoding($data, 'UTF-8', 'UTF-8,GBK,GB2312,BIG5');
preg_match("/<body.*?>(.*?)<\/body>/is",$data,$match);
$str= trim($match[1]);
$html = strip_tags($str);
$html_len = mb_strlen($html,'UTF-8');
$html = mb_substr($html, 0, strlen($html), 'UTF-8');
$search = array(" "," ","\n","\r","\t");
$replace = array("","","","","");
echo str_replace($search, $replace, $html);
}
}
curl_request ( $url, $post = '' , $cookie = '' , $returnCookie = 0 );
?>
以上就是php怎么只获取文章文字内容的详细内容,更多请关注编程网其它相关文章!
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341