抓取《国家食品药品监督管理局》网站配方奶粉数据失败
例子:http://app1.sfda.gov.cn/datasearch/face3/c...
$client = new Client();
$guzzleClient = new GuzzleClient(array(
'timeout' => 60,
));
$client->setClient($guzzleClient);
$client->setHeader('Host', 'app1.sfda.gov.cn');
$client->setHeader('Referer', 'http://app1.sfda.gov.cn/');
$client->setHeader('User-Agent', 'Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_2 like Mac OS X) App leWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D257 Safari/9537.53');
$crawler = $client->request('GET', 'http://app1.sfda.gov.cn/datasearch/face3/content.jsp?tableId=124&tableName=TABLE124&Id=260');
Request URL: http://localhost:8081/call/datasearch/face3/content.jsp?tableId=124&tableName=TABLE124&Id=260
Request Method: GET
Status Code: 202 Accepted
Remote Address: 127.0.0.1:8081
Referrer Policy: no-referrer-when-downgrade
Crawler {#346 ▼
#uri: "http://app1.sfda.gov.cn/datasearch/face3/content.jsp?tableId=124&tableName=TABLE124&Id=260"
-defaultNamespacePrefix: "default"
-namespaces: []
-baseHref: "http://app1.sfda.gov.cn/datasearch/face3/content.jsp?tableId=124&tableName=TABLE124&Id=260"
-document: DOMDocument {#352 ▼
+nodeName: "#document"
+nodeValue: null
+nodeType: XML_HTML_DOCUMENT_NODE
+parentNode: null
+childNodes: DOMNodeList {#358 ▶}
+firstChild: DOMDocumentType {#360 …}
+lastChild: DOMElement {#362 …}
+previousSibling: null
+nextSibling: null
+attributes: null
+ownerDocument: null
+namespaceURI: null
+prefix: ""
+localName: null
+baseURI: null
+textContent: ""
+doctype: DOMDocumentType {#360 …}
+implementation: DOMImplementation {#367 ▶}
+documentElement: DOMElement {#336 …}
+actualEncoding: "utf-8"
+encoding: "utf-8"
+xmlEncoding: "utf-8"
+standalone: true
+xmlStandalone: true
+version: null
+xmlVersion: null
+strictErrorChecking: true
+documentURI: null
+config: null
+formatOutput: false
+validateOnParse: true
+resolveExternals: false
+preserveWhiteSpace: true
+recover: false
+substituteEntities: false
xml: """
<?xml version="1.0" encoding="utf-8" standalone="yes"?>\n
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n
<html xmlns="http://www.w3.org/1999/xhtml">\n
<head>\n
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n
</head>\n
<body>\n
<meta id="9DhefwqGPrzGxEp9hPaoag" content="LZC:.<Z.3:.5p.2H.@}.<@.0@.1i.Ev.C;.GD&Qh},//2wx00,9wVww{w12z+,-[/\84(na)_X0H3mpblJi]dh7O`^}gfPIy6ce*EN: ▶
<!--[if lt IE 9]><script r='m'>document.createElement("section")</script><![endif]-->\n
<script type="text/javascript" src="/4QbVtADbnLVIc/c.FxJzG50F.js?D9PVtGL=9a1adc" r="m"></script>\n
<script type="text/javascript" r="m">\n
<![CDATA[function _$AT(_$Db,_$Aq){var _$iB=_$aV(_$Db),_$DF=new _$Gz(_$Aa(_$iB/_$Aq)),_$i6=0,_$DI=0;for (;_$DI<_$iB;_$DI+=_$Aq,_$i6++ )_$DF[_$i6]=_$GM.call(_$Db, ▶
</script>\n
<script type="text/javascript" r="m">\n
<![CDATA[\n
_$pq('doGT');\n
]]>\n
</script>\n
<a href="/stream_4f7ec2a26362a/admin/" style="display:none">admin</a>\n
<a href="/stream_4f7ec2a26362a/wp-admin/" style="display:none">wp-admin</a>\n
<a href="/stream_4f7ec2a26362a/backend/" style="display:none">backend</a>\n
</body>\n
</html>\n
<html xmlns="http://www.w3.org/1999/xhtml">\n
<script type="text/javascript" r="m">\n
<![CDATA[_$Cd();]]>\n
</script>\n
</html>\n
"""
}
-nodes: array:1 [▼
0 => DOMElement {#336 …}
]
-isHtml: true
}
试过Goutte,axios等等,均返回 202 Accepted;
有没有大侠能提供点思路。
///// 没办法,孩子要吃奶粉。
这个页面内容是JS生成的。。。。。。哭
推荐文章: