[TOC] 我们以国家企业信用信息公示系统(www.gsxt.gov.cn )做接入实例讲解 # 1.抓包获取参数 打开浏览器,开启抓包页面,访问 www.gsxt.gov.cn ,等待页面加载完成 找到响应数据中包含gt和challenge的请求。 ![](https://box.kancloud.cn/6cf05b3f56a095dcd7d37ed7c89ab7ab_775x577.png) 如图所示,http://www.gsxt.gov.cn/SearchItemCaptcha 就是配置接口,也就是验证码参数了。 请求这个接口地址,得到: ``` { "success": 1, "gt": "1d2c042096e050f07cb35ff3df5afd92", "challenge": "9ddac8ad20c828c99210fcf410ca6ba2" } ``` 通过反序列化Json或通过正则表达式取出gt和challenge。 # 2.调用识别接口进行识别 根据[识别接口文档](识别接口.md),我们给接口所需的参数进行赋值。 http://jiyanapi.c2567.com/shibie?gt=1d2c042096e050f07cb35ff3df5afd92&challenge=9ddac8ad20c828c99210fcf410ca6ba2&referer=http://www.gsxt.gov.cn&user=test&pass=test&return=json 对gt、challenge、referer、user、pass这4个参数进行赋值。其中referer就是网站的页面地址(去掉所有的参数和路径)。user参数请.[点击这里注册账户](http://jiyan.c2567.com/index.php/login/index.html) # 3.获取识别结果 识别接口返回如下: ``` { status: "ok", challenge: "3d033f099597f5ae63e2e2c902301d183z", validate: "8f6ebd56291ed6569ac40c1d74780985" } ``` 如果接口返回的status为ok则说明识别成功。 如果为no则说明识别失败。失败的具体原因可以通过错误代码的值查看对应错误码含义。 如果识别失败,请从第一步 1.抓包获取配置接口重试 # 4.提交表单 通过反序列化Json或通过正则表达式取出Validate和Challenge。然后进行参数拼接。 我们继续抓包。在搜索框输入阿里巴巴,点击查询按钮,弹出的滑块页面。 我们拖动滑块到对应位置。此时抓包窗口出现ajax.php这个请求,这个请求就是的行为验证接口。验证通过后,会返回一个validate,而我们提供的识别接口就是模拟整套交互流程,拿到这个validate。 我们看验证通过之后的第一个请求,是提交数据到http://www.gsxt.gov.cn/corp-query-search-1.html ,这个就是我们的提交表单了。 ![](https://box.kancloud.cn/142f7a9daf45512852a29a42998a434d_816x639.png) POST提交的参数: ``` tab:ent_tab token:34287916 searchword:阿里巴巴 geetest_challenge:3d033f099597f5ae63e2e2c902301d183z geetest_validate:8f6ebd56291ed6569ac40c1d74780985 geetest_seccode:8f6ebd56291ed6569ac40c1d74780985|jordan ``` 其中geetest_challenge、geetest_validate、geetest_seccode这3个参数是额外注入的表单的参数。 其他参数都是网站本身的参数,网站本身的参数如何来的这就得结合网站自身去研究了,与我们的识别接口没有关联 我们证识别的目标就是拿到这3个参数的值, `geetest_challenge`的值为接口中返回的`Challenge` `geetest_validate`的值为接口中返回的`Validate` `geetest_seccode`的值为Validate值加上固定字符串“|jordan”, 到这里,整套识别就结束了 # 伪代码实现 ``` var user="此处填入您的user"; var pass="此处填入您的pass"; //请求配置接口获取gt和challenge var configJsonText = Http.Get("http://www.gsxt.gov.cn/SearchItemCaptcha"); //通过Json库将json字符串反序列号成对象jsonObject object jsonObject = Json.Parse(configJsonText); //取出gt和challenge var gt = jsonObject.gt; var challenge = jsonObject.challenge; //设置HTTP的超时时间为60秒(6万毫秒) Http.Timeout = 60000; //请求识别接口 var result = Http.Get(" http://jiyanapi.c2567.com/shibie?gt="+gt+"&challenge="+challenge+"&referer=http://www.gsxt.gov.cn&user="+user+"&pass=" + pass + "&return=json"); ///通过Json库将json字符串反序列号成对象apiResultObject object apiResultObject = Json.Parse(result); //判断识别结果 if(apiResultObject.status == "ok") { //识别成功 //组装下一步需要的请求参数 var postParams = "tab=ent_tab&token=34287916&searchword=%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4&geetest_challenge=" + apiResultObject.challenge + "&geetest_validate="+ apiResultObject.validate + "&geetest_seccode=" + apiResultObject.Validate + "%7Cjordan"; //提交表单 var searchResult = Http.Post("http://www.gsxt.gov.cn/corp-query-search-1.html" , postParams); } else { //识别失败 //注意:识别失败的话 重新请求返回验证码参数的地址,获取gt和challenge } ``` # 另一份代码参考 javascript的代码用的是java的javascript引擎,用python的jpype去调用java的对象。因为试了好多python的js库,PYv8装起来太麻烦。而且对js的eval函数支持不是很好,后面就用了java 的js引擎。 java的话打包成jar或者class文件,java用的是1.8版本 python的代码: ``` #coding:UTF-8 import json import re import threading import time import jpype import redis import requests from bs4 import BeautifulSoup from jpype import * jpype.startJVM(jpype.getDefaultJVMPath(), "-ea", "-Djava.class.path=/code/java/forpython/target/classes/") class SearchItem(threading.Thread): session=requests.session() keyword="" proxy="" semaphore=None def getGTChallenge(self): print "getGTChallenge start" loginurl="http://www.gsxt.gov.cn/SearchItemCaptcha" result=self.session.get(loginurl) if "y.replace(" not in result.text: raise Exception("被屏蔽了") mycookies= result.cookies jpype.attachThreadToJVM() jpype.isThreadAttachedToJVM() A = jpype.JClass("com.GovTest") self.Aobj=A() fu=self.Aobj.challenge(result.text) print "fu="+fu jslarr= fu.split("=") jsl_clearance=jslarr[1] self.session.cookies['__jsl_clearance']=jsl_clearance result=self.session.get(loginurl) challengeJson=json.loads(result.text) return challengeJson def getImageGif(self): print "getImageGif start" url="http://www.gsxt.gov.cn/corp-query-custom-geetest-image.gif?v=" localTime=time.localtime(time.time()) url=url+str(localTime.tm_min+localTime.tm_sec) resp=self.session.get(url) aaa=self.Aobj.getImageGif(resp.text) matchObj = re.search( 'location_info = (\d+);', aaa) if matchObj: return matchObj.group(1) else: Exception("没有找到location_info") def getValidateInput(self,location_info): print "getValidateInput start" url="http://www.gsxt.gov.cn/corp-query-geetest-validate-input.html?token="+location_info resp=self.session.get(url) aaa=self.Aobj.getImageGif(resp.text) matchObj = re.search( 'value: (\d+)}', aaa) if matchObj: location_info= matchObj.group(1) token=int(location_info) ^ 536870911; print "token=",token return str(token) else: Exception("没有找到location_info") def searchTest(self,keyword): print "searchTest start" url="http://www.gsxt.gov.cn/corp-query-search-test.html?searchword="+keyword resp=self.session.get(url); print "searchTest ",resp.text def jianYan(self,challengeJson): print "jianYan start" url="http://jiyanapi.c2567.com/shibie?user=帐号&pass=密码&gt="+challengeJson["gt"]+"&challenge="+challengeJson["challenge"]+"&referer=http://www.gsxt.gov.cn&return=json&format=utf8" sess=requests.session() resp=sess.get(url); jiyanJson= json.loads(resp.text) print resp.text return jiyanJson def querySearch(self,jiYanJson,token,keyword): print "querySearch start" url="http://www.gsxt.gov.cn/corp-query-search-1.html" postData={ 'tab':'ent_tab', 'province':'', 'geetest_challenge':jiYanJson['challenge'], 'geetest_validate':jiYanJson['validate'], 'geetest_seccode':jiYanJson['validate']+'|jordan', 'token':token, 'searchword':keyword } resp=self.session.post(url,postData) return resp.text ,postData def dealPageUrl(self,html): print "dealPageUrl start" soup = BeautifulSoup(html,"html.parser") urlsItem=soup.find_all("a",class_="search_list_item db") pageNums=0 for urlItem in urlsItem: print "urlItem['href']=",urlItem['href'] if len(urlsItem)>1: pageForm=soup.find_all(id="pageForm") tabAs=pageForm[0].find_all("a",text=re.compile("\d+")) pageNums=len(tabAs) return pageNums def dealPageUrlNum(self,pageNums,postData): print "dealPageUrlNum start" url="http://www.gsxt.gov.cn/corp-query-search-advancetest.html" for i in range(pageNums): postData['page']=i+1 resp=self.session.get(url,params=postData) soup = BeautifulSoup(resp.text) urlsItem=soup.find_all("a",class_="search_list_item db") for urlItem in urlsItem: print "urlItem['href']=",urlItem['href'] def getCorpUrl(self): self.session.timeout=1 self.session.max_redirects=1 if self.proxy: self.session.proxies={ "http": "http://"+self.proxy, "https": "http://"+self.proxy, } headers={'Host': 'www.gsxt.gov.cn', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', 'Accept-Encoding': 'gzip, deflate', 'Referer': 'http://www.gsxt.gov.cn/SearchItemCaptcha', 'Connection': 'keep-alive', 'Upgrade-Insecure-Requests': '1', 'Cache-Control': 'max-age=0, no-cache'} self.session.headers=headers challengeJson=self.getGTChallenge() localtion_info= self.getImageGif() token=self.getValidateInput(localtion_info) self.searchTest(self.keyword) jiyanJson=self.jianYan(challengeJson) html,postData=self.querySearch(jiyanJson,token,self.keyword) pageNums=self.dealPageUrl(html) print 'pageNums=',pageNums self.dealPageUrlNum(pageNums,postData) return 1 def run(self): try: self.getCorpUrl() except Exception,e: print "run exception ",e.message self.session.close() self.semaphore.release() print "search Item run finish" def __init__(self, keyword,proxy,semaphore): threading.Thread.__init__(self) self.keyword = keyword self.proxy = proxy self.semaphore = semaphore semaphore=threading.Semaphore(1) while 1: try: semaphore.acquire() t1=SearchItem("百度",None,semaphore) t1.start() except Exception, e: print 'main e.message:\t', e.message time.sleep(1) ``` java代码 ``` package com; import javax.script.ScriptEngine; import javax.script.ScriptEngineManager; import javax.script.ScriptException; public class GovTest { private ScriptEngine scriptEngine; public GovTest() { ScriptEngineManager scriptEngineManager = new ScriptEngineManager(); this.scriptEngine = scriptEngineManager.getEngineByName("JavaScript"); } public String challenge(String resp){ resp = resp.substring(8); String tmp[] = resp.split("</script"); resp = tmp[0]; resp = resp.replace("eval(y.replace", "var aaa=(y.replace"); resp = resp + "aaa=aaa.replace("h=document.createElement('div');","");aaa=aaa.replace("h.innerHTML='<a href=\\\'/\\\'","");\n" + "aaa=aaa.replace(">x</a>';","");aaa=aaa.replace("h=h.firstChild.href;","h='http://www.gsxt.gov.cn/';");aaa=aaa.replace("while(window._phantom||window.__phantomas){};","");bbb=aaa.split("setTimeout");\n" + " aaa=bbb[0]+"return dc;}}";\n" + " aaa=aaa.replace("var l=","{fa:");\n" + " var ffa=eval("("+aaa+")");\n" + " var fffa=ffa.fa();"; System.out.println(resp); String script = resp; try { scriptEngine.eval(script); } catch (ScriptException e) { return e.getMessage(); } String bbb = (String) scriptEngine.get("fffa"); System.out.println(bbb); return bbb; } public String getImageGif(String resp){ String script="function dd(){var json="+resp+";return json.map( function(item){ return String.fromCharCode(item);}).join('');}" + "var ggg=dd();"; try { scriptEngine.eval(script); } catch (ScriptException e) { return e.getMessage(); } String bbb = (String) scriptEngine.get("ggg"); return bbb; } public static void main(String[] s){ new GovTest().challenge("<script>var x="while@div@substr@setTimeout@26@window@9@5@03@location@cookie@String@l@dc@37@GMT@a@href@if@0@length@__jsl_clearance@4@toLowerCase@_phantom@var@f@challenge@1517192797@reverse@match@085@join@3@cd@Mon@catch@fromCharCode@charAt@firstChild@Path@createElement@document@Expires@29@innerHTML@addEventListener@__phantomas@i@eval@captcha@h@replace@x@https@Jan@for@try@r@2@function@18@return@1500@e@false@DOMContentLoaded@else@attachEvent@onreadystatechange".replace(/@*$/,"").split("@"),y="1a d=3d(){1(6.19||6.30){};1a 23,e='16=1d.20|14|';1a 1b=[3d(36){3f 32('c.26('+36+')')},(3d(){1a 34=2b.2a('2');34.2e='<11 12=\\\'/\\\'>36</11>';34=34.28.12;1a 3b=34.1f(/37?:\\\\/\\\\//)[14];34=34.3(3b.15).18();3f 3d(36){39(1a 31=14;31<36.15;31++){36[31]=34.27(36[31])};3f 36.21('')}})()];23=[[[3c+8]+[-~-~(+[])],(22+[]+[[]][~~[]])+[3c+8],(8+[[]][~~[]])+[(+[])],[3c+8]+[(+[])],[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+[(+[])],(8+[[]][~~[]])+[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]],(8+[[]][~~[]])+[3c+8],[3c+8]+[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]],(22+[]+[[]][~~[]])+[3c+8],(8+[[]][~~[]])+[(+[])],[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]]+[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]],[3c+8]+[(+[])],[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]]+(8+[[]][~~[]]),(-~(+[])+[[]][~~[]])+[(+[])]+[17],[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+(8+[[]][~~[]]),(7+[])+[(+[])]],[[(+[])]],[(-~(+[])+[[]][~~[]])+[(+[])]+[-~-~(+[])],(8+[[]][~~[]])+[(+[])],[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+(22+[]+[[]][~~[]])],[[3c+8]],[[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+[-~-~(+[])]],[[(+[])],[17],[17]],[[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]]+[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]],[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+[17],(-~(+[])+[[]][~~[]])+(-~(+[])+[[]][~~[]])+[17],(-~(+[])+[[]][~~[]])+[(+[])]+(7+[]),[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]+[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]],[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]]+(8+[[]][~~[]]),(22+[]+[[]][~~[]])+[3c+8],(8+[[]][~~[]])+(-~(+[])+[[]][~~[]]),[-~[(-~(+[])<<-~(+[]))]-~[(-~(+[])<<-~(+[]))]]+[-~-~(+[])-~[([(-~(+[])<<-~(+[]))]+~~![]>>(-~(+[])<<-~(+[])))]]]];39(1a 31=14;31<23.15;31++){23[31]=1b.1e()[(-~(+[])+[[]][~~[]])](23[31])};23=23.21('');e+=23;4('a.12=a.12.35(/[\\\\?|&]33-1c/,\\\'\\\')',40);2b.b=(e+';2c=24, 2d-38-3e 9:5:f 10;29=/;');};13((3d(){3a{3f !!6.2f;}25(41){3f 42;}})()){2b.2f('43',d,42);}44{2b.45('46',d);}",z=0,f=function(x,y){var a=0,b=0,c=0;x=x.split("");y=y||99;while((a=x.shift())&&(b=a.charCodeAt(0)-77.5))c=(Math.abs(b)<13?(b+48.5):parseInt(a,36))+y*c;return c},g=y.match(/\\b\\w+\\b/g).sort(function(x,y){return f(x)-f(y)}).pop();while(f(g,++z)-x.length){};eval(y.replace(/\\b\\w+\\b/g, function(y){return x[f(y,z)-1]}));</script>"); } } ```