最近很多人问一键评教的一些细节,所以写一点在Java做爬虫程序的一些技术点。首先,什么是评教?教务系统神来之笔了一个评教系统,每门课程有大约6、7个评论选项,还要写评论,每学期课程大概有10门,最奇葩的是,如果没有完成评教请求其他的服务还会被自动拦截到评教页面,这也意味着完成这项工作耗费时间,至于对教学有没有实际作用就只有仁者见仁了。
所以,在最新一版加入了自动评教功能。这篇主要说下通过OkHttp爬虫的一些细节。
如何保存保存Session?
对于如何保存页面状态获取需要Session认证的方法,可以使用OkHttp的拦截器,当然,OKHttp提供了一个 CookieJar 的接口可以方便完成这个任务,这里示例是一个没有做持久化存储Cookie的示例。
public class EPCookieJar implements CookieJar { private final HashMap<String, List<Cookie>> cookieStore = new HashMap<>(); @Override public void saveFromResponse(HttpUrl httpUrl, List<Cookie> list) { cookieStore.put(httpUrl.host(), list); } @Override public List<Cookie> loadForRequest(HttpUrl httpUrl) { List<Cookie> cookies = cookieStore.get(httpUrl.host()); return cookies != null ? cookies : new ArrayList<Cookie>(); } }
|
构建请求
在构建请求Builder的时候设置实现的CookieJar
public OKHttpJar login(String username, String password) { OKHttpJar OKHttpJar = new OKHttpJar(); OkHttpClient client = new OkHttpClient.Builder().cookieJar(new EPCookieJar()).build(); OKHttpJar.setClient(client); String sign = String.valueOf(System.currentTimeMillis()); FormBody formBody = new FormBody.Builder().add("Action", "Login") .add("userName", username) .add("pwd", CommonUtils.getMD5String((username + sign + CommonUtils.getMD5String(password.trim())))) .add("sign", sign).build(); Request request = new Request.Builder().url(Constant.AAO_HOST + "/Common/Handler/UserLogin.ashx").post(formBody).build(); JSONObject object = new JSONObject(); OKHttpJar.setJsonObject(object); try { Response response = client.newCall(request).execute(); Integer resultCode = Integer.valueOf(response.body().string()); OKHttpJar.setResultCode(resultCode); switch (resultCode) { case 0: break; case 2: object.put("result", false); object.put("message", "账号已被封停!"); break; case 4: object.put("result", false); object.put("message", "账号或者密码错误!"); break; default: break; } } catch (IOException e) { OKHttpJar.setResultCode(-1); object.put("result", false); object.put("message", "server error!"); e.printStackTrace(); } return OKHttpJar; }
|
解析页面
使用Jsoup构造Document对象,然后就可以像JavaScript操作DOM内容了。
public List<ClassInfo> getEvaluationList(OKHttpJar OKHttpJar) { Request request = new Request.Builder().url(Constant.AAO_HOST + "/TeachingEvaluation/List.aspx").get().build(); List<ClassInfo> classInfos = new ArrayList<>(); try { OkHttpClient client = OKHttpJar.getClient(); Response response = client.newCall(request).execute(); String string = response.body().string(); // System.out.println(string); Document parse = Jsoup.parse(string); Elements links = parse.getElementsByTag("a"); for (Element link : links) { String linkHref = link.attr("href"); if (linkHref.contains("Eval.aspx?id=")) { classInfos.add(new ClassInfo(linkHref.replace("Eval.aspx?id=", ""))); } // String linkText = link.text(); } Elements TeacherElements = parse.getElementsByAttributeValueContaining("style", "width:200px;"); for (int i = 0; i < TeacherElements.size(); i++) { classInfos.get(i).setTeacher(TeacherElements.get(i).text()); } Elements ClassNameElements = parse.getElementsByAttributeValueContaining("style", "width: 300px;"); for (int i = 0; i < ClassNameElements.size(); i++) { classInfos.get(i).setClassName(ClassNameElements.get(i).text()); } Elements statusElements = parse.getElementsByClass("btn_conn1"); for (int i = 0; i < statusElements.size(); i++) { if (statusElements.get(i).text().equals("查看")) { classInfos.get(i).setEvaluated(true); } } for (int i = 0; i < classInfos.size(); i++) { classInfos.get(i).setClassId(getClassID(client, classInfos.get(i))); } } catch (IOException e) { OKHttpJar.setResultCode(-1); e.printStackTrace(); } return classInfos; }
|
public String getClassID(OkHttpClient client, ClassInfo info) { Request request = new Request.Builder().url(Constant.AAO_HOST + "/TeachingEvaluation/Eval.aspx?id=" + info.getId()).get().build(); String reslut = null; try { Response response = client.newCall(request).execute(); Document parse = Jsoup.parse(response.body().string()); Elements elements = parse.getElementsByAttributeValue("name", "teachclassid"); for (Element element : elements) { reslut = element.attr("value"); } __VIEWSTATEGENERATOR = parse.getElementById("__VIEWSTATEGENERATOR").attr("value"); __VIEWSTATE = parse.getElementById("__VIEWSTATE").attr("value"); } catch (IOException e) { e.printStackTrace(); } return reslut; }
|
到这里就已经拿到了所有完成请求的参数信息了,剩下的就不用说了吧。这个小玩具已经被归进了SequariusToys_AAOClient项目中。