最近很多人问一键评教的一些细节,所以写一点在Java做爬虫程序的一些技术点。首先,什么是评教?教务系统神来之笔了一个评教系统,每门课程有大约6、7个评论选项,还要写评论,每学期课程大概有10门,最奇葩的是,如果没有完成评教请求其他的服务还会被自动拦截到评教页面,这也意味着完成这项工作耗费时间,至于对教学有没有实际作用就只有仁者见仁了。
所以,在最新一版加入了自动评教功能。这篇主要说下通过OkHttp爬虫的一些细节。

如何保存保存Session?

对于如何保存页面状态获取需要Session认证的方法,可以使用OkHttp的拦截器,当然,OKHttp提供了一个 CookieJar 的接口可以方便完成这个任务,这里示例是一个没有做持久化存储Cookie的示例。

public class EPCookieJar implements CookieJar {
private final HashMap<String, List<Cookie>> cookieStore = new HashMap<>();
@Override
public void saveFromResponse(HttpUrl httpUrl, List<Cookie> list) {
cookieStore.put(httpUrl.host(), list);
}
@Override
public List<Cookie> loadForRequest(HttpUrl httpUrl) {
List<Cookie> cookies = cookieStore.get(httpUrl.host());
return cookies != null ? cookies : new ArrayList<Cookie>();
}
}

构建请求

在构建请求Builder的时候设置实现的CookieJar

public OKHttpJar login(String username, String password) {
OKHttpJar OKHttpJar = new OKHttpJar();
OkHttpClient client = new OkHttpClient.Builder().cookieJar(new EPCookieJar()).build();
OKHttpJar.setClient(client);
String sign = String.valueOf(System.currentTimeMillis());
FormBody formBody = new FormBody.Builder().add("Action", "Login")
.add("userName", username)
.add("pwd", CommonUtils.getMD5String((username + sign + CommonUtils.getMD5String(password.trim()))))
.add("sign", sign).build();
Request request = new Request.Builder().url(Constant.AAO_HOST + "/Common/Handler/UserLogin.ashx").post(formBody).build();
JSONObject object = new JSONObject();
OKHttpJar.setJsonObject(object);
try {
Response response = client.newCall(request).execute();
Integer resultCode = Integer.valueOf(response.body().string());
OKHttpJar.setResultCode(resultCode);
switch (resultCode) {
case 0:
break;
case 2:
object.put("result", false);
object.put("message", "账号已被封停!");
break;
case 4:
object.put("result", false);
object.put("message", "账号或者密码错误!");
break;
default:
break;
}
} catch (IOException e) {
OKHttpJar.setResultCode(-1);
object.put("result", false);
object.put("message", "server error!");
e.printStackTrace();
}
return OKHttpJar;
}

解析页面

使用Jsoup构造Document对象,然后就可以像JavaScript操作DOM内容了。

public List<ClassInfo> getEvaluationList(OKHttpJar OKHttpJar) {
Request request = new Request.Builder().url(Constant.AAO_HOST + "/TeachingEvaluation/List.aspx").get().build();
List<ClassInfo> classInfos = new ArrayList<>();
try {
OkHttpClient client = OKHttpJar.getClient();
Response response = client.newCall(request).execute();
String string = response.body().string();
// System.out.println(string);
Document parse = Jsoup.parse(string);
Elements links = parse.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
if (linkHref.contains("Eval.aspx?id=")) {
classInfos.add(new ClassInfo(linkHref.replace("Eval.aspx?id=", "")));
}
// String linkText = link.text();
}
Elements TeacherElements = parse.getElementsByAttributeValueContaining("style", "width:200px;");
for (int i = 0; i < TeacherElements.size(); i++) {
classInfos.get(i).setTeacher(TeacherElements.get(i).text());
}
Elements ClassNameElements = parse.getElementsByAttributeValueContaining("style", "width: 300px;");
for (int i = 0; i < ClassNameElements.size(); i++) {
classInfos.get(i).setClassName(ClassNameElements.get(i).text());
}
Elements statusElements = parse.getElementsByClass("btn_conn1");
for (int i = 0; i < statusElements.size(); i++) {
if (statusElements.get(i).text().equals("查看")) {
classInfos.get(i).setEvaluated(true);
}
}
for (int i = 0; i < classInfos.size(); i++) {
classInfos.get(i).setClassId(getClassID(client, classInfos.get(i)));
}
} catch (IOException e) {
OKHttpJar.setResultCode(-1);
e.printStackTrace();
}
return classInfos;
}

public String getClassID(OkHttpClient client, ClassInfo info) {
Request request = new Request.Builder().url(Constant.AAO_HOST + "/TeachingEvaluation/Eval.aspx?id=" + info.getId()).get().build();
String reslut = null;
try {
Response response = client.newCall(request).execute();
Document parse = Jsoup.parse(response.body().string());
Elements elements = parse.getElementsByAttributeValue("name", "teachclassid");
for (Element element : elements) {
reslut = element.attr("value");
}
__VIEWSTATEGENERATOR = parse.getElementById("__VIEWSTATEGENERATOR").attr("value");
__VIEWSTATE = parse.getElementById("__VIEWSTATE").attr("value");
} catch (IOException e) {
e.printStackTrace();
}
return reslut;
}

到这里就已经拿到了所有完成请求的参数信息了,剩下的就不用说了吧。这个小玩具已经被归进了SequariusToys_AAOClient项目中。