java 爬虫二
解析页面元素 —jsoup
简介:Jsoup是一款基于Java的HTML解析器,它可以方便地从网页中抓取和解析数据。它的主要作用是帮助开 发者处理HTML文档,提取所需的数据或信息。
常用的方法
- 选择器(Selector)API:用于根据CSS选择器语法选择HTML元素。
1 2 3 4 5 6 7 8 9 10 11
|
Elements elements = doc.select("li");
Elements divs = doc.select("div.news");
Element header = doc.select("div#header").first();
|
- 属性(Attribute)API:用于获取、设置和移除HTML元素的属性。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|
String href = element.attr("href");
element.attr("href", "http://example.com");
element.removeAttr("href");
Attributes attributes = element.attributes()
Elements links = doc.select("a[href^=http]");
Elements divs = doc.select("div[class~=news]");
|
- 遍历(Traversal)API:用于遍历HTML文档中的元素。
1 2 3 4 5 6 7 8 9 10
|
Element parentElement = element.parent();
Elements childrenElements = element.children();
Element nextSiblingElement = element.nextElementSibling()
|
- 操作(Manipulation)API:用于修改HTML文档中的元素和属性
1 2 3 4 5 6 7 8 9 10 11 12
|
String html = element.html();
String text = element.text();
element.append("<p>这是一个新段落</p>");
element.appendText("这是一段新文本");
|
1 2 3 4
| Elements paragraphs = doc.select("div p");
Elements directParagraphs = doc.select("div > p");
|
- 读取文件
1 2 3
| File input = new File("D:\\works\\out_codes\\javas\\learn\\untitled\\src\\spider\\dom.html"); Document doc = Jsoup.parse(input, "UTF-8");
|
- 代码demo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| package test1; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import java.io.IOException; class Doms{ public static void test(){ String html ="<div><p> hello world </p></div>"; Document dcc =Jsoup.parse(html); System.out.println(dcc); } } public class one { public static void main(String[] args) throws IOException { Doms.test(); } }
|
连接MySQL
jar包地址:https://mvnrepository.com/artifact/mysql/mysql-connector-java
根据自己的mysql版本 下载对应的jar包
- demo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| class connect_mysql{ public static void conn(){ String url ="jdbc:mysql://10.1.1.124:3306/test"; String username ="root1"; String password ="1234Abcd+"; try{ Connection connection =DriverManager.getConnection(url,username,password); String sql = "INSERT INTO test_table (id ,name) VALUES (?,?)"; PreparedStatement preparedStatement =connection.prepareStatement(sql); preparedStatement.setString(1,"1"); preparedStatement.setString(2,"张三"); preparedStatement.executeLargeUpdate(); connection.close();
} catch (SQLException e){ e.printStackTrace(); } } }
|
- 使用python 写一个建表建库,删库的方法便于测试
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
| import pymysql
class test: def __init__(self): self.conn = pymysql.connect( host='10.1.1.124', port=3306, user='root1', password='1234Abcd+', ) def create_database_and_table(self): try: if self.conn.open: print("连接数据库成功") cursor = self.conn.cursor() cursor.execute("CREATE DATABASE IF NOT EXISTS test;") print("创建数库库 test 成功.") cursor.execute("USE test;") cursor.execute("CREATE TABLE IF NOT EXISTS test_table (" "id INT AUTO_INCREMENT PRIMARY KEY," "name VARCHAR(255) NOT NULL" ");") print("创建表'test_table'成功.") self.conn.commit() cursor.close() self.conn.close() print("MySQL 连接关闭.")
except pymysql.MySQLError as e:
print(f"连接数据库报错: {e}") def delete_database(self,database_name): try: if self.conn.open: print("连接数据库成功") cursor = self.conn.cursor() cursor.execute(f"DROP DATABASE IF EXISTS {database_name};") print(f"Database '{database_name}' deleted successfully.") self.conn.commit() cursor.close() self.conn.close() print("MySQL 连接关闭.")
except pymysql.MySQLError as e: print(f"连接数据库报错: {e}")
test.create_database_and_table()
|
