- Mac + Eclipse
- Before building the program, we need to prepare the related open-source classes, likes,
- Apache HttpComponents 4.5 (Supply HTTP interface, submit the HTTP requires to the Target URL, so as to obtain the web’s content. )
- HTML Parser 2.0 (Used to parser website, to extract url links from DOM nodes)
- MySQL Connector/J 5. 1.38 (Connect the Java with MySQL, then you can use java code to operate the database. )
Othewise, We only need to install the XAMPP(which contains all the jars that metioned above) to supply the URL port to access to MySQL database.
All codes are contained in three differents file, CrawlerMain.java, httpGet.java and parsePage.java.
The blog crawler program code using Java and MySQL is uploaded to my github.
Then, with the results, I display it using nodejs.