Java Web Scraper using JSoup – Part III

In this tutorial, I will show you how to read data from tables. Sometimes you have to develop a program that reads data from a table within an HTML Page. For example, reading jokes and its author from a site. Here’s a sample HTML page below named sample3.html.


<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Sample 3</title>
</head>
<body>
<table>
<tr>
<td>Author1</td>
<td>Joke1</td>
</tr>
<tr>
<td>Author2</td>
<td>Joke2</td>
</tr>
</table>
</body>
</html>

After you coded that html, create a new class in eclipse and name it GrabJokes and type the code below.


package org.soup.examples;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.*;

public class GrabJokes {

public static void main(String[] args) throws IOException{
File input = new File("C:\\Users\\allmankind\\Documents\\sample3.html");
Document doc = Jsoup.parse(input, "UTF-8");
Elements authors = doc.select("tr td:first-child");
Elements jokes = doc.select("tr td:last-child");

for(int i = 0; i<authors.size(); i++){
System.out.println(authors.get(i).text() + " - " + jokes.get(i).text());
}
}
}

Like the other tutorials we initialized the Document and Elements and gave them names. Then we run a loop and output it to the console. The difference is on the doc.select(“tr td:first-child”)  and doc.select(“tr td:last-child”). They mean we are selecting the first-child td within a table row which is on our case, the Author. The other selector selects the second td element within the table row which is the Joke itself.

Thank you for reading. Don’t forget to share and leave a comment.

Advertisements

7 thoughts on “Java Web Scraper using JSoup – Part III

  1. Pingback: http://www.blackhatlinks.com/generic_keyword_anchor_list.php

  2. I always spent my half an hour to read this webpage’s articles all the time along with a cup of coffee.

  3. what is the purpose of making HTML page.?? you are not using it anywhere?

  4. what i want is i have scraped data i have to display it on another html page. how could i do this?

  5. What if there are 5 ‘s in a and I just need to fetch data from second and 4th ? is there an easy wat similar to last-child/first-child ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s