VB.Net Web Scraper Using HTMLAgilityPack – Part III

In this tutorial, I will show you how to read data from tables. Sometimes you have to develop a program that reads data from a table within an HTML Page. For example, reading jokes and its author from a site. Here’s a sample HTML page below named sample3.html.


<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>Sample 3</title>
</head>
<body>
<table>
<tr>
<td>Author1</td>
<td>Joke1</td>
</tr>
<tr>
<td>Author2</td>
<td>Joke2</td>
</tr>
</table>
</body>
</html>

When you are finished with the HTML page, create a new VB.Net Project in Visual Studio. Select Console Application and name it GrabJokes. Type the codes below.


Imports HtmlAgilityPack

Module Module1

Sub Main()
Dim doc As New HtmlDocument
doc.Load("C:\\Users\\allmankind\\Documents\\sample3.html")
For Each row As HtmlNode In doc.DocumentNode.SelectNodes("//tr")
Console.Write(row.Elements("td").First().InnerText)
Console.Write(" - ")
Console.WriteLine(row.Elements("td").Last().InnerText)
Next
Console.ReadKey()
End Sub

End Module

Like the other tutorials in this series, similar codes are used. The difference is the code inside the foreach loop which selects tr and then selects the first-child td and outputs a hyphen then selects the last-child td.

Thank you for reading. Don’t forget to share and leave a comment.

Advertisements

4 thoughts on “VB.Net Web Scraper Using HTMLAgilityPack – Part III

  1. Pingback: free online article spinner tool

  2. Trying to extract with xcode through string capture and HtmlAgilityPack.If I use I use chrome to inspect element I want, then copy this as xcode to use as a string it creates “//*[@id=”ext-gen271″]/div[1]/table/tbody/tr/td[2]/div/a”
    but vb rejects as valid string and run produces errors. Can you help…

    here is the code..

    Dim URL As String = “http://www.pse.com.ph/stockMarket/marketInfo-marketActivity.html?tab=3”
    Dim SecurityName As String
    Dim SecurityNameXPath As String = “//*[@id=”ext-gen271″]/div[1]/table/tbody/tr/td[2]/div/a”
    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Doc = Web.Load(URL)
    For Each SecurityName As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(SecurityNameXPath)
    SecurityName = SecurityNameResult.InnerText
    MsgBox(SecurityName)
    Next

  3. I tried this but does not pull the data in the table

    Imports HtmlAgilityPack
    Module Module1
    Sub Main()
    Dim web As New HtmlWeb
    Dim doc As New HtmlDocument
    doc = web.Load(“http://www.pse.com.ph/stockMarket/marketInfo-marketActivity.html?tab=3”)
    For Each row As HtmlNode In
    doc.DocumentNode.SelectNodes(“//tr”)
    Console.Write(row.Elements(“td class”).First().InnerText)
    Console.Write(” – “)
    Console.WriteLine(row.Elements(“td class”).Last().InnerText)
    Next
    Console.ReadKey()
    End Sub
    End Module

    • I’m also having problem with that kind of pages. The main issue there is the data in the table is loaded dynamically. So when the page is loaded to our program, it doesn’t pass the data in the table. It returns nothing which leads to the error.
      I can help you but it will take time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s