Finding Broken Links in a Website using Selenium Webdriver

Selenium Webdriver is the most widely used open source API to test the functionality of any website. For using selenium webdriver API we have to first install JDK(1.7 or later) , Eclipse and Selenium Java Client Driver . For installation and configuration of eclipse with webdriver, go through the blog https://webkul.com/blog/getting-started-selenium/.

After configuration, we have to create a class in our Java project to find the broken links in a website. Here we are testing the broken links in the website https://webkul.com/ .

Code :-

We have created a Links.java class on a project named Webkul under automationFramework package with following code snippet :-

package automationFramework;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Iterator;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Links {
    
    private static WebDriver driver = null;

    public static void main(String[] args) {
    	System.setProperty("webdriver.gecko.driver", "path of geckodriver");
        
        String home = "http://www.webkul.com";
        String url = "";
        HttpURLConnection huc = null;
        int respCode = 200;
        
        driver = new FirefoxDriver();
        
        driver.manage().window().maximize();
        
        driver.get(home);
        
        List<WebElement> link = driver.findElements(By.tagName("a"));
        System.out.println("Total no. of links are "
				+ link.size());
       
        Iterator<WebElement> it = link.iterator();
        
        while(it.hasNext()){
            
            url = it.next().getAttribute("href");
            
            System.out.println(url);
        
            if(url == null || url.isEmpty()){
                System.out.println("URL is either not configured for anchor tag or it is empty");
                continue;
            }
            
            if(!url.startsWith(home)){
                System.out.println("URL belongs to another domain, skipping it.");
                continue;
            }
            
            try {
                huc = (HttpURLConnection)(new URL(url).openConnection());
                
                huc.setRequestMethod("HEAD");
                
                huc.connect();
                
                respCode = huc.getResponseCode();
                
                if(respCode >= 400){
                    System.out.println(url+" is a broken link");
                }
                else{
                    System.out.println(url+" is a valid link");
                }
                    
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
   
                e.printStackTrace();
            }
        }
        
        driver.quit();

    }

Here we have used FirefoxDriver class for checking the links in firefox web browser . If we want to check it in chrome browser we have to just use chromedriver in place of geckodriver and use of ChromeDriver class in place of FirefosDriver class.

Step by step execution of Code:-

We will now learn how the code is working in finding out the links.

Start your headless eCommerce
now. Find out More

Firstly we will import all the required packages in the code. Here the required package is :-

import java.net.HttpURLConnection;

The methods under this package are used to send Java HTTP Request programmatically.

Now under the main function first of all we will set the path of geckodriver while using FirefoxDriver class. Below is the code for giving path of the driver :-

System.setProperty("webdriver.gecko.driver", "path of geckodriver");

Now we will instantiate the FirefoxDriver :-

driver = new FirefoxDriver();

For maximizing the firefox window we used following code :-

driver.manage().window().maximize();

After that we will collect all the links of the website and we will store them in a list and then traverse all the items of the list by below code :-

List<WebElement> link = driver.findElements(By.tagName("a"));
         
        Iterator<WebElement> it = link.iterator();

Now from the list of Urls we will identify and validate each url by following code snippet :-

url = it.next().getAttribute("href");  //get the href of anchor tag and store it in variable url
            
            System.out.println(url);
           
            //Check if url is null or empty 
        
            if(url == null || url.isEmpty()){
                System.out.println("URL is either not configured for anchor tag or it is empty");
                continue;
            }
            
           //Check if url belongs to main domain or any third party domain

            if(!url.startsWith(home)){
                System.out.println("URL belongs to another domain, skipping it.");
                continue;
            }

After that we are sending http request. HttpURLConnection is the class having methods to send http request and get http response in return. Here we have set request method as “HEAD” because we want only header not the full body of the document. And connect() method is establishing connection with url and sending requests.

huc = (HttpURLConnection)(new URL(url).openConnection());
                
                huc.setRequestMethod("HEAD");
                
                huc.connect();

At the end we are using getResponseCode() method so that we can get response code for the request. And on the basis of the response code we are trying to check link status.

respCode = huc.getResponseCode();
                
                if(respCode >= 400){
                    System.out.println(url+" is a broken link");
                }
                else{
                    System.out.println(url+" is a valid link");
                }

Thus, we can find all links from website and print whether links are valid or broken.

Thanks for reading this blog .

Garima Pathak

5 Badges

16 Jul, 2021
Updated by - Webkul
12 Oct, 2018
Updated by - Garima Pathak

8 comments

tayyab 5 years ago

hi
this code is for just one complete page or complete website..?
And very useful information for free.
really appreciate that!!
thanks

Shikha Sharma 5 years ago
This code is for one page only but similarly, we can check for the whole website, and thanks for your appreciation.

dipti dilpak 6 years ago

such a good blog on broken link finding in selenium it is very informative thanks for sharing keep posting

Praveen Pal (Moderator) 6 years ago
Thanks for your feedback.

priya 7 years ago

Very Good Post! Thank you so much for sharing this good post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
Angular JS Training in Electronic City

Garima Pathak (Moderator) 7 years ago
Thanks for your feedback 🙂

Sandeep 8 years ago

resp variable is not declared so its giving error

Garima Pathak (Moderator) 8 years ago
Thanks for your comment. The issue with the variable has been fixed. You can check now.

Finding Broken Links in a Website using Selenium Webdriver

Code :-

Step by step execution of Code:-

Leave a Comment Cancel Reply

8 comments