Back to Top

Finding Broken Links in a Website using Selenium Webdriver

Updated 16 July 2021

Selenium Webdriver is the most widely used open source API  to test the functionality of any website. For using selenium webdriver API we have to first install JDK(1.7 or later) , Eclipse and Selenium Java Client Driver . For installation and configuration of eclipse with webdriver, go through the blog https://webkul.com/blog/getting-started-selenium/.

After configuration, we have to create a class in our Java project to find the broken links in a website. Here we are testing the broken links in the website https://webkul.com/ .

Code :-

We have created a Links.java class on a project named Webkul under automationFramework package with following code snippet :-

package automationFramework;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Iterator;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class Links {
    
    private static WebDriver driver = null;

    public static void main(String[] args) {
    	System.setProperty("webdriver.gecko.driver", "path of geckodriver");
        
        String home = "http://www.webkul.com";
        String url = "";
        HttpURLConnection huc = null;
        int respCode = 200;
        
        driver = new FirefoxDriver();
        
        driver.manage().window().maximize();
        
        driver.get(home);
        
        List<WebElement> link = driver.findElements(By.tagName("a"));
        System.out.println("Total no. of links are "
				+ link.size());
       
        Iterator<WebElement> it = link.iterator();
        
        while(it.hasNext()){
            
            url = it.next().getAttribute("href");
            
            System.out.println(url);
        
            if(url == null || url.isEmpty()){
                System.out.println("URL is either not configured for anchor tag or it is empty");
                continue;
            }
            
            if(!url.startsWith(home)){
                System.out.println("URL belongs to another domain, skipping it.");
                continue;
            }
            
            try {
                huc = (HttpURLConnection)(new URL(url).openConnection());
                
                huc.setRequestMethod("HEAD");
                
                huc.connect();
                
                respCode = huc.getResponseCode();
                
                if(respCode >= 400){
                    System.out.println(url+" is a broken link");
                }
                else{
                    System.out.println(url+" is a valid link");
                }
                    
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
   
                e.printStackTrace();
            }
        }
        
        driver.quit();

    }

Here we have used FirefoxDriver class for checking the links in firefox web browser . If we want to check it in chrome browser we have to just use chromedriver in place of geckodriver and use of ChromeDriver class in place of FirefosDriver class.

Step by step execution of Code:-

We will now learn how the code is working in finding out the links.

Start your headless eCommerce
now.
Find out More

Firstly we will import all the required packages in the code. Here the required package is :-

import java.net.HttpURLConnection;

The methods under this package are used to send Java HTTP Request programmatically.

Now under the main function first of all we will set the path of geckodriver while using FirefoxDriver class. Below is the code for giving path of the driver :-

System.setProperty("webdriver.gecko.driver", "path of geckodriver");

Now we will instantiate the FirefoxDriver :-

driver = new FirefoxDriver();

For maximizing the firefox window we used following code :-

driver.manage().window().maximize();

After that we will collect all the links of the website and we will store them in a list and then traverse all the items of the list by below code :-

List<WebElement> link = driver.findElements(By.tagName("a"));
         
        Iterator<WebElement> it = link.iterator();

Now from the list of Urls we will identify and validate each url by following code snippet :-

url = it.next().getAttribute("href");  //get the href of anchor tag and store it in variable url
            
            System.out.println(url);
           
            //Check if url is null or empty 
        
            if(url == null || url.isEmpty()){
                System.out.println("URL is either not configured for anchor tag or it is empty");
                continue;
            }
            
           //Check if url belongs to main domain or any third party domain

            if(!url.startsWith(home)){
                System.out.println("URL belongs to another domain, skipping it.");
                continue;
            }

After that we are sending http request. HttpURLConnection is the class having methods to send http request and get http response in return. Here we have set request method as “HEAD” because we want only header not the full body of the document. And connect() method is establishing connection with url and sending requests.

huc = (HttpURLConnection)(new URL(url).openConnection());
                
                huc.setRequestMethod("HEAD");
                
                huc.connect();

At the end we are using getResponseCode() method so that we can get response code for the request. And on the basis of the response code we are trying to check link status.

respCode = huc.getResponseCode();
                
                if(respCode >= 400){
                    System.out.println(url+" is a broken link");
                }
                else{
                    System.out.println(url+" is a valid link");
                }

Thus, we can find all links from website and print whether links are valid or broken.

Thanks for reading this blog .

. . .

Leave a Comment

Your email address will not be published. Required fields are marked*


8 comments

  • tayyab
    • Shikha Sharma
  • dipti dilpak
    • Praveen Pal (Moderator)
  • priya
    • Garima Pathak (Moderator)
  • Sandeep
    • Garima Pathak (Moderator)
  • Back to Top

    Message Sent!

    If you have more details or questions, you can reply to the received confirmation email.

    Back to Home