Php : Get links on a page with DomDocument

By | August 4, 2012

Scraper scripts often need to extract all links on a given page. This can be done in a number of ways like regex, domdocument etc.

Here is simple code snippet to do this using domdocument.

Function to get all links on a certain url using the DomDocument

function get_links($link)
	//return array
	$ret = array();
	/*** a new dom object ***/
	$dom = new domDocument;
	/*** get the HTML (suppress errors) ***/
	/*** remove silly white space ***/
	$dom->preserveWhiteSpace = false;
	/*** get the links from the HTML ***/
	$links = $dom->getElementsByTagName('a');
	/*** loop over the links ***/
	foreach ($links as $tag)
		$ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
	return $ret;

//Link to open and search for links
$link = "";

/*** get the links ***/
$urls = get_links($link);

/*** check for results ***/
if(sizeof($urls) > 0)
	foreach($urls as $key=>$value)
		echo $key . ' - '. $value . '<br >';
	echo "No links found at $link";

Last Updated On : 4th August 2012

Related Post

One thought on “Php : Get links on a page with DomDocument

  1. Eugene Gudkov

    Hello, I faced with the problem of drawing links from the web site. How to pull links from html I understood, but how to pull links are loaded dynamically – I do not understand. Please tell me how to pull the page link which redirects google advertising.

Leave a Reply

Your email address will not be published. Required fields are marked *