WP-Mix

A fresh mix of code snippets and tutorials

Find All URLs in a String with PHP

While developing my WordPress chat plugin, SAC Pro, I needed a way to get all URLs from a string. This enabled me to find any URLs that were included in chat messages, so I could apply HTML formatting and convert the raw URLs into actual clickable hyperlinks.

Here is the magic PHP code, along with some useful comments.

// define the URL that we want to find
$url = 'https://wp-mix.com/wp-mix-celebrates-11-years/';

// escape special characters including slash
$url = preg_quote($url, '/');

// define the regex to find urls
$regex = '/\s?('. $url .')\s?/i'; 

// find all urls in this string
$haystack = 'WP-Mix Celebrates 11 Years! https://wp-mix.com/wp-mix-celebrates-11-years/ Thank you to all of our readers!';

// using preg_match_all returns all instances of the url
preg_match_all($regex, $haystack, $matches);

// $matches contains all matched urls
var_dump($matches);

It’s pretty self-explanatory, just using basic PHP functions. A couple of notes to keep in mind. First, about escaping the $url with preg_quote(), the result for the current URL will look like this:

https\:\/\/wp\-mix\.com\/wp\-mix\-celebrates\-11\-years\/

The other thing to keep in mind. When you var_dump() the $matches variable, the above code will output the following:

array(2) {
	
	[0]=> 
		array(2) {
			[0]=> string(48) " https://wp-mix.com/wp-mix-celebrates-11-years/ "
			[1]=> string(46) "https://wp-mix.com/wp-mix-celebrates-11-years/"
		}
	
}

So $matches[0] is the URL including any whitespace, while $matches[1] is the URL without any whitespace. These matches correspond to how the regular expression (regex) is written.

Multiple URLs in a string

The above example includes only one instance of the target URL in the string. Here is what happens when there are multiple instances of the URL. Say the string $haystack looks like this, with two URLs:

$haystack = 'WP-Mix Celebrates 11 Years! https://wp-mix.com/wp-mix-celebrates-11-years/ Thank you to all of our readers! Here again is the URL: https://wp-mix.com/wp-mix-celebrates-11-years/';

If we apply our code to that string and then dump the results, the array looks like this:

array(2) {
	
	[0]=> 
		array(2) {
			[0]=> string(48) " https://wp-mix.com/wp-mix-celebrates-11-years/ "
			[1]=> string(47) " https://wp-mix.com/wp-mix-celebrates-11-years/"
		}
	
	[1]=>
		array(2) {
			[0]=> string(46) "https://wp-mix.com/wp-mix-celebrates-11-years/"
			[1]=> string(46) "https://wp-mix.com/wp-mix-celebrates-11-years/"
		}
	
}

The first array contains both URLs. These are the result of the complete regex match. The second array also contains both URLs. These are the result of the more specific inner regex match, where only the URL (and no whitespace) is matched. So in your app, calling from the second array, like $matches[1][0] (the first instance of the URL), and $matches[1][1] (the second instance of the URL), is going to give you the URLs without any whitespace. Of course, your results may vary depending on the URL and string contents.

★ Pro Tip:

USP ProSAC Pro