DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Truncate Text Preserving HTML Tags With PHP

03.30.2009
| 23418 views |
  • submit to reddit
        <a href="http://jsfromhell.com">Truncate/limit text preserving the HTML tags (the code auto-closes the tags).</a>


Example

echo String::truncate('jo<i><b>n</b>as</i>', 3, '...'); //jo<...
echo String::truncate('jo<i><b>n</b>as</i>', 3, '...', true); //jo<i><b>n</b></i>...
echo String::truncate('jo<i><b>n</b>as</i>', 3, '...', true, false); //jo<i><b>n...


Code

//+ Jonas Raoni Soares Silva
//@ http://jsfromhell.com

class String{
	public static function truncate($s, $l, $e = '...', $isHTML = false){
		$i = 0;
		$tags = array();
		if($isHTML){
			preg_match_all('/<[^>]+>([^<]*)/', $s, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
			foreach($m as $o){
				if($o[0][1] - $i >= $l)
					break;
				$t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1);
				if($t[0] != '/')
					$tags[] = $t;
				elseif(end($tags) == substr($t, 1))
					array_pop($tags);
				$i += $o[1][1] - $o[0][1];
			}
		}
		return substr($s, 0, $l = min(strlen($s),  $l + $i)) . (count($tags = array_reverse($tags)) ? '</' . implode('></', $tags) . '>' : '') . (strlen($s) > $l ? $e : '');
	}
}
    

Comments

Snippets Manager replied on Mon, 2010/03/22 - 8:53am

Sorry for spamming, but this last version truncates on whole words instead of just on the number of characters, so I thought I'd share it as well. If some mod out there feels like deleting my two previous posts, please do ;) class String { public static function truncate($text, $length, $suffix = '…', $isHTML = true){ $i = 0; $tags = array(); if($isHTML){ preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); foreach($m as $o){ if($o[0][1] - $i >= $length) break; $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1); if($t[0] != '/') $tags[] = $t; elseif(end($tags) == substr($t, 1)) array_pop($tags); $i += $o[1][1] - $o[0][1]; } } $output = substr($text, 0, $length = min(strlen($text), $length + $i)) . (count($tags = array_reverse($tags)) ? '' : ''); // Get everything until last space $one = substr($output, 0, strrpos($output, " ")); // Get the rest $two = substr($output, strrpos($output, " "), (strlen($output) - strrpos($output, " "))); // Extract all tags from the last bit preg_match_all('/<(.*?)>/s', $two, $tags); // Add suffix if needed if (strlen($text) > $length) { $one .= $suffix; } // Re-attach tags $output = $one . implode($tags[0]); return $output; } } Also, forgot to mention, changed '...' to '…' character.

Snippets Manager replied on Sat, 2010/08/07 - 4:16pm

Great piece of code and it was instrumental in saving me much time. However, I quickly found a small bug: If the incoming html string contained an html comment, the class was trying to close it with a "". Rather than debug the code and regexs, I decided to simply look for and replace any instances of the unneeded closure attempt by adding one line just prior to returning the new value. //+ Jonas Raoni Soares Silva //@ http://jsfromhell.com class String { public static function truncate($text, $length, $suffix = '…', $isHTML = true){ $i = 0; $simpleTags=array('br'=>true,'hr'=>true,'input'=>true,'image'=>true,'link'=>true,'meta'=>true); $tags = array(); if($isHTML){ preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); foreach($m as $o){ if($o[0][1] - $i >= $length) break; $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1); // test if the tag is unpaired, then we mustn't save them if($t[0] != '/' && (!isset($simpleTags[$t]))) $tags[] = $t; elseif(end($tags) == substr($t, 1)) array_pop($tags); $i += $o[1][1] - $o[0][1]; } } // output without closing tags $output = substr($text, 0, $length = min(strlen($text), $length + $i)); // closing tags $output2 = (count($tags = array_reverse($tags)) ? '' : ''); // Find last space or HTML tag (solving problem with last space in HTML tag eg. ) $pos = (int)end(end(preg_split('/<.*>| /', $output, -1, PREG_SPLIT_OFFSET_CAPTURE))); // Append closing tags to output $output.=$output2; // Get everything until last space $one = substr($output, 0, $pos); // Get the rest $two = substr($output, $pos, (strlen($output) - $pos)); // Extract all tags from the last bit preg_match_all('/<(.*?)>/s', $two, $tags); // Add suffix if needed if (strlen($text) > $length) { $one .= $suffix; } // Re-attach tags $output = $one . implode($tags[0]); //added to remove unnecessary closure $output = str_replace('','',$output); return $output; } }

Snippets Manager replied on Mon, 2010/03/22 - 8:53am

Slightly improved code: class String { public static function truncate($text, $length, $suffix = '…', $isHTML = true){ $i = 0; $tags = array(); if($isHTML){ preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); foreach($m as $o){ if($o[0][1] - $i >= $length) break; $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1); if($t[0] != '/') $tags[] = $t; elseif(end($tags) == substr($t, 1)) array_pop($tags); $i += $o[1][1] - $o[0][1]; } } $output = substr($text, 0, $length = min(strlen($text), $length + $i)) . (count($tags = array_reverse($tags)) ? '' : ''); if (strlen($text) > $length) { $output = substr($output,-4,4)=='' ? $output=substr($output,0,(strlen($output)-4)).$suffix.'' : $output.=$suffix; } return $output; } } There's a few extra lines at the end to make sure the suffix falls within a P tag if that is the last tag being closed. Also I've expanded some of the variable names ;)

Snippets Manager replied on Mon, 2010/03/22 - 8:53am

Slightly improved code: class String { public static function truncate($text, $length, $suffix = '…', $isHTML = true){ $i = 0; $tags = array(); if($isHTML){ preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); foreach($m as $o){ if($o[0][1] - $i >= $length) break; $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1); if($t[0] != '/') $tags[] = $t; elseif(end($tags) == substr($t, 1)) array_pop($tags); $i += $o[1][1] - $o[0][1]; } } $output = substr($text, 0, $length = min(strlen($text), $length + $i)) . (count($tags = array_reverse($tags)) ? '' : ''); if (strlen($text) > $length) { $output = substr($output,-4,4)=='' ? $output=substr($output,0,(strlen($output)-4)).$suffix.'' : $output.=$suffix; } return $output; } } There's a few extra lines at the end to make sure the suffix falls within a P tag if that is the last tag being closed. Also I've expanded some of the variable names ;)

Snippets Manager replied on Mon, 2010/03/22 - 8:53am

Sorry for the late reply, my opinion is you rock! This is my last day at my current job and the issue with spaces in this function was the last thing on my todo list, thanks!!! :P I don't exactly understand what your asking about the word splitter and the multi-byte functions though, this function was already a bit above my level. If you still need help, please clarify the question and I'll try to help if I can!

Snippets Manager replied on Wed, 2010/04/21 - 6:41am

Hi, I try use your function and I found some problems with space chars in HTML tags and unpaired HTML tags eg.
, ,... So I modified your function and ask you for your opinion. I need use multi-byte functions for strings in UTF-8. I think as the word splitter it can be a HTML tag, for example in string "First Line
NextLine" class String { public static function truncate($text, $length, $suffix = '…', $isHTML = true){ $i = 0; $simpleTags=array('br'=>true,'hr'=>true,'input'=>true,'image'=>true,'link'=>true,'meta'=>true); $tags = array(); if($isHTML){ preg_match_all('/<[^>]+>([^<]*)/', $text, $m, PREG_OFFSET_CAPTURE | PREG_SET_ORDER); foreach($m as $o){ if($o[0][1] - $i >= $length) break; $t = substr(strtok($o[0][0], " \t\n\r\0\x0B>"), 1); // test if the tag is unpaired, then we mustn't save them if($t[0] != '/' && (!isset($simpleTags[$t]))) $tags[] = $t; elseif(end($tags) == substr($t, 1)) array_pop($tags); $i += $o[1][1] - $o[0][1]; } } // output without closing tags $output = substr($text, 0, $length = min(strlen($text), $length + $i)); // closing tags $output2 = (count($tags = array_reverse($tags)) ? '' : ''); // Find last space or HTML tag (solving problem with last space in HTML tag eg. ) $pos = (int)end(end(preg_split('/<.*>| /', $output, -1, PREG_SPLIT_OFFSET_CAPTURE))); // Append closing tags to output $output.=$output2; // Get everything until last space $one = substr($output, 0, $pos); // Get the rest $two = substr($output, $pos, (strlen($output) - $pos)); // Extract all tags from the last bit preg_match_all('/<(.*?)>/s', $two, $tags); // Add suffix if needed if (strlen($text) > $length) { $one .= $suffix; } // Re-attach tags $output = $one . implode($tags[0]); return $output; } }

Snippets Manager replied on Mon, 2012/05/07 - 3:09pm

I enjoy small variable names, for me it's boring to type long names :) You're free to modify it and change the variable names :p

Michael Pearson replied on Sat, 2013/02/16 - 1:35pm in response to:

I know I am necroing on this, but I had to comment. If you plan on being a paid developer, your preference for short variable names, such as single letters, needs to change. Your variables need to be descriptive of what they pertain to. It doesn't take long to type $username instead of $u, but it does take other developers a significant amount of time figuring out what $u is referencing 300 lines of code later.

You are making a classic mistake, found mostly in beginners. If you write code all day, every day, in 3 months you will not be able to read your own code, let alone have other developers work with your code.

Snippets Manager replied on Sun, 2009/04/12 - 9:56am

Have you considered cleaning up your variable names? All single letters? You do know that you can put in clumps of letters we call words for variable names, right? I was going to use it but I don't want to take the time to trace it and figure out what s,l,e mean.