I needed a function which would extract the first hundred words out of a given input while retaining all markup such as line breaks, double spaces and the like. Most of the regexp based functions posted above were accurate in that they counted out a hundred words, but recombined the paragraph by imploding an array down to a string. This did away with any such hopes of line breaks, and thus I devised a crude but very accurate function which does all that I ask it to:
<?php
function Truncate($input, $numWords)
{
if(str_word_count($input,0)>$numWords)
{
$WordKey = str_word_count($input,1);
$PosKey = str_word_count($input,2);
reset($PosKey);
foreach($WordKey as $key => &$value)
{
$value=key($PosKey);
next($PosKey);
}
return substr($input,0,$WordKey[$numWords]);
}
else {return $input;}
}
?>
The idea behind it? Go through the keys of the arrays returned by str_word_count and associate the number of each word with its character position in the phrase. Then use substr to return everything up until the nth character. I have tested this function on rather large entries and it seems to be efficient enough that it does not bog down at all.
Cheers!
Josh