«
PHP截取中文字符串的问题

时间:2008-5-31    作者:Deri    分类: 分享


   <p>  以下代码试用于GB2312编码,截取中文字符串是PHP中一个头疼的问题,解决方法是根据值是否大于等于128来判断是否是双字节字符,以避免出现乱码的情况。但中英文混合、特殊符号等问题总是存在,现在写一个比较全面的,仅供参考:</p><p>  程序说明:</p><p>  1. len 参数以中文字符为标准,1len等于2个英文字符,为了形式上好看些</p><p>  2. 如果将magic参数设为false,则中文和英文同等看待,取绝对的字符数</p><p>  3. 特别适用于用htmlspecialchars()进行过编码的字符串</p><p>  4. 能正确处理GB2312中实体字符模式(&#93232;)</p><p>  程序代码:</p><code>function FSubstr($title,$start,$len="",$magic=true)<br />{<br />/**<br /> * powered by Smartpig<br /> * mailto:d.einstein@263.net<br /> */<br /> <br />$length = 0;<br />if($len == "") $len = strlen($title);<br />//判断起始为不正确位置<br />if($start > 0)<br />{<br /> $cnum = 0;<br /> for($i=0;$i<$start;$i++)<br /> {<br />  if(ord(substr($title,$i,1)) >= 128) $cnum ++;<br /> }<br /> if($cnum%2 != 0) $start--;<br /> <br /> unset($cnum);<br />}<br />if(strlen($title)<=$len) return substr($title,$start,$len);<br />$alen  = 0;<br />$blen = 0;<br />$realnum = 0;<br />for($i=$start;$i<strlen($title);$i++)<br />{<br /> $ctype = 0;<br /> $cstep = 0;<br /> $cur = substr($title,$i,1);<br /> if($cur == "&")<br /> {<br />  if(substr($title,$i,4) == "&lt;")<br />  {<br />  $cstep = 4;<br />  $length += 4;<br />  $i += 3;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen ++;<br />  }<br />  }<br />  else if(substr($title,$i,4) == "&gt;")<br />  {<br />  $cstep = 4;<br />  $length += 4;<br />  $i += 3;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen ++;<br />  }<br />  }<br />  else if(substr($title,$i,5) == "&amp;")<br />  {<br />  $cstep = 5;<br />  $length += 5;<br />  $i += 4;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen ++;<br />  }<br />  }<br />  else if(substr($title,$i,6) == "&quot;")<br />  {<br />  $cstep = 6;<br />  $length += 6;<br />  $i += 5;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen ++;<br />  }<br />  }<br />  else if(substr($title,$i,6) == "&#039;")<br />  {<br />  $cstep = 6;<br />  $length += 6;<br />  $i += 5;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen ++;<br />  }<br />  }<br />  else if(preg_match("/&#(d+);/i",substr($title,$i,8),$match))<br />  {<br />  $cstep = strlen($match[0]);<br />  $length += strlen($match[0]);<br />  $i += strlen($match[0])-1;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $blen ++;<br />   $ctype = 1;<br />  }<br />  }<br /> }else{<br />  if(ord($cur)>=128)<br />  {<br />  $cstep = 2;<br />  $length += 2;<br />  $i += 1;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $blen ++;<br />   $ctype = 1;<br />  }<br />  }else{<br />  $cstep = 1;<br />  $length +=1;<br />  $realnum ++;<br />  if($magic)<br />  {<br />   $alen++;<br />  }<br />  }<br /> }<br /> <br /> if($magic)<br /> {<br />  if(($blen*2+$alen) == ($len*2)) break;<br />  if(($blen*2+$alen) == ($len*2+1))<br />  {<br />  if($ctype == 1)<br />  {<br />   $length -= $cstep;<br />   break;<br />  }else{<br />   break;<br />  }<br />  }<br /> }else{<br />  if($realnum == $len) break;<br /> }<br />}<br />unset($cur);<br />unset($alen);<br />unset($blen);<br />unset($realnum);<br />unset($ctype);<br />unset($cstep);<br />return substr($title,$start,$length);<br />}</code></p>