GBK转UTF-8的PHP函数,从phpcms v9中提取

作者:enenba | 发表于:2012-02-05 10:46 | 分类:php源码
上次提取了utf-8转GBK,以下内容是从phpcms v9中提取的函数源码,去除掉其它的转码代码,只留下GBK转UTF-8部分的源码,对采集更有用哦

共有两个文件,一个PHP函数源码文件,一个转编码库文件,

 

<?php
/**
 * gbk转utf8
 * @param $gbstr
 */
function gbk_to_utf8($gbstr) {
	global $CODETABLE;
	if(empty($CODETABLE)) {
		$filename = 'encoding/gb-unicode.table';
		$fp = fopen($filename, 'rb');
		while($l = fgets($fp,15)) { 
			$CODETABLE[hexdec(substr($l, 0, 6))] = substr($l, 7, 6); 
		}
		fclose($fp);
	}
	$ret = '';
	$utf8 = '';
	while($gbstr) {
		if(ord(substr($gbstr, 0, 1)) > 0x80) {
			$thisW = substr($gbstr, 0, 2);
			$gbstr = substr($gbstr, 2, strlen($gbstr));
			$utf8 = '';
			@$utf8 = unicode_to_utf8(hexdec($CODETABLE[hexdec(bin2hex($thisW)) - 0x8080]));
			if($utf8 != '') {
				for($i = 0; $i < strlen($utf8); $i += 3) $ret .= chr(substr($utf8, $i, 3));
			}
		} else {
			$ret .= substr($gbstr, 0, 1);
			$gbstr = substr($gbstr, 1, strlen($gbstr));
		}
	}
	return $ret;
}
/**
 * unicode转utf8
 * @param  $c
 */
function unicode_to_utf8($c) {
	$str = '';
	if($c < 0x80) {
		$str .= $c;
	} elseif($c < 0x800) {
		$str .= (0xC0 | $c >> 6);
		$str .= (0x80 | $c & 0x3F);
	} elseif($c < 0x10000) {
		$str .= (0xE0 | $c >> 12);
		$str .= (0x80 | $c >> 6 & 0x3F);
		$str .= (0x80 | $c & 0x3F);
	} elseif($c < 0x200000) {
		$str .= (0xF0 | $c >> 18);
		$str .= (0x80 | $c >> 12 & 0x3F);
		$str .= (0x80 | $c >> 6 & 0x3F);
		$str .= (0x80 | $c & 0x3F);
	}
	return $str;
}

//以下是测试内容  
//test.txt 为GBK编码文档  
header('Content-type: text/html; charset=utf-8');  
$str = file_get_contents('test.txt');  
echo gbk_to_utf8($str);   

?>

 

又是另一个转码http://enenba.com/?post=198

 

 

end

附件下载/演示源码:
gbk_to_utf8.rar35.38KB

上一篇: java正则<(.)+?>表示什么意思   |   下一篇:php 中header() 编码的作用» 标签: php源码 php函数 编码转换

评论: