程式師世界 >> 編程語言 >> 網頁編程 >> PHP編程 >> 關於PHP編程 >> PHP“相關文章推薦”功能的簡易實現

PHP“相關文章推薦”功能的簡易實現

編輯：關於PHP編程

一般做內容網站，需要在每一篇文章出現與該文章相關的文章列表。多數人使用的方法大概是：建立一個關鍵詞列表，判斷每篇文章包含有那些關鍵詞，最後根據關鍵詞找出與某篇文章最相關的文章。對於內容比較復雜的網站，確定關鍵列表詞顯然比較麻煩。

後來我查閱了一些php函數，感覺similar_text（php4,php5）函數能夠十分方便的達到我的要求。這個思路是：從文章列表中取出所有的文章標題，將所有的文章標題都同當前標題對比，將對比結果生成一個數組，按照相似度的大小由大到標題，利用similar_text將這些文章標題同原文章標題做對比，按標題的相似程度重新排列標題，就得到了與原文章相似的文章列表。

這個思路用到的關鍵函數是：

int similar_text ( string $first, string $second [, float $percent] )

它返回的是兩個字根串的相同字節數。

按照這個思路，我們建立如下的函數，這個函數的功能是把$arr_title數組按照同$title相似的的順序重新排列數組。

<?php
$demo_title = "幫客之家";
$demo_arr_title = array("簡單易懂的現代魔法","簡單明了的現代魔法","簡明扼要的古代魔法","不簡單的現代魔法","很難懂的現代魔法");
$new_array = getSimilar($demo_title,$demo_arr_title);
//print_r($new_array);
echo "與[$demo_title]最相關的前三個文章是：<br/>";
for($j=0; $j<=2; $j++)
{ 
	echo ($j+1).":".$new_array[$j]."<br/>";
}

//$title當前標題，$arrayTitle為需要查找的數組
function getSimilar($title,$arr_title)
{
	$arr_len = count($arr_title);
	for($i=0; $i<=($arr_len-1); $i++)
	{
		//取得兩個字符串相似的字節數
		$arr_similar[$i] = similar_text($arr_title[$i],$title);
	}
	arsort($arr_similar);	//按照相似的字節數由高到低排序
	reset($arr_similar);	//將指針移到數組的第一單元
	$index = 0;
	foreach($arr_similar as $old_index=>$similar)
	{
		$new_title_array[$index] = $arr_title[$old_index];
		$index++;
	}
	return $new_title_array;
}
?>

程序運行結果：

與[幫客之家]最相關的前三個文章是：
1:簡單明了的現代魔法
2:簡單易懂的現代魔法
3:簡明扼要的古代魔法

有些需要注意的地方：

關於similar_text速度，有人做過這個一個測試，結果是：

The speed issues for similar_text seem to be only an issue for long sections of text (>20000 chars).

I found a huge performance improvement in my application by just testing if the string to be tested was less than 20000 chars before calling similar_text.

20000+ took 3-5 secs to process, anything else (10000 and below) took a fraction of a second. Fortunately for me, there was only a handful of instances with >20000 chars which I couldn't get a comparison % for.

如果要直接使用正文作對比速度可能會比較慢。