Hello, all.

I'm working on a simple news-type site, and I'm including keywords for each piece of content.  I'm trying to implement a feature that we are all familiar with, related content.  This is the first time that I've tried this so I didn't really know how to approach it.  I think that I've come up with a pretty good algorithm to compare two csv strings containing keywords and calculate a relevancy factor.  I'm sure that someone in the community has done this before, so I figured I'd let you guys tear apart my code.  This was just a testing snippet, hence comparing every string to every other.   Code:


<?php

$keyword_strings = array(
"coffee, java, cream, caffeine",
"coffee",
"coffee, java, cream, caffeine, other beverages, I don't know, even more stuff",
"guns, other stuff, sights, coffee",
"guns, bullets, sights",
"other, more stuff, idk"
);


foreach ($keyword_strings as $keywords_string1) {
foreach($keyword_strings as $keywords_string2) {
$keywords1 = explode(', ', $keywords_string1);
$keywords2 = explode(', ', $keywords_string2);

$total_matches = 0;
$percent;

foreach ($keywords1 as $keyword1) {
foreach ($keywords2 as $keyword2) {
similar_text($keyword1, $keyword2, $percent); 
if ($percent > 90)
$total_matches++;
}
}

$percent = 100 * $total_matches / ((sizeof($keywords1) + sizeof($keywords2)) / 2);

print_r($keywords1);
echo " vs. ";
print_r($keywords2);
echo "<br>";
echo "Percent: ".$percent."%<br>";
similar_text($keywords_string1, $keywords_string2, $percent);
echo "Similar Text: ".$percent."%<br>";

echo "Total Matches: ".$total_matches."<br><br>";
}
}
?>


And my output is:

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 100%
Similar Text: 100%
Total Matches: 4

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => coffee ) 
Percent: 40%
Similar Text: 34.285714285714%
Total Matches: 1

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 72.727272727273%
Similar Text: 54.716981132075%
Total Matches: 4

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 25%
Similar Text: 19.354838709677%
Total Matches: 1

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 0%
Similar Text: 20%
Total Matches: 0

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 0%
Similar Text: 23.529411764706%
Total Matches: 0

Array ( [0] => coffee ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 40%
Similar Text: 34.285714285714%
Total Matches: 1

Array ( [0] => coffee ) vs. Array ( [0] => coffee ) 
Percent: 100%
Similar Text: 100%
Total Matches: 1

Array ( [0] => coffee ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 25%
Similar Text: 14.457831325301%
Total Matches: 1

Array ( [0] => coffee ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 40%
Similar Text: 30.769230769231%
Total Matches: 1

Array ( [0] => coffee ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 0%
Similar Text: 7.4074074074074%
Total Matches: 0

Array ( [0] => coffee ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 0%
Similar Text: 21.428571428571%
Total Matches: 0

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 72.727272727273%
Similar Text: 54.716981132075%
Total Matches: 4

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => coffee ) 
Percent: 25%
Similar Text: 14.457831325301%
Total Matches: 1

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 100%
Similar Text: 100%
Total Matches: 7

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 18.181818181818%
Similar Text: 25.454545454545%
Total Matches: 1

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 0%
Similar Text: 18.367346938776%
Total Matches: 0

Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 0%
Similar Text: 34.343434343434%
Total Matches: 0

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 25%
Similar Text: 19.354838709677%
Total Matches: 1

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => coffee ) 
Percent: 40%
Similar Text: 30.769230769231%
Total Matches: 1

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 18.181818181818%
Similar Text: 25.454545454545%
Total Matches: 1

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 100%
Similar Text: 100%
Total Matches: 4

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 57.142857142857%
Similar Text: 59.259259259259%
Total Matches: 2

Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 0%
Similar Text: 50.909090909091%
Total Matches: 0

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 0%
Similar Text: 20%
Total Matches: 0

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => coffee ) 
Percent: 0%
Similar Text: 7.4074074074074%
Total Matches: 0

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 0%
Similar Text: 18.367346938776%
Total Matches: 0

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 57.142857142857%
Similar Text: 55.555555555556%
Total Matches: 2

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 100%
Similar Text: 100%
Total Matches: 3

Array ( [0] => guns [1] => bullets [2] => sights ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 0%
Similar Text: 27.906976744186%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine ) 
Percent: 0%
Similar Text: 39.21568627451%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => coffee ) 
Percent: 0%
Similar Text: 21.428571428571%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => coffee [1] => java [2] => cream [3] => caffeine [4] => other beverages [5] => I don't know [6] => even more stuff ) 
Percent: 0%
Similar Text: 34.343434343434%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => guns [1] => other stuff [2] => sights [3] => coffee ) 
Percent: 0%
Similar Text: 50.909090909091%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => guns [1] => bullets [2] => sights ) 
Percent: 0%
Similar Text: 27.906976744186%
Total Matches: 0

Array ( [0] => other [1] => more stuff [2] => idk ) vs. Array ( [0] => other [1] => more stuff [2] => idk ) 
Percent: 100%
Similar Text: 100%
Total Matches: 3



I know that is a lot of information, just looking for some feedback and any wisdom.

Thanks!