Reading random lines from a file with PHP
While developing a testing framework I decided it would be nice to use a random sample of records from Alexa’s Top 1 million domains list. Here is the function I wrote to read a random number of lines from the file.
function random_lines($filename, $numlines, $unique=true) {
if (!file_exists($filename) || !is_readable($filename))
return null;
$filesize = filesize($filename);
$lines = array();
$n = 0;
$handle = @fopen($filename, 'r');
if ($handle) {
while ($n < $numlines) {
fseek($handle, rand(0, $filesize));
$started = false;
$gotline = false;
$line = "";
while (!$gotline) {
if (false === ($char = fgetc($handle))) {
$gotline = true;
} elseif ($char == "\n" || $char == "\r") {
if ($started)
$gotline = true;
else
$started = true;
} elseif ($started) {
$line .= $char;
}
}
if ($unique && array_search($line, $lines))
continue;
$n++;
array_push($lines, $line);
}
fclose($handle);
}
return $lines;
}
// Example usage
$lines = random_lines('top-1m.csv', 100);
echo json_encode($lines) . PHP_EOL;
The output produced is:
["804254,2z2z.info","298052,taronga.org.au","601192,bnsi.net","211144,best.sk","506296,bridge9.com","767784,zibashahr.com","294162,mrbookmarking.com","894095,youtube.com\/user\/Gaja2A","781514,hochschober.at","133134,global.gr"]