Remove duplicates lines from multiple files in a folder

Just follow the steps and you can remove all the dupicates lines from the list

Paste the below file into a file say duplicates.pl and then type perl duplicates.pl , Note you would have to install active or strawberry perl for the below script to run.

use strict;

use warnings;

use File::Find;

use File::Basename;

use Cwd qw(getcwd);

our %seen;

sub clean_duplicate{

my ($filepath) = @_;

my @tmp = fileparse($filepath);

my $filename = $tmp[0];

my $only_path = $tmp[1];

my $tmp_name = $only_path.”/”.”tmp_”.$filename;

rename($filepath,$tmp_name);

open(FR,”<“,$tmp_name);

open(FW,”>”,$filepath);

foreach(<FR>){

chomp;

if (!exists($::seen{$_})){

print FW $_.”\n”;

$::seen{$_}=1;

}

close FR;

close FW;

unlink($tmp_name);

}

sub process_file {

next if (($_ eq ‘.’) || ($_ eq ‘..’));

if (-d && $_ eq ‘fp’){

$File::Find::prune = 1;

return;

}

if (-f) {

my $path = $File::Find::name;

my @stat = stat $path;

if ($path =~ /\.txt\z/){

clean_duplicate($path);

}

print “Working Directory “.getcwd.”\n”;

find(\&process_file,getcwd); # you can also provide path here in place of getcwd like so ‘c://demo/desktop/test\\’

— Keep the file duplicate.pl in the directory where all such files exist or you can follow the above instruction to add a path to the file.

The above script basically goes inside all txt file in the specified folder or the current folder if you haven’t changed anything , Then it keeps a note of all the lines that sees in the memory and when it find duplicates it does write to the file and moves on.

The cons of this code is you require a good amount of ram i have tested this script on 10GB file with 8GB ram it takes a while but get the job done.