A question - how do I process all the unique lines from a file in Python? Asked by a delegate today, solved neatly and easily using a generator which means that there's no need to store all the data - unique values can be passed back and processed onwards as they're found. This is fantastic news if the input isn't really a file, but is some other reporting data source that's slower and you would like to get answers even as the data's still flowing in.
def unique(source):
sofar = {}
for val in open(source):
if not sofar.get(val):
yield val.strip()
sofar[val] = 1
for lyne in unique("info.txt"):
print lyne
[complete source]. Neat, isn't it? I love Python! And to test that love, I thought I would answer the same question in Perl:
sub unique {
open FH,$_[0];
my %sofar;
my @uvals;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
push @uvals,$line;
}
}
return @uvals;
}
foreach $lyne (unique("info.txt")) {
print $lyne;
}
[complete source]. A little longer, and as Perl doesn't have a generator as such, I was tempted to write the code to only return the unique list once the whole incoming data flow had been received. But a little more thought let me produce a generator-line alternative:
sub unique {
$static or open FH,$_[0];
$static = 1;
while (my $line = ) {
if (! $sofar{$line}) {
$sofar{$line} = 1;
return $line;
}
}
return "";
}
while ($lyne = unique("info.txt")) {
print $lyne;
}
[complete source]. Actually rather neat, but relying on the use of a global variable to note the state of the "generator" routine, and a need to take care to flag the end of the data. Careful code examination will show you that the
return "";
is actually redundant, as Perl returns the result of the last expression evaluated, which is
false
when the loop exits. However, start applying tricks like this and you're getting into code that's going to be hard to maintain.
Truth be know - I love Perl too. See our
Perl Courses and
Python Courses. Happy to teach you either - to help you use their strengths and write good maintainable code in either.
(written 2012-03-20, updated 2012-03-24)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y107 - Python - Dictionaries [103] Can't resist writing about Python - (2004-10-29)
[955] Python collections - mutable and imutable - (2006-11-29)
[1144] Python dictionary for quick look ups - (2007-04-12)
[1145] Using a list of keys and a list of values to make a dictionary in Python - zip - (2007-04-13)
[2368] Python - fresh examples of all the fundamentals - (2009-08-20)
[2915] Looking up a value by key - associative arrays / Hashes / Dictionaries - (2010-08-11)
[2986] Python dictionaries - reaching to new uses - (2010-10-05)
[2994] Python - some common questions answered in code examples - (2010-10-10)
[3464] Passing optional and named parameters to python methods - (2011-10-04)
[3488] Python sets and frozensets - what are they? - (2011-10-20)
[3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[3555] Football league tables - under old and new point system. Python program. - (2011-12-18)
[3934] Multiple identical keys in a Python dict - yes, you can! - (2012-11-24)
[4027] Collections in Python - list tuple dict and string. - (2013-03-04)
[4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
[4409] Setting up and using a dict in Python - simple first example - (2015-01-30)
[4469] Sorting in Python 3 - and how it differs from Python 2 sorting - (2015-04-20)
[4661] Unique word locator - Python dict example - (2016-03-06)
[4668] Sorting a dict in Python - (2016-04-01)
Y105 - Python - Functions, Modules and Packages [96] Variable Scope - (2004-10-22)
[105] Distance Learning - (2004-10-31)
[294] Python generator functions, lambdas, and iterators - (2005-04-28)
[303] Lambdas in Python - (2005-05-06)
[308] Call by name v call by value - (2005-05-11)
[340] Code and code maintainance efficiency - (2005-06-08)
[386] What is a callback? - (2005-07-22)
[418] Difference between import and from in Python - (2005-08-18)
[561] Python's Generator functions - (2006-01-11)
[668] Python - block insets help with documentation - (2006-04-04)
[745] Python modules. The distribution, The Cheese Shop and the Vaults of Parnassus. - (2006-06-05)
[749] Cottage industry or production line data handling methods - (2006-06-07)
[775] Do not duplicate your code - (2006-06-23)
[821] Dynamic functions and names - Python - (2006-08-03)
[900] Python - function v method - (2006-10-20)
[912] Recursion in Python - (2006-11-02)
[913] Python - A list of methods - (2006-11-03)
[949] Sludge off the mountain, and Python and PHP - (2006-11-27)
[959] It's the 1st, not the 1nd 1rd or 1th. - (2006-12-01)
[1134] Function / method parameters with * and ** in Python - (2007-04-04)
[1163] A better alternative to cutting and pasting code - (2007-04-26)
[1202] Returning multiple values from a function (Perl, PHP, Python) - (2007-05-24)
[1464] Python Script - easy examples of lots of basics - (2007-12-08)
[1784] Global - Tcl, PHP, Python - (2008-09-03)
[1790] Sharing variables with functions, but keeping them local too - Python - (2008-09-09)
[1869] Anonymous functions (lambdas) and map in Python - (2008-11-04)
[1870] What to do with a huge crop of apples - (2008-11-04)
[1871] Optional and named parameters in Python - (2008-11-05)
[1879] Dynamic code - Python - (2008-11-11)
[2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
[2439] Multiple returns from a function in Python - (2009-10-06)
[2440] Optional parameters to Python functions - (2009-10-07)
[2481] Sample code with errors in it on our web site - (2009-10-29)
[2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)
[2520] Global and Enable - two misused words! - (2009-11-30)
[2718] Python - access to variables in the outer scope - (2010-04-12)
[2766] Optional and named parameters to Python functions/methods - (2010-05-15)
[2878] Program for reliability and efficiency - do not duplicate, but rather share and re-use - (2010-07-19)
[2929] Passing a variable number of parameters in to a function / method - (2010-08-20)
[2998] Using an exception to initialise a static variable in a Python function / method - (2010-10-13)
[3159] Returning multiple values from a function call in various languages - a comparison - (2011-02-06)
[3280] Passing parameters to Python functions - the options you have - (2011-05-07)
[3459] Catching the fishes first? - (2011-09-27)
[3472] Static variables in functions - and better ways using objects - (2011-10-10)
[3474] Python Packages - groupings of modules. An introduction - (2011-10-11)
[3695] Functions are first class variables in Lua and Python - (2012-04-13)
[3766] Python timing - when to use a list, and when to use a generator - (2012-06-16)
[3852] Static variables in Python? - (2012-08-29)
[3885] Default local - a good choice by the author of Python - (2012-10-08)
[3931] Optional positional and named parameters in Python - (2012-11-23)
[3945] vargs in Python - how to call a method with unknown number of parameters - (2012-12-06)
[4161] Python varables - checking existance, and call by name or by value? - (2013-08-27)
[4212] Python functions - an introduction to how they work - (2013-11-16)
[4361] Multiple yields and no loops in a Python generator? - (2014-12-22)
[4407] Python - even named code blocks are objects - (2015-01-28)
[4410] A good example of recursion - a real use in Python - (2015-02-01)
[4441] Reading command line parameters in Python - (2015-02-23)
[4448] What is the difference between a function and a method? - (2015-03-04)
[4645] What are callbacks? Why use them? An example in Python - (2016-02-11)
[4662] Recursion in Python - the classic example - (2016-03-07)
[4719] Nesting decorators - (2016-11-02)
[4722] Embedding more complex code into a named block - (2016-11-04)
[4724] From and Import in Python - where is the module loaded from? - (2016-11-06)
Q110 - Object Orientation and General technical topics - Programming Algorithms [202] Searching for numbers - (2005-02-04)
[227] Bellringing and Programming and Objects and Perl - (2005-02-25)
[642] How similar are two words - (2006-03-11)
[1157] Speed Networking - a great evening and how we arranged it - (2007-04-21)
[1187] Updating a page strictly every minute (PHP, Perl) - (2007-05-14)
[1391] Ordnance Survey Grid Reference to Latitude / Longitude - (2007-10-14)
[1840] Validating Credit Card Numbers - (2008-10-14)
[1949] Nuclear Physics comes to our web site - (2008-12-17)
[2189] Matching disparate referencing systems (MediaWiki, PHP, also Tcl) - (2009-05-19)
[2259] Grouping rows for a summary report - MySQL and PHP - (2009-06-27)
[2509] A life lesson from the accuracy of numbers in Excel and Lua - (2009-11-21)
[2586] And and Or illustrated by locks - (2010-01-17)
[2617] Comparing floating point numbers - a word of caution and a solution - (2010-02-01)
[2894] Sorting people by their names - (2010-07-29)
[2951] Lots of way of converting 3 letter month abbreviations to numbers - (2010-09-10)
[2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
[3042] Least Common Ancestor - what is it, and a Least Common Ancestor algorithm implemented in Perl - (2010-11-11)
[3072] Finding elements common to many lists / arrays - (2010-11-26)
[3093] How many toilet rolls - hotel inventory and useage - (2010-12-18)
[3102] AND and OR operators - what is the difference between logical and bitwise varieties? - (2010-12-24)
[3451] Why would you want to use a Perl hash? - (2011-09-20)
[3620] Finding the total, average, minimum and maximum in a program - (2012-02-22)
[4325] Learning to program - what are algorithms and design patterns? - (2014-11-22)
[4401] Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19)
[4402] Finding sum, minimum, maximum and average in Python (and Ruby) - (2015-01-19)
[4652] Testing new algorithms in PHP - (2016-02-20)
[4656] Identifying the first and last records in a sequence - (2016-02-26)
[4707] Some gems from an introduction to Python - (2016-10-29)
P211 - Perl - Hashes [240] Conventional restraints removed - (2005-03-09)
[738] (Perl) Callbacks - what are they? - (2006-05-30)
[930] -> , >= and => in Perl - (2006-11-18)
[968] Perl - a list or a hash? - (2006-12-06)
[1334] Stable sorting - Tcl, Perl and others - (2007-09-06)
[1705] Environment variables in Perl / use Env - (2008-07-11)
[1826] Perl - Subs, Chop v Chomp, => v , - (2008-10-08)
[1856] A few of my favourite things - (2008-10-26)
[1917] Out of memory during array extend - Perl - (2008-12-02)
[2833] Fresh Perl Teaching Examples - part 2 of 3 - (2010-06-27)
[2836] Perl - the duplicate key problem explained, and solutions offered - (2010-06-28)
[2920] Sorting - naturally, or into a different order - (2010-08-14)
[3106] Buckets - (2010-12-26)
[3400] $ is atomic and % and @ are molecular - Perl - (2011-08-20)
Some other Articles
Makefile variables - defined internally, from the command line and from the environmentWill will smile?Error checking in a Python program - making your program robust via exceptionsChanging shops and organisations - Melksham, the last and next five yearsFinding all the unique lines in a file, using Python or PerlKeeping forum and blog comments cleanA Pivotal Incident - learning how to welcome your guestsWelcome to Melksham - our new communitiesUsing Make for a distributionBasham Festival, Melksham, early August 2012 - a welcome