09-Dictionaries

What is a Collection?

• A collection is nice because we can put more than one value in it and carry them all around in one convenient package

• We have a bunch of values in a single “variable”

• We do this by having more than one place “in” the variable

• We have ways of finding the different places in the variable

A Story of Two Collections..

• List

> A linear collection of values that stay in order

• Dictionary

> A “bag” of values, each with its own label

Dictionaries

• Dictionaries are Python’s most powerful data collection

• Dictionaries allow us to do fast database-like operations in Python

• Dictionaries have different names in different languages

> Associative Arrays - Perl / PHP

> Properties or Map or HashMap - Java

> Property Bag - C# / .Net

• Lists index their entries based on the position in the list

• Dictionaries are like bags - no order

• So we index the things we put in the dictionary with a “lookup tag”

>>> purse = dict()

>>> purse['money'] = 12

>>> purse['candy'] = 3

>>> purse['tissues'] = 75

>>> print purse

{'money': 12, 'tissues': 75, 'candy': 3}

>>> print purse['candy']

3

>>> purse['candy'] = purse['candy'] + 2

>>> print purse

{'money': 12, 'tissues': 75, 'candy': 5}

Comparing Lists and Dictionaries

• Dictionaries are like lists except that they use keys instead of

numbers to look up values

>>> lst = list()

>>> lst.append(21)

>>> lst.append(183)

>>> print lst

[21, 183]

>>> lst[0] = 23

>>> print lst

[23, 183]

>>> ddd = dict()

>>> ddd['age'] = 21

>>> ddd['course'] = 182

>>> print ddd

{'course': 182, 'age': 21}

>>> ddd['age'] = 23

>>> print ddd

{'course': 182, 'age': 23}

Dictionary Literals (Constants)

• Dictionary literals use curly braces and have a list of key : value pairs

• You can make an empty dictionary using empty curly braces

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}

>>> print jjj

{'jan': 100, 'chuck': 1, 'fred': 42}

>>> ooo = { }

>>> print ooo

{}

>>>

Many Counters with a Dictionary

• One common use of dictionary is counting how often we “see” something

>>> ccc = dict()

>>> ccc['csev'] = 1

>>> ccc['cwen'] = 1

>>> print ccc

{'csev': 1, 'cwen': 1}

>>> ccc['cwen'] = ccc['cwen'] + 1

>>> print ccc

{'csev': 1, 'cwen': 2}

Dictionary Tracebacks

• It is an error to reference a key which is not in the dictionary

• We can use the in operator to see if a key is in the dictionary

>>> ccc = dict()

>>> print ccc['csev']

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

KeyError: 'csev'

>>> print 'csev' in ccc

False

The get method for dictionaries

• This pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common, that there is a method called get() that does this for us

Default value if key does not exist

if name in counts:

x = counts[name]

else :

x = 0

x = counts.get(name, 0)

(and no Traceback). {'csev': 2, 'zqian': 1, 'cwen': 2}

Counting Pattern

counts = dict()

print 'Enter a line of text:'

line = raw_input('')

words = line.split()

print 'Words:', words

print 'Counting...'

for word in words:

counts[word] = counts.get(word,0) + 1

print 'Counts', counts

The general pattern to count the words in a line of text is to split the line into words, then loop through the words and use a dictionary to track the count of each word independently

Definite Loops and Dictionaries

• Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values

>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}

>>> for key in counts:

... print key, counts[key]

...

jan 100

chuck 1

fred 42

>>>

Retrieving lists of Keys and Values

• You can get a list of keys, values, or items (both) from a dictionary

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}

>>> print list(jjj)

['jan', 'chuck', 'fred']

>>> print jjj.keys()

['jan', 'chuck', 'fred']

>>> print jjj.values()

[100, 1, 42]

>>> print jjj.items()

[('jan', 100), ('chuck', 1), ('fred', 42)]

>>>

Bonus: Two Iteration Variables!

• We loop through the key-value pairs in a dictionary using *two* iteration variables

• Each iteration, the first variable is the key and the second variable is the corresponding value

for the key

>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}

>>> for aaa,bbb in jjj.items() :

... print aaa, bbb

...

jan 100

chuck 1

fred 42

>>>

items() Vs iteritems()

dict.items(): Return a copy of the dictionary’s list of (key, value) pairs.

dict.iteritems(): Return an iterator over the dictionary’s (key, value) pairs.

If I run the code below, each seems to return a reference to the same object. Are there any subtle differences that I am missing?

#!/usr/bin/python

d={1:'one',2:'two',3:'three'}

print 'd.items():'

for k,v in d.items():

if d[k] is v: print '\tthey are the same object'

else: print '\tthey are different'

print 'd.iteritems():'

for k,v in d.iteritems():

if d[k] is v: print '\tthey are the same object'

else: print '\tthey are different'

Output:

d.items():

they are the same object

they are the same object

they are the same object

d.iteritems():

they are the same object

they are the same object

they are the same object

It's part of an evolution.

Originally, Python items() built a real list of tuples and returned that. That could potentially take a lot of extra memory.

Then, generators were introduced to the language in general, and that method was reimplemented as an iterator-generator method named iteritems(). The original remains for backwards compatibility.

One of Python 3’s changes is that items() now return iterators, and a list is never fully built. The iteritems() method is also gone, since items() in Python 3 works like viewitems() in Python 2.7.

Best Practice

movies = list()

movie1 = dict()

movie1['Title'] = 'Avatar'

movie1['Rating'] = 'PG-13'

movies.append(movie1)

movie2 = dict()

movie2['Title'] = 'Matrix'

movie2['Ratng'] = 'PG-13'

movies.append(movie2)

Suppose the convention is to have keys Title, Rating but Rating is mis-spelled to Ratng

Now what is better way for lookup validation.

We can loop through the keys that are expected to be there.

keys = ['Title', 'Rating']

for item in movies:

for key in keys:

print(key + ' : ' + item[key]

The mis-spelling would be taken care in such a case.