06-Strings

String Data Type

• A string is a sequence (LIST) of characters

• A string literal uses quotes 'Hello' or "Hello"

• For strings, + means “concatenate”

• When a string contains numbers, it is still a string

• We can convert numbers in a string into a number using int()

• Python2 string is ASCII based with 256 characters

• Python3 string is UNICODE based containing non english characters

>>> str1 = "Hello"

>>> str2 = 'there'

>>> bob = str1 + str2

>>> print bob

Hellothere

>>> str3 = '123'

>>> str3 = str3 + 1

Traceback (most recent call

last): File "<stdin>", line

1, in <module>TypeError:

cannot concatenate 'str' and

'int' objects

>>> x = int(str3) + 1

>>> print x

124

>>>

String Operations

As we saw, strings are just lists of characters. So operations you can do on a list, you can do on a string. This means that you can concatenate two strings together using the plus operator. And multiplying strings will repeat a given string. You can also search for strings using the in operator.

firstname='Christopher'

lastname='Brooks'

print(firstname + ' ' + lastname)

print(firstname *3)

print('Chris' in firstname)

Christopher Brooks

ChristopherChristopherChristopher

True

The string type has an associated function called split.

This function breaks the string up into substrings based on a simple pattern. Here for instance, I'll just split my full name based on the presence of a space character. The result is a list of two elements. We can choose the first element with the indexing operator to be the first name, and the last element to be my last name.

firstname='Christopher Brooks'.split(' ')[0]

lastname='Christopher Brooks'.split(' ')[-1]

print(firstname)

print(lastname)

Christopher

Brooks

String Print Formatting

In addition to Unicode, Python uses a special language for formatting the output of strings. One of the challenges with dynamic typing is that it's bit unclear when you have to do type conversion yourself. If we wanted to print out a name and a number that we can't use concatenation without calling the str function to convert the number to a string first.

print('Chris' + str(2))

Chris2

This creates a lot of nasty looking code where every operator you're looking to concatenate is wrapped in this str function. The Python string formatting mini language allows you to write a string statement indicating placeholders for variables to be evaluated. You then pass these variables in either named or in order arguments, and Python handles the string manipulation for you.

Here's an example. Imagine we have purchase order details and a dictionary, which includes a number of items, a price, and a person's name.

We can write a sales statement string which includes these items using curly brackets.

We can then call the format method on that string and pass in the values that we want substituted as appropriate.

sales_record = {'price' : 3.34, 'num_items' : 4, 'person' : 'Chris'}

sales_statement = '{} bought {} items(s) at a price of {} each for a total of {}'

print(sales_statement.format(sales_record['person'], sales_record['num_items'], sales_record['price'], sales_record['num_items'] * sales_record['price']))

Chris bought 4 items(s) at a price of 3.34 each for a total of 13.36

Reading and Converting

• We prefer to read data in using strings and then parse and convert the data as we need

• This gives us more control over error situations and/or bad user input

• Raw input numbers must be converted from strings

>>> name = raw_input('Enter:')

Enter:Chuck

>>> print name

Chuck

>>> apple = raw_input('Enter:')

Enter:100

>>> x = apple – 10

Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: unsupported operand type(s) for -: 'str' and 'int'

>>> x = int(apple) – 10

>>> print x

90

Looking Inside Strings

• We can get at any single character in a string using an index specified in square brackets

• The index value must be an integer and starts at zero

• The index value can be an expression that is computed

b a n a n a

0 1 2 3 4 5

>>> fruit = 'banana'

>>> letter = fruit[1]

>>> print letter

a

>>> x = 3

>>> w = fruit[x - 1]

>>> print w

n

A Character Too Far

• You will get a python error if you attempt to index beyond the end of a string.

• So be careful when constructing index values and slices

>>> zot = 'abc'

>>> print zot[5]

Traceback (most recent call last): File "<stdin>", line 1, in <module>IndexError: string index out of range

>>>

Strings Have Length

• There is a built-in function len that gives us the length of a string

b a n a n a

0 1 2 3 4 5

>>> fruit = 'banana'

>>> print len(fruit)

6

A function is some stored code that we use. A function takes some input and produces an output.

'banana'

(a string) ---------------> 6

len function (a number)

Guido wrote this code

Looping Through Strings

• Using a while statement and an iteration variable, and the len function, we can construct a loop to look at each of the letters in a string individually

fruit = 'banana'

index = 0

while index < len(fruit):

letter = fruit[index]

print index, letter

index = index + 1

0 b

1 a

2 n

3 a

4 n

5 a

• A definite loop using a for statement is much more elegant

• The iteration variable is completely taken care of by the for loop

fruit = 'banana'

for letter in fruit:

print letter

index = 0

while index < len(fruit) :

letter = fruit[index]

print letter

index = index + 1

b

a

n

a

n

a

Looping and Counting

• This is a simple loop that loops through each letter in a string and counts the number of times the loop encounters the 'a' character

word = 'banana'

count = 0

for letter in word :

if letter == 'a' :

count = count + 1

print count

Looking deeper into 'in'

• The iteration variable “iterates” through the sequence (ordered set)

• The block (body) of code is executed once for each value in the sequence

• The iteration variable moves through all of the values in the sequence

for letter in 'banana' :

print letter

Iteration variable - letter

Six-character string - 'banana'

Slicing Strings

• We can also look at any continuous section of a string using a colon operator

• The second number is one beyond the end of the slice - “up to but not including”

• If the second number is beyond the end of the string, it stops at the end

M o n t y P y t h o n

0 1 2 3 4 5 6 7 8 9 10 11

>>> s = 'Monty Python'

>>> print s[0:4]

Mont

>>> print s[6:7]

P

>>> print s[6:20]

Python

• If we leave off the first number or the last number of the slice, it is assumed to be the beginning or end of the string respectively

>>> s = 'Monty Python'

>>> print s[:2]

Mo

>>> print s[8:]

thon

>>> print s[:]

Monty Python

String Concatenation

• When the + operator is applied to strings, it means “concatenation”

>>> a = 'Hello'

>>> b = a + 'There'

>>> print b

HelloThere

>>> c = a + ' ' + 'There'

>>> print c

Hello There

>>>

Using in as a logical Operator

• The in keyword can also be used to check to see if one string is “in” another string

• The in expression is a logical expression that returns True or False and can be used in an if statement

>>> fruit = 'banana'

>>> 'n' in fruit

True

>>> 'm' in fruit

False

>>> 'nan' in fruit

True

>>> if 'a' in fruit :

... print 'Found it!'

...

Found it!

>>>

String Comparison

if word == 'banana':

print 'All right, bananas.'

if word < 'banana':

print 'Your word,' + word + ', comes before banana.'

elif word > 'banana':

print 'Your word,' + word + ', comes after banana.'

else:

print 'All right, bananas.'

String Library

• Python has a number of string functions which are in the string library

• These functions are already built into every string - we invoke them by appending the function to the string variable

• These functions do not modify the original string, instead they return a new string that has been altered

>>> greet = 'Hello Bob'

>>> zap = greet.lower()

>>> print zap

hello bob

>>> print greet

Hello Bob

>>> print 'Hi There'.lower()

hi there

>>>

>>> stuff = 'Hello world'

>>> type(stuff)

<type 'str'>

• Check library methods by using dir for the variable of given type

>>> dir(stuff)

['capitalize', 'center', 'count', 'decode', 'encode',

'endswith', 'expandtabs', 'find', 'format', 'index',

'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',

'istitle', 'isupper', 'join', 'ljust', 'lower',

'lstrip', 'partition', 'replace', 'rfind', 'rindex',

'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',

'splitlines', 'startswith', 'strip', 'swapcase',

'title', 'translate', 'upper', 'zfill']

https://docs.python.org/2/library/stdtypes.html#string-methods

str.capitalize()

str.center(width[, fillchar])

str.endswith(suffix[, start[, end]])

str.find(sub[, start[, end]])

str.lstrip([chars])

str.replace(old, new[, count])

str.lower()

str.rstrip([chars])

str.strip([chars])

str.upper()

Searching a String

• We use the find() function to search for a substring within another string • find() finds the first occurrence of the substring

• If the substring is not found, find() returns -1

• Remember that string position starts at zero

b a n a n a

0 1 2 3 4 5

>>> fruit = 'banana'

>>> pos = fruit.find('na')

>>> print pos

2

>>> aa = fruit.find('z')

>>> print aa

-1

Making everything UPPER CASE

• You can make a copy of a string in lower case or upper case

• Often when we are searching for a string using find() - we first convert the string to lower case so we can search a string regardless of case

>>> greet = 'Hello Bob'

>>> nnn = greet.upper()

>>> print nnn

HELLO BOB

>>> www = greet.lower()

>>> print www

hello bob

>>>

Search and Replace

• The replace() function is like a “search and replace” operation in a word processor

• It replaces all occurrences of the search string with the replacement string

>>> greet = 'Hello Bob'

>>> nstr = greet.replace('Bob','Jane')

>>> print nstr

Hello Jane

>>> nstr = greet.replace('o','X')

>>> print nstr

HellX BXb

>>>

Stripping Whitespace

• Sometimes we want to take a string and remove whitespace at the beginning and/or end

• lstrip() and rstrip() remove whitespace at the left or right

• strip() removes both beginning and ending whitespace

>>> greet = ' Hello Bob '

>>> greet.lstrip()

'Hello Bob '

>>> greet.rstrip()

' Hello Bob'

>>> greet.strip()

'Hello Bob'

>>>

Prefixes

>>> line = 'Please have a nice day'

>>> line.startswith('Please')

True

>>> line.startswith('p')

False

Parsing and Extracting

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

21 31

>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'

>>> atpos = data.find('@')

>>> print atpos

21

>>> sppos = data.find(' ',atpos)

>>> print sppos

31

>>> host = data[atpos+1 : sppos]

>>> print host

uct.ac.za