Strings ↩
Strings are immutable ordered collections of Unicode code points, and are used to handle textual data.
Python 2 had two separate object types to represent text:
str
andunicode
. In Python 3 these two types have been unified in a single unicodestr
object type.
Strings syntax
Strings are typically delimited by single or double quotes (also in other programming languages):
'this is a string'
"this is also a string"
The quote character used to delimit a string cannot be used in the string itself.
If a quote character is used as text, use the other one to delimit the string:
"don't worry be happy"
'national "airquoting" competition'
Escaping quote characters
It is also possible to escape the quote character, so it is used literally (as a quote character, not as a string delimiter). In Python strings, you can escape a character by adding a backslash before it:
>>> "this is a \"string\" too"
'this is a "string" too'
Special characters
The backslash character \
is also used to invoke special characters such as the newline character \n
and the tabulator \t
:
>>> print("hello\n\tworld")
hello
world
Multi-line strings
Python strings can also be delimited with triple single or double quotes to allow using tabs and newline characters literally:
>>> txt = '''hello
... python
... world'''
>>> print(txt)
hello
python
world
Operations with strings
Strings can perform some operations using mathematical operators, even though they are not numbers. This is called operator overloading.
Adding strings
Strings can be added to another string:
>>> 'a' + 'b'
ab
>>> a = "a string"
>>> b = "another string"
>>> a + " " + b
a string another string
Multiplying strings
Strings can be multiplied by integers…
>>> "spam " * 5
spam spam spam spam spam
…but not by floats (even if they look like an integer):
>>> "spam " * 5.0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'float'
String formatting
The string formatting syntax is a mini-language to create strings by combining fixed and variable parts.
f-Strings
Python 3 introduces a new string formatting notation – “formatted string literals”, or f-strings – which improves on the previous two notations available in Python 2. The ‘old’ ways of formatting strings are still supported.
F-strings are prefixed with an f
and contain curly braces with expressions that are replaced with their values at runtime.
>>> a = 'eggs'
>>> f'spam spam spam {a} spam'
'spam spam spam eggs spam'
>>> f'spam spam spam {a} spam {a} {a} spam'
'spam spam spam eggs spam eggs eggs spam'
>>> b = 'bacon'
>>> f'spam spam spam {a} spam {b} spam spam {a} spam'
'spam spam spam eggs spam bacon spam spam eggs spam'
Because f-strings are evaluated at runtime, you can put any Python expressions between the curly braces:
>>> f"A day has {24 * 60 * 60} seconds."
'A day has 86400 seconds.'
If you pass a dictionary value to an f-string, watch out for which quotation marks you use. If single quotes are used to delimit the f-string, use double quotes for the dictionary, and vice-versa.
>>> D = {'name' : 'Brian'}
>>> f"A man called {D['name']}..."
'A man called Brian...'
Using the same quote character for the f-string and the dictionary will not work:
>>> f"A man called {D["name"]}..."
File "<stdin>", line 1
f"A man called {D["name"]}..."
^
SyntaxError: invalid syntax
String.format
The ‘old’ string.format
syntax is still supported and useful in some cases. In this notation, the formatting string has one or more pairs of braces which work as empty ‘containers’, and a format
method which receives matching arguments.
If the braces are empty, the arguments will be passed to the string in the order they are given:
>>> '{}, {}, {}'.format('a', 'b', 'c')
'a, b, c'
If the order of the values needs to be different from the order of the arguments, you can use an index inside the braces to match the arguments by position:
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
Unpacking a sequence argument with *
also works:
>>> '{2}, {1}, {0}'.format(*'abc')
'c, b, a'
It’s also possible to use names to match placeholders to keyword arguments:
>>> 'position: ({x}, {y})'.format(x=120, y=-115)
'position: (120, -115)'
The presentation of the value at each replacement field can be controlled with a special “format specifications” notation.
For example, we can control how many digits are displayed after the period when formatting floats:
>>> from math import pi
>>> '{:.4f}'.format(pi)
'3.1416'
Here’s how we can pad 0
s to a string with fixed width:
>>> '{:07d}'.format(4)
'0000004'
String methods
String objects have several useful methods – let’s have a look at some of them.
string.lower() / string.upper()
A string can return an all-lowercase or all-uppercase copy of itself:
>>> 'Hello World'.upper()
HELLO WORLD
>>> 'Hello World'.lower()
hello world
string.startswith() / string.endswith()
These two methods let you to quickly check if a string starts or ends with a given sequence of characters:
>>> 'abracadabra'.startswith('abra')
True
>>> 'abracadabra'.endswith('abra')
True
string.find() / string.replace()
Strings have a find
method which returns the index of the first match in a string, or -1
if no match is found.
>>> 'abracadabra'.find('c')
4
>>> 'abracadabra'.find('x')
-1
There’s also a replace()
method which replaces all occurrences of a substring with another string:
>>> 'abracadabra'.replace('a', 'x')
'xbrxcxdxbrx'
string.split() / string.strip()
The string methods split
and strip
are often useful when working with data from text files.
split
allows us to break a string at a given character into a list of substrings.
>>> 'the quick brown fox jumps'.split()
['the', 'quick', 'brown', 'fox', 'jumps']
>>> '20,20,30,-30,360,240'.split(',')
['20', '20', '30', '-30', '360', '240']
strip
returns a new string without whitespace characters at the beginning and at the end of a string.
>>> txt = '\t\tthe quick brown fox jumps '
>>> txt.strip()
'the quick brown fox jumps'
To clear whitespace characters only at the beginning or only at the end of a string, use lstrip()
or rstrip()
instead:
>>> txt.lstrip()
'the quick brown fox jumps '
>>> txt.rstrip()
'\t\tthe quick brown fox jumps'