Strings are immutable ordered collections of Unicode code points, and are used to handle textual data.

Python 2 had two separate object types to represent text: str and unicode. In Python 3 these two types have been unified in a single unicode str object type.

Strings syntax

Strings are typically delimited by single or double quotes (also in other programming languages):

'this is a string'
"this is also a string"

The quote character used to delimit a string cannot be used in the string itself.

If a quote character is used as text, use the other one to delimit the string:

"don't worry be happy"
'national "airquoting" competition'

Escaping quote characters

It is also possible to escape the quote character, so it is used literally (as a quote character, not as a string delimiter). In Python strings, you can escape a character by adding a backslash before it:

>>> "this is a \"string\" too"
'this is a "string" too'

Special characters

The backslash character \ is also used to invoke special characters such as the newline character \n and the tabulator \t:

>>> print("hello\n\tworld")
hello
    world

Multi-line strings

Python strings can also be delimited with triple single or double quotes to allow using tabs and newline characters literally:

>>> txt = '''hello
...     python
... world'''
>>> print(txt)
hello
    python
world

Operations with strings

Strings can perform some operations using mathematical operators, even though they are not numbers. This is called operator overloading.

Adding strings

Strings can be added to another string:

>>> 'a' + 'b'
ab
>>> a = "a string"
>>> b = "another string"
>>> a + " " + b
a string another string

Multiplying strings

Strings can be multiplied by integers…

>>> "spam " * 5
spam spam spam spam spam

…but not by floats (even if they look like an integer):

>>> "spam " * 5.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'float'

String formatting

The string formatting syntax is a mini-language to create strings by combining fixed and variable parts.

f-Strings

Python 3 introduces a new string formatting notation – “formatted string literals”, or f-strings – which improves on the previous two notations available in Python 2. The ‘old’ ways of formatting strings are still supported.

F-strings are prefixed with an f and contain curly braces with expressions that are replaced with their values at runtime.

>>> a = 'eggs'
>>> f'spam spam spam {a} spam'
'spam spam spam eggs spam'
>>> f'spam spam spam {a} spam {a} {a} spam'
'spam spam spam eggs spam eggs eggs spam'
>>> b = 'bacon'
>>> f'spam spam spam {a} spam {b} spam spam {a} spam'
'spam spam spam eggs spam bacon spam spam eggs spam'

Because f-strings are evaluated at runtime, you can put any Python expressions between the curly braces:

>>> f"A day has {24 * 60 * 60} seconds."
'A day has 86400 seconds.'

If you pass a dictionary value to an f-string, watch out for which quotation marks you use. If single quotes are used to delimit the f-string, use double quotes for the dictionary, and vice-versa.

>>> D = {'name' : 'Brian'}
>>> f"A man called {D['name']}..."
'A man called Brian...'

Using the same quote character for the f-string and the dictionary will not work:

>>> f"A man called {D["name"]}..."
  File "<stdin>", line 1
    f"A man called {D["name"]}..."
                          ^
SyntaxError: invalid syntax

String.format

The ‘old’ string.format syntax is still supported and useful in some cases. In this notation, the formatting string has one or more pairs of braces which work as empty ‘containers’, and a format method which receives matching arguments.

If the braces are empty, the arguments will be passed to the string in the order they are given:

>>> '{}, {}, {}'.format('a', 'b', 'c')
'a, b, c'

If the order of the values needs to be different from the order of the arguments, you can use an index inside the braces to match the arguments by position:

>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'

Unpacking a sequence argument with * also works:

>>> '{2}, {1}, {0}'.format(*'abc')
'c, b, a'

It’s also possible to use names to match placeholders to keyword arguments:

>>> 'position: ({x}, {y})'.format(x=120, y=-115)
'position: (120, -115)'

The presentation of the value at each replacement field can be controlled with a special “format specifications” notation.

For example, we can control how many digits are displayed after the period when formatting floats:

>>> from math import pi
>>> '{:.4f}'.format(pi)
'3.1416'

Here’s how we can pad 0s to a string with fixed width:

>>> '{:07d}'.format(4)
'0000004'

String methods

String objects have several useful methods – let’s have a look at some of them.

string.lower() / string.upper()

A string can return an all-lowercase or all-uppercase copy of itself:

>>> 'Hello World'.upper()
HELLO WORLD
>>> 'Hello World'.lower()
hello world

string.startswith() / string.endswith()

These two methods let you to quickly check if a string starts or ends with a given sequence of characters:

>>> 'abracadabra'.startswith('abra')
True
>>> 'abracadabra'.endswith('abra')
True

string.find() / string.replace()

Strings have a find method which returns the index of the first match in a string, or -1 if no match is found.

>>> 'abracadabra'.find('c')
4
>>> 'abracadabra'.find('x')
-1

There’s also a replace() method which replaces all occurrences of a substring with another string:

>>> 'abracadabra'.replace('a', 'x')
'xbrxcxdxbrx'

string.split() / string.strip()

The string methods split and strip are often useful when working with data from text files.

split allows us to break a string at a given character into a list of substrings.

>>> 'the quick brown fox jumps'.split()
['the', 'quick', 'brown', 'fox', 'jumps']
>>> '20,20,30,-30,360,240'.split(',')
['20', '20', '30', '-30', '360', '240']

strip returns a new string without whitespace characters at the beginning and at the end of a string.

>>> txt = '\t\tthe quick brown fox jumps    '
>>> txt.strip()
'the quick brown fox jumps'

To clear whitespace characters only at the beginning or only at the end of a string, use lstrip() or rstrip() instead:

>>> txt.lstrip()
'the quick brown fox jumps    '
>>> txt.rstrip()
'\t\tthe quick brown fox jumps'
Last edited on 01/09/2021