How to use Python's f-strings with pandas
Python introduced f-strings back in version 3.6 (six years ago!), but I've only recently realized how useful they can be.
In this post, I'll start by showing you some simple examples of how f-strings are used, and then I'll walk you through a more complex example using pandas.
Here's what I'll cover:
- Substituting objects
- Calling methods and functions
- Evaluating expressions
- Formatting numbers
- Real-world example using pandas
- Further reading
Substituting objects:
name = 'Kevin'
age = 42
print(f'My name is {name}. I am {age} years old.')
My name is Kevin. I am 42 years old.
To make an f-string, you simply put an f
in front of a string. By putting the name
and age
objects inside of curly braces, those objects are automatically substituted into the string.
Calling methods and functions:
role = 'Daddy'
print(f'Sometimes my 6-year-old yells: {role.upper()}!!!')
Sometimes my 6-year-old yells: DADDY!!!
Strings have an upper()
method, and so I was able to call that method on the role
string from within the f-string.
Evaluating expressions:
days_completed = 37
print(f'This portion of the year remains: {(365 - days_completed) / 365}')
This portion of the year remains: 0.8986301369863013
You can evaluate an expression (a math expression, in this case) within an f-string.
Formatting numbers:
print(f'This percentage of the year remains: {(365 - days_completed) / 365:.1%}')
This percentage of the year remains: 89.9%
This looks much nicer, right? The :
begins the format specification, and the .1%
means "format as a percentage with 1 digit after the decimal point."
Real-world example using pandas:
Recently, I was analyzing the survey data submitted by 500+ Data School community members. I asked each person about their level of experience with 11 different data science topics, plus their level of interest in improving those skills this year.
Thus I had 22 columns of data, with names like:
python_experience
python_interest
pandas_experience
pandas_interest
- ...
Each “experience” column was coded from 0 (None) to 3 (Advanced), and each “interest” column was coded from 0 (Not interested) to 2 (Definitely interested).
Among other things, I wanted to know the mean level of interest in each topic, as well as the mean level of interest in each topic by experience level.
Here's what I did to answer those questions:
cats = ['python', 'pandas'] # this actually had 11 categories
for cat in cats:
mean_interest = df[f'{cat}_interest'].mean()
print(f'Mean interest for {cat.upper()} is {mean_interest:.2f}')
print(df.groupby(f'{cat}_experience')[f'{cat}_interest'].mean(), '\n')
Mean interest for PYTHON is 1.77
python_experience
0 1.590909
1 1.857143
2 1.781759
3 1.630769
Name: python_interest, dtype: float64
Mean interest for PANDAS is 1.67
pandas_experience
0.0 1.500000
1.0 1.825806
2.0 1.709924
3.0 1.262295
Name: pandas_interest, dtype: float64
Notice how I used f-strings:
- Because of the naming convention, I could access the DataFrame columns using
df[f'{cat}_interest']
anddf[f'{cat}_experience']
. - I capitalized the category using
f'{cat.upper()}'
to help it stand out. - I formatted the mean interest to 2 decimal places using
f'{mean_interest:.2f}'
.
Further reading:
- Guide to f-strings (written by my pal Trey Hunner)
- f-string cheat sheet (also by Trey)
P.S. This blog post originated as one of my weekly data science tips. Sign up below to receive data science tips every Tuesday! 👇