How to write a great Stack Overflow question
In my 7 years of teaching data science, I've answered thousands of questions online. I know what makes a great question, because those are the questions that get my attention!
When you have a code question, Stack Overflow is an excellent place to ask. There are tons of skilled developers hungry for questions to answer so that they can earn points and build their reputation.
If you write a great question, it may be answered quickly! If you don't, it will probably be ignored.
So how do you write a great Stack Overflow question? Jake VanderPlas summarizes my thinking nicely:
The single best thing you can do when asking for coding help online is provide a short, complete example script that others can copy, paste, and run **without any modification** to reproduce your problem.
— Jake VanderPlas (@jakevdp) August 9, 2018
This applies to StackOverflow, mailing lists, github issues... everything.
Below, I'll demonstrate my step-by-step process for writing a great question which you can use any time you need to ask for coding help online!
Here's the initial question:
I'll use this pandas question as an example, which I received from one of my students:
I'm trying to tackle a dataset where each row represents a home transaction with a unique listing id. I know fillna is a column-based function that applies to all nans, but can we apply per row instead?
This dataset has town and architectural style columns, and I want to replace a nan value in architectural style with a groupby town of that row's town, and then apply the most common (highest value count) architectural style. I was going to use
df.groupby('town')['architectural style'].value_counts()
.I have a feeling I can do this with transform and a lambda function on the whole dataset, but how do I interact with the row, to get its town within the fillna function?
Conceptually, this is an excellent question, but it needs some work in order to be successful on Stack Overflow!
Here's my process for rewriting that question:
- Write a brief introduction
- Provide a self-contained code example
- Detail the expected results and why I expect those results
- Add any important notes
- Link to any relevant questions
- Write a title that summarizes the question
Below I'll rewrite my student's pandas question, and then I'll explain 9 lessons you can take away from this example!
Here's my rewritten question:
Title: How to fill missing values in a DataFrame with the most frequent value of each group?
I have a pandas DataFrame with two columns: toy
and color
. The color
column includes missing values.
How do I fill the missing color
values with the most frequent color
for that particular toy
?
Here's the code to create a sample dataset:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'toy':['car'] * 4 + ['train'] * 5 + ['ball'] * 3 + ['truck'],
'color':['red', 'blue', 'blue', np.nan, 'green', np.nan,
'red', 'red', np.nan, 'blue', 'red', np.nan, 'green']
})
Here's the sample dataset:
toy color
0 car red
1 car blue
2 car blue
3 car NaN
4 train green
5 train NaN
6 train red
7 train red
8 train NaN
9 ball blue
10 ball red
11 ball NaN
12 truck green
Here's the desired result:
- Replace the first NaN with blue, since that is the most frequent
color
for a car. - Replace the second and third NaNs with red, since that is the most frequent
color
for a train. - Replace the fourth NaN with either blue or red, since they are tied for the most frequent
color
for a ball.
Notes about the real dataset:
- There are many different
toy
types (not just four). - There are no
toy
types that only have missing values forcolor
, so the answer does not need to handle that case.
This question is related, but it doesn't answer my question of how to use the most frequent value to fill in missing values.
Here's what makes this a good question:
-
I summarized the entire question at the top. If someone might know the answer, I want to give them the question right away in order to motivate them to keep reading. If the question was instead buried at the bottom, the reader might just give up and move on to another question.
-
I wrote code that can be copied, pasted, and run. Coding questions can be hard to solve in your head, and so I want to make it as easy as possible for the reader to work on this problem on their own machine. I even included the imports so that they can run the code from a new environment.
-
I printed out the dataset. Even though the reader may end up running my code on their machine, I don't want to require them to do so in order to start thinking through a solution. This is also important because they will need something to look at when I explain the expected results. (In case you're wondering, I generated this output by running the sample code in IPython.)
-
I used short and simple object names. This makes it easier for the reader to read the code, interact with it on their own machine, and write out an answer.
-
I added proper formatting. This makes it easier for the reader to understand your question quickly.
-
I provided the expected results and why I expect those results. Detailing what you expect is important because it's unlikely that your question alone will be crystal clear to every reader. Detailing why you expect it is equally important because you don't want to receive solutions that arrive at the right result but for the wrong reason.
-
I covered many different cases with the example dataset. My dataset included a toy with one missing value, a toy with two missing values, a toy for which there was a "tie" in terms of most frequent color, and a toy with no missing values. This is important for ensuring that the solution you receive will work for all of the cases present in your real dataset.
-
I added any special notes that wouldn't be obvious from the example dataset. In this case, I wanted to ensure that the solution "scaled" to more than four toy types, and I didn't want the reader to worry about handling an edge case that doesn't exist in the real dataset. Thus, I'm trying to increase the likelihood that I get a useful answer as well as eliminate any questions that may have come up in the reader's mind.
-
I linked to a related question. This demonstrates that you have already searched for an answer, and it gives you a chance to make the case for why your question is not a duplicate (in which case it may be closed by a moderator). It also gives the reader a "head start" by linking to a resource that might help them to come up with an answer to your question.
Here are some additional things you might want to include:
My guiding principle is to make it easy for the reader by telling them everything they need to know and nothing else. Here are a few things that I didn't include in my question, but may be worth including in some cases:
- An error message, if your question is about how to fix a specific error.
- A diagram, if you think that might help to clarify your question.
- Which software versions you are using, if you think that might be relevant.
- Code you tried that didn't work, if you think that might help to clarify your goal.
Note: Including code that didn't work is often recommended as a way to "prove" that you have put in enough effort solving your own problem, but I don't universally recommend this since it makes your question longer without necessarily adding useful information.
Here are a few more tips:
- Ask questions that have a correct answer. The site is not designed for opinion-based questions.
- Focus on a single issue in your question. The site is optimized for discrete questions with discrete answers, and so questions that contain multiple issues aren't as well-received.
- Keep revising your question until it is as clear and concise as possible. You're asking people to help you for free, so writing a high-quality question is one way of honoring the time that they are giving you.
- If you find the answer on your own, you should post both the question and the answer. This is a great way to share your knowledge with the community, and you may even receive an answer that is better than your own!
Related resources:
- Stack Overflow Help: How do I ask a good question?
- Stack Overflow Help: How to create a Minimal, Reproducible Example
- Jon Skeet: Stack Overflow Question Checklist
- Jon Skeet: Writing the Perfect Question
- Matthew Rocklin: Craft Minimal Bug Reports
Good luck!
If you used this guide to help you write a great Stack Overflow question, feel free to share a link in the comments below and I'll take a look! 👇
P.S. I posted my pandas question on Stack Overflow and received a helpful answer within 5 minutes! 🙌 This is by no means guaranteed, but by writing a high-quality question, you will greatly increase the likelihood of a useful response!