SettingsWithCopyWarning

Category: Python (in Hungarian).

The problem

Sometimes we get the following message when working with Python Pandas DataFrame:

... SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
...

Seems scary, however, this is just a warning, indicating that there is a potential error.

There are a lot of descriptions on the net dealing with this, but I find those too complicated on one side, and not exhaustive on other side. I try to target the problem as simple as possible, and as exhaustive as possible. Let's walk around this!

Meaning

The background of the problem is that some DataFrame operations create views, while the other copies of the original dataframe.

  • If the result is a view, and we change something, then the original dataframe also changes.
  • If the result is a copy, and we change something, then the original dataframe does not change.

Sometimes we want to change the original dataframe, sometimes not. This warning indicates that we might not implemented as we want. The problem is that if a selection result is view or copy is not guaranteed.

This is sometimes called chained assignment problem, because typically occurs in case of chain assignments.

Cases

There are 4 possibilities on 2 dimensions: is it view or copy, and do we want to treat it as view or copy. The following will be the starter input for all the 4 scenarios:

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})

True positive

We think it is a view but actually it is a copy.

Let's consider the most typical true positive case (this is most likely the original case why this warning was created):

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
my_fruits_df[my_fruits_df['pieces'] < 4]['pieces'] = my_fruits_df['pieces'] + 1

Explanation:

  • We have a data frame containing 2 columns: a fruit name and how many pieces of that fruit we have.
  • We want to increase the pieces by 1 where there are lower number of pieces.

At first glance the code does what we want, but the result might be surprising:

print(my_fruits_df)
#     fruit  pieces
# 0   apple       3
# 1  orange       2
# 2  banana       5

It did not change anything! The problem is the following:

  • my_fruits_df[my_fruits_df['pieces'] < 4] actually creates a copy of the original data frame,
  • therefore the assignment is done on the copy,
  • the copy is dropped after assignment, and the original one remains the same.

So in this case actually it is an error, it should be fixed. The solution is somewhat cumbersome, but straightforward:

my_fruits_df.loc[my_fruits_df['pieces'] < 4, 'pieces'] = my_fruits_df[my_fruits_df['pieces'] < 4]['pieces'] + 1

False positive

We think it is copy, and actually it is a copy.

In this example we have favourite fruits. We would like to subset the original dataframe based on our preference, and perform similar changes like above: increasing the number of pieces of our favourites by 2:

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
 
my_favorite_fruits = ['apple', 'banana', 'plum']
my_favorite_fruits_df = my_fruits_df[my_fruits_df['fruit'].isin(my_favorite_fruits)]
my_favorite_fruits_df['pieces'] = my_favorite_fruits_df['pieces'] + 2

We expect that the original one remains the same (or simply don't care the original one anymore), the copy changes, and actually it is:

print(my_fruits_df)
#     fruit  pieces
# 0   apple       3
# 1  orange       2
# 2  banana       5
 
print(my_favorite_fruits_df)
#     fruit  pieces
# 0   apple       5
# 2  banana       7

Now the warning says that we might think that my_favorite_fruits_df is a view, but actually it is a copy.

The dirty solution here is just neglect the warning. The nice is to add copy() function call, indicating that we know that this is a copy:

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
 
my_favorite_fruits = ['apple', 'banana', 'orange']
my_favorite_fruits_df = my_fruits_df[my_fruits_df['fruit'].isin(my_favorite_fruits)].copy()
my_favorite_fruits_df['pieces'] = my_favorite_fruits_df['pieces'] + 2

See the copy() call in the second line. Actually it creates a copy on the copy, so not so efficient in case of large dataframes, however, it suppresses the warning. The programmer says here: I know that I am working on a copy, please do not warn me that this could be a view.

True negative

We think it is a copy, but actually it is a view.

Let's come back to the original data frame, and In the next example we take the pieces just as number series, and we would like to change it:

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
 
my_fruits_pieces = my_fruits_df['pieces']
my_fruits_pieces[0] = 4

We might think it is a copy, but actually it is a view, so the original data frame changes as well:

print(my_fruits_df)
#     fruit  pieces
# 0   apple       4
# 1  orange       2
# 2  banana       5

Here the warning is misleading, because it says "a value is trying to be set on a copy of a slice from a DataFrame", but the contrary is true: a value is trying to be set on a view of a slice from a DataFrame.

If we really don't want to change the original one, we should again use copy(), like this:

my_fruits_pieces = my_fruits_df['pieces'].copy()
my_fruits_pieces[0] = 4

In this case the original remains the same, as we want:

print(my_fruits_df)
#     fruit  pieces
# 0   apple       3
# 1  orange       2
# 2  banana       5

False negative

We think it is a view, and actually it is a view.

Again, starting the the original dataframe, let's change the pieces of the apple to 7:

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
 
my_fruits_df['pieces'][0] = 7

The warning is wrong here as well, in should warn to view instead of copy. But actually this works as expected:

print(my_fruits_df)
#     fruit  pieces
# 0   apple       7
# 1  orange       2
# 2  banana       5

However, it prints the warning. This warning here is really valid, because if we used the following format instead of the actual one - very similar to this, seemingly the same - does not work as expected (again, starting with the original dataframe):

import pandas as pd
 
my_fruits_df = pd.DataFrame({
    'fruit': ['apple', 'orange', 'banana'],
    'pieces': [3, 2, 5],
})
 
my_fruits_df['pieces']['apple'] = 7
 
print(my_fruits_df)
#     fruit  pieces
# 0   apple       3
# 1  orange       2
# 2  banana       5

In the first case we thought that we worked on the view, and actually we worked on the view, but the compiler warned us that we might think that we are working on a copy. At first let's fix this one, just eliminating the warning nicely:

my_fruits_df.loc[0, 'pieces'] = 7

In the second example we actually worked on the copy. Here also the loc[] can be used to fix the error:

my_fruits_df.loc[0, 'pieces'] = 7

Or, more general, fixing the second approach:

my_fruits_df.loc[my_fruits_df['fruit'] == 'apple', 'pieces'] = 7

The latter one is actually the most generic solution: the first expression within the square brackets is a row condition where the assignment should be applied, and the second one indicates the column of which we want to change the values.

General solution

There are 2 general solutions to eliminate the warning in false positive case:

  • If we want to work on a copy, then write copy() call to the proper place.
  • If we want to work on a view, i.e. we want actually change original content, then we should perform the operation on the original dataframe, using the loc[] or similar accessors (iloc[], at[], iat[]). These are guaranteed to work on the view, and never create an intermediate copy.

Suppress warning

It is not recommended to suppress, as a quick a dirty solution we can use the following line of code:

pd.set_option('mode.chained_assignment', None)

But the policy could be more strict: the following line tells that the chain assignment problem should be treated as an error:

pd.set_option('mode.chained_assignment', 'raise')

The default is the following:

pd.set_option('mode.chained_assignment', 'warning')
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License