Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Series.update throws a FutureWarning about def[col] = df[col].method but .update returns None and works inplace #59788

Open
1 task done
spawn-guy opened this issue Sep 13, 2024 · 6 comments
Labels
Copy / view semantics Needs Discussion Requires discussion from core team before further action Warnings Warnings that appear or should be added to pandas

Comments

@spawn-guy
Copy link

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/dev/reference/api/pandas.Series.update.html#pandas.Series.update

Documentation problem

df.update resembles how python.dict.update works, but df.update doesn't support CoW

Suggested fix for documentation

remove FutureWarning for the df.update

or create a (for example) df.coalesce method that will, actually, return something. this shouldn't brake existing code

@spawn-guy spawn-guy added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 13, 2024
@rhshadrach
Copy link
Member

df.update doesn't support CoW

Thanks for the report - can you provide a reproducible example on how CoW is not supported.

@rhshadrach rhshadrach added Copy / view semantics Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 14, 2024
@spawn-guy
Copy link
Author

@rhshadrach here is some code and log

# select best source: heading
# HeadingTrue > HeadingMagnetic > HeadingAndDeclination (this is also magnetic) > TrackMadeGood
measurements_df["heading"] = measurements_df["gps_course_over_ground"]
# replace if other value is not nan
measurements_df["heading"].update(measurements_df["gps_heading"])

FutureWarning

_task.py:427: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  measurements_df["heading"].update(measurements_df["gps_heading"])

Inconsistency with the warning:

  • update is always "inplace=True"
  • there is no df[col] = df[col].update(value) that actually "returns" something

@rhshadrach
Copy link
Member

Thanks @spawn-guy, however your example is not reproducible because you did not provide measurements_df. Can you provide a reproducible example?

@spawn-guy
Copy link
Author

spawn-guy commented Nov 26, 2024

@rhshadrach it took me some time to pick this up, but here is a small test. at first i thought it might be related to the mask that i use, but the FutureWarning is thrown without it as well

import numpy as np
import pandas as pd

# test pandas warnings
df = pd.DataFrame(
    {
        "A": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
        "B": [1, 1, 1, 1, 1, 1],
        "C": [np.nan, 5, 6, np.nan, np.nan, np.nan],
        "D": [0, 0, 2, 2, 0, 0],
    }
)

# with mask
# df = df[df["D"] > 0]

df["E"] = df["A"]
df["E"].update(df["B"])
# df["E"].update(df["C"])
print(df)

results in

cli_python_311_upgrade_test.py:209: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["E"].update(df["B"])

    A  B    C  D    E
0 NaN  1  NaN  0  1.0
1 NaN  1  5.0  0  1.0
2 NaN  1  6.0  2  1.0
3 NaN  1  NaN  2  1.0
4 NaN  1  NaN  0  1.0
5 NaN  1  NaN  0  1.0

the FutureWarning is thrown after df["E"].update(df["B"])

so, in current implementation, i don't see a way to fix this FutureWarning for the reasons mentioned above

@spawn-guy
Copy link
Author

spawn-guy commented Nov 26, 2024

and if i do as the warning suggests - it will be a mistake

import numpy as np
import pandas as pd

# test pandas warnings
df = pd.DataFrame(
    {
        "A": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
        "B": [1, 1, 1, 1, 1, 1],
        "C": [np.nan, 5, 6, np.nan, np.nan, np.nan],
        "D": [0, 0, 2, 2, 0, 0],
    }
)

# with mask
# df = df[df["D"] > 0]

df["E"] = df["A"]
df["E"].update(df["B"])
# df["E"].update(df["C"])
print(df)

df["E"] = df["A"]
df["E"] = df["E"].update(df["C"])
print(df)

output

cli_python_311_upgrade_test.py:209: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["E"].update(df["B"])
    A  B    C  D    E
0 NaN  1  NaN  0  1.0
1 NaN  1  5.0  0  1.0
2 NaN  1  6.0  2  1.0
3 NaN  1  NaN  2  1.0
4 NaN  1  NaN  0  1.0
5 NaN  1  NaN  0  1.0

    A  B    C  D     E
0 NaN  1  NaN  0  None
1 NaN  1  5.0  0  None
2 NaN  1  6.0  2  None
3 NaN  1  NaN  2  None
4 NaN  1  NaN  0  None
5 NaN  1  NaN  0  None
cli_python_311_upgrade_test.py:214: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["E"] = df["E"].update(df["C"])

notice the all-None column E

@rhshadrach
Copy link
Member

rhshadrach commented Dec 2, 2024

Thanks for the example. To do this operation, you'd need to have something like:

ser = df["E"]
ser.update(df["C"])
df["E"] = ser

or create a (for example) df.coalesce method that will, actually, return something. this shouldn't brake existing code

I think you're looking for combine_first

@jorisvandenbossche - thoughts on changing the warning message here? It would likely need to go into 2.3, yea?

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Warnings Warnings that appear or should be added to pandas and removed Needs Info Clarification about behavior needed to assess issue Docs labels Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Needs Discussion Requires discussion from core team before further action Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

2 participants