Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: why arrow no affect in Intel chips? #60466

Open
3 tasks done
wonb168 opened this issue Dec 2, 2024 · 1 comment
Open
3 tasks done

BUG: why arrow no affect in Intel chips? #60466

wonb168 opened this issue Dec 2, 2024 · 1 comment
Labels
Arrow pyarrow functionality Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance

Comments

@wonb168
Copy link

wonb168 commented Dec 2, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.options.mode.copy_on_write = True

def ensure_arrow_format(df):
        if not isinstance(df._mgr, pd.core.internals.ArrayManager):  # 检查是否已经是 Arrow 格式
            return pa.Table.from_pandas(df).to_pandas()
        return df

Issue Description

I updated pandas from 2.1.4 to 2.2.3,
and open copy on write,
and convert to arrow before df merge,
then the cost from 61s to 36s,
faster.

Expected Behavior

My pc is mac mini m2, from 61s to 36s,
but in x86 centos or win10, only from 109 to 102,
why no affect in intel cpu?

Installed Versions

pandas 2.2.3

@wonb168 wonb168 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 2, 2024
@asishm asishm added Performance Memory or execution speed performance and removed Bug labels Dec 2, 2024
@rhshadrach
Copy link
Member

Thanks for the report. Can you provide a fully reproducible example. We need the DataFrame you are operating on to be able to look further into the issue.

pa.Table.from_pandas(df).to_pandas()

In addition, it would be helpful to report benchmarks for each of these two operations.

pa_table = pa.Table.from_pandas(df)
result = pa_table.to_pandas()

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants