Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the method of drawing confidence intervals #3715

Open
z4668640 opened this issue Dec 19, 2024 · 0 comments
Open

About the method of drawing confidence intervals #3715

z4668640 opened this issue Dec 19, 2024 · 0 comments

Comments

@z4668640
Copy link

What is your suggestion?

Hello author, can Altair add confidence interval ellipses for different groups of points in the plot after drawing PCA scatter plot?
The following is a PCA scatter plot drawn in R language ggplot2.
Image
Here is the PCA scatter plot I am drawing now with altair and the implementation code.
Image

chart = alt.Chart(data).mark_circle(size=60).encode(
    x=alt.X(pc1_column, title='Principal Component 1'),
    y=alt.Y(pc2_column, title='Principal Component 2'),
    color=alt.Color('Environment1', title='Environment'), 
#    tooltip=['Index', pc1_column, pc2_column]
)
chart.display()

Is there any solution at the moment? thx.

Have you considered any alternative solutions?

Image
That was one of my attempt, but it obviously didn't work.
I don't have any good ideas right now.

kmeans = KMeans(n_clusters=3, random_state=0).fit(data[[pc1_column, pc2_column]])
data['Cluster'] = kmeans.labels_

chart = alt.Chart(data).mark_circle(size=60).encode(
    x=alt.X(pc1_column, title='Principal Component 1'),
    y=alt.Y(pc2_column, title='Environment'),
    color=alt.Color('Environment1', title='Environment'),
    tooltip=[pc1_column, pc2_column, 'Environment1']
)

def plot_confidence_ellipses_altair(df, chart):
    for cluster in df['Cluster'].unique():
        cluster_data = df[df['Cluster'] == cluster]
        mean_x = cluster_data[pc1_column].mean()
        mean_y = cluster_data[pc2_column].mean()
        cov_matrix = np.cov(cluster_data[pc1_column], cluster_data[pc2_column])
        eigvals, eigvecs = np.linalg.eigh(cov_matrix)
        order = eigvals.argsort()[::-1]
        eigvals, eigvecs = eigvals[order], eigvecs[:, order]
        theta = np.degrees(np.arctan2(*eigvecs[:, 0][::-1]))
        width, height = 2 * np.sqrt(chi2.ppf(0.95, 2)) * np.sqrt(eigvals)

        ellipse_df = pd.DataFrame({
            'x': mean_x + width / 2 * np.cos(np.linspace(0, 2 * np.pi, 100)) * np.cos(np.radians(theta)) -
                 height / 2 * np.sin(np.linspace(0, 2 * np.pi, 100)) * np.sin(np.radians(theta)),
            'y': mean_y + width / 2 * np.sin(np.linspace(0, 2 * np.pi, 100)) * np.cos(np.radians(theta)) +
                 height / 2 * np.cos(np.linspace(0, 2 * np.pi, 100)) * np.sin(np.radians(theta))
        })

        chart += alt.Chart(ellipse_df).mark_line(color='red', opacity=0.5, strokeWidth=2).encode(
            x=alt.X('x:Q', title='Principal Component 1'),
            y=alt.Y('y:Q', title='Environment')
        )

    return chart

chart = plot_confidence_ellipses_altair(data, chart)
chart.display()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant