Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding out what packages are exploding the build #24

Open
abkfenris opened this issue Jul 30, 2021 · 5 comments
Open

Finding out what packages are exploding the build #24

abkfenris opened this issue Jul 30, 2021 · 5 comments

Comments

@abkfenris
Copy link
Collaborator

Here's a way to start analyzing what packages are causing the build to explode.

import json
from pathlib import Path
import pandas as pd

pkg_files = Path("/opt/conda/conda-meta/").glob("*.json")

paths = []

for pkg_file in pkg_files:
    with pkg_file.open() as f:
        pkg = json.load(f)
        paths += pkg["paths_data"]["paths"]

df = pd.DataFrame(paths)
df = df.drop(
    [
        "path_type",
        "sha256",
        "sha256_in_prefix",
        "no_link",
        "file_mode",
        "prefix_placeholder",
    ],
    axis=1,
)
df = df.dropna()
df = df.sort_values("size_in_bytes", ascending=False)
 $ df.head(20)

                                                    _path  size_in_bytes
37480   lib/python3.9/site-packages/tensorflow/python/...    271301744.0
25649                                    lib/libavcodec.a    152533588.0
25258                                   lib/libLLVM-11.so    105929424.0
74839   x86_64-conda-linux-gnu/sysroot/usr/lib64/local...     99188496.0
85596                                    lib/librsvg-2.so     97415432.0
85597                                  lib/librsvg-2.so.2     97415432.0
85598                             lib/librsvg-2.so.2.47.0     97415432.0
91707                                   lib/libLLVM-10.so     95685352.0
50318                        lib/libQt5WebEngineCore.so.5     92408776.0
50320                   lib/libQt5WebEngineCore.so.5.12.9     92408776.0
50317                          lib/libQt5WebEngineCore.so     92408776.0
50319                     lib/libQt5WebEngineCore.so.5.12     92408776.0
122433                                         bin/pandoc     76341040.0
25661                                   lib/libavformat.a     47442464.0
33070   site-packages/compliance_checker/tests/data/ma...     42981152.0
40321                                      lib/libgdal.so     35682192.0
40323                               lib/libgdal.so.28.0.1     35682192.0
40322                                   lib/libgdal.so.28     35682192.0
80393                                lib/libclang.so.11.1     35233816.0
80392                                     lib/libclang.so     35233816.0

Caching some of the build with #23

@ocefpaf

@abkfenris
Copy link
Collaborator Author

Removing tensorflow only slims things down another half gig

➜ docker images
REPOSITORY             TAG       IMAGE ID       CREATED          SIZE
ohw-no-py-tensorflow   latest    200d425d469f   52 seconds ago   5.3GB
ohw-cache-apt          latest    c646a0031f14   9 hours ago      5.81GB
ohw-cache              latest    274e85773a32   9 hours ago      5.81GB
ohw                    latest    d2014651c42b   10 hours ago     8.27GB

Archive.zip

@ocefpaf
Copy link
Member

ocefpaf commented Jul 30, 2021

causing the build to explode.

What do you mean by exploding? We are not able to upload that?

PS: let's remove tensorflow!

@abkfenris
Copy link
Collaborator Author

I meant size in this case, but also didn't have permissions for uploading to Docker Hub.

@abkfenris
Copy link
Collaborator Author

I'll remove tensorflow in #23

@abkfenris
Copy link
Collaborator Author

no_link is no longer in the dataframe, so it should now be

import json
from pathlib import Path
import pandas as pd

pkg_files = Path("/opt/conda/conda-meta/").glob("*.json")

paths = []

for pkg_file in pkg_files:
    with pkg_file.open() as f:
        pkg = json.load(f)
        paths += pkg["paths_data"]["paths"]

df = pd.DataFrame(paths)
df = df.drop(
    [
        "path_type",
        "sha256",
        "sha256_in_prefix",
        # "no_link",
        "file_mode",
        "prefix_placeholder",
    ],
    axis=1,
)
df = df.dropna()
df = df.sort_values("size_in_bytes", ascending=False)
df

It's also useful to include mamba in the environment for mamba repoquery to get dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants