Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add some example with real Dataset, it is not working. #17

Open
MrBenzWorld opened this issue Nov 19, 2022 · 11 comments
Open

Please add some example with real Dataset, it is not working. #17

MrBenzWorld opened this issue Nov 19, 2022 · 11 comments

Comments

@MrBenzWorld
Copy link

Please , use some real dataset ( Ex : which are avaible Kaggle, House pricing .etc..)

we are getting problem in giving features and target tata.
Catboost is giving best results in python.
Please fix it in julia and add some examples with hyper parameter tuning.

Please help,
Thank you

@ericphanson
Copy link
Collaborator

ericphanson commented Nov 19, 2022

Is there an example or tutorial you like for the python interface to CatBoost? This is just a light wrapper of the python interface so any examples there should translate pretty easily. If you link one you are having trouble translating I could help.

If you are getting errors, please post them as minimal examples with the error message and stack trace. It could be a bug in the wrapper code here.

@MrBenzWorld
Copy link
Author

I have dataFrame data, Pylist is not working.
https://github.com/catboost/tutorials/tree/master/regression

Please add any example with these features (https://github.com/catboost/tutorials/tree/master/regression)

input feature data, lable data , is hard to define.
Please, add one example.

Catboost is best, nothing comparable to catboost.
Please provide good wrapper in julia with exmaple.

Thank you.

@tylerjthomas9
Copy link
Collaborator

tylerjthomas9 commented Nov 21, 2022

I have dataFrame data, Pylist is not working.

If you want to work with DataFrames.jl/Tables.jl, you have to convert it to a python table using the pytable function in PythonCall.jl.

https://cjdoris.github.io/PythonCall.jl/stable/compat/#PythonCall.pytable

Here is an example:

using CatBoost
using DataFrames
using PythonCall

df_train = DataFrame(a=[1,4,30], b=[4,5,40], c=[5,6,50], d=[6,7,60])
train_data = pytable(df_train)
df_eval = DataFrame(a=[2,1], b=[4,4], c=[6,50], d=[8,60])
eval_data = pytable(df_eval)
train_labels = PyList([10, 20, 30])

# Initialize CatBoostRegressor
model = CatBoostRegressor(; iterations=2, learning_rate=1, depth=2)

# Fit model
fit!(model, train_data, train_labels)

# Get predictions
preds = predict(model, eval_data)

Additionally, you should be able to leave the dataframe as a DataFrames.jl object. Here is an example:

using CatBoost
using DataFrames
using PythonCall

train_data= DataFrame(a=[1,4,30], b=[4,5,40], c=[5,6,50], d=[6,7,60])
eval_data= DataFrame(a=[2,1], b=[4,4], c=[6,50], d=[8,60])
train_labels = PyList([10, 20, 30])

# Initialize CatBoostRegressor
model = CatBoostRegressor(; iterations=2, learning_rate=1, depth=2)

# Fit model
fit!(model, train_data, train_labels)

# Get predictions
preds = predict(model, eval_data)

@MrBenzWorld
Copy link
Author

MrBenzWorld commented Nov 23, 2022

Thank you . it is working.

@WilliamZimmerman83
Copy link

WilliamZimmerman83 commented Jan 17, 2023

In terms of running this in Julia 1.8.3 the CatBoost example seems to not work:

image

I can't stop it and there is no error message... PythonCall seems to be working as per the pytables() in the previous block. I have the most recent version of CatBoost:

image

Any ideas?

@tylerjthomas9
Copy link
Collaborator

tylerjthomas9 commented Jan 17, 2023

Any ideas?

I can not replicate this. What platform are you on?

Can you try in a fresh environment with just CatBoost, DataFrames, PythonCall?

Does the test suite pass for you?

@WilliamZimmerman83
Copy link

WilliamZimmerman83 commented Jan 18, 2023

When following an example in CatBoost:

using Pkg

using CatBoost
using DataFrames
using PythonCall

train_data= DataFrame(a=[1,4,30], b=[4,5,40], c=[5,6,50], d=[6,7,60])
eval_data= DataFrame(a=[2,1], b=[4,4], c=[6,50], d=[8,60])
train_labels = PyList([10, 20, 30])

# Initialize CatBoostRegressor
model = CatBoost.CatBoostRegressor(; iterations=2, learning_rate=1, depth=2)

# Fit model
fit!(model, train_data, train_labels)

# Get predictions
preds = predict(model, eval_data)

When running this I end up with the following error:

ERROR: InitError: Python: ModuleNotFoundError: No module named 'catboost'
Python stacktrace: none
Stacktrace:
  [1] pythrow()
    @ PythonCall ~/.julia/packages/PythonCall/3GRYN/src/err.jl:94
  [2] errcheck
    @ ~/.julia/packages/PythonCall/3GRYN/src/err.jl:10 [inlined]
  [3] pyimport(m::String)
    @ PythonCall ~/.julia/packages/PythonCall/3GRYN/src/concrete/import.jl:11
  [4] __init__()
    @ CatBoost ~/.julia/packages/CatBoost/1k9L5/src/CatBoost.jl:31
  [5] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
    @ Base ./loading.jl:831
  [6] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
    @ Base ./loading.jl:1039
  [7] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1315
  [8] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1200
  [9] macro expansion
    @ ./loading.jl:1180 [inlined]
 [10] macro expansion
    @ ./lock.jl:223 [inlined]
 [11] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1144
 [12] eval
    @ ./boot.jl:368 [inlined]
 [13] eval
    @ ./Base.jl:65 [inlined]
 [14] repleval(m::Module, code::Expr, #unused#::String)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.38.2/scripts/packages/VSCodeServer/src/repl.jl:222
 [15] (::VSCodeServer.var"#107#109"{Module, Expr, REPL.LineEditREPL, REPL.LineEdit.Prompt})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.38.2/scripts/packages/VSCodeServer/src/repl.jl:186
 [16] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:511
 [17] with_logger
    @ ./logging.jl:623 [inlined]
 [18] (::VSCodeServer.var"#106#108"{Module, Expr, REPL.LineEditREPL, REPL.LineEdit.Prompt})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.38.2/scripts/packages/VSCodeServer/src/repl.jl:187
 [19] #invokelatest#2
    @ ./essentials.jl:729 [inlined]
 [20] invokelatest(::Any)
    @ Base ./essentials.jl:726
 [21] macro expansion
    @ ~/.vscode/extensions/julialang.language-julia-1.38.2/scripts/packages/VSCodeServer/src/eval.jl:34 [inlined]
 [22] (::VSCodeServer.var"#61#62")()
    @ VSCodeServer ./task.jl:484
during initialization of module CatBoost

Is "using CatBoost" trying to run the python version of catboost? I do have the python version installed, but I don't see why it would shoot this error at me.. If I run the "using CatBoost" again it gives no error, but when I run the model line:

model = CatBoost.CatBoostRegressor(; iterations=2, learning_rate=1, depth=2)

the terminal crashes and spits out this error:

The terminal process "julia '-i', '--banner=no', '--project=/Users/williamzimmerman/.julia/environments/v1.8', '/Users/williamzimmerman/.vscode/extensions/julialang.language-julia-1.38.2/scripts/terminalserver/terminalserver.jl', '/var/folders/5s/qsvggw9n407_7r13v8j1xcsm0000gp/T/vsc-jl-repl-2763b893-1eb8-4b48-acdc-cee625375ebc', '/var/folders/5s/qsvggw9n407_7r13v8j1xcsm0000gp/T/vsc-jl-cr-38d35a79-e9bd-4528-8268-cc393f0d316a', 'USE_REVISE=true', 'USE_PLOTPANE=true', 'USE_PROGRESS=true', 'ENABLE_SHELL_INTEGRATION=true', 'DEBUG_MODE=false'" terminated with exit code: 139.

Running a test on CatBoost in pkg fails I believe due to:

error    libmamba Selected channel specific (or force-reinstall) job, but package is not available from channel. Solve job will fail.

As this process installs 15 packages on a mini-conda but can't get catboost from libmamda... I have Anaconda installed with catboost - why does this check here instead of my local? Is there a way to fix this?

  Package              Version  Build               Channel                     Size
──────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────

  + bzip2                1.0.8  h3422bc3_4          conda-forge/osx-arm64     Cached
  + ca-certificates  2022.12.7  h4653dfc_0          conda-forge/osx-arm64     Cached
  + libffi               3.4.2  h3422bc3_5          conda-forge/osx-arm64     Cached
  + libsqlite           3.40.0  h76d750c_0          conda-forge/osx-arm64     Cached
  + libzlib             1.2.13  h03a7124_4          conda-forge/osx-arm64     Cached
  + ncurses                6.3  h07bb92c_1          conda-forge/osx-arm64     Cached
  + openssl              3.0.7  h03a7124_1          conda-forge/osx-arm64     Cached
  + pip                 22.3.1  pyhd8ed1ab_0        conda-forge/noarch        Cached
  + python              3.11.0  h3ba56d0_1_cpython  conda-forge/osx-arm64     Cached
  + readline             8.1.2  h46ed386_0          conda-forge/osx-arm64     Cached
  + setuptools          66.0.0  pyhd8ed1ab_0        conda-forge/noarch        Cached
  + tk                  8.6.12  he1e0b03_0          conda-forge/osx-arm64     Cached
  + tzdata               2022g  h191b570_0          conda-forge/noarch        Cached
  + wheel               0.38.4  pyhd8ed1ab_0        conda-forge/noarch        Cached
  + xz                   5.2.6  h57fd34a_0          conda-forge/osx-arm64     Cached

  Summary:

  Install: 15 packages

  Total download: 0 B

@tylerjthomas9
Copy link
Collaborator

tylerjthomas9 commented Jan 18, 2023

why does this check here instead of my local? Is there a way to fix this?

PythonCall.jl automatically creates a python environment. Here is a guide on using your own conda environments with PythonCall.jl: https://cjdoris.github.io/PythonCall.jl/stable/pythoncall/#If-you-already-have-a-Conda-environment

@ CatBoost ~/.julia/packages/CatBoost/1k9L5/src/CatBoost.jl:31
Is "using CatBoost" trying to run the python version of catboost?

Yes it is. From your error output, it appears that PythonCall.jl is failing when importing catboost. Can you try manually installing catboost via CondaPkg.jl? I am unable to replicate the issue

Here is the CondaPkg command to install catboost:

CondaPkg.add("catboost", channel="conda-forge")

Is "using CatBoost" trying to run the python version of catboost? I do have the python version installed, but I don't see why it would shoot this error at me.. If I run the "using CatBoost" again it gives no error, but when I run the model line:

The error wont appear the second time, because the python imports are run when initializing the CatBoost module. https://github.com/beacon-biosignals/CatBoost.jl/blob/9cab2d6ebce3cf205923a26140b4bda2443907b3/src/CatBoost.jl#L30-L33

@WilliamZimmerman83
Copy link

Thanks @tylerjthomas9 this was a huge help! It now works. Maybe we should add some of these parts to the landing page on how to get this up and running and how to use native dataframes. I know there are many like myself and @MrBenzWorld who really want to use Julia but documentation is either outdated or missing altogether.

Just my $0.02

@tylerjthomas9
Copy link
Collaborator

how to use native dataframes

What do you mean by native dataframes? Pandas or DataFrames.jl?

I will look into adding more documentation, especially on getting started and troubleshooting PythonCall.jl issues. If you come up with other things, feel free to open another issue where we can track potential documentation deficiencies in one place.

@WilliamZimmerman83
Copy link

how to use native dataframes

What do you mean by native dataframes? Pandas or DataFrames.jl?

I will look into adding more documentation, especially on getting started and troubleshooting PythonCall.jl issues. If you come up with other things, feel free to open another issue where we can track potential documentation deficiencies in one place.

Thanks for the added documentation & sorry for not being concise - I meant DataFrames.jl - it would be great to see how this package works from start to finish including getting plots up and running (I have, after many hours, gotten everything to work except plots = true despite having all of the necessary libraries installed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants