LightAutoML (LAMA) allows you create machine learning models using just a few lines of code, or build your own custom pipeline using ready blocks. It supports tabular, time series, image and text data.
Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Rinchin Damdinov, Vasilii Bunakov, Alexander Kirilin, Pavel Shvets.
There are two ways to solve machine learning problems using LightAutoML:
-
Ready-to-use preset:
from lightautoml.automl.presets.tabular_presets import TabularAutoML from lightautoml.tasks import Task automl = TabularAutoML(task = Task(name = 'binary', metric = 'auc')) oof_preds = automl.fit_predict(train_df, roles = {'target': 'my_target', 'drop': ['column_to_drop']}).data test_preds = automl.predict(test_df).data
-
As a framework:
LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.
- Tabular Playground Series April 2021 competition solution
- Titanic competition solution (80% accuracy)
- Titanic 12-code-lines competition solution (78% accuracy)
- House prices competition solution
- Natural Language Processing with Disaster Tweets solution
- Tabular Playground Series March 2021 competition solution
- Tabular Playground Series February 2021 competition solution
- Interpretable WhiteBox solution
- Custom ML pipeline elements inside existing ones
- Custom ML pipeline elements inside existing ones
- Tabular Playground Series November 2022 competition solution with Neural Networks
Google Colab tutorials and other examples:
Tutorial_1_basics.ipynb
- get started with LightAutoML on tabular data.Tutorial_2_WhiteBox_AutoWoE.ipynb
- creating interpretable models.Tutorial_3_sql_data_source.ipynb
- shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV.Tutorial_4_NLP_Interpretation.ipynb
- example of using TabularNLPAutoML preset, LimeTextExplainer.Tutorial_5_uplift.ipynb
- shows how to use LightAutoML for a uplift-modeling task.Tutorial_6_custom_pipeline.ipynb
- shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc.Tutorial_7_ICE_and_PDP_interpretation.ipynb
- shows how to obtain local and global interpretation of model results using ICE and PDP approaches.Tutorial_8_CV_preset.ipynb
- example of using TabularCVAutoML preset in CV multi-class classification task.Tutorial_9_neural_networks.ipynb
- example of using Tabular preset with neural networks.Tutorial_10_relational_data_with_star_scheme.ipynb
- example of using Tabular preset with neural networks.Tutorial_11_time_series.ipynb
- example of using Tabular preset with timeseries data.
Note 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default
Note 2: to take a look at this report after the run, please comment last line of demo with report deletion command.
-
LightAutoML crash courses:
-
Video guides:
- (Russian) LightAutoML webinar for Sberloga community (Alexander Ryzhkov, Dmitry Simakov)
- (Russian) LightAutoML hands-on tutorial in Kaggle Kernels (Alexander Ryzhkov)
- (English) Automated Machine Learning with LightAutoML: theory and practice (Alexander Ryzhkov)
- (English) LightAutoML framework general overview, benchmarks and advantages for business (Alexander Ryzhkov)
- (English) LightAutoML practical guide - ML pipeline presets overview (Dmitry Simakov)
-
Papers:
- Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin "LightAutoML: AutoML Solution for a Large Financial Services Ecosystem". arXiv:2109.01528, 2021.
-
Articles about LightAutoML:
To install LAMA framework on your machine from PyPI:
# Base functionality:
pip install -U lightautoml
# For partial installation use corresponding option
# Extra dependencies: [nlp, cv, report] or use 'all' to install all dependencies
pip install -U lightautoml[nlp]
# Or extra dependencies with specific version
pip install 'lightautoml[all]==0.4.0'
Additionally, run following commands to enable pdf report generation:
# MacOS
brew install cairo pango gdk-pixbuf libffi
# Debian / Ubuntu
sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info
# Fedora
sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2
# Windows
# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows
Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:
- GPU pipeline is available here
- Spark pipeline is available here
If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.
- Seek prompt advice in Telegram group.
- Open bug reports and feature requests on GitHub issues.
This project is licensed under the Apache License, Version 2.0. See LICENSE file for more details.