This repository captures experimentation with the scikit-multilearn package and its uses in multi-label classification.
To navigate this repository, the following files are available for exploration:
- 101-data-exploratiion.ipynb: This code explores the dataset [PubMed Multi Label Text Classification Dataset.csv].
- 102_data_preprocessing.ipynb: The code can be run from this Jupyter Notebook down, with preprocessing conducted to prepare the data.
- 103_data_vectorisation_and_modelling.ipynb: This code takes the preprocessed data and vectorises text data for modelling.
Some further files for consideration are:
- modelling_notes_and_caveats.md: This notebook covers the modelling features and caveats encountered in this project.
- data: The data folder includes the raw data [PubMed Multi Label Text Classification Dataset.csv], the features and labels and the vectorised data used for modelling.