UNF(Universal NLP Framework) is built on pytorch and torchtext. Its design philosophy is:
- modularity: specifically, on the one hand, it is convenient to quickly run some nlp-related tasks; on the other hand, it is convenient for secondary development and research to implement some new models or technologies.
- efficiency: supports distributed training and half-precision training, which is convenient for quickly training the model, although the current support is relatively crude
- comprehensive: support pytorch trace into static graph, support c ++ server, provide web-server for debugging tools
Now, support text classification and sequence labeling related tasks.
- Convolutional Neural Networks for Sentence Classification 2014
- Bag of Tricks for Efficient Text Classification 2016
- Deep Pyramid Convolutional Neural Networks for Text Categorization 2017, ACL
- Hierarchical Attention Networks for Document Classification 2017, ACL
- A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING 2017,ICLR
- Joint Embedding of Words and Labels for Text Classification2018,ACL
- Neural Architectures for Named Entity 2016,ACL
- Semi-supervised Multitask Learning for Sequence Labeling 2017, ACL
Module name | Module function |
---|---|
UNF.data | Load data from disk to RAM, include batch, padding,numerical |
UNF.module | Neural network layer, include encoder, decoder, embedding, provided for use by the model |
UNF.model | Neural network model structure, include DpCnn, SelAttention,Lstm-crf..and python predictor for those models |
UNF.training | Model training, include early stopping, model save and reload, visualize metrics throuth Tensorboard |
UNF.tracing | Trace pytorch dynamic graph to static graph, and provide c++ serving |
UNF.web_server | Web server tool related |
python3
pip3 install -r requirement.txt
#quick start
python3 train_flow.py
Only 5 line code need
#data loader
data_loader = DataLoader(data_loader_conf)
train_iter, dev_iter, test_iter = data_loader.generate_dataset()
#model loader
model, model_conf = ModelLoader.from_params(model_conf, data_loader.fields)
#learner loader
learner = LearnerLoader.from_params(model, train_iter, dev_iter, learner_conf, test_iter=test_iter, fields=data_loader.fields, model_conf=model_conf)
#learning
learner.learn()
"use_fp16": False,
"multi_gpu": False
#quick start
python3 score_flow.py
#core code
from models.predictor import Predictor
predictor = Predictor(model_path, device, model_type)
logits = predictor.predict(input)
(0.18, -0.67)
#quick start
python3 trace.py
#core code
net = globals()[model_cls](**config.__dict__)
net.load_state_dict_trace(torch.load("%s/best.th" % model_path))
net.eval()
mock_input = net.mock_input_data()
tr = torch.jit.trace(net, mock_input)
tr.save("trace/%s" % save_path)
- install cmake
- download libtorch and unzip to trace folder
cd trace
cmake -DCMAKE_PREFIX_PATH=libtorch .
make
./predict trace.pt predict_vocab.txt
output: 2.2128 -2.3287
cd web_server
python run.py