using GLiNER for entities and relationships with explicit data? #174
Unanswered
ElJefeDSecurIT
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
HI! I'm kind of an old timer hacker but a first timer to ml training and looking to build out a knowledge graph based on the security domain. I've just come across GLiNER and I'm kind of interested in trying my hand at either fine-tuning it or perhaps building my own. model. I'm working on an experiment that takes the ATT&CK fwk relationships which is all in json, and cooked up 29k annotated sentences, like so:
{"sentence": "Lokibot uses Visual Basic.", "data": [["Lokibot", "malware", 0, 7], ["Visual Basic", "attack-pattern", 13, 25]]}
{"sentence": "Conti uses SMB/Windows Admin Shares.", "data": [["Conti", "malware", 0, 5], ["SMB/Windows Admin Shares", "attack-pattern", 11, 35]]}
{"sentence": "FunnyDream uses ccf32.", "data": [["FunnyDream", "campaign", 0, 10], ["ccf32", "malware", 16, 21]]}
now, I already have a list of labels that I think align with my labeled terms, And also want to keep common use terms (person, location, etc), and lastly - would love for it to be a super-accurate extraction model post-training, but: I just need some validation:
am I on the right path here? what am i missing? should I split my training dataset 70/30 for test data? should I train a clean model or fine tune? if anyone could offer a few pointers which way to go, this would be most helpful. 🙏🏽
Beta Was this translation helpful? Give feedback.
All reactions