Need guidance on where to go next #10850
-
Hello, I'm currently doing a research project which involves reading text from news articles surrounding criminal organizations. I'm starting with Chinese Organized Crime, I.E., Triads. On my first go around, I ran all of the text from my articles into spaCy to see what would happen. It did well, but it wasn't perfect. For instance, spaCy labeled the alias of one of the leaders of the Triad as a "work of art" and if the context changed, it was classified as an organization. For Example:
SpaCy also couldn't correctly label the organizations names. For example, the 14K Triad was never labeled or if anything, the 14 was labeled as a "Cardinal". here you can see my stack question on this and how I eventually figured it out with some gentle nudging by polm. As for my code, right now I'm sitting on:
and this is an example of my "patterns".
The total amount of patterns I have created is 744, with various criminal groups and variations on their name, e.g., sometimes they are referred to as the 14K Mafia, instead of the 14K Triad. So, this brings me to my reason for opening this thread, where should I go next? I could do the same for Broken Tooth, and create a new pattern for this alias name. However, I think if I keep doing this, I'm doing spaCy wrong lol. I feel like I'm using spaCy as a fancy regex and highlighter. Is there something else I should do instead? Eventually, I want to get to a point where I can take an article and accurately associate people to organizations/events/businesses. Thank you for any help you can provide. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It sounds like maybe you are ready to train an NER model. I would suggest you go through the spaCy course. Since it sounds like you're new to NLP, you might also want to look at the Jurafsky and Martin book and read through any chapters that seem relevant to what you're working on. In this case I guess that would be 8 (for NER) and 17 (information extraction) to start with. |
Beta Was this translation helpful? Give feedback.
It sounds like maybe you are ready to train an NER model. I would suggest you go through the spaCy course. Since it sounds like you're new to NLP, you might also want to look at the Jurafsky and Martin book and read through any chapters that seem relevant to what you're working on. In this case I guess that would be 8 (for NER) and 17 (information extract…