puzzled by results processing a project guternberg text #12381

mister-elliott · 2023-03-07T18:45:35Z

mister-elliott
Mar 7, 2023

I downloaded from project gutenberg: ANABASIS BY XENOPHON (pg1170.txt).
I hoped to be able to use spacy to extract person names and place names for a database project. I am very new to this....
I am attaching python code (ugly, due to hacking). I do not understand why in the opening sentence of the sample text Parysatis is not classified as a person. Most other names and places are missed as well.
I am attaching python code (ugly, due to hacking).

sample text:
Darius and Parysatis had two sons: the elder was named Artaxerxes, and the younger Cyrus. Now, as Darius lay sick and felt that the end of life drew near, he wished both his sons to be with him. The elder, as it chanced, was already there, but Cyrus he must needs send for from the province over which he had made him satrap, having appointed him general moreover of all the forces that muster in the plain of the Castolus. Thus Cyrus went up, taking with him Tissaphernes as his friend, and accompanied also by a body of Hellenes, three hundred hea armed men, under the command of Xenias the Parrhasian . Parrhasia, a district and town in the south-west of Arcadia. Now when Darius was dead, and Artaxerxes was established in the kingdom, Tissaphernes brought slanderous accusations against Cyrus before his brother, the king, of harbouring designs against him. And Artaxerxes, listening to the words of Tissaphernes, laid hands upon Cyrus, desiring to put him to death; but his mother made intercession for him, and sent him back again in safety to his province. He then, having so escaped through peril and dishonour, fell to considering, not only how he might avoid ever again being in his brother's power, but how, if possible, he might become king in his stead. Parysatis, his mother, was his first resource; for she had more lo for Cyrus than for Artaxerxes upon his throne. Moreover Cyrus's behaviour towards all who came to him from the king's court was such that, when he sent them away again, they were better friends to himself than to the king his brother. Nor did he neglect the barbarians in his own service; but trained them, at once to be capable as warriors and devoted adherents of himself. Lastly, he began collecting his Hellenic armament, but with the utmost secrecy, so that he might take the king as far as might be at unawares. The manner in which he contrived the levying of the troops was as follows: First, he sent orders to the commandants of garrisons in the cities so held by him, bidding them to get together as large a body of picked Peloponnesian troops as they severally were able, on the plea that Tissaphernes was plotting against their cities; and truly these cities of Ionia had originally belonged to Tissaphernes, being given to him by the king; but at this time, with the exception of Miletus, they had all revolted to Cyrus. In Miletus, Tissaphernes, having become aware of similar designs, had forestalled the conspirators by putting some to death and banishing the remainder. Cyrus, on his side, welcomed these fugitives, and having collected an army, laid siege to Miletus by sea and land, endeavouring to reinstate the exiles; and this ga him another prete for collecting an armament. At the same time he sent to the king, and claimed, as being the king's brother, that these cities should be given to himself rather than that Tissaphernes should continue to govern them; and in furtherance of this end, the queen, his mother, co-operated with him, so that the king not only failed to see the design against himself, but concluded that Cyrus was spending his money on armaments in order to make war on Tissaphernes. Nor did it pain him greatly to see the two at war together, and the less so because Cyrus was careful to remit the tribute due to the king from the cities which belonged to Tissaphernes. A third army was being collected for him in the Chersonese, over against Abydos, the origin of which was as follows: There was a Lacedaemonian exile, named Clearchus, with whom Cyrus had become associated. Cyrus admired the man, and made him a present of ten thousand darics . Clearchus took the gold, and with the money raised an army, and using the Chersonese as his base of operations, set to work to fight the Thracians north of the Hellespont, in the interests of the Hellenes, and with such happy result that the Hellespontine cities, of their own accord, were eager to contribute funds for the support of his troops. In this way, again, an armament was being secretly maintained for Cyrus. A Persian gold coin = .grains of gold. Then there was the Thessalian Aristippus, Cyrus's friend , who, under pressure of the rival political party at home, had come to Cyrus and asked him for pay for two thousand mercenaries, to be continued for three months, which would enable him, he said, to gain the upper hand of his antagonists. Cyrus replied by presenting him with six months' pay for four thousand mercenaries only stipulating that Aristippus should not come to terms with his antagonists without final consultation with himself. In this way he secured to himself the secret maintenance of a fourth armament. Lit. "guest-friend." Aristippus was, as we learn from the "Meno" of Plato, a nat of Larisa, of the family of the Aleuadae, and a pupil of Gorgias. He was also a lover of Menon, whom he appears to ha sent on this expedition instead of himself. Further, he bade Proxenus, a Boeotian, who was another friend, get together as many men as possible, and join him in an expedition which he meditated against the Pisidians , who were causing annoyance to his territory. Similarly two other friends, Sophaenetus the Stymphalian , and Socrates the Achaean, had orders to get together as many men as possible and come to him, since he was on the point of opening a campaign, along with Milesian exiles, against Tissaphernes. These orders were duly carried out by the officers in question. Lit. "into the country of the Pisidians. Of Stymphalus in Arcadia.

----------- outputs:
Persons ------------------
TEXT:[Cyrus ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Darius ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[hea ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Parrhasia ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Lacedaemonian ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Clearchus ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Gorgias ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Menon ] TYPE:[PERSON ] BK:[ Anabasis] AUTH:[ Xenophon]
Places ------------------
TEXT:[Arcadia ] TYPE:[GPE ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Ionia ] TYPE:[GPE ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Abydos ] TYPE:[GPE ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Hellespont ] TYPE:[GPE ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Aleuadae ] TYPE:[LOC ] BK:[ Anabasis] AUTH:[ Xenophon]
TEXT:[Arcadia ] TYPE:[GPE ] BK:[ Anabasis] AUTH:[ Xenophon]

cleaner.zip

Answered by thomashacker

Mar 9, 2023

Hey, thanks for your post!
Can you provide more information about which spaCy model you are using? Is it a pre-trained model, or did you train from scratch?
You could try to use other pre-trained models and see if the results are improving.

Generally, depending on the data, ML models are mostly never 100% and do misclassification sometimes.
We have a little thread about this issue here , which goes into that topic with more detail.

View full answer

thomashacker · 2023-03-09T10:01:11Z

thomashacker
Mar 9, 2023

Hey, thanks for your post!
Can you provide more information about which spaCy model you are using? Is it a pre-trained model, or did you train from scratch?
You could try to use other pre-trained models and see if the results are improving.

Generally, depending on the data, ML models are mostly never 100% and do misclassification sometimes.
We have a little thread about this issue here , which goes into that topic with more detail.

0 replies

mister-elliott · 2023-03-09T17:54:25Z

mister-elliott
Mar 9, 2023
Author

I tried all sizes of en_core_web_xx. There was a bit of change with each, but not much to the good. The result posted is with this model:
nlp = spacy.load("en_core_web_lg")

hacked python code is in the zip I attached.

I mostly used snippets of code I found online in website posts purporting to teach about Spacy and Python. The snippets were eerily similar over several different websites, as if the authors each were borrowing extensively from other webites.

I am new to this. I don't know what training a model would entail, time-wise. I was hoping Spacy would give a good enough result out of the box, but alas, no. I have no idea how much of a time commitment it would be to train a language model using project gutenberg English texts (as in, both text from native English writers and text translated from Russian or Greek or French).
I did get reasonable results classifying proper nouns from tokens, but using ents to find and classify names of persons and places yielded poor results.

1 reply

thomashacker Mar 13, 2023

The main issue is that the pretrained models were trained on a different corpus, that's why they seem to have trouble with analyzing the ANABASIS BY XENOPHON text, which is a very specific domain.

About training, you can read here all about the process and decide whether it's worth your time.

I hope this was helpful!

mister-elliott · 2023-03-13T16:09:47Z

mister-elliott
Mar 13, 2023
Author

OK. To me, Anabasis, as translated to English, is in the domain of plain simple English text. Maybe Project Gutenberg should train an English language model for Spacy. I will use the mostly accurate Proper Nouns to continue my project. Thanks for the feedback.

…

On Mon, Mar 13, 2023 at 8:26 AM Edward ***@***.***> wrote: The main issue is that the pretrained models were trained on a different corpus, that's why they seem to have trouble with analyzing the ANABASIS BY XENOPHON text, which is a very specific domain. About training, you can read here all about the process <https://spacy.io/usage/training> and decide whether it's worth your time. I hope this was helpful! — Reply to this email directly, view it on GitHub <#12381 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJSF3KYM7H34C5VILVD5CMDW34G65ANCNFSM6AAAAAAVS257HY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

puzzled by results processing a project guternberg text #12381

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

puzzled by results processing a project guternberg text #12381

Uh oh!

mister-elliott Mar 7, 2023

Replies: 3 comments · 1 reply

Uh oh!

thomashacker Mar 9, 2023

Uh oh!

mister-elliott Mar 9, 2023 Author

Uh oh!

thomashacker Mar 13, 2023

Uh oh!

Uh oh!

mister-elliott Mar 13, 2023 Author

mister-elliott
Mar 7, 2023

Replies: 3 comments 1 reply

thomashacker
Mar 9, 2023

mister-elliott
Mar 9, 2023
Author

mister-elliott
Mar 13, 2023
Author