-
Notifications
You must be signed in to change notification settings - Fork 0
Text Extraction
This page explains the text extraction pipeline, extraction types, and the outputs.
For how to run the text extraction pipeline, please refer to the README.
Text extraction pipeline consists of three parts: 1) processing an input will text, 2) extracting information from each will sentence in the input will text, and 3) assembling the extractions from each sentence to create a single extraction for the entire will text.
- Processing an input will text
An input will text should be given in .txt format. When an input will text is given, it is read and tokenized into sentences. The sentences are stored in a python list. This process is necessary as our extraction model runs on sentence level. It is done automatically if you run our pipeline by running main.py.
- Extracting information from will sentences
We use the Large Language Model (LLM) in the in-context learning setting to extraction information from will sentences. The LLM we currently use is GPT-4. Each sentence contained in a python list (created above) is given as an input to the text extraction model. We have a choice between the two text extraction models: classification model and full examples model.
-
Classification model: This model classifies a sentence and creates prompt based on the classification result. The processing time is longer, but the expense from using OpenAI API is lower as there are less tokens in the prompt.
-
Full example model: This model creates prompt using the full examples (one example per each event type). The processing time is shorter, but the expense from using OpenAI API is higher as there are more tokens in the prompt.
The default is set as classification model.
- Assembling the extractions to create an extraction for the entire will text
The extractions from the sentences from a single will text is assembled into a single extraction output. The assembly involves several processes.
- File name, testator name, execution date, and full text are added.
- All the entities are given provenances (i.e., sentence id and global character offsets)
- Any entities referring to the same real world entity are grouped together (through the coreference resolution using
spacy) - Entity ids and event ids are updated
The final result of the assembly is a single json file. The output file's format can be found from the Output section below.
We currently extract 25 types of entities and 20 types of events*. Below are the lists of the extraction types.
- Entities: Testator, Beneficiary, Witness, State, County, Asset, Bond, Executor, Date, Time, Trustee, Will, Codicil, Debt, Expense, Tax, Duty, Right, Condition, Guardian, Trust, Conservator, Affidavit, NotaryPublic, NonBeneficiary
- Events: WillCreation, SignWill, Attestation, Revocation, Codicil, Bequest, Nomination, Disqualification, Renunciation, Death, Probate, Direction, Authorization, Excuse, Give, Notarization, NonProbateInstrumentCreation, Birth, Residual, Removal
*Note: During the manual annotation process, we also had relations, which were used to mark the relations between the entities. These relations are currently not extracted. If these are needed, we can add them later.
Click to see the full taxonomy of entities
- Testator: a person who makes a will
ex) I, [Person-1], domiciled in Memphis, Tennessee, do make, publish and declare this to be my Last Will and Testament, hereby revoking all wills and codicils heretofore made by me.
- Beneficiary*: a person or entity (e.g., organization) that receives something from a will
ex) I will, devise and bequeath my house and lot to [Person-16] and [Person-17].
*Note: It is correct that we distinguished between Beneficiary and NamedBeneficiary during the manual annotation process. However, the distinction was removed during the evaluation and the text extraction using LLM (at least for now). The reason for the removal is that the distinction between Beneficiary and NamedBeneficiary is so subtle that it is not certain if the model would learn to distinguish them in the in-context learning setting. We didn't want to add an extra layer of complication from the beginning, but if it turns out that this distinction is necessary, we can add it back.
- Witness: a person witnessing a will
ex) We, [Person-5] and [Person-6], the witnesses, sign our names to this instrument, consisting of four pages, and being first duly sworn do hereby declare to the undersigned authority that the testatrix signs and executes this instrument as her last Will and that she signs it willingly, and that each of us, in the presence and hearing of the testatrix, hereby signs this Will as witness to the testatrix’s signing, and that to the best of our knowledge the testatrix is eighteen (18) years of age or older, of sound mind, and under no constraint of undue influence.
- State: any US state names
ex) I, [Person-1], domiciled in Memphis, Tennessee, do make, publish and declare this to be my Last Will and Testament, hereby revoking all wills and codicils heretofore made by me.
- County: any US county names
ex) IN WITNESS WHEREOF, I have hereunto signed, published and declared this instrument as my Last Will and Testament, in Lauderdale County, Tennessee, on this 12th day of June, 1989
- Asset: any money, personal property, or real estate owned by a testator
ex) I hereby give, devise and bequeath all of my property, real, personal or mixed, to [Person-2], if living at the time of my death.
- Bond: any bonds (usually probate bonds, which is a type of bond ordered and required by a court before they will appoint a person or entity as the personal representative of an estate)
ex) I name, nominate and appoint my daughter, [Person-3], Executor of this my will and estate, and direct that she be allowed to serve without bond.
- Executor: a person who executes a will (=personal representative)
ex) I name, nominate and appoint my daughter, [Person-3], Executor of this my will and estate, and direct that she be allowed to serve without bond.
- Date: any dates
ex) IN WITNESS WHEREOF, I have hereunto signed, published and declared this instrument as my Last Will and Testament, in Lauderdale County, Tennessee, on this 12th day of June, 1989
- Time: any expression denoting a particular point in time
ex) I hereby give, devise and bequeath all of my property, real, personal or mixed, to [Person-2], if living at the time of my death.
- Trustee: a person who manages a trust
ex) My entire estate, after payment of debts, taxes and expenses, shall be distributed to the Trustee of the Living Trust Agreement of [Person-1], entered into as of this same date of this my Last Will and Testament, to be administered and distributed under the terms of the trust created in the said Living Trust Agreement of [Person-1].
- Will: a legal document containing a person’s wishes regarding the disposal of one’s asset after death
ex) I, [Person-1], domiciled in Memphis, Tennessee, do make, publish and declare this to be my Last Will and Testament, hereby revoking all wills and codicils heretofore made by me.
- Codicil: a testamentary or supplementary document that modifies or revokes a will or part of a will
ex) I, [Person-1], domiciled in Memphis, Tennessee, do make, publish and declare this to be my Last Will and Testament, hereby revoking all wills and codicils heretofore made by me.
- Debt: any debts
ex) I direct my Executor to pay all my just debts and all funeral expenses, which shall be probated, registered and allowed against my estate, as soon after my death as can conveniently be done.
- Expense: any expenses
ex) I direct my Executor to pay all my just debts and all funeral expenses, which shall be probated, registered and allowed against my estate, as soon after my death as can conveniently be done
- Tax: any taxes
ex) I authorize my personal representative to pay from my general estate any interest which may accrue on debts or taxes due from my estate.
- Duty: any duty directed by a testator to fiduciaries (e.g., executors, trustees, guardians, or conservators)
ex) I direct my Executor to pay all my just debts and all funeral expenses, which shall be probated, registered and allowed against my estate, as soon after my death as can conveniently be done.
- Right: any rights authorized by a testator to fiduciaries (e.g., executors, trustees, guardians, or conservators)
ex) My personal representative shall have the authority and discretion to buy or to sell or lease real property or any interest in real property which I may have and to use and apply the proceeds from a sale or lease to the payment of debts, taxes, and expenses of administration of my estate and may generally treat real property the same as personalty.
- Condition: a condition under which an event (e.g., will execution, bequest, etc.) occurs
ex) I hereby give, devise and bequeath all of my property, real, personal or mixed, to [Person-2], if living at the time of my death.
- Guardian: a person who has a legal right and responsibility of taking care of someone who cannot takes care of themselves (usually a minor or an legally incompetent person)
ex) If my personal representative determines that income or principal is payable to a minor or to a person under mental or physical disability, whether or not adjudicated, then my fiduciary shall have the discretion to either make payments directly to the beneficiary, to the legally appointed guardian or conservator, or to distribute and pay such amounts directly for the benefit of such beneficiary.
- Trust: a fiduciary arrangement that allows a trustee to hold assets on behalf of a beneficiary
ex) I give, devise and bequeath the sum of five thousand dollars ($5,000.00) to my greatgrandson [Person-11] to be held in trust for his future use and benefit until he reaches the age of twenty-five (25).
- Conservator: a person who handles the financial and personal affairs who cannot handles such affairs by themselves (usually a minor or an legally incompetent person)
ex) If my personal representative determines that income or principal is payable to a minor or to a person under mental or physical disability, whether or not adjudicated, then my fiduciary shall have the discretion to either make payments directly to the beneficiary, to the legally appointed guardian or conservator, or to distribute and pay such amounts directly for the benefit of such beneficiary.
- Affidavit: a legal statement sworn and signed by a testator and witnesses to confirm the validity of a will. It is usually attached to a will.
ex) IN WITNESS WHEREOF, I have executed this my Last Will at Ripley, Tennessee this 13th day of September, 2001 and request the attesting witnesses to make the Affidavit set out below.
- NotaryPublic: a person who is authorized by state government to witness the signing of important documents and administer oaths
ex) Sworn to and subscribed before me on this, the 7th day of April, 1990.
- NonBeneficiary: a person who is excluded from being beneficiary
ex) In the event that any other person or persons other than those herein named as my heirs should seek to inherit from me and establish a right to so inherit by a final Decree of the Court of competent jurisdiction, then, in such event, I give and bequeath unto such person or persons, nothing.
Click to see the full taxonomy of events
- WillCreation: an event in which a testator creates a will
ex) I, [Person-1], an adult resident citizen of [Address-1], Lauderdale County, Tennessee, being of sound and disposing mind, memory and understanding, do hereby make, declare and publish this instrument as my Last Will and Testament, expressly revoking any and all testamentary dispositions heretofore made by me.
- Testator: "I"
- Will: "this instrument"
- SignWill: an event in which a testator or a witness signs a will
ex) IN WITNESS WHEREOF, I have hereunto signed, published and declared this instrument as my Last Will and Testament, in Lauderdale County, Tennessee, on this 11 day of April, 1994.
- Testator: "I"
- Will: "this instrument"
- Date: "on this 11 day of April, 1994"
- Condition: "IN WITNESS WHEREOF"
- Attestation: an event in which a witness attests the validity of a will
ex) We, the undersigned subscribing witnesses, do hereby certify that we witnessed the foregoing Last Will and Testament of [Person-1], at her request, in her presence and in the presence of each other, and that she signed the same in our presence, and in the presence of each of us, declaring the same to be her Last Will and Testament. This 11 day of April, 1994.
- Witness: "We"
- Attested events:
– Attestation: "we witnessed the foregoing Last Will and Testament of [Person-1], at her request, in her presence and in the presence of each other"
– Sign will: "she signed the same in our presence, and in the presence of each of us"
- Date: "This 11 day of April, 1994"
- Revocation: an event in which a testator revokes a will or a codicil
ex) I, [Person-1], of Gates, Lauderdale County, Tennessee, being of sound and disposing mind and memory, do hereby make, publish, and declare this instrument as my LAST WILL AND TESTAMENT, hereby revoking all wills and codicils to wills heretofore made by me.
- Testator: "my"
- Will: "all wills"
- Codicil: "codicils to wills"
- Codicil: an event in which a codicil is made
ex) I, [Person-1], of Gates, Lauderdale County, Tennessee, being of sound and disposing mind and memory, do hereby make, publish, and
declare this instrument as my LAST WILL AND TESTAMENT, hereby revoking all wills and codicils to wills heretofore made by me.
- Testator: "me"
- Codicil: "codicils"
- Time: "heretofore"
- Bequest: an event in which a testator bequeath asset to a beneficiary
ex) I hereby give, devise and bequeath all of my property, real, personal or mixed, to [Person-2], if living at the time of my death.
- Testator: "I"
- Asset: "all of my property, real, personal or mixed"
- Beneficiary: "[Person-2]"
- Condition: "if living at the time of my death"
- Nomination: an event in which a testator nominates a fiduciary
ex) I name, nominate and appoint my daughter, [Person-3], Executor of this my will and estate, and direct that she be allowed to serve
without bond.
- Testator: "I"
- Executor: "my daughter"
- Disqualification: an event in which a beneficiary or a fiduciary is disqualified
ex) If, for any reason, [Person-3] is unwilling or unable to serve in this capacity, then I nominate and appoint his daughter, [Person-5], to serve in his place as Co-Executor without bond.
- Executor: "[Person-3]"
- Renunciation: an event in which a fiduciary renounces
ex) If, for any reason, [Person-3] is unwilling or unable to serve in this capacity, then I nominate and appoint his daughter, [Person-5], to serve in his place as Co-Executor without bond.
- Executor: "[Person-3]"
- Death: an event in which any entity (e.g., testator, beneficiary, executor, etc.) dies
ex) In the event that my said granddaughter does not survive me, then I hereby give, devise and bequeath said property to my great grandson, [Person-3]
- Beneficiary: "my said granddaughter"
- Probate: an event in which a will or any part of the will is probated
ex) I hereby direct my Executor to pay all of my just debts, funeral expenses, taxes and other expenses, which shall be probated, registered and allowed against my estate as soon after my death as can be conveniently done.
- Debt: "all of my just debts"
- Expense: "funeral expenses"
- Tax: "taxes"
- Expense: "other expenses"
- Condition: "against my estate"
- Time: "as soon after my death as can be conveniently done"
- Direction: an event in which a testator gives direction to a fiduciary
ex) I direct that my Executrix not be required to make an accounting to the Court.
- Testator: "I"
- Directed event:
– Excuse: "my Executrix not be required to make an accounting to the Court"
- Authorization: an event in which a testator authorizes a fiduciary to a right
ex) I further give to my Personal Representative all of the powers of a Personal Representative under the laws of the state of Idaho as now in effect and as may hereafter be amended.
- Testator: "I"
- Executor: "my Personal Representative"
- Right: "all of the powers of a Personal Representative"
- Condition: "under the laws of the state of Idaho as now in effect and as may hereafter be amended"
- Excuse: an event in which a testator excuses a fiduciary from a duty
ex) I name, nominate and appoint my daughter, [Person-3], Executor of this my will and estate, and direct that she be allowed to serve without bond.
- Testator: "I"
- Executor: "she"
- Bond: "bond"
- Give: an event in which a testator gives a compensation to a fiduciary
ex) I direct that my Executor shall receive a fee for his services of five (5) percent of my net estate after the above specific bequests are made.
- Executor: "my Executor"
- Asset: "five (5) percent of my net estate"
- Time: "after the above specific bequests are made"
- Notarization: an event in which an affidavit is notarized by a notary public
ex) SWORN TO before me this September 14, 2001. NOTARY PUBLIC My Commission expires: 6/19/06
- Notary public: "me"
- Date: "this September 14, 2001"
- NonProbateInstrumentCreation: an event in which a non probate instrument (e.g., trust) is created
ex) I give, devise and bequeath the sum of five thousand dollars ($5,000.00) to my greatgrandson [Person-11] to be held in trust for his future use and benefit until he reaches the age of twenty-five (25).
- Asset: "the sum of five thousand dollars ($5,000.00)
- Trust: "trust"
- Birth: an event in which a beneficiary is born
ex) I have two children: [Person-2], born [Date-1], and [Person-3], born [Date-2].
- Beneficiary: "[Person-2]"
- Date: "[Date-1]"
- Beneficiary: "[Person-3]"
- Date: "[Date-2]"
- Residual: an event in which asset becomes residuary estate
ex) If the gift of any item of property under this Article 1 fails or lapses, such property shall become a part of my residuary estate and shall be distributed as provided in Article 2.
- Asset: "such property"
- Condition: "If the gift of any item of property under this Article 1 fails or lapses"
- Removal: an event in which a beneficiary is removed from the will
ex) I further direct that if any beneficiary under this Will should contest the terms of this Will they shall receive nothing by the terms of this Will and that share which they would have received shall be divided among the remaining beneficiaries of my estate in the same manner as though they had predeceased me without issue.
- Beneficiary: "they"
- Condition: "if any beneficiary under this Will should contest the terms of this Will"
The output from the text extraction pipeline serves as an input for creating will models and for the backend.
The outputs are in .json format. The format of the output is illustrated below.
Click to see the output format
{
"file_name": file name (e.g., "simple_will_1.txt"),
"testator_name": Testator's name (e.g., "[Person-1]"),
"execution_date": execution date (e.g., "This 11 day of April, 1994"),
"full_text": full will text (e.g., "I, [Person-1], an adult resident citizen of [Address-1], Lauderdale County, Tennessee, ... This 11 day of April, 1994."),
"extractions": {
"entities": [
{
"id": entity id (e.g., "e1"),
"type": entity type (e.g., "County"),
"texts": {
extracted text (e.g., "Lauderdale County"): [
{
"sentence_id": sentence id (e.g., 0),
"character_offsets": [
character offsets (e.g., 57, 74)
]
},
... (sentence id and character offsets for the same text, if there are more instances)
],
... (extracted texts for the same entity)
},
},
... (more entities)
],
"events": [
{
"id": event id (e.g., "v1"),
"type": event type (e.g., "WillCreation"),
"sentence_id": sentence id (e.g., 0),
argument type (e.g., "Will"): [
argument's entity id (e.g., "e4")
],
argument type (e.g., "Testator"): [
argument's entity id (e.g., "e5")
],
argument type (e.g., "Condition"): [
argument's entity id (e.g., "e3")
]
},
... (more events)
]
}
}