To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Legal information retrieval

From Wikipedia, the free encyclopedia

Legal information retrieval is the science of information retrieval applied to legal text, including legislation, case law, and scholarly works.[1] Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means.[2] Legal information retrieval is a part of the growing field of legal informatics.

In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used boolean search methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent,[3] meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents.[3] This may result in failing to retrieve important or precedential cases. In some jurisdictions this may be especially problematic, as legal professionals are ethically obligated to be reasonably informed as to relevant legal documents.[4]

Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to jargon,[5] polysemes[6] (words that have different meanings when used in a legal context), and constant change.

Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and natural language processing of legal text.

YouTube Encyclopedic

  • 1/3
    Views:
    1 074 120
    619
    715
  • Remembering and Forgetting - Crash Course Psychology #14
  • Auto Redaction and Bates Stamping in PSICapture for Legal Documents
  • Information Privacy Law 14 - Information Brokers

Transcription

It was midnight when Bernice got off work. She was exhausted after a long and terrible day, and just wanted to get home to a hot bath. She was driving down the street, flipping through radio stations, when she pulled up to a stop sign, and saw something weird. A shadowy figure ran up to an idling fruit truck, pushed the delivery man down, grabbed a crate of bananas, and ran off around the corner. Bernice was pretty shaken up, but she made sure the driver was okay, and then called the police, describing the thief as a pale, lanky man, wearing a dark jacket and a baseball cap. She gave the cops her information, and then she went home. A couple days later the police asked her to come down to the station to identify a potential thief--a guy who more or less matched her description, and was found eating a banana early that morning, near the scene of the crime. Although the guy professed innocence, Bernice said it was him, and they locked him up. But at the trial, the defense called a memory expert to the stand, and soon after that, the suspect walked. Today’s lesson may not quite make you an expert worthy of the witness stand, but by the time we’re done, you’ll understand a lot more about how we retrieve memories we think we’ve stored, and why the accused banana thief was set free. [INTRO] We’re all constantly retrieving memories throughout the day-- you’re remembering where you parked your car, or if you fed the cat, or called your mom ‘cause it’s her birthday. You’ll remember from last week that while our implicit memories--like how to talk and ride a bike--are dealt with on a mostly automatic and non-conscious level, our explicit memories--the chronicles of our personal experiences and general knowledge -- often require conscious, effortful work. Bernice had to notice, encode, store, and later consciously retrieve details about the crime she witnessed--what color was the guy’s jacket, what did he look like, what did he steal, and where did he run? It takes a lot of work to retrieve memories from long-term storage, and the truth is, a lot can go wrong along the way. In order to understand all of the many fascinating ways you forget things, we need to talk more about how we remember. Our memories are not like books in the library of your mind. You don’t just pluck a neatly-packaged memory -- about where you left your phone or the hair color of a fruit thief. Instead your memories are more like the spider webs in the dank catacombs of your mind--a series of interconnected associations that link all sorts of diverse things, as bits of information get stuck to other bits of information. Like, maybe Bernice remembers that the night of the crime was chilly with a full moon, and that Beyonce was on the radio, and the fruit truck had plates from California, which is where her grandfather lives. All those bits of information in the web of memory--the weather, the song, the plates--can serve as retrieval cues, kind of like a trail of breadcrumbs leading back to a particular memory. The more retrieval cues you inadvertently, or intentionally, build along the way, the better you can backtrack and find the memory you’re looking for. This way of activating associations non-consciously is called priming, sometimes called “memoryless memory”. It’s how “invisible memories” that you didn’t know you had can awaken old associations. Priming is how you often jog your memory. This kind of recall is sometimes referred to as context-dependent memory. Say you’re reading in bed, and you want to underline a quote, but you don’t have a pen. You get up and go into the other room to find your special light-up Hello Kitty pen, but you get distracted and suddenly you find yourself in the kitchen; you’re like “Why? Why, mind? Why am I in the kitchen? What is here? Why am- there was a rea- and I don’t know but I’m here now and agh!” It’s only when you retrace your steps and return to bed, to the initial context where you read that quote and encoded that first thought of wanting that pen, that the memory comes back. And then you’re like ‘oh, I need to go get the pen. Ugh’ If some memories are context-dependent, others are state-dependent, and also mood-congruent. This just means that our states and our emotions can also serve as retrieval cues. If I had a throbbing headache and a super bad day, I’m more likely to start recalling bad memories, because I’m priming negative associations. But of course if I’m relaxed and jolly, I’m prone to remember happy times, which are prolonging my good mood. Another funny memory-retrieval quirk speaks not to our location or emotions, but to the order in which we receive new information. So, say you make a grocery list in the morning, but a few hours later, you’re at the store, you realize you left it at home. You’d be more likely to recall the first items on the list--bananas and bread--and the last items--pickles and cheese--than anything in the middle. This is known as the serial position effect. This might be because the early words benefitted from what’s known as the primacy effect, and made it into your long-term memory because they were rehearsed more. Meanwhile, the last words lingered in the working memory through the recency effect. But those poor middle words, they didn’t benefit from either effect and therefore escaped your brain, which is why you now have no toilet paper, dog food, toothpaste, or cookies. Who forgets cookies? But even with all these tricks and associations, things still go wrong--memory can fail or become distorted, and of course we forget things. Forgetfulness can be as minor as those frustrating moments where you’re like ‘Ah, it’s on the tip of my tongue. It’s the guy, the guy’s got hair, and a face, and, like, shoulders.’ Or as major as Clive Wearing, whose neurological damage made it impossible for him to recall the past or create new memories. Of course, we all forget things, and typically we do it in one of three different ways: We fail to encode it, we fail to retrieve it, or we experience what psychologists call storage decay. Sometimes forgetting something just means it never really got through your encoding process in the first place. I mean, think of all the stuff that’s going around you at any given moment. We only actually notice a fraction of what we sense, and we can only consciously hold so many bits of information in our minds at any given time, so what we fail to notice, we tend to not encode, and thus don't remember. Bernice noticed a dark jacket, Beyonce, and bananas, but she didn’t encode much about the driver, or the color of the thief’s shoes. Then again, even memories that have been encoded are still vulnerable to storage decay, or natural forgetting over time. Interestingly, even though we can forget things pretty quickly, the amount of data that we forget can actually levels off after a while. This means that Bernice would have forgotten about half of what she first noticed from the crime scene a couple days later, but what she still remembered, she’d likely hang on to, because the rate at which we forget tends to plateau. A lot of times forgetting doesn’t mean our memory just faded to black, it means we can’t call it up on demand because of retrieval failure. We all know the common tip-of-the-tongue phenomenon where you feel like you know the name of that weird-looking hard-backed animal that rolls up into ball. It’s kind of cute and weird and I think they get leprosy or something…what is it?! This is where retrieval cues can come in handy. If I say is starts with the letter A, you may suddenly unlock the information--Armadillo! Sometimes these retrieval problems stem from interference from other memories getting in the way, essentially cluttering the brain. Sometimes, old stuff that you’ve learned keeps you from recalling new stuff -- like, if you change one of your passwords, but keep recalling your old one every time you try to log in. That’s called proactive, or forward-acting, interference. The flip side is retroactive, or backward-acting, interference, which happens when new learning gets in the way of recalling old information, like if you start studying Spanish, it may interfere with the French that you’ve already learned. There’s a lot of reconstruction and inferring involved when you try to flesh out a memory, and every time you replay it in your mind, or relate it to a friend, it changes, just a little. So in a way, we’re all sort of perpetually re-writing our pasts. While this is an inevitable part of human nature, it can prove dangerous at times. Misleading information can get incorporated into a memory, and twist the truth - and yes there is an effect for this; it’s called the misinformation effect. American psychologist and memory expert Elizabeth Loftus has spent decades showing how eyewitnesses inadvertently tweak and reconstruct their memories after accidents or crimes. In one experiment, two groups watched a film of a car accident. Those asked how fast the cars were going when they smashed into each other estimated much higher speeds than those who were asked about the cars hitting each other. Smash is the leading word that essentially altered the witnesses’ memories -- so much so that a week later, when both groups were asked if they saw any broken glass, those who heard the word smash were twice as likely to report seeing bits of glass, when in fact, the original film didn’t show any. In Bernice’s case, chances are her memory of the robbery would be altered if the prosecution said the thief assaulted, rather than pushed the driver. This sort of interfering or misleading information may also manifest itself as source misattribution, like when we forget or misrecall the source of a memory. In the case of Bernice, when she saw the suspect in the courtroom, she thought she recognized him from the night of the crime, when in reality, he’d just served her coffee earlier that day. But her memory of the event had probably already been tweaked several times before she even made it into the courtroom. Like she re-lived the tale multiple times, in her own mind or when she told other people about it, and every time she introduced errors, filling in memory gaps with reasonable guesses. Not only that, but we know Bernice was already tired and stressed when she witnessed the event, and we know our emotions can influence both what we remember and what we forget. Because memory is both a reconstruction and a reproduction of past events, we can’t ever really be sure if a memory is real just because it feels real. Elizabeth Loftus knows this. She’s frequently called in to testify against the accuracy of eyewitnesses. In fact, of all the U.S. prisoners who have been exonerated based on DNA evidence presented by Innocence Project, a non-profit legal group, 75 percent of them were convicted by mistaken eyewitnesses. That is a lot of innocent people. Bernice meant well of course, she’s an honest enough lady, but all these factors--the emotion, the retelling, the suggestions of outside sources-- combined with the darkness, the quick glimpse, the passing of time, maybe even the Beyonce, ended up leading to a mistake in the thief’s identification. Turns out the human memory is actually a very fragile thing. We’re all largely the product of the stories that we tell ourselves. If you haven’t forgotten already, today you learned about how our memories are stored in webs of association, aided by retrieval cues and priming, and influenced by context and mood. You also learned how we forget information, how our memories are susceptible to interference and misinformation, and why eyewitnesses are often not as reliable as you might think. Thanks for watching, especially to all of our Subbable subscribers, who make this whole channel possible. To learn how you can keep these lessons coming while earning awesome perks, just go to subbable.com. This episode was written by Kathleen Yale, edited by Blake de Pastino, and our consultant is Dr. Ranjit Bhagwat. Our director and editor is Nicholas Jenkins, the script supervisor is Michael Aranda, who’s also our sound designer, and the graphics team is Thought Café.

Problems

Application of standard information retrieval techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent taxonomy.[7] Instead, the law is generally filled with open-ended terms, which may change over time.[7] This can be especially true in common law countries, where each decided case can subtly change the meaning of a certain word or phrase.[8]

Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings:[9]

  1. Any worker as defined in Article 3(a) of Directive 89/391/EEC who habitually uses display screen equipment as a significant part of his normal work.
  2. Any person employed by an employer, including trainees and apprentices but excluding domestic servants;
  3. Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside;
  4. Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice;

It also has the common meaning:

  1. A person who works at a specific occupation.[9]

Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results.

Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case.[10] Case decisions from senior or superior courts may be more relevant than those from lower courts, even where the lower court's decision contains more discussion of the relevant facts.[10] The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case).[10] An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority.

Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not.[10] He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions.[10]

Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day[2]), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data.[2][11]

Techniques

Boolean searches

Boolean searches, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above.

The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%.[3] Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals.[12]

Manual classification

In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an ontology to classify the texts, based on the way a legal professional might think about them.[13] These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as Westlaw's “Natural Language”[14] or LexisNexis' Headnote[15] searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers[14] or Lexis' Headnotes.[15] Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted).[13]

These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text.[16] In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals.[17] The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction.[18]

The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts.[16][19] As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable.[20]

Natural language processing

In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries.[2][21][22] Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ Natural Language Processing (NLP) techniques that are adapted to the legal domain, and also require the creation of a legal ontology. Though multiple systems have been postulated,[2][21][22] few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an f-measure (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0).[23] This is probably much lower than an acceptable rate for general usage.[23][24]

Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems.[25][26]

Citation-Based ranking

In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the Page Rank algorithm at Stanford which was also a citation-based ranking. Ranking of results was based as much on jurisdiction as on number of references.[27]

Notes

  1. ^ Maxwell, K.T., and Schafer, B. 2009, p. 1
  2. ^ a b c d e Jackson et al., p. 60
  3. ^ a b c Blair, D.C., and Maron, M.E., 1985, p.293
  4. ^ American Bar Association, Model Rules of Professional Conduct Rule 1.1, http://www.abanet.org/cpr/mrpc/rule_1_1.html
  5. ^ Peters, W. et al. 2007, p. 118
  6. ^ Peters, W. et al. 2007, p. 130
  7. ^ a b Peters, W. et al. 2007, p. 120
  8. ^ Saravanan, M. et al. 2009, p. 101
  9. ^ a b Peters, W. et al. 2007, p. 131
  10. ^ a b c d e Maxwell, K.T., and Schafer, B. 2008, p. 8
  11. ^ Maxwell, K.T., and Schafer, B. 2007, p.1
  12. ^ Saravanan M., et al. 2009, p. 116
  13. ^ a b Maxwell, K.T., and Schafer, B. 2008, p. 2
  14. ^ a b Westlaw Research, http://www.westlaw.com
  15. ^ a b Lexis Research, http://www.lexisnexis.com
  16. ^ a b Maxwell, K.T., and Schafer, B. 2008, p. 3
  17. ^ Saravanan, M. et al. 2009, p. 116
  18. ^ Saravanan, M. et al. 2009, p. 103
  19. ^ Schweighofer, E. and Liebwald, D. 2008, p. 108
  20. ^ Maxwell, K.T., and Schafer, B. 2008, p. 4
  21. ^ a b Ashley, K.D. and Bruninghaus, S. 2009, p. 125
  22. ^ a b Gelbart, D. and Smith, J.C. 1993, p. 142
  23. ^ a b Ashley, K.D. and Bruninghaus, S. 2009, p. 159
  24. ^ Maxwell, K.T., and Schafer, B. 2009, p. 3
  25. ^ Maxwell, K.T., and Schafer, B. 2009, p. 9
  26. ^ Ashley, K.D. and Bruninghaus, S. 2009, p. 126
  27. ^ Loui, R. P., Norman, J., Altepeter, J., Pinkard, D., Craven, D., Linsday, J., & Foltz, M. (1997, June). Progress on Room 5: A testbed for public interactive semi-formal legal argumentation. In Proceedings of the 6th international conference on Artificial intelligence and law (pp. 207-214). ACM.

References

  • Maxwell, K.T.; Schafer, B. (2008). "Concept and Context in Legal Information Retrieval". Frontiers in Artificial Intelligence and Applications. 189: 63–72. Retrieved 2009-11-07.
  • Jackson, P.; et al. (1998). "Information extraction from case law and retrieval of prior cases by partial parsing and query generation". Proceedings of the seventh international conference on Information and knowledge management. Cikm '98. ACM. pp. 60–67. doi:10.1145/288627.288642. ISBN 978-1581130614. S2CID 1268465. Retrieved 2009-11-07.
  • Blair, D.C.; Maron, M.E. (1985). "An evaluation of retrieval effectiveness for a full-text document-retrieval". Communications of the ACM. 28 (3): 289–299. doi:10.1145/3166.3197. hdl:2027.42/35415. S2CID 5144091.
  • Peters, W.; et al. (2007). "The structuring of legal knowledge in LOIS". Artificial Intelligence and Law. 15 (2): 117–135. CiteSeerX 10.1.1.104.7469. doi:10.1007/s10506-007-9034-4. S2CID 2355864.
  • Saravanan, M.; et al. (2007). "Improving legal information retrieval using an ontological framework". Artificial Intelligence and Law. 17 (2): 101–124. doi:10.1007/s10506-009-9075-y. S2CID 8853001.
  • Schweighofer, E.; Liebwald, D. (2007). "Advanced lexical ontologies and hybrid knowledge based systems: First steps to a dynamic legal electronic commentary". Artificial Intelligence and Law. 15 (2): 103–115. doi:10.1007/s10506-007-9029-1. S2CID 80124.
  • Gelbart, D.; Smith, J.C. (1993). "Flexicon". Proceedings of the fourth international conference on Artificial intelligence and law - ICAIL '93. ACM. pp. 142–151. doi:10.1145/158976.158994. ISBN 978-0897916066. S2CID 18952317.
  • Ashley, K.D.; Bruninghaus, S. (2009). "Automatically classifying case texts and predicting outcomes". Artificial Intelligence and Law. 17 (2): 125–165. doi:10.1007/s10506-009-9077-9. S2CID 31791294.

See also

This page was last edited on 8 August 2023, at 06:49
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.