Skip to content

RegexSpanFinder

Sean Finan edited this page Feb 9, 2026 · 1 revision

final public class RegexSpanFinder implements Closeable

Class that can / should be used to find text spans using regular expressions. It runs Matcher find {@link Matcher#find()} in a separate thread so that it may be interrupted at a set timeout. This prevents infinite loop problems that can be caused by poorly-built expressions or unexpected text contents. The timeout can be specified in milliseconds between 100 and 10,000. Large timeouts are unadvised. If a large amount of text needs to be parsed then it is better to split up the text logically and use smaller timeouts. The default timeout is 1000 milliseconds.

Proper usage is:

try ( RegexSpanFinder finder = new RegexSpanFinder( "\\s+" ) ) {
   final List<Pair<Integer>> spans = finder.findSpans( "Hello World !" ); 
   ... <do something with discovered spans> ... 
} catch ( IllegalArgumentException iaE ) { 
   ... <do something with Exception> ...
}
  • Author: SPF , chip-nlp
  • Version: %I%
  • Since: 11/5/2016

public RegexSpanFinder( final String regex ) throws IllegalArgumentException

Uses the default timeout of 1000 milliseconds

  • Parameters:
    • regex regular expression
  • Exceptions:
    • IllegalArgumentException if the regular expression is null or malformed

public RegexSpanFinder( final String regex, final int flags, final int timeoutMillis ) throws IllegalArgumentException

Uses the default timeout of 1000 milliseconds

  • Parameters:
    • regex regular expression
    • flags pattern flags; CASE_INSENSITIVE, etc.
    • timeoutMillis milliseconds at which the regex match should abort, between 100 and 10000
  • Exceptions:
    • IllegalArgumentException if the regular expression is null or malformed

public RegexSpanFinder( final String regex, final int timeoutMillis ) throws IllegalArgumentException

  • Parameters:
    • regex regular expression
    • timeoutMillis milliseconds at which the regex match should abort, between 100 and 10000
  • Exceptions:
    • IllegalArgumentException if the regular expression is null or malformed

public RegexSpanFinder( final Pattern pattern ) throws IllegalArgumentException

Uses the default timeout of 1000 milliseconds

  • Parameters:
    • pattern Pattern compiled from a regular expression
  • Exceptions:
    • IllegalArgumentException if the pattern is null or malformed

public RegexSpanFinder( final Pattern pattern, final int timeoutMillis ) throws IllegalArgumentException

Uses the default timeout of 1000 milliseconds

  • Parameters:
    • pattern Pattern compiled from a regular expression
    • timeoutMillis milliseconds at which the regex match should abort, between 100 and 10000
  • Exceptions:
    • IllegalArgumentException if the pattern is null or malformed

public List<Pair<Integer>> findSpans( final String text )

  • Parameters:
    • text text in which a find should be conducted
  • Returns: List of Integer Pairs representing text span begin and end offsets

@Override public void close()

shut down the executor {@inheritDoc}

static private final class RegexCallable implements Callable<List<Pair<Integer>>>

Simple Callable that runs a {@link Matcher} on text to find text span begin and end offsets

@Override public List<Pair<Integer>> call()

{@inheritDoc}

  • Returns: text span begin and end offsets

Apache cTAKES

ctakes-core API

Python API

ctakes-core

patient
    AbstractPatientConsumer
    AbstractPatientFileWriter
    PatientNoteCollector
    PatientNoteStore
    PatientViewUtil
pipeline
    PipeBitLocator
    PipelineBuilder
    PiperFileReader
    PiperFileRunner
resource
    FileLocator
util
    CalendarUtil
    MutableUimaContext
    NumberedSuffixComparator
    Pair
    RelationArgumentUtil
    StringUtil
     annotation
         ConceptBuilder
         EssentialAnnotationUtil
         IdentifiedAnnotationBuilder
         IdentifiedAnnotationUtil
         OntologyConceptUtil
         SemanticGroup
         SemanticTui
     doc
         DocIdUtil
         JCasBuilder
         TextBySectionBuilder
         TextBySentenceBuilder
     log
         DotLogger
         FinishedLogger
     regex
         RegexSpanFinder
         TimeoutMatcher
     textspan
         DefaultAspanComparator
         DefaultTextSpanComparator
         DefaultTextSpan
         TextSpan



General

Clone this wiki locally