Ph.D. title
Extraction of information about events from domain documents
Status: in progress
Abstract:
Text is a natural way for humans to describe the surrounding word, especially to describe relations between entities and relation changes in the time (a fact of changing relation between entities is called an event). For instance, in a sentence “Marek Nowak, who is a chairman of Nowak PLC, was chosen as a businessman of the year. “ there are two entities (PERSON:Marek Nowak and COMPANY:Nowak PLC) and a relation between them chairman(PERSON, COMPANY). Information extraction is the automatic identification of entities, relations between entities and events in text. The goal of information extraction is to identify relevant information (piece of texts) in a set of domain documents (for instance, reports from stock exchange domain) describing selected types of events (for instance, changes in the management board) and present it in a structured format allowing further processing on the level of named entities and relations. To identify relevant information information extraction patterns are used. They are the expressions in a formal language that identify pieces of text, distinguish chunks and assign them an interpretation in the context of information extraction task. During the presentation the selected problems with the acquisition of patterns for information extraction for Polish language will be presented. Also the reason why this task is more complex for Polish than for English will be explained.
Key words:
information extraction, patterns, automatic pattern acquisition, natural language processing, Polish


