When dealing with large-scale document collections, traditional Search Engines show their limit in supporting a wide range of complex queries.
For example, while an organization may be interested in knowing all the units performing a specific activity and all the other units that benefit from such activity, a search engine can only retrieve all documents mentioning the target office and the specific (maybe ambiguous) actions.
For big organizations it is crucial to adopt techniques for extracting knowledge automatically from texts, since most of the valuable information is only implicit (or hidden) within them.
A time-consuming activity to read such retrieved text is still required. On the other hand, this kind of queries can be easily accomplished by a database, but the information required to populate such a transactional system is only reported in unstructured and heterogeneous documents.
In these scenarios, it is crucial to adopt techniques for extracting knowledge automatically from texts, since most of the valuable information is only implicit (or hidden) within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Entities like persons, activities or organizations, form the most basic unit of information. Moreover, occurrences of entities in a sentence are often linked through well-defined relations; e.g., occurrences of a unit and activities in a sentence may be linked through relations such as unit-performs-activity.
The automatic extraction of domain-specific knowledge supports the creation of semantic metadata related to concepts relevant to the Organization’s domain (e.g. events, locations and persons) and activities.
- The automatic tracking of activities
- Search about them in past archives
- Visualize the aggregated information in meaningful forms
- Navigate across such information ecosystems
- Target intelligent aggregation (knowledge) and analysis (decisions).