Knowledge discovery
{{short description|Creation of knowledge from structured and unstructured sources}} '''Knowledge extraction''' is the creation of [[Knowledge representation and reasoning|knowledge]] from structured ([[relational databases]], [[XML]]) and unstructured ([[text (literary theory)|text]], documents, [[image]]s) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must [[Knowledge representation and reasoning|represent knowledge]] in a manner that facilitates inferencing. Although it is methodically similar to [[information extraction]] ([[Natural language processing|NLP]]) and [[extract, transform, load|ETL]] (data warehouse), the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a [[Database schema|relational schema]]. It requires either the reuse of existing [[Knowledge representation and reasoning|formal knowledge]] (reusing identifiers or [[ontologies]]) or the generation of a schema based on the source data.
The RDB2RDF W3C group is currently standardizing a language for extraction of [[Resource Description Framework|resource description frameworks]] (RDF) from [[relational databases]]. Another popular example for knowledge extraction is the transformation of Wikipedia into [[structured data]] and also the mapping to existing [[Knowledge representation and reasoning|knowledge]] (see [[DBpedia]] and [[Freebase (database)|Freebase]]).
==Overview== After the standardization of knowledge representation languages such as [[Resource Description Framework|RDF]] and [[Web Ontology Language|OWL]], much research has been conducted in the area, especially regarding transforming relational databases into RDF, [[identity resolution]], [[knowledge discovery]] and ontology learning. The general process uses traditional methods from [[information extraction]] and [[extract, transform, load|extract, transform, and load]] (ETL), which transform the data from the sources into structured formats. So understanding how the interact and learn from each other.
The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases):
{| class="wikitable"
|-
! Source
| Which data sources are covered: Text, Relational Databases, XML, CSV
|-
! Exposition
| How is the extracted knowledge made explicit (ontology file, semantic database)? How can you query it?
|-
! Synchronization
| Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Static or dynamic. Are changes to the result written back (bi-directional)
|-
! Reuse of vocabularies
| The tool is able to reuse existing vocabularies in the extraction. For example, the table column 'firstName' can be mapped to foaf:firstName. Some automatic approaches are not capable of mapping vocab.
|-
! Automatization
| The degree to which the extraction is assisted/automated. Manual, GUI, semi-automatic, automatic.
|-
! Requires a domain ontology
| A pre-existing ontology is needed to map to it. So either a mapping is created or a schema is learned from the source ([[ontology learning]]).
|}
==Examples==
===Entity linking===
[[DBpedia Spotlight]], [[Calais (Reuters Product)|OpenCalais]], [http://dandelion.eu/datatxt/ Dandelion dataTXT]{{Dead link|date=March 2026 |bot=InternetArchiveBot }}, the Zemanta API, [http://www.extractiv.com/demo.html Extractiv] and [http://poolparty.biz/products/poolparty-extractor/ PoolParty Extractor] analyze free text via [[named-entity recognition]] and then disambiguates candidates via [[Name resolution (semantics and text extraction)|name resolution]] and links the found entities to the [[DBpedia]] knowledge repository ([https://dandelion.eu/products/datatxt/nex/demo/?exec=true#results Dandelion dataTXT demo] {{Webarchive|url=https://web.archive.org/web/20131102190834/https://dandelion.eu/products/datatxt/nex/demo/?exec=true#results |date=2013-11-02 }} or [http://spotlight.dbpedia.org/rest/annotate?text=President%20Obama%20called%20Wednesday%20on%20Congress%20to%20extend%20a%20tax%20break%20for%20students%20included%20in%20last%20year%27s%20economic%20stimulus%20package,%20arguing%20that%20the%20policy%20provides%20more%20generous%20assistance.&confidence=0.2&support=20 DBpedia Spotlight web demo] or [http://poolparty.biz/demozone/?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Findex.php%3Ftitle%3DKnowledge_extraction%26printable%3Dyes&domain=ssw PoolParty Extractor Demo]).
:As President Obama is linked to a DBpedia [[LinkedData]] resource, further information can be retrieved automatically and a [[Semantic Reasoner]] can for example infer that the mentioned entity is of the type [http://xmlns.com/foaf/0.1/Person Person] (using [[FOAF (software)]]) and of type [http://dbpedia.org/class/yago/PresidentsOfTheUnitedStates Presidents of the United States] (using [[YAGO (Ontology)|YAGO]]). Counter examples: Methods that only recognize entities or link to Wikipedia articles and other targets that do not provide further retrieval of structured data and formal knowledge.
===Relational databases to RDF===
[[Triplify]], D2R Server, [https://capsenta.com/#section-ultrawrap Ultrawrap] {{Webarchive|url=https://web.archive.org/web/20161127152526/https://capsenta.com/#section-ultrawrap |date=2016-11-27 }}, and [[Virtuoso Universal Server|Virtuoso]] RDF Views are tools that transform relational databases to RDF. During this process they allow reusing existing vocabularies and [[Ontology (information science)|ontologies]] during the conversion process. When transforming a typical relational table named ''users'', one column (e.g.''name'') or an aggregation of columns (e.g.''first_name'' and ''last_name'') has to provide the URI of the created entity. Normally the primary key is used. Every other column can be extracted as a relation with this entity. Then properties with formally defined semantics are used (and reused) to interpret the information. For example, a column in a user table called ''marriedTo'' can be defined as symmetrical relation and a column ''homepage'' can be converted to a property from the [[FOAF (software)|FOAF Vocabulary]] called [http://xmlns.com/foaf/spec/#term_homepage foaf:homepage], thus qualifying it as an [[Web Ontology Language#Properties|inverse functional property]]. Then each entry of the ''user'' table can be made an instance of the class [http://xmlns.com/foaf/spec/#term_Person foaf:Person] (Ontology Population). Additionally [[domain knowledge]] (in form of an ontology) could be created from the ''status_id'', either by manually created rules (if ''status_id'' is 2, the entry belongs to class Teacher ) or by (semi)-automated methods ([[ontology learning]]). Here is an example transformation:
{| class="wikitable"
|-
! Name !! marriedTo !! homepage !! status_id
|-
| Peter || Mary || https://example.org/Peters_page{{Dead link|date=February 2020 |bot=InternetArchiveBot |fix-attempted=yes }} || 1
|-
| Claus || Eva || https://example.org/Claus_page{{Dead link|date=February 2020 |bot=InternetArchiveBot |fix-attempted=yes }} || 2
|}
:Peter :marriedTo :Mary .
:marriedTo a owl:SymmetricProperty .
:Peter foaf:homepage https://example.org/Peters_page .
:Peter a foaf:Person .
:Peter a :Student .
:Claus a :Teacher .
== Extraction from structured sources to RDF ==
=== 1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values ===
When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:
- Each column in the table is an attribute (i.e., predicate)
- Each column value is an attribute value (i.e., object)
- Each row key represents an entity ID (i.e., subject)
- Each row represents an entity instance
- Each row (entity instance) is represented in RDF by a collection of triples with a common subject (entity ID).
So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows:
create an RDFS class for each table
convert all primary keys and foreign keys into IRIs
assign a predicate IRI to each column
assign an rdf:type predicate for each row, linking it to an RDFS class IRI corresponding to the table
for each column that is neither part of a primary or foreign key, construct a triple containing the primary key IRI as the subject, the column IRI as the predicate and the column's value as the object.
Early mentioning of this basic or direct mapping can be found in [[Tim Berners-Lee]]'s comparison of the [[Entity-relationship model|ER model]] to the RDF model.
=== Complex mappings of relational databases to RDF ===
The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in [[object-relational impedance mismatch]]) and has to be [[Reverse Engineering|reverse engineered]]. From a conceptual view, approaches for extraction can come from two directions. The first direction tries to extract or learn an OWL schema from the given database schema. Early approaches used a fixed amount of manually created mapping rules to refine the 1:1 mapping. More elaborate methods are employing heuristics or learning algorithms to induce schematic information (methods overlap with [[ontology learning]]). While some approaches try to extract the information from the structure inherent in the SQL schema (analysing e.g. foreign keys), others analyse the content and the values in the tables to create conceptual hierarchies (e.g. a columns with few values are candidates for becoming categories). The second direction tries to map the schema and its contents to a pre-existing domain ontology (see also: [[ontology alignment]]). Often, however, a suitable domain ontology does not exist and has to be created first.
=== XML ===
As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph. [http://rhizomik.net/html/redefer/xml2rdf/ XML2RDF] is one example of an approach that uses RDF blank nodes and transforms XML elements and attributes to RDF properties. The topic however is more complex as in the case of relational databases. In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples. An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple. [[XSLT]] can be used a standard transformation language to manually convert XML to RDF.
=== Survey of methods / tools ===
{| class="wikitable sortable"
|-
! Name !! Data Source !! Data Exposition !! Data Synchronisation !! Mapping Language !! Vocabulary Reuse !! Mapping Automat. !! Req. Domain Ontology !! Uses GUI
|-
| [http://www.w3.org/TR/rdb-direct-mapping/ A Direct Mapping of Relational Data to RDF] || Relational Data || SPARQL/ETL || dynamic || {{N/A}} || false || automatic || false || false
|-
| [http://logd.tw.rpi.edu/technology/csv2rdf4lod CSV2RDF4LOD] || CSV || ETL || static || RDF || true || manual || false || false
|-
|[https://github.com/acoli-repo/conll-rdf CoNLL-RDF]
|TSV, CoNLL
|SPARQL/ RDF stream
|static
|none
|true
|automatic (domain-specific, for use cases in language technology, preserves relations between rows)
|false
|false
|-
| [http://www.mindswap.org/~mhgrove/ConvertToRDF/ Convert2RDF] || Delimited text file || ETL || static || RDF/DAML || true || manual || false || true
|-
| [http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ D2R Server] || RDB || SPARQL || bi-directional || D2R Map || true || manual || false || false
|-
| [https://web.archive.org/web/20090428013624/http://ccnt.zju.edu.cn/projects/dartgrid/ DartGrid] || RDB || own query language || dynamic || Visual Tool || true || manual || false || true
|-
| [http://protegewiki.stanford.edu/wiki/DataMaster DataMaster] || RDB || ETL || static || proprietary || true || manual || true || true
|-
| [https://web.archive.org/web/20120621115715/http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ Google Refine's RDF Extension] || CSV, XML || ETL || static || {{CNone|none}} || || semi-automatic || false || true
|-
| [https://web.archive.org/web/20170718122006/https://kwarc.info/projects/krextor/ Krextor] || XML || ETL || static || xslt || true || manual || true || false
|-
| [http://www.cs.toronto.edu/semanticweb/maponto/ MAPONTO] || RDB || ETL || static || proprietary || true || manual || true || false
|-
| [https://metamorphoses.sourceforge.net/ METAmorphoses] || RDB || ETL || static || proprietary xml based mapping language || true || manual || false || true
|-
| [https://web.archive.org/web/20110723042155/http://protege.cim3.net/cgi-bin/wiki.pl?MappingMaster MappingMaster] || CSV || ETL || static || MappingMaster || true || GUI || false || true
|-
| [https://web.archive.org/web/20160304121216/http://neon-toolkit.org/wiki/ODEMapster ODEMapster] || RDB || ETL || static || proprietary || true || manual || true || true
|-
| [https://web.archive.org/web/20110724231333/http://aksw.org/Projects/Stats2RDF OntoWiki CSV Importer Plug-in - DataCube & Tabular] || CSV || ETL || static || The RDF Data Cube Vocaublary || true || semi-automatic || false || true
|-
| [http://poolparty.biz/products/poolparty-extractor/ Poolparty Extraktor (PPX)] || XML, Text || LinkedData || dynamic || RDF (SKOS) || true || semi-automatic || true || false
|-
| [https://web.archive.org/web/20160816225339/http://tao-project.eu/researchanddevelopment/demosanddownloads/RDBToOnto.html RDBToOnto] || RDB || ETL || static || {{CNone|none}} || false || automatic, the user furthermore has the chance to fine-tune results || false || true
|-
| [http://ebiquity.umbc.edu/project/html/id/82/RDF123 RDF 123] || CSV || ETL || static || false || false || manual || false || true
|-
| [https://sourceforge.net/projects/rdote/ RDOTE] || RDB || ETL || static || SQL || true || manual || true || true
|-
| [https://sourceforge.net/projects/relational-owl/ Relational.OWL] || RDB || ETL || static || {{CNone|none}} || false || automatic || false || false
|-
| [http://ebiquity.umbc.edu/paper/html/id/480/T2LD-An-automatic-framework-for-extracting-interpreting-and-representing-tables-as-Linked-Data T2LD] || CSV || ETL || static || false || false || automatic || false || false
|-
| [https://web.archive.org/web/20110630083409/http://publishing-statistical-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html The RDF Data Cube Vocabulary] || Multidimensional statistical data in spreadsheets || || || Data Cube Vocabulary || true || manual || false ||
|-
| [https://web.archive.org/web/20110717074012/http://www.topquadrant.com/products/TB_Composer.html TopBraid Composer] || CSV || ETL || static || SKOS || false || semi-automatic || false || true
|-
| [http://triplify.org Triplify] || RDB || LinkedData || dynamic || SQL || true || manual || false || false
|-
| [https://capsenta.com/#section-ultrawrap Ultrawrap] {{Webarchive|url=https://web.archive.org/web/20161127152526/https://capsenta.com/#section-ultrawrap |date=2016-11-27 }} || RDB || SPARQL/ETL || dynamic || [[R2RML]] || true || semi-automatic || false || true
|-
| [http://virtuoso.openlinksw.com Virtuoso RDF Views] || RDB || SPARQL || dynamic || Meta Schema Language || true || semi-automatic || false || true
|-
| [http://virtuoso.openlinksw.com Virtuoso Sponger] || structured and semi-structured data sources || SPARQL || dynamic || Virtuoso PL & XSLT || true || semi-automatic || false || false
|-
| [https://web.archive.org/web/20130514044520/http://www.cn.ntua.gr/~nkons/essays_en.html#t VisAVis] || RDB || RDQL || dynamic || SQL || true || manual || true || true
|-
| [https://xlwrap.sourceforge.net/ XLWrap: Spreadsheet to RDF] || CSV || ETL || static || TriG Syntax || true || manual || false || false
|-
| [http://rhizomik.net/html/redefer/#XML2RDF XML to RDF] || XML || ETL || static || false || false || automatic || false || false
|}
==Extraction from natural language sources== The largest portion of information contained in business documents (about 80%) is encoded in natural language and therefore unstructured. Because [[unstructured data]] is rather a challenge for knowledge extraction, more sophisticated methods are required, which generally tend to supply worse results compared to structured data. The potential for a massive acquisition of extracted knowledge, however, should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data is given in an unstructured fashion as plain text. If the given text is additionally embedded in a markup document (e. g. HTML document), the mentioned systems normally remove the markup elements automatically.
=== Linguistic annotation / natural language processing (NLP) === {{Main|Natural language processing|Linguistic Annotation}} As a preprocessing step to knowledge extraction, it can be necessary to perform linguistic annotation by one or multiple [[Natural language processing|NLP]] tools. Individual modules in an NLP workflow normally build on tool-specific formats for input and output, but in the context of knowledge extraction, structured formats for representing linguistic annotations have been applied.
Typical NLP tasks relevant to knowledge extraction include:
- [[part-of-speech tagging|part-of-speech (POS) tagging]]
- lemmatization (LEMMA) or stemming (STEM)
- [[word sense disambiguation]] (WSD, related to semantic annotation below)
- named entity recognition (NER, also see IE below)
- syntactic parsing, often adopting syntactic dependencies (DEP)
- shallow syntactic parsing (CHUNK): if performance is an issue, chunking yields a fast extraction of nominal and other phrases
- [[anaphor resolution]] (see coreference resolution in IE below, but seen here as the task to create links between textual mentions rather than between the mention of an entity and an abstract representation of the entity)
- [[semantic role labelling]] (SRL, related to relation extraction; not to be confused with semantic annotation as described below)
- discourse parsing (relations between different sentences, rarely used in real-world applications)
In NLP, such data is typically represented in TSV formats (CSV formats with TAB as separators), often referred to as CoNLL formats. For knowledge extraction workflows, RDF views on such data have been created in accordance with the following community standards:
- NLP Interchange Format (NIF, for many frequent types of annotation){{Cite web|title=NLP Interchange Format (NIF) 2.0 - Overview and Documentation|url=https://persistence.uni-leipzig.org/nlp2rdf/|access-date=2020-06-05|website=persistence.uni-leipzig.org}}{{Cite book|last1=Hellmann|first1=Sebastian|last2=Lehmann|first2=Jens|last3=Auer|first3=Sören|last4=Brümmer|first4=Martin|date=2013|editor-last=Alani|editor-first=Harith|editor2-last=Kagal|editor2-first=Lalana|editor3-last=Fokoue|editor3-first=Achille|editor4-last=Groth|editor4-first=Paul|editor5-last=Biemann|editor5-first=Chris|editor6-last=Parreira|editor6-first=Josiane Xavier|editor7-last=Aroyo|editor7-first=Lora|editor8-last=Noy|editor8-first=Natasha|editor9-last=Welty|editor9-first=Chris|chapter=Integrating NLP Using Linked Data|title=The Semantic Web – ISWC 2013|series=Lecture Notes in Computer Science|volume=7908|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=98–113|doi=10.1007/978-3-642-41338-4_7|isbn=978-3-642-41338-4|doi-access=free}}
- [[Web Annotation]] (WA, often used for entity linking){{Cite journal |last1=Verspoor |first1=Karin |author-link=Karin Verspoor |last2=Livingston |first2=Kevin |date=July 2012 |title=Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web |url=https://www.aclweb.org/anthology/W12-3610 |journal=Proceedings of the Sixth Linguistic Annotation Workshop |location=Jeju, Republic of Korea |publisher=Association for Computational Linguistics |pages=75–84}}
- CoNLL-RDF (for annotations originally represented in TSV formats){{Citation|title=acoli-repo/conll-rdf|date=2020-05-27|url=https://github.com/acoli-repo/conll-rdf|publisher=ACoLi|access-date=2020-06-05}}{{Cite book|last1=Chiarcos|first1=Christian|last2=Fäth|first2=Christian|date=2017|editor-last=Gracia|editor-first=Jorge|editor2-last=Bond|editor2-first=Francis|editor3-last=McCrae|editor3-first=John P.|editor4-last=Buitelaar|editor4-first=Paul|editor5-last=Chiarcos|editor5-first=Christian|editor6-last=Hellmann|editor6-first=Sebastian|chapter=CoNLL-RDF: Linked Corpora Done in an NLP-Friendly Way|chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-59888-8_6|title=Language, Data, and Knowledge|series=Lecture Notes in Computer Science|volume=10318|language=en|location=Cham|publisher=Springer International Publishing|pages=74–88|doi=10.1007/978-3-319-59888-8_6|isbn=978-3-319-59888-8}}
Other, platform-specific formats include
- LAPPS Interchange Format (LIF, used in the LAPPS Grid){{Cite book|last1=Verhagen|first1=Marc|last2=Suderman|first2=Keith|last3=Wang|first3=Di|last4=Ide|first4=Nancy|last5=Shi|first5=Chunqi|last6=Wright|first6=Jonathan|last7=Pustejovsky|first7=James|date=2016|editor-last=Murakami|editor-first=Yohei|editor2-last=Lin|editor2-first=Donghui|chapter=The LAPPS Interchange Format|chapter-url=https://link.springer.com/chapter/10.1007/978-3-319-31468-6_3|title=Worldwide Language Service Infrastructure|series=Lecture Notes in Computer Science|volume=9442|language=en|location=Cham|publisher=Springer International Publishing|pages=33–47|doi=10.1007/978-3-319-31468-6_3|isbn=978-3-319-31468-6}}{{Cite web|title=The Language Application Grid {{!}} A web service platform for natural language processing development and research|url=http://www.lappsgrid.org/|access-date=2020-06-05|language=en-US}}
- NLP Annotation Format (NAF, used in the NewsReader workflow management system){{Citation|title=newsreader/NAF|date=2020-05-25|url=https://github.com/newsreader/NAF|publisher=NewsReader|access-date=2020-06-05}}{{Cite journal|last1=Vossen|first1=Piek|last2=Agerri|first2=Rodrigo|last3=Aldabe|first3=Itziar|last4=Cybulska|first4=Agata|last5=van Erp|first5=Marieke|last6=Fokkens|first6=Antske|last7=Laparra|first7=Egoitz|last8=Minard|first8=Anne-Lyse|last9=Palmero Aprosio|first9=Alessio|last10=Rigau|first10=German|last11=Rospocher|first11=Marco|date=2016-10-15|title=NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news|journal=Knowledge-Based Systems|language=en|volume=110|pages=60–85|doi=10.1016/j.knosys.2016.07.013|issn=0950-7051|doi-access=free|hdl=1871.1/cbc310a9-677c-42ce-a84b-c59ebb344cc3|hdl-access=free}}
===Traditional information extraction (IE)=== Traditional [[information extraction]] is a technology of natural language processing, which extracts information from typically natural language texts and structures these in a suitable manner. The kinds of information to be identified must be specified in a model before beginning the process, which is why the whole process of traditional Information Extraction is domain dependent. The IE is split in the following five subtasks.
- [[Named-entity recognition|Named entity recognition]] (NER)
- [[Coreference|Coreference resolution]] (CO)
- Template element construction (TE)
- Template relation construction (TR)
- Template scenario production (ST)
The task of [[named entity recognition]] is to recognize and to categorize all named entities contained in a text (assignment of a named entity to a predefined category). This works by application of grammar based methods or statistical models.
Coreference resolution identifies equivalent entities, which were recognized by NER, within a text. There are two relevant kinds of equivalence relationship. The first one relates to the relationship between two different represented entities (e.g. IBM Europe and IBM) and the second one to the relationship between an entity and their [[Anaphora (linguistics)|anaphoric references]] (e.g. it and IBM). Both kinds can be recognized by coreference resolution.
During template element construction the IE system identifies descriptive properties of entities, recognized by NER and CO. These properties correspond to ordinary qualities like red or big.
Template relation construction identifies relations, which exist between the template elements. These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities.
In the template scenario production events, which are described in the text, will be identified and structured with respect to the entities, recognized by NER and CO and relations, identified by TR.
=== Ontology-based information extraction (OBIE) === Ontology-based information extraction is a subfield of information extraction, with which at least one [[Ontology (information science)|ontology]] is used to guide the process of information extraction from natural language text. The OBIE system uses methods of traditional information extraction to identify [[concept]]s, instances and relations of the used ontologies in the text, which will be structured to an ontology after the process. Thus, the input ontologies constitute the model of information to be extracted.{{cite journal | last1 = Chicco | first1 = D | last2 = Masseroli | first2 = M | year = 2016 | title = Ontology-based prediction and prioritization of gene functional annotations | journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics | volume = 13 | issue = 2 | pages = 248–260 | doi=10.1109/TCBB.2015.2459694 | pmid = 27045825 | s2cid = 2795344 | url = https://doi.org/10.1109/TCBB.2015.2459694| url-access = subscription }}
===Ontology learning (OL)=== {{Main|Ontology learning}}
Ontology learning is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms from natural language text. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.
===Semantic annotation (SA)=== During semantic annotation, natural language text is augmented with metadata (often represented in [[RDFa]]), which should make the semantics of contained terms machine-understandable. At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and for example concepts from ontologies is established. Thus, knowledge is gained, which meaning of a term in the processed context was intended and therefore the meaning of the text is grounded in [[machine-readable data]] with the ability to draw inferences. Semantic annotation is typically split into the following two subtasks.
[[Terminology extraction]]
[[Entity linking]]
At the terminology extraction level, lexical terms from the text are extracted. For this purpose a tokenizer determines at first the word boundaries and solves abbreviations. Afterwards terms from the text, which correspond to a concept, are extracted with the help of a domain-specific lexicon to link these at entity linking.
In entity linking a link between the extracted lexical terms from the source text and the concepts from an ontology or knowledge base such as [[DBpedia]] is established. For this, candidate-concepts are detected appropriately to the several meanings of a term with the help of a lexicon. Finally, the context of the terms is analyzed to determine the most appropriate disambiguation and to assign the term to the correct concept.
Note that "semantic annotation" in the context of knowledge extraction is not to be confused with [[semantic parsing]] as understood in natural language processing (also referred to as "semantic annotation"): Semantic parsing aims a complete, machine-readable representation of natural language, whereas semantic annotation in the sense of knowledge extraction tackles only a very elementary aspect of that.
===Tools=== The following criteria can be used to categorize tools, which extract knowledge from natural language text.
{| class="wikitable" |- | Source || Which input formats can be processed by the tool (e.g. plain text, HTML or PDF)? |- | Access Paradigm || Can the tool query the data source or requires a whole dump for the extraction process? |- | Data Synchronization || Is the result of the extraction process synchronized with the source? |- | Uses Output Ontology || Does the tool link the result with an ontology? |- | Mapping Automation || How automated is the extraction process (manual, semi-automatic or automatic)? |- | Requires Ontology || Does the tool need an ontology for the extraction? |- | Uses GUI || Does the tool offer a graphical user interface? |- | Approach || Which approach (IE, OBIE, OL or SA) is used by the tool? |- | Extracted Entities || Which types of entities (e.g. named entities, concepts or relationships) can be extracted by the tool? |- | Applied Techniques || Which techniques are applied (e.g. NLP, statistical methods, clustering or [[machine learning]])? |- | Output Model || Which model is used to represent the result of the tool (e. g. RDF or OWL)? |- | Supported Domains || Which domains are supported (e.g. economy or biology)? |- | Supported Languages || Which languages can be processed (e.g. English or German)? |}
The following table characterizes some tools for Knowledge Extraction from natural language sources.
{| class="wikitable sortable" |- ! Name !! Source !! Access Paradigm !! Data Synchronization !! Uses Output Ontology !! Mapping Automation !! Requires Ontology !! Uses GUI !! Approach !! Extracted Entities !! Applied Techniques !! Output Model !! Supported Domains !! Supported Languages |- | [http://www.rocketsoftware.com] || plain text, HTML, XML, SGML || dump || no || yes || automatic || yes || yes || IE || named entities, relationships, events || linguistic rules || proprietary || domain-independent || English, Spanish, Arabic, Chinese, indonesian |- | [https://web.archive.org/web/20160513114853/http://www.alchemyapi.com/api AlchemyAPI] || plain text, HTML || || || || automatic || || yes || SA || || || || || multilingual |- | [http://gate.ac.uk/sale/tao/splitch6.html#chap:annie ANNIE] || plain text || dump || || || || yes || yes || IE || || finite state algorithms || || || multilingual |- | [http://www-ai.ijs.si/~ilpnet2/systems/asium.html ASIUM] || plain text || dump || || || semi-automatic || || yes || OL || concepts, concept hierarchy || NLP, clustering || || || |- | [https://web.archive.org/web/20120711232021/http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ Attensity Exhaustive Extraction] || || || || || automatic || || || IE || named entities, relationships, events || NLP || || || |- | [https://dandelion.eu/ Dandelion API] || plain text, HTML, URL || REST || no || no || automatic || no || yes || SA || named entities, concepts || statistical methods || JSON || domain-independent || multilingual |- | [https://web.archive.org/web/20120712015122/http://dbpedia.org/spotlight DBpedia Spotlight] || plain text, HTML || dump, SPARQL || yes || yes || automatic || no || yes || SA || annotation to each word, annotation to non-stopwords || NLP, statistical methods, machine learning || RDFa || domain-independent || English |- | [http://entityclassifier.eu EntityClassifier.eu] || plain text, HTML || dump || yes || yes || automatic || no || yes || IE, OL, SA || annotation to each word, annotation to non-stopwords || rule-based grammar || XML || domain-independent || English, German, Dutch |- | [http://wit.istc.cnr.it/stlab-tools/fred/ FRED] || plain text || dump, REST API || yes || yes || automatic || no || yes || IE, OL, SA, ontology design patterns, [[frame semantics (linguistics)|frame semantics]] || (multi-)word NIF or EarMark annotation, predicates, instances, compositional semantics, concept taxonomies, frames, semantic roles, periphrastic relations, events, modality, tense, entity linking, event linking, sentiment || NLP, machine learning, heuristic rules || RDF/OWL || domain-independent || English, other languages via translation |- | [http://idocument.opendfki.de iDocument] || HTML, PDF, DOC || SPARQL || || yes || || || yes || OBIE || instances, property values || NLP || || personal, business || |- | [http://www.netowl.com/ NetOwl Extractor] || plain text, HTML, XML, SGML, PDF, MS Office || dump || No || Yes || Automatic || yes || Yes || IE || named entities, relationships, events || NLP || XML, JSON, RDF-OWL, others || multiple domains || English, Arabic Chinese (Simplified and Traditional), French, Korean, Persian (Farsi and Dari), Russian, Spanish |- | [http://ontogen.ijs.si OntoGen] {{Webarchive|url=https://web.archive.org/web/20100330060600/http://ontogen.ijs.si/ |date=2010-03-30 }} || || || || || semi-automatic || || yes || OL || concepts, concept hierarchy, non-taxonomic relations, instances || NLP, machine learning, clustering || || || |- | [http://wwwusers.di.uniroma1.it/~velardi/CL.pdf OntoLearn] {{Webarchive|url=https://web.archive.org/web/20170809104810/http://wwwusers.di.uniroma1.it/~velardi/CL.pdf |date=2017-08-09 }} || plain text, HTML || dump || no || yes || automatic || yes || no || OL || concepts, concept hierarchy, instances || NLP, statistical methods || proprietary || domain-independent || English |- | [http://wwwusers.di.uniroma1.it/~navigli/pubs/IJCAI_2011_Navigli_Velardi_Faralli.pdf OntoLearn Reloaded] || plain text, HTML || dump || no || yes || automatic || yes || no || OL || concepts, concept hierarchy, instances || NLP, statistical methods || proprietary || domain-independent || English |- | [http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf OntoSyphon] || HTML, PDF, DOC || dump, search engine queries || no || yes || automatic || yes || no || OBIE || concepts, relations, instances || NLP, statistical methods || RDF || domain-independent || English |- | [http://ieg.ifs.tuwien.ac.at/projects/ontox ontoX] {{Webarchive|url=https://web.archive.org/web/20160527000719/http://ieg.ifs.tuwien.ac.at/projects/ontox/ |date=2016-05-27 }} || plain text || dump || no || yes || semi-automatic || yes || no || OBIE || instances, datatype property values || heuristic-based methods || proprietary || domain-independent || language-independent |- | [http://www.opencalais.com/ OpenCalais] || plain text, HTML, XML || dump || no || yes || automatic || yes || no || SA || annotation to entities, annotation to events, annotation to facts || NLP, machine learning || RDF || domain-independent || English, French, Spanish |- | [http://www.semantic-web.at/de/poolparty-extractor PoolParty Extractor] || plain text, HTML, DOC, ODT || dump || no || yes || automatic || yes || yes || OBIE || named entities, concepts, relations, concepts that categorize the text, enrichments || NLP, machine learning, statistical methods || RDF, OWL || domain-independent || English, German, Spanish, French |- | [http://www.rosoka.com/ Rosoka] || plain text, HTML, XML, SGML, PDF, MS Office || dump || Yes || Yes || Automatic || no || Yes || IE || named entity extraction, entity resolution, relationship extraction, attributes, concepts, multi-vector [[sentiment analysis]], geotagging, [[language identification]] || NLP, machine learning || XML, JSON, POJO, RDF || multiple domains || Multilingual 200+ Languages |- | [https://github.com/benjamin-adrian/scoobie SCOOBIE] || plain text, HTML || dump || no || yes || automatic || no || no || OBIE || instances, property values, RDFS types || NLP, machine learning || RDF, RDFa || domain-independent || English, German |- | [http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html SemTag] || HTML || dump || no || yes || automatic || yes || no || SA || || machine learning || database record || domain-independent || language-independent |- | [http://www.insiders-technologies.de/produkte/smart-produkte/smart-fix/ smart FIX] {{Webarchive|url=https://web.archive.org/web/20160517163815/http://www.insiders-technologies.de/produkte/smart-produkte/smart-fix/ |date=2016-05-17 }} || plain text, HTML, PDF, DOC, e-Mail || dump || yes || no || automatic || no || yes || OBIE || named entities || NLP, machine learning || proprietary || domain-independent || English, German, French, Dutch, polish |- | [https://code.google.com/p/text2onto/ Text2Onto] || plain text, HTML, PDF || dump || yes || no || semi-automatic || yes || yes || OL || concepts, concept hierarchy, non-taxonomic relations, instances, axioms || NLP, statistical methods, machine learning, rule-based methods || OWL || deomain-independent || English, German, Spanish |- | [https://texttoonto.sourceforge.net/ Text-To-Onto] || plain text, HTML, PDF, PostScript || dump || || || semi-automatic || yes || yes || OL || concepts, concept hierarchy, non-taxonomic relations, lexical entities referring to concepts, lexical entities referring to relations || NLP, machine learning, clustering, statistical methods || || || German |- |[http://www.thatneedle.com/nlp-api.html ThatNeedle] |Plain Text |dump | | |automatic | |no | |concepts, relations, hierarchy |NLP, proprietary |JSON |multiple domains |English |- | [https://web.archive.org/web/20120719171047/http://thewikimachine.fbk.eu/html/index.html The Wiki Machine] || plain text, HTML, PDF, DOC || dump || no || yes || automatic || yes || yes || SA || annotation to proper nouns, annotation to common nouns || machine learning || RDFa || domain-independent || English, German, Spanish, French, Portuguese, Italian, Russian |- | [https://web.archive.org/web/20120629052702/http://inxightfedsys.com/products/sdks/tf/ ThingFinder] || || || || || || || || IE || named entities, relationships, events || || || || multilingual |}
==Knowledge discovery== Knowledge discovery describes the process of automatically searching large volumes of [[data]] for patterns that can be considered [[knowledge]] ''about'' the data. It is often described as ''deriving'' knowledge from the input data. Knowledge discovery developed out of the [[data mining]] domain, and is closely related to it both in terms of methodology and terminology.
The most well-known branch of [[data mining]] is knowledge discovery, also known as [[knowledge discovery in databases]] (KDD). Just as many other forms of knowledge discovery it creates [[abstraction]]s of the input data. The ''knowledge'' obtained through the process may become additional ''data'' that can be used for further usage and discovery. Often the outcomes from knowledge discovery are not actionable, techniques like [[domain driven data mining]],{{Cite journal|last=Cao|first=L.|year=2010|title=Domain driven data mining: challenges and prospects|journal=IEEE Transactions on Knowledge and Data Engineering|volume=22|issue=6|pages=755–769|doi=10.1109/tkde.2010.32|citeseerx=10.1.1.190.8427|s2cid=17904603}} aims to discover and deliver actionable knowledge and insights.
Another promising application of knowledge discovery is in the area of [[software modernization]], weakness discovery and compliance which involves understanding existing software artifacts. This process is related to a concept of [[reverse engineering]]. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An [[entity relationship]] is a frequent format of representing knowledge obtained from existing software. [[Object Management Group]] (OMG) developed the specification [[Knowledge Discovery Metamodel]] (KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery in existing code. Knowledge discovery from existing software systems, also known as [[software mining]] is closely related to [[data mining]], since existing software artifacts contain enormous value for risk management and [[Business Value|business value]], key for the evaluation and evolution of software systems. Instead of mining individual [[data set]]s, [[software mining]] focuses on [[metadata]], such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.
===Input data===
- [[Data mining|Databases]] ** [[Relational data mining|Relational data]] ** [[Database]] ** [[Document warehouse]] ** [[Data warehouse]]
- [[Software mining|Software]] ** [[Source code]] ** [[Configuration files]] ** [[Build automation|Build scripts]]
- [[text mining|Text]] ** [[Concept mining]]
- [[Graph mining|Graphs]] ** [[Molecule mining]]
- [[Sequence mining|Sequences]] ** [[Data stream mining]] ** [[Concept drift|Learning from time-varying data streams under concept drift]]
- [[Web mining|Web]]
===Output formats===
- [[Data model]]
- [[Metadata]]
- [[Metamodeling|Metamodels]]
- [[Ontology]]
- [[Knowledge representation]]
- [[Knowledge tags]]
- [[Business rule]]
- [[Knowledge Discovery Metamodel]] (KDM)
- [[Business Process Modeling Notation]] (BPMN)
- [[Intermediate representation]]
- [[Resource Description Framework]] (RDF)
- [[Software metric]]s
==See also==
- [[Cluster analysis]]
- [[Data archaeology]]
== Further reading == *{{cite journal | last1 = Chicco | first1 = D | last2 = Masseroli | first2 = M | year = 2016 | title = Ontology-based prediction and prioritization of gene functional annotations | journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics | volume = 13 | issue = 2 | pages = 248–260 | doi=10.1109/TCBB.2015.2459694 | pmid = 27045825 | s2cid = 2795344 | url = https://doi.org/10.1109/TCBB.2015.2459694| url-access = subscription }}
==References==
{{reflist|30em| refs= {{cite web |url=http://www.opencalais.com/node/9501 |title=Life in the Linked Data Cloud |publisher=www.opencalais.com |access-date=2009-11-10 |quote=Wikipedia has a Linked Data twin called DBpedia. DBpedia has the same structured information as Wikipedia – but translated into a machine-readable format. |url-status=dead |archive-url=https://web.archive.org/web/20091124182935/http://www.opencalais.com/node/9501 |archive-date=2009-11-24 }}
RDB2RDF Working Group, Website: http://www.w3.org/2001/sw/rdb2rdf/, charter: http://www.w3.org/2009/08/rdb2rdf-charter, R2RML: RDB to RDF Mapping Language: http://www.w3.org/TR/r2rml/
LOD2 EU Deliverable 3.1.1 Knowledge Extraction from Structured Sources http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf {{Webarchive|url=https://web.archive.org/web/20110827231506/http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf |date=2011-08-27 }}
Frawley William. F. et al. (1992), "Knowledge Discovery in Databases: An Overview", ''AI Magazine'' (Vol 13, No 3), 57-70 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011 {{Webarchive|url=https://web.archive.org/web/20160304054249/http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011 |date=2016-03-04 }})
Fayyad U. et al. (1996), "From Data Mining to Knowledge Discovery in Databases", ''AI Magazine'' (Vol 17, No 3), 37-54 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230 {{Webarchive|url=https://web.archive.org/web/20160504232218/http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230 |date=2016-05-04 }}
Tim Berners-Lee (1998), [http://www.w3.org/DesignIssues/RDB-RDF.html "Relational Databases on the Semantic Web"]. Retrieved: February 20, 2011.
Farid Cerbah (2008). "Learning Highly Structured Semantic Repositories from Relational Databases", The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf {{Webarchive|url=https://web.archive.org/web/20110720172603/http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf |date=2011-07-20 }}
Tirmizi et al. (2008), "Translating SQL Applications to the Semantic Web", Lecture Notes in Computer Science, Volume 5181/2008 (Database and Expert Systems Applications). http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf
Hu et al. (2007), "Discovering Simple Mappings Between Relational Database Schemas and Ontologies", In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐238, Busan, Korea, 11‐15 November 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf
R. Ghawi and N. Cullot (2007), "Database-to-Ontology Mapping Generation for Semantic Interoperability". In Third International Workshop on Database Interoperability (InterDB 2007). http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf
Li et al. (2005) "A Semi-automatic Ontology Acquisition Method for the Semantic Web", WAIM, volume 3739 of Lecture Notes in Computer Science, page 209-220. Springer. {{doi|10.1007/11563952_19}}
Gangemi, Aldo; Presutti, Valentina; Reforgiato Recupero, Diego; Nuzzolese, Andrea Giovanni; Draicchio, Francesco; Mongiovì, Misael (2016). "Semantic Web Machine Reading with FRED", ''Semantic Web Journal'', {{doi| 10.3233/SW-160240}}, http://www.semantic-web-journal.net/system/files/swj1379.pdf
Adrian, Benjamin; Maus, Heiko; Dengel, Andreas (2009). "iDocument: Using Ontologies for Extracting Information from Text", http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf (retrieved: 18.06.2012).
Attensity (2012). "Exhaustive Extraction", http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ {{Webarchive|url=https://web.archive.org/web/20120711232021/http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ |date=2012-07-11 }} (retrieved: 18.06.2012).
Cimiano, Philipp; Völker, Johanna (2005). "Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery", ''Proceedings of the 10th International Conference of Applications of Natural Language to Information Systems'', 3513, p. 227 - 238, http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf (retrieved: 18.06.2012).
Cunningham, Hamish (2005). "Information Extraction, Automatic", ''Encyclopedia of Language and Linguistics'', 2, p. 665 - 677, http://gate.ac.uk/sale/ell2/ie/main.pdf (retrieved: 18.06.2012).
Dill, Stephen; Eiron, Nadav; Gibson, David; Gruhl, Daniel; Guha, R.; Jhingran, Anant; Kanungo, Tapas; Rajagopalan, Sridhar; Tomkins, Andrew; Tomlin, John A.; Zien, Jason Y. (2003). "SemTag and Seeker: Bootstraping the Semantic Web via Automated Semantic Annotation", ''Proceedings of the 12th international conference on World Wide Web'', p. 178 - 186, http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html (retrieved: 18.06.2012).
Erdmann, M.; Maedche, Alexander; Schnurr, H.-P.; Staab, Steffen (2000). "From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools", ''Proceedings of the COLING'', http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf (retrieved: 18.06.2012).
Fortuna, Blaz; Grobelnik, Marko; Mladenic, Dunja (2007). "OntoGen: Semi-automatic Ontology Editor", ''Proceedings of the 2007 conference on Human interface, Part 2'', p. 309 - 318, http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf {{Webarchive|url=https://web.archive.org/web/20130918152126/http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf |date=2013-09-18 }} (retrieved: 18.06.2012).
ILP Network of Excellence. "ASIUM (LRI)", http://www-ai.ijs.si/~ilpnet2/systems/asium.html (retrieved: 18.06.2012).
Inxight Federal Systems (2008). "Inxight ThingFinder and ThingFinder Professional", http://inxightfedsys.com/products/sdks/tf/ {{Webarchive|url=https://web.archive.org/web/20120629052702/http://inxightfedsys.com/products/sdks/tf/ |date=2012-06-29 }} (retrieved: 18.06.2012).
Machine Linking. "We connect to the Linked Open Data cloud", http://thewikimachine.fbk.eu/html/index.html {{Webarchive|url=https://web.archive.org/web/20120719171047/http://thewikimachine.fbk.eu/html/index.html |date=2012-07-19 }} (retrieved: 18.06.2012).
Maedche, Alexander; Volz, Raphael (2001). "The Ontology Extraction & Maintenance Framework Text-To-Onto", ''Proceedings of the IEEE International Conference on Data Mining'', http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf (retrieved: 18.06.2012).
McDowell, Luke K.; Cafarella, Michael (2006). "Ontology-driven Information Extraction with OntoSyphon", ''Proceedings of the 5th international conference on The Semantic Web'', p. 428 - 444, http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf (retrieved: 18.06.2012).
Mendes, Pablo N.; Jakob, Max; Garcia-Sílva, Andrés; Bizer; Christian (2011). "DBpedia Spotlight: Shedding Light on the Web of Documents", ''Proceedings of the 7th International Conference on Semantic Systems'', p. 1 - 8, http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf {{Webarchive|url=https://web.archive.org/web/20120405211554/http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf |date=2012-04-05 }} (retrieved: 18.06.2012).
Missikoff, Michele; Navigli, Roberto; Velardi, Paola (2002). "Integrated Approach to Web Ontology Learning and Engineering", ''Computer'', 35(11), p. 60 - 63, http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf {{Webarchive|url=https://web.archive.org/web/20170519011529/http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf |date=2017-05-19 }} (retrieved: 18.06.2012).
Orchestr8 (2012): "AlchemyAPI Overview", http://www.alchemyapi.com/api {{Webarchive|url=https://web.archive.org/web/20160513114853/http://www.alchemyapi.com/api |date=2016-05-13 }} (retrieved: 18.06.2012).
Rao, Delip; McNamee, Paul; Dredze, Mark (2011). "Entity Linking: Finding Extracted Entities in a Knowledge Base", ''Multi-source, Multi-lingual Information Extraction and Summarization'', http://www.cs.jhu.edu/~delip/entity-linking.pdf{{Dead link|date=February 2020 |bot=InternetArchiveBot |fix-attempted=yes }} (retrieved: 18.06.2012).
Rocket Software, Inc. (2012). "technology for extracting intelligence from text", http://www.rocketsoftware.com/products/aerotext {{Webarchive|url=https://web.archive.org/web/20130621113445/http://www.rocketsoftware.com/products/aerotext |date=2013-06-21 }} (retrieved: 18.06.2012).
semanticweb.org (2011). "PoolParty Extractor", http://semanticweb.org/wiki/PoolParty_Extractor {{Webarchive|url=https://web.archive.org/web/20160304185625/http://semanticweb.org/wiki/PoolParty_Extractor |date=2016-03-04 }} (retrieved: 18.06.2012).
SRA International, Inc. (2012). "NetOwl Extractor", http://www.sra.com/netowl/entity-extraction/ {{Webarchive|url=https://web.archive.org/web/20120924081059/http://www.sra.com/netowl/entity-extraction/ |date=2012-09-24 }} (retrieved: 18.06.2012).
The University of Sheffield (2011). "ANNIE: a Nearly-New Information Extraction System", http://gate.ac.uk/sale/tao/splitch6.html#chap:annie (retrieved: 18.06.2012).
Uren, Victoria; Cimiano, Philipp; Iria, José; Handschuh, Siegfried; Vargas-Vera, Maria; Motta, Enrico; Ciravegna, Fabio (2006). "Semantic annotation for knowledge management: Requirements and a survey of the state of the art", ''Web Semantics: Science, Services and Agents on the World Wide Web'', 4(1), p. 14 - 28, http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf{{Dead link|date=February 2020 |bot=InternetArchiveBot |fix-attempted=yes }}, (retrieved: 18.06.2012).
Wimalasuriya, Daya C.; Dou, Dejing (2010). "Ontology-based information extraction: An introduction and a survey of current approaches", ''Journal of Information Science'', 36(3), p. 306 - 323, http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf (retrieved: 18.06.2012).
Yildiz, Burcu; [[Silvia Miksch|Miksch, Silvia]] (2007). "ontoX - A Method for Ontology-Driven Information Extraction", ''Proceedings of the 2007 international conference on Computational science and its applications'', 3, p. 660 - 673, http://publik.tuwien.ac.at/files/pub-inf_4769.pdf {{Webarchive|url=https://web.archive.org/web/20170705135417/https://publik.tuwien.ac.at/files/pub-inf_4769.pdf |date=2017-07-05 }} (retrieved: 18.06.2012). }}
{{Semantic Web}} {{Authority control}}
{{DEFAULTSORT:Knowledge Extraction}} [[Category:Knowledge economy]] [[Category:Knowledge transfer]] [[Category:Information economics]]
From MOAI Insights

공장의 뇌는 어떻게 생겼는가 — 제조운영 AI 아키텍처 해부
지식관리, 업무자동화, 의사결정지원 — 따로 보면 다 있던 것들입니다. 제조 AI의 진짜 차이는 이 셋이 순환하면서 '우리 공장만의 지능'을 만든다는 데 있습니다.

그 30분을 18년 동안 매일 반복했습니다 — 품질팀장이 본 AI Agent
18년차 품질팀장이 매일 아침 30분씩 반복하던 데이터 분석을 AI Agent가 3분 만에 해냈습니다. 챗봇과는 완전히 다른 물건 — 직접 시스템에 접근해서 데이터를 꺼내고 분석하는 AI의 현장 도입기.

ERP 20년, 나는 왜 AI를 얹기로 했나
ERP 20년차 제조IT본부장의 고백: 3,200만 행의 데이터가 잠들어 있었다. ERP를 바꾸지 않고 AI를 얹자, 일주일 걸리던 불량 분석이 수 초로 줄었다.
Want to apply this in your factory?
MOAI helps manufacturing companies adopt AI tailored to their operations.
Talk to us →