Close the map
Print E-mail
Image  NATURAL LANGUAGE PROCESSING GLOSSARY


A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z


A

Authority list

A list of descriptors that must be retained in order to index a document. The list can be structured in different ways (see thesaurus, taxonomy, classification plan).


C

Categorization

A process that consists of integrating a document in one or several points of a classification plan.

Also see Indexing

Lingway KM can carry out an automatic categorization within a classification plan defined by the user (not to be confused with Clustering).

For more



Classification plan

A hierarchical structure enabling the classification and location of documents or documentary sets. It generally consists of a hierarchical list descriptors or Taxonomy.

Also see Categorization

Lingway KM enables you to download a classification plan.
For more



Clustering

A process that consists in extracting groups (“clusters“) of documents from a set of unclassified documents. The aim is to automatically organize a set of documents in sub-groups. The process is
usually based on a calculation of closeness between documents. Clustering is a bottom up information search method.

Lingway KM carries out the clustering process for the documentary set found in response to a query (“group” button).
For more



Concept

An object that represents the idea behind a term, generally a set of synonyms in one or several languages, as the object is independent from its linguistic expression, or actual name. The object is used to describe properties independently of the language used (conceptual properties). For instance, the concept <hammer> belongs to the field <tool> whatever the language used.

In Lingway KM, the dictionary describes 150,000 concepts mapped to five languages. The concepts are connected by a set of links, creating a semantic network. For instance, concept n° 344 is linked to concept n° 765: concept n° 344 is named “armchair” in English and “fauteuil” in French, and concept n° 765 is named “furniture” in English and “meuble” in French.
For more

 

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page

 


D

Descriptor

Term or named entity used to characterize (Indexing 2) a document. A descriptor can be free or controlled.




[Controlled] Descriptor

Descriptor imposed by an authority list, usually a thesaurus.

Lingway KM enables you to define a principal list, also known as an authority list.
For more




[Free] Descriptor

Descriptor selected independently of an authority list.

Lingway KM carries out two different types of free descriptor indexing: “themes,” or terms extracted from the text and “named entities” which are also extracted from the text.
For more



Dictionary

[Electronic] dictionary


Database with all the linguistic and conceptual information needed for text and query analysis. The dictionary includes the morphological description of the words, their different meanings, their link to the concepts, and the semantic network between concepts.

Lingway’s generic dictionary is comprised of 150,000 concepts mapped in five languages.
For more



[User] dictionary

A dictionary offering the linguistic coverage specific to an application.

The Lingway KM user dictionary is built on a simplified model, making it it easy to manage and use.
For more

 

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page




I

Indexing

Indexing -1

Indexing referred to as “full-text.” The process consists in compiling a list of words (”inverted list”) that appear in the individual documents of a given documentary set. The process is applied to all the words, except for those listed in an “anti-dictionary”, and which generally include tool words, verbs such as “to be”, etc.

In Lingway KM, full-text indexing is carried out and used for the semantic search.
For more



Indexing - 2

A process that consists in assigning descriptors to documents.



[Controlled] indexing

The indexing of documents with descriptors selected in a principal list.

Lingway KM enables you to define an authority list. In this case, a text that includes a descriptor from the principal list will be systematically indexed by the descriptor, independently of any statistical calculation.
For more


[Free] indexing

The indexing of documents with descriptors selected independently of a principal, or authority, list.

In Lingway KM, fee descriptors are selected from all of the terms and named entities extracted from the text. The statistical methods used take the frequency of the terms in both the document and the entire corpus to be indexed into account.
For more


[Mixed] indexing

A process combining controlled indexing with the free indexing methods.

Lingway KM enables mixed indexing.
For more

 

 

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page

 



M

Metadata

A metadata can best be described as data about a data. In linguistic analysis, it means the data that describes a document or a set of documents. These data can include the descriptors, as well as other types of information about the document, such as: author, publication, date, legal information and document format.

Lingway KM automatically generates metadata for each processed document, according to the standard formats suggested by the Dublin Core agreements.
For more

 

A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page


N


Named entity

A specific descriptor indicating an object (typically a person, place, or organization) by its name. Named entities also include values and dates by extension.

Lingway KM is able to extract all of these different types of named entities.
For more



A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page



P


Post-coordination

Indexing that combines several basic descriptors. For instance, a document describing a “garage” will have both “repair’ and “car” as descriptors.

Also see Pre-Coordination.



Pre-coordination


Indexing carried out by complex descriptors, compound words or expressions. For instance, when a garage is described by the descriptor “car repairing.”

Also see Post-Coordination et Classification plan

Lingway KM is able to recognize the pre-coordinated descriptors.
For more




A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page

 



S


Semantic expansion

A process which starts from a given term (helped by the dictionary’s semantic network) and compiles a list of terms with close meanings, usually to build a search equation in the (full text) documentary database.

In Lingway KM, the “semantic distance” can be customized enabling expansion to be adjusted to determine how broadly or narrowly to search.
For more




Semantic field

A subject or activity that gives a particular meaning to a word. For instance, the word “bridge” has a specific meaning in the field “game” and another in the field “building.”

Lingway’s dictionary consists of approximately 350 semantic fields that serve to interpret the meaning of the words in queries. The fields appear in the frame “analysis” which provides the query interpretation.
For more


Semantic network


Set of objects (of concepts) that are linked by semantic relations.

In Lingway KM, the semantic network comprises both the concepts and the graph of the semantic relations. There are approximately 20 semantic relations, including hierarchical relation (is-a), semantic closeness, part-off, etc.
For more




A - B - C - D - E - F - G - H - I - J - K - L - M - N - O - P - Q - R - S - T - U - V - W - X - Y - Z

Top of page



T

Taxonomy

A semantic network in which the only relation is hierarchical (generic-specific).

Also see Classification plan .



Term

A simple word, compound word or expression, complex or not, usually designating an object or a process.

Lingway KM uses both linguistic and statistical tools to identify the key terms of a document or a corpus. The linguistic “patterns” describe the possible syntactic form of a term (name-preposition-name, name-adjective, etc.) and the statistical calculations define the terms to be retained as descriptors for document or corpus indexing.
For more



Theme


A descriptor that is a term as opposed to a named entity.
For more



Thesaurus

An authority list with a structure similar to that of a semantic network .
Both are based on two main relations: the hierarchical relation, is-a, (from a descriptor to its generic or its opposite, from a descriptor to a specific) and the closeness relation (or “related term”). In addition, the thesaurus often lists non-descriptor terms linked to descriptors.

Lingway KM allows you to integrate a thesaurus. The descriptors appear in the list of themes prefixed by “TH”.
For more


Top of page




 
Lingway is a software company that develops specialized search solutions based on powerful multilingual semantic tools and business-specific linguistic resources. The solutions offer companies analysis and search capabilities tailored to their line of business.