Natural Language Processing
Natural Language Processing or “Computational Linguistics” belongs to the artificial intelligence and serves as a transition between language and computer science.
Based on grammar rules, statistics and algorithms the machine learns to understand written language.
A brief description on how to learn the computer languages:
Step 1To begin with, the letter combinations are counted and then divided into words and sentences. This step is called tokenization.
Step 2Then follows the morphological analysis which is conducted on the level of single words. It recognizes people, places, or organizations for example.
Step 3The analysis on the level of sentences, the so called ‘syntactical analysis‘, is concerned with the structures within a sentence. It identifies the verb, the actor of the sentence, and so
Step 4What follows now is the semantic analysis which is the most difficult part of the process. The machine has to assign a meaning to the single parts of the sentence. Since this is a rather complicated task, there are several ways how to approach it.
Step 5In a final step there is the discourse analysis. It aims at recognizing the connection between sentences. This analysis beyond the sentence boundary is not any less difficult and has its own challenges.
Why is Valuescope’s Natural Language Processing unique?
Several lemmas for one term:
Valuescope automatically considers multiple spellings (lemma) for a keyword. So i.e. for Rolls Royse: – Rolls Royce – Rolls-Royce – rollsroyce – Rolls Royces etc. …
We not only search the sentence in which the keyword occurs, but also the entire environment. Pronouns that refer to the keyword are identified and assigned. Example:
“Peer Steinbrück verweigert sich dem Internet. Das kostet ihn Sympathisanten.”
Peer Steinbrueck is detected in both sentences, although his name appears only in the first one.
Declension and conjugation:
Our program recognizes the entire paradigm of verbs, adjectives, nouns, pronouns, etc., and assigns them accordingly. It is difficult at words that appear in several word classes – with no changes in their writing:
Love. I love you. This must be love
“The shorter the better” is often the motto for communication in the internet like in tweets (140 signs). Valuescope faces only limited challenges to also recognise the common used abbreviations. Example:
“Ich hab mir grad ein Buch bei Amazon gekauft.”
It’s the aim of our computational linguists to guarantee a reliable and effective tonality analysis. Therefore, we face the new challenges that holds an ever-changing language every day.