what is natural language procesing, challenge of nlp, how does natural language processing work, phases of nlp
INTRODUCTION TO NATURAL LANGUAGE PROCESSING
What is Natural Language? Any language in which humans are making a conversation that is natural language. Language is at the core of human evolution. It is the most fundamental form of our intelligence and our ability to communicate through language is truly spectacular. What if computers could understand and converse in languages just like humans?
Natural Language Processing is a computer program built for reading,
understand the human language and respond the Text and Voice Data. It is
considered as a subset of machine learning while NLP and ML both fall
under the larger category of artificial intelligence that can analyze,
understand, and manipulate human language. It is the understanding and
generation of written and spoken natural language using advanced software.
The technology leverages a wide range of disciplines and artificial
intelligence like machine learning and computational linguistics. This
means when you write or speak phrases sentences or even longer content
natural language processing can understand based upon grammatical rules of
the specific language.
Natural Language Processing Terminologies:
This technique is used to clean the text data for the machine to be able
to analyze it.
Tokenization
Tokenization is the task in Natural Language processing, Tokenization is
breaking the raw text into words or sentences called a token.
Sequencing
Creating sequences of numbers from the sentences and using tools to
process to make them ready to teach neural networks.
Normalization
Normalization is used to put all words on equal footing and allows
processing to proceed uniformly, converting all words in the same case
(upper or lower) and finding valuable output.
Corpus
Corpus is a language that consists of text collection or a set of text.
Corpora are used for statistical linguistic analysis and hypothesis
testing.
Bag of Words
It is a text representation that describes the word count in a document.
Sample text
“Hello, hello, hello,” said Josh
“Here, here,” said John. “Here, here,”
The resulting bag of word representation as a dictionary:
‘hello’:3,
‘said’:2,
‘Josh’:1,
‘here’:4,
‘John’:1
}
N-gram
N-gram model that predicts the probability of a given sequence of words
in a sentence. An n-gram model is a language model to predict the next
item in the form of an (n-1) order.
What are the techniques used in NLP?
Two components of the NLP system are:
Natural language understanding (NLU)
It is also called natural language interpretation (NLI) (i.e. human to
machine). The mapping of a given input in natural language into useful
representation. Analyzing different aspects of the language.
Natural language generation (NLG)
It is a software process that generates meaningful sentences and phrases
in the form of natural language output. This process involves text plan,
sentence plan, and text realization.
Example:
Automated journalism
Difficulties in Natural Language Processing
Lexical ambiguity
It is predefined at a very primitive level such as word level.
Syntax level ambiguity
It defines a sentence in a parsed way or in a different way.
Referential ambiguity
Very often a text mentions an entity (Someone/something) and then refers
to it again possible in a different sentence using another word. Refer
something using a pronoun.
Phases of Natural Language Processing
Lexical (structure) analysis
It is a process of finding and analyze the structure of words. The
collection of words and phrases in the language is the lexicon of a
language.
Syntactic analysis (parsing)
Parsing for the analysis of the word using formal grammar. It can arrange
words in a particular manner. That shows the relationship between
words.
Examples:
LemmatizationIt is the most common text preprocessing technique used in NLP and ML.
For example, stemming the word fails to return its citation form;
however, lemmatization would result in the following:
Better to good
Stemming:
Stemming refers to the method of reducing a derivational word into its
stem that is attached to suffixes and prefixes.
Running to run
Part of Speech (POS) tagging: It is also called grammatical tagging is the process which refers to categorizing words in a text in accordance with a particular part of speech such as nouns, adjectives, verbs, adverbs, etc., depending on the meaning of the word and its context.
Semantic analysis
Semantic data analysis is the process of identifying the meaning and tone
in unstructured text. In the task domain, mapping syntactic structure and
object.
Examples:
Named entity recognition (NER):
Categorize the words into groups
Word sense disambiguation: This refers to the meaning of a
word based on context.
Discourse integration
In this step, the meaning of any sentence depends upon the meaning of a
sense of the context. It also brings meaning to immediately the preceding
sentence.
Pragmatic analysis
It is the process of extracting information from text and data is integrated into what is actually meant.
How does natural language processing work?
Segmentation is to break the entire document down into its constituent sentences. Segmenting the article along with its punctuations like full stops and commas.
For the algorithm to understand these sentences we get the words in a sentence and explain them individually to our algorithm. So we break down our sentence into its constituent words and store them. This is called tokenizing where each word is called a token. Make the learning process faster by getting rid of non-essential words which do not add much meaning to our statement and are just there to make our statement sound more cohesive. These words such as ‘as’, ‘are’, and ‘the’ are called stop words. So the unique words in the text remain. For instance, “the”, “and”, and “a” while all required words in a particular passage, it does not contribute much to understanding of content. The basic form of our document we need to explain to our machine. First, start off by explaining that some words like are the same word with added prefixes and suffixes this is called stemming. Identify the base words for different word tenses, mood, gender, etc. This is called lemmatization stemming from the base word lemma. Explain the concept of nouns, verbs, articles, and other parts of speech to the machine by adding these tags to our words this is called part of speech tagging. Introduce our machine to pop culture references and everyday names by flagging names of movies, important personalities or locations, etc. That may occur in the document. This is called named entity tagging. Once we have our base words and tags use a machine learning algorithm like Naïve-Bayes to teach our model humans sentiment and speech. Most of the techniques used in NLP are simple grammar techniques.
Challenges of NLP
Natural language processing has the potential to have significant social
benefits. These technologies are rapidly advancing however, they face many
challenges here are a couple of data.
Natural language processing analyzes vast amounts of data to extract a
particular piece of information. To function, effectively NLP models have
trained with a corpus curated data set. However, finding the right or
relevant answer is challenging because of the enormous complexity of
machine learning algorithms that examine millions of unstructured and
semi-structured data sets.
In human language, we often use the same vocabulary in different
contextual meetings. Natural language processing algorithms are not yet
fully competent to distinguish between contextual human languages. The
same challenge exists with ambiguity and homonyms where NLP has to make a
guess. However, as more data is captured and the technology learns the
model will improve slang and colloquialism. Formal language has rarely
changed rules and forms this means that using natural language processing
for a wide variety of applications is more challenging because the data
needed to train the model becomes larger, evolving, and more unstructured
this is a significant challenge. However, advancements in technology are
showing signs that this problem will soon be overcome.









COMMENTS