Comprehensive notes Unit 6 Natural Language Processing AI Class 10

In this article, I will provide comprehensive notes on Unit 6 Natural Language processing AI Class 10. So lets us start the article with the introduction to Natural Language processing ai class 10, here we go!

Unit 6 Natural Language Processing AI Class 10

Natural Language Processing is one of the branches of AI that helps machines to understand, interpret, and manipulate human languages such as English or Hindi to analyze and derive their meaning.

NLP takes the data as input from the spoken words, verbal commands or speech recognition software that humans use in their daily lives and operates on this. Before getting deeper into the concept follow the links to play the games based on NLP.

The mystery animal game: Play from here

Just open the browser and play the game then answer a few questions like:

  1. Were you able to guess the animal?
  2. If yes, in how many questions were you able to guess it?
  3. If no, how many times did you try playing this game?
  4. What according to you was the task of the machine?
  5. Were there any challenges that you faced while playing this game? If yes, list them down.
  6. What approach must one follow to win this game?

In the next section of Unit 6 Natural Language Processing AI Class 10, we are going to cover the applications of natural language processing.

Applications of Natural Language processing


As we have seen some of the common uses of Natural Langauge Processing in our daily lives like virtual assistants, google translate etc. Here are some more applications of NLP:

Automatic Summarization

Automatic Summarization - Unit 6 Natural Language Processing AI Class 10
  • Today the internet is a huge source of information. So it is very difficult to access specific information from the web.
  • Summarizing helps us to understand the emotional meaning of the information. For example social media
  • It can be also helpful to provide an overview of a blog post or new story by avoiding redundancy from multiple sources and maximising the diversity of content obtained.
  • For example, newsletters, social media marketing, video scripting etc.

Sentiment Analysis

sentiment analysis - NLP class 10 notes
sentiment analysis – NLP class 10 notes
  • Sometimes companies need to identify the options and sentiments online to help them to understand customers’ thoughts about products and services.
  • Sometimes sentiment analysis also helps to define the overall reputation.
  • It helps customers to purchase the product or services based on expressed opinions, and understand the sentiment in context to help better.
  • For example, Customers think “I like the new smartphone, but it has weak battery backup” brand monitoring, customer support analysis, customer feedback analysis, market research etc.

Text Classification

text classification - NLP class 10 notes
  • It helps to assign a predefined category to a document, and organize it in such a way that helps customers to find the information they want.
  • It also helps to simplify the activities.
  • For Examples, the spam filtering in email, auto-tagging in social media, categorization of news articles etc.

Virtual Assistants

Photo Credit:https://claudeai.uk/
  • We are using google assistant, Cortana, Siri, and Alexa daily.
  • We can talk with them and they make our tasks easy and comfortable.
  • They can be used for keeping notes, reminders for our scheduled tasks, making calls, sending messages etc.
  • It can also identify the words spoken by the user.

Watch this video for more understanding:

Getting Started with NLP

Let’s start the Natural Language Processing by revisiting the AI project cycle. Here I will give you the same example as given in the CBSE study material. So let’s go through it!

Revisiting the AI project cycle – The scenario

  • The world is competitive nowadays.
  • People face competition in every task and are expected to give their best at every point in time.
  • Sometimes people are not able to meet this expectation, they get stressed and could go into depression.
  • We get to hear a lot of cases where people are depressed due to many reasons.
  • These reasons are pressure, studies, family issues, and relationships.
  • To overcome this Cognitive Behavioural Therapy (CBT), which is one of the best methods to address stress as it is easy to implement on people and also gives good results.
  • CBT included the behaviour and mindset of a person in normal life.

Follow this link to get more information about CBT:

Cognitive Behavioural Therapy

Problem Scoping

CBT technique is used to cure depression and stress. But what to do who do not want to connect with a psychiatrist. So let us try to look at this problem with the 4Ws problem canvas.

Who Canvas

Who?People who suffer from stress and depressed
What did we know about them?People who are going through stress are reluctant to consult a psychiatrist

What Canvas

What is the problem?People who need help are reluctant to consult a psychiatrist and hence live miserably.
How do you know it is a problem?Studies around mental stress and depression are available from various authentic sources.

Where Canvas

What is the context/situation in which the stakeholders experience this problem?When they are going through a stressful period of time o Due to some unpleasant experiences

Why Canvas

What would be of key value to the stakeholders?People get a platform where they can talk and vent out their feelings anonymously
People get a medium that can interact with them and applies primitive CBT to them and can suggest help whenever needed
How would it improve their situation?People would be able to vent out their stress
They would consider going to a psychiatrist whenever required

After 4Ws canvas the problem statement template is created which is as follows:

OurPeople undergoing stressWho?
Have a problem ofNot being able to share their feelingsWhat?
WhileThey need help in venting out their emotionsWhere?
An ideal solution wouldProvide them with a platform to share their thoughts anonymously and suggest help whenever requiredWhy?

This leads us to the goal of our project which is:

“To create a chatbot which can interact with people, help them to vent out their feelings and take them through primitive CBT.”

Data Acquisition

  • To understand the sentiments of people, data should be collected.
  • After collecting data machines can interpret the words that they use and understand their meaning.
  • Data can be collected from various means:
    • Surveys
    • Observing the therapist’s sessions
    • Database available on the internet
    • Interview

Data Exploration

  • Once data is collected it needs to be processed and cleaned.
  • After processing the easier version can be sent.
  • The text is normalised through various steps and is lowered to minimum vocabulary since the machine does not require grammatically correct statements but the essence of it.

Modelling

  • Once the text has been normalised, it is then fed to an NLP based AI model.
  • Note that in NLP, modelling requires data pre-processing only after which the data is fed to the machine.
  • Depending upon the type of chatbot we try to make, there are a lot of AI models available which help us build the foundation of our project.

Evaluation

  • The model trained is then evaluated and the accuracy for the same is generated on the basis of the relevance of the answers that the machine gives to the user’s responses.
  • To understand the efficiency of the model, the suggested answers by the chatbot are compared to the actual answers.
NLP Class 10

In the above diagram, the blue line talks about the model’s output while the green one is the actual output along with data samples.

  1. Figure 1: The model’s output does not match the true function at all. Hence the model is said to be underfitting and its accuracy is lower.
  2. Figure 2: In the second one, the model’s performance matches well with the true function which states that the model has optimum accuracy and the model is called a perfect fit.
  3. Figure 3: In the third case, model performance is trying to cover all the data samples even if they are out of alignment with the true function. This model is said to be overfitting and this too has lower accuracy.

Once the model is evaluated thoroughly, it is then deployed in the form of an app that people can use easily.

Chatbot

Answer the following questions:

  • Which chatbot did you try? Name anyone.
  • What is the purpose of this chatbot?
  • How was the interaction with the chatbot?
  • Did the chat feel like talking to a human or a robot? Why do you think so?
  • Do you feel that the chatbot has a certain personality?

Types of chatbot

There are two types of chatbots:

  1. Script-bot:
    • Script-bot are very easy to make
    • It works around the script which is programmed in them
    • Free to use
    • easy to integrate into a messaging platform
    • no or little processing skills required
    • offers limited functionality
    • story speaker is an example of script-bot
  2. Smart-bot
    • Flexible and powerful
    • Work on bigger databases and other resources directly
    • Learn with more data
    • coding is required to take up this on board
    • Wide functionality
    • Google Virtual Assistant, Alexa, Siri, Cortana etc

In the next section of Unit 6 Natural Language Processing AI Class 10, we will discuss human language vs computer language.

Human Language vs Computer Language

  • Humans communicate through language which we process all the time.
  • Our brain keeps on processing the sounds that it hears around itself and tries to make sense of them all the time.
  • Even in the classroom, as the teacher delivers the session, our brain is continuously processing everything and storing it in someplace. Also, while this is happening, when your friend whispers something, the focus of your brain automatically shifts from the teacher’s speech to your friend’s conversation.
  • So now, the brain is processing both the sounds but is prioritising the one on which our interest lies.
  • The sound reaches the brain through a long channel.
  • As a person speaks, the sound travels from his mouth and goes to the listener’s eardrum.
  • The sound striking the eardrum is converted into neuron impulse, gets transported to the brain and then gets processed.
  • After processing the signal, the brain gains an understanding of its meaning of it.
  • If it is clear, the signal gets stored. Otherwise, the listener asks for clarity from the speaker.
  • This is how human languages are processed by humans.

Watch this video for more understanding:

Let us see some basics of computer language in the next section of Unit 6 Natural Language Processing AI 10.

  • On the other hand, the computer understands the language of numbers.
  • Everything that is sent to the machine has to be converted to numbers.
  • And while typing, if a single mistake is made, the computer throws an error and does not process that part.
  • The communications made by the machines are very basic and simple.
  • Now, if we want the machine to understand our language, how should this happen?
  • What are the possible difficulties a machine would face in processing natural language?

Arrangement of the words and meaning

  • Every language has its own rules.
  • In human language, we have nouns, verbs, adverbs, adjectives etc.
  • A word can be a noun at one time or a verb or adjective at another time.
  • The rules provide a structure for the language.
  • Every language has its own syntax. Syntax refers to the grammatical structure of the sentence.
  • Similarly, rules are required to process the computer language.
  • To do so part-of-speech tagging is required which allows the computer to identify different parts of speech.
  • Human language has multiple characteristics that are easy for a human to understand but extremely difficult for a computer to understand.

Analogy with programming

  • Different syntax, same semantics: 2+3 = 3+2
    • Here the way these statements are written is different, but their meanings are the same that is 5.
  • Different semantics, same syntax: 2/3 (Python 2.7) ≠ 2/3 (Python 3)
    • Here the statements written have the same syntax but their meanings are different. In Python 2.7, this statement would result in 1 while in Python 3, it would give an output of 1.5.

In the next section of Unit 6 Natural Language Processing AI Class 10 we are going to discuss the multiple meanings of words. Here it is!

Multiple meanings of the word

Let’s consider these three sentences:

“His face turned red after he found out that he took the wrong bag”

  • What does this mean?
  • Is he feeling ashamed because he took another person’s bag instead of his?
  • Is he feeling angry because he did not manage to steal the bag that he has been targeting?

“The red car zoomed past his nose”

  • Probably talking about the colour of the car

“His face turns red after consuming the medicine”

  • Is he having an allergic reaction?
  • Or is he not able to bear the taste of that medicine?
  • Here we can see that context is important.
  • We understand a sentence almost intuitively, depending on our history of using the language, and the memories that have been built within.
  • In all three sentences, the word red has been used in three different ways which according to the context of the statement changes its meaning completely.
  • Thus, in natural language, it is important to understand that a word can have multiple meanings and the meanings fit into the statement according to its context of it.

Perfect Syntax, no meaning

  • Sometimes a sentence has perfect syntax but no meaning. For example,

“Chickens feed extravagantly while the moon drinks tea.”

  • This statement is correct in syntax but does this make any sense?
  • In human language, a perfect balance of syntax and semantics is important for better understanding.

Watch this video for more understanding:

Data Processing

  • Humans communicate with each other very easily.
  • The natural language can be used very conveniently and efficiently by speaking and understanding by humans.
  • At the same time, it is very difficult and complex for computers to process them.
  • Now the question is how machines can understand and speak in the Natual Language just like humans.
  • So the computer understands only numbers, so the basic step is to convert each word or letter into numbers.
  • The conversion requires text normalization.

Text Normalization

  • As human languages are to complex, it needs to be simplified to understand.
  • Text Normalization helps in cleaning up the textual data in such a way that it comes down to a level where its complexity is lower than actual data.
  • In text normalization, we follow several steps to normalise the text to a lower level.
  • We need to collect the text for text normalization.

What does text normalization include?

The process of transforming a text into a canonical (standard) form. For example, the word ‘Well’ and ‘Wel’ can be transformed to “Well”.

Why is text normalization important?

While text normalization we reduce the randomness and bring them closer to predefined standards. It reduces the amount of different information that the computer has to deal with and therefore improves efficiency.

Corpus

  • The text and terms collected from various documents and used for whole textual data from all documents altogether is known as corpus.
  • To work out on corpus these steps are required:

Sentence Segmentation

  • Sentence segmentation divides the corpus into sentences.
  • Each sentence is taken as a different data so now the corpus gets reduced to sentences.
sentence segmentation AI class 10

Tokenisation

  • After sentence segmentation, each sentence is further divided into tokens.
  • The token is a term used for any word or number or special character occurring in a sentence.
  • Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token.
Tokenisation Unit 6 Natural Language Processing AI Class 10

Removing Stopwords, Special Characters and Numbers

  • Stop words are those words that are used very frequently in the corpus and do not add any value to the corpus.
  • In human language, there are certain words used to grammar which does not add any essence to the corpus.
  • Some examples of stop words are:
stopwords - Unit 6 Natural Language Processing AI Class 10

The above words have little or no meaning in the corpus, hence these words are removed and focused on meaningful terms.

Along with these stopwords, the corpus may have some special characters and numbers. Sometimes some of them are meaningful, sometimes not. For example, for email ids, the symbol @ and some numbers are very important. If symbolism special characters and numbers are not meaningful can be removed like stopwords.

Converting text to a common case

  • The next step after removing stopwords, convert the whole text into a similar case.
  • The most preferable case is the lower case.
  • This ensures that the case sensitivity of the machine does not consider the same words as different just because of different cases.
Converting text into a comman case

In the above example, the word “hello” is written in 6 different forms, which are converted into lower case and hence all of them are treated as a similar word by the machine.

Stemming

  • In this step, the words are reduced to their root words.
  • Stemming is the process in which the affixes of words are removed and the words are converted to their base form.
  • Note that in stemming, the stemmed words (words which we get after removing the affixes) might not be meaningful.
  • Here in this example as you can see: healed, healing and healer all were reduced to heal but studies were reduced to studi after the affix removal which is not a meaningful word.
  • Stemming does not take into account if the stemmed word is meaningful or not.
  • It just removes the affixes hence it is faster.
WordAffixStem
healededheal
healingingheal
healererheal
studiesesstudi
studyingingstudy
  • In stemming, the stemmed words are not meaningful always.
  • As you can observe in the above table healed, healing, healer reduced to heal and studies reduced to studi.
  • Stemming does not take into account if the stemmed word is meaningful or not.
  • It just removes the affixes hence it is faster.

In the next section of Unit 6 Natural Language Processing AI Class 10 we are going to discuss lemmatization. Here we go!

Lemmatization

  • It is an alternate process of stemming.
  • It also removes the affix from the corpus.
  • The only difference between lemmatization and stemming is the output of lemmatization are meaningful words.
  • The final output is known as a lemma.
  • It takes a longer time to execute than stemming.
  • The following table shows the process.
WordAffixStem
healededheal
healingingheal
healererheal
studiesesstudy
studyingingstudy

Compare the tables of stemming and lemmatization table, you will find the word studies converted into studi by stemming whereas the lemma word is study.

Observe the following example to understand stemming and lemmatization:

Difference between stemming and lemmatization

After normalisation of the corpus, let’s convert the tokens into numbers. To do so the bag of words algorithm will be used.

Watch this video for more understanding:

Bag of words

  • A bag of words is an NLP model that extracts the features of the text which can be helpful in machine learning algorithms.
  • We get the occurrences of each word and develop the vocabulary for the corpus.
Comprehensive notes Unit 6 Natural Language Processing AI Class 10
  • The above image shows how the bag of word algorithm works.
  • The text given on the left is a normalised corpus after going through all the steps of text processing.
  • The image in middle shows the bag of words algorithm. Here we put all the words we got from the text processing.
  • The image in the rights shows unique words returned by the bag of words algorithm along with their occurrence in the text corpus.
  • Eventually, the bag of words returns us two things:
    • A vocabulary of words
    • The frequency of words
  • The algorithm “bag” of words symbolises that the sequence of sentences or tokens does not matter in this case as all we need are the unique words and their frequency in it.

The step by step process to implement the bag of words algorithm

The following steps should be followed to implement the bag of words:

  1. Text Normalization
  2. Create dictionary
  3. Create document vectors
  4. Create document vectors for all documents

Text Normalization:

This step collects the data and pre-processes it. For example:

Document 1: Divya and Rani both are stressed

Document 2: Rani wents to a therapist

Document 3: Divya went to download a health chatbot

The above example consists of three documents having one sentence each. After text normalization, the text would be:

Document 1: [Divya, and, Rani, both, are, stressed]

Document 2: [Rani, went, to, a, therapist]

Document 3:[Divya, went, to, download, a, health, chatbot]

  • Note that no tokens have been removed in the stopwords removal step.
  • It is because we have very little data and since the frequency of all the words is almost the same, no word can be said to have lesser value than the other.

Step 2 Create a Dictionary

To create a dictionary write all words which occurred in the three documents.

Dictionary:

DivyaandRaniwentarestressed
toatherapistdownloadhealthchatbot

In this step, the repeated words are written just once and we create a list of unique words.

Step 3 Create a document vector

  • In this step write all the words in the top row.
  • Now write check the document for the same word and write 1 (One) under it if it’s found otherwise write 0 (zero)
  • If the same word appears more than once do the increment of that word.
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
Document 1111011000000
Document 2001100111000
Document 3100100110111

In the above table,

  1. The header row contains the vocabulary of the corpus.
  2. Three rows represent the three documents.
  3. Analyse the 1s and 0s.
  4. This gives the document vector table for the corpus.
  5. Finally, the tokens should be converted into numbers.

To convert these tokens into numbers the TFIDF (Term Frequency and Inverse Document Frequency) is used.

TFIDF (Term Frequency and Inverse Document Frequency)

There are two terms in TFIDF, one is Term Frequency and another one is Inverse Document Frequency.

Term Frequency

  • It helps to identify the value of each word in a document.
  • Term frequency can easily be found in the document vector table in that table we mention the frequency of each word of the vocabulary in each document. (Refer to the above table)
  • The numbers represent the frequency of the word
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
111011000000
001100111000
100100110111

Inverse Document Frequency

  • Now let’s discuss Inverse Document Frequency.
  • It refers to the number of documents in which the word occurs irrespective of how many times it has occurred in those documents.
  • For example,
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
212211221111
  • In the above table you can observe the words – “Divya”,”Rani”,”went”,”to”,”a” is having 2 frequencies as they occurred in two documents.
  • The rest of the terms occurred in just one document.
  • Now for Inverse Frequency put the document frequency in the denominator while the no. of documents is the numerator.
  • For example,
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
3/23/13/23/23/13/13/23/23/13/13/13/1
  • The formula is
TFIDF(W) = TF(W) * log( IDF(W) )
  • Here, the log is to the base of 10. Don’t worry! You don’t need to calculate the log values by yourself.  Simply use the log function in the calculator and find out!
  • Now, let’s multiply the IDF values to the TF values.
  • Note that the TF values are for each document  while the IDF values are for the whole corpus.
  • Hence, we need to multiply the IDF values to each row  of the document vector table.
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
1*log(3/2)1*log(3)1*log(3/2)0*log((3/2)1*log(3)1*log(3)0*log(3/2)0*log(3/2)0*log(3/1)0*log(3/1)0*log(3/1)0*log(3/1)
0 *log(3/2) 0 *log(3) 1 *log(3/2) 1 *log(3/2) 0 *log(3) 0 *log(3) 1 *log(3/2) 1 *log(3/2) 1 *log(3) 0 *log(3) 0 *log(3) 0 *log(3)
1 *log(3/2) 0 *log(3) 0 *log(3/2) 1 *log(3/2) 0 *log(3) 0 *log(3) 1 *log(3/2) 1 *log(3/2) 0 *log(3) 1 *log(3) 1 *log(3) 1 *log(3)
  • The IDF values for each word is as follows:
Divyaand Raniwentarestressedtoatherapistdownloadhealthchatbot
0.1760.4770.17600.4770.477000000
0 00.176 0.1760 0 0.1760.176 0.477 0 0 0
0.1760 0 0.1760 0 0.176 0.1760 0.477 0.4770.477
  • Finally, the words have been converted to numbers.
  • These numbers are the values of each document.
  • Here, you can see that since we have less amount of data, words like ‘are’ and ‘and’ also have a high value.
  • But as the IDF value increases, the value of that word decreases.
  • That is, for  example:

Total Number of documents: 10

Number of documents in which ‘and’ occurs: 10

Therefore, IDF(and) = 10/10 = 1

Which means: log(1) = 0.

Hence, the value of ‘and’ becomes 0.

On the other hand, suppose a number of documents in which ‘Artificial’ occurs: 3  IDF(Artificial) = 10/3 = 3.3333…

This means log(3.3333) = 0.522; which shows that the word ‘pollution’ has considerable value in the corpus.

Applications of TFDIF

Document  ClassificationTopic ModellingInformation  Retrieval SystemStop word filtering
Helps in classifying the type and genre of a  document.It helps in predicting the topic for a corpus.To extract the important information out of a corpus.Helps in removing the unnecessary words from a text body.

Watch this video for more understanding:

DIY – Do It Yourself!
Here is a corpus for you to challenge yourself with the given tasks. Use the knowledge you have gained in the above sections and try completing the whole exercise by yourself.
The Corpus
Document 1: We can use health chatbots for treating stress.
Document 2: We can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health Chatbots cannot replace human counsellors now.
Accomplish the following challenges on the basis of the corpus given above. You can use the tools available online for these challenges.

The link for each tool is given below:

  1. Sentence Segmentation: https://tinyurl.com/y36hd92n
  2. Tokenisation: https://text-processing.com/demo/tokenize/
  3. Stopwords removal: https://demos.datasciencedojo.com/demo/stopwords/
  4. Lowercase conversion: https://caseconverter.com/
  5. Stemming: http://textanalysisonline.com/nltk-porter-stemmer
  6. Lemmatisation: http://textanalysisonline.com/spacy-word-lemmatize
  7. Bag of Words: Create a document vector table for all documents.
  8. Generate TFIDF values for all the words.
  9. Find the words having the highest value.
  10. Find the words having the least value.

Follow this link for Term 2 important questions for AI class 10.

Term 2 Important Questions Class 10 AI

If you are looking for Term 2 Sample Paper AI Class 10, follow this link:

Term 2 Sample Paper AI Class 10

Thank you for reading this article – Unit 6 Natural Language Processing AI Class 10. Share your valuable feed and views regarding this article in the comment section.

  1. What is Natural Language Processing class 10?

    Natural Language Processing class 10 is one of the topics of the Artificial Intelligence skill course. It is unit 6 of the CBSE Artificial Intelligence curriculum of class 10.

  2. What is NLP explain with an example?

    NLP stands for Natual Language Processing. It is one of the subfields of AI or domains of AI that helps machines to understand, interpret, and manipulate human languages such as English or Hindi to analyze and derive their meaning. For example

  3. What is Natural Language Processing? Explain in detail.

    Natural Language Processing is one of the branches of AI that helps machines to understand, interpret, and manipulate human languages such as English or Hindi to analyze and derive their meaning.

    NLP takes the data as input from the spoken words, verbal commands or speech recognition software that humans use in their daily lives and operates on this.

  4. What are the steps of NLP?

    NLP follows these steps:
    1. Text Normalization
    2. Sentence Segmentation
    3. Tokenization
    4. Bag of Words
    5. TFIDF

Leave a Reply