Introduction to Natural Language Processing (NLP)

Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that enables computers to understand, process, and generate human language.

Examples of NLP Applications:

  • ChatGPT
  • Google Translate
  • Siri
  • Alexa
  • Grammarly
  • Predictive Keyboards
  • Search Engines

One of the most important concepts in NLP is Language Modeling.

What is a Language Model?

A Language Model is a probabilistic model that predicts the likelihood of a sequence of words.

In simple words:

It predicts what word is most likely to come next in a sentence.

Example:

Sentence:

“I am going to the ____”

Possible predictions:

  • market
  • school
  • office

The language model selects the word with the highest probability.

Why Language Modeling is Important?

Language Models are used in many modern AI systems.

Applications

1. Predictive Typing

Mobile keyboards suggest the next word.

Example:

  • “How are ___”
  • Prediction: “you”

2. Speech Recognition

Converts spoken language into text.

3. Machine Translation

Used in Google Translate.

4. Chatbots

Helps AI generate meaningful responses.

5. Search Engines

Provides search suggestions.

Probability in Language Modeling

Language models are based on probability.

Suppose we have two sentences:

  1. “I love machine learning”
  2. “Machine learning love I”

The first sentence has a higher probability because it follows natural grammar and sentence structure.

Mathematical Representation:

Chain Rule of Probability

The probability of a sentence is calculated using the chain rule.

P(w1,w2,...,wn)=P(w1)P(w2∣w1)P(w3∣w1,w2)...P(wn∣w1,...,wn−1)P(w_1,w_2,...,w_n)=P(w_1)P(w_2|w_1)P(w_3|w_1,w_2)...P(w_n|w_1,...,w_{n-1})P(w1​,w2​,...,wn​)=P(w1​)P(w2​∣w1​)P(w3​∣w1​,w2​)...P(wn​∣w1​,...,wn−1​)

This means:

  • Every word depends on previous words.

Example:

Sentence:

“I love NLP”

Probability:

P(I)×P(love∣I)×P(NLP∣I,love)P(I)\times P(love|I)\times P(NLP|I,love)P(I)×P(love∣I)×P(NLP∣I,love)

Problem with Full Probability Models

Considering all previous words creates problems:

  • Very high computation
  • Huge memory usage
  • Complex calculations

To solve this problem, NLP uses N-gram Models.

What are N-grams?

An N-gram is a sequence of N consecutive words.

Types of N-gram Models

1. Unigram Model

A unigram treats every word independently.

Formula:

Example:

Sentence:

“I love NLP”

Probability:


Characteristics

  • No context dependency
  • Simplest model
  • Fast computation

Advantages

  • Easy implementation
  • Low memory usage

Disadvantages

  • Ignores sentence context
  • Poor prediction quality

Python Example — Unigram Model

from collections import Counter

text = "I love NLP and I love Python"

words = text.split()

word_counts = Counter(words)

total_words = len(words)

print("Unigram Probabilities:\n")

for word, count in word_counts.items():
    probability = count / total_words
    print(f"{word} : {probability:.2f}")

Step-by-Step Explanation

Input Sentence:

I love NLP and I love Python

Words:

['I', 'love', 'NLP', 'and', 'I', 'love', 'Python']

Total Words:

7

Count of “love”:

2

Probability:

2. Bigram Model

A bigram considers one previous word.

Formula:

P(wn∣wn−1)P(w_n|w_{n-1})P(wn​∣wn−1​)

Example:

Sentence:

“I love NLP”

Probability:

Bigram Probability Formula

Bigram Example Calculation

Python Example — Bigram Model

from collections import Counte
r
text = "I love NLP and I love Python"

words = text.split()

bigrams = []

for i in range(len(words)-1):
    bigrams.append((words[i], words[i+1]))

bigram_counts = Counter(bigrams)

print("Bigram Counts:\n")

for bigram, count in bigram_counts.items():
    print(f"{bigram} : {count}")

Output

('I', 'love') : 2
('love', 'NLP') : 1
('NLP', 'and') : 1
('and', 'I') : 1
('love', 'Python') : 1

How Bigram Works

Sentence:

“I love Python”

Bigram pairs:

3. Trigram Model

A trigram considers two previous words.

Formula:

Trigram Example

Sentence:

“I love machine learning”

Probability:

Python Example — Trigram Model

from collections import Counter

text = "I love NLP and I love Python"

words = text.split()

trigrams = []

for i in range(len(words)-2):
    trigrams.append((words[i], words[i+1], words[i+2]))

trigram_counts = Counter(trigrams)

print("Trigram Counts:\n")

for trigram, count in trigram_counts.items():
    print(f"{trigram} : {count}")

Output

('I', 'love', 'NLP') : 1
('love', 'NLP', 'and') : 1
('NLP', 'and', 'I') : 1
('and', 'I', 'love') : 1
('I', 'love', 'Python') : 1

Comparison of N-gram Models

Training a Simple N-gram Model

Steps:

  1. Collect text data
  2. Tokenize sentences into words
  3. Create N-grams
  4. Count frequencies
  5. Calculate probabilities

Tokenization in NLP

Tokenization means breaking text into words.

Example:

Sentence:

“I love AI”

Tokens:

['I', 'love', 'AI']

Python Example — Tokenization

text = "I love Artificial Intelligence"

tokens = text.split()

print(tokens)

Applications of N-gram Models

1. Search Engines

Google uses N-grams for search suggestions.

Example:

  • “Best AI ___”
  • Suggestions:
  • tools
  • courses
  • software

2. Auto-complete Systems

Smartphones predict next words.

3. Machine Translation

Used for predicting correct translated sentences.

4. Speech Recognition

Converts voice into text.

5. Spam Detection

Detects spam messages based on word patterns.

Problems in N-gram Models

1. Data Sparsity Problem

Some word combinations never appear in training data.

Example:

“Quantum banana robot”

Its probability becomes zero.

2. Large Memory Requirement

Higher-order N-grams require huge storage.

3. Limited Context Understanding

Bigram only sees one previous word.

4. Overfitting

Models may memorize training data.

Real-World Example

Google Keyboard predicts words using language models.

Input:

“How are”

Predictions:

  • you
  • we
  • they

The system uses N-grams to estimate the most probable next word.

Advantages of N-gram Models

  • Easy to implement
  • Fast training
  • Useful for small datasets
  • Good for beginner NLP systems

Disadvantages of N-gram Models

  • Poor long-distance understanding
  • Large storage requirements
  • Sparse data problem

Modern Alternatives to N-grams

Modern NLP systems use deep learning models:

  • RNN
  • LSTM
  • Transformers
  • GPT Models

These models capture long-range dependencies better than traditional N-grams.

Summary

In this lecture, we studied:

  • Natural Language Processing
  • Language Modeling
  • Probability in NLP
  • Chain Rule
  • N-gram Models
  • Unigram
  • Bigram
  • Trigram
  • Python Examples
  • Applications
  • Limitations

N-gram models are foundational concepts in NLP and are still important for understanding how modern AI language systems work.

Conclusion

Language Modeling is a core concept in NLP that helps machines understand human language statistically. N-gram models simplify sentence probability calculations by considering a limited number of previous words. Unigram, Bigram, and Trigram models are widely used in predictive systems, search engines, speech recognition, and AI applications.

Although modern AI systems now use advanced neural networks, N-grams remain an essential foundation for learning NLP concepts.