Davy Jakim Niwemwungere

Overview

This project served as a quick and practical introduction to Natural Language Processing (NLP). The main goal was to explore whether NLP techniques could be used to take a large chunk of text—such as a story—and reduce its length while preserving its core meaning and important information. Through this experiment, I was able to successfully summarize lengthy text into a much shorter version, maintaining the original context. The approach is simple and accessible, making it easy for anyone to try out and understand the power of NLP in text summarization.

Tech Stack

what needs to be downloaded

The entire program is written in Python, making it simple and lightweight. To run the code, you just need to install a few required packages as listed below.

nltk ( pip install nltk )

Then these NLTK package need to be downloaded, the code for it is as follow:

nltk.download('punkt') # for sent_tokenize and word_tokenize
nltk.download('stopwords') # for stopwords
nltk.download('averaged_perceptron_tagger') # for pos_tag (part of speach tagging)

The code

Needed Imports

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk import pos_tag
from collections import Counter
import nltk

# Optionally downloaded the following if not installed yet
# nltk.download('punkt_tab')
# nltk.download('stopwords')
# nltk.download('averaged_perceptron_tagger_eng')

Summarization function

The function takes in the text to be summarized and a parameter n_topic, which determines the number of key topics to include. A higher n_topic value results in a longer, more detailed summary.

def Summarizer(text,n_topic=4):

    sentences = sent_tokenize(text)
    stop_words = set(stopwords.words('english'))
    tokens = [word.lower() for word, pos in pos_tag(word_tokenize(text))
              if pos.startswith(('NN')) and word.lower() not in stop_words and word.isalpha()]

    main_keywords = [word for word, _ in Counter(tokens).most_common(n_topic)]
    goal_indicators = set(main_keywords)

    summary_sentences = [sentence for sentence in sentences if any(word in sentence.lower() for word in goal_indicators)]

    return ' '.join(summary_sentences)

Testing

This section is meant for testing the text summarization function in action.

Path_text_toSummarize= r'path to the text file'
# enter a path of the file containing text you want to summarize
with open(Path_text_toSummarize, 'r', encoding='utf-8') as file:
    text = file.read()

print(f'Initial text length: {len(text)} summary length: {len(summary)}')
print(summary)

Why It Matters

This summarizer demonstrates how NLP can be used to extract meaning from large text bodies in a lightweight and interpretable way

Interesting Use Cases

Search Functionality: Summaries can be indexed instead of—or alongside—full documents to improve the efficiency and relevance of search results.
Chatbot and Virtual Assistant Support: By reducing the size and complexity of user inputs, summaries help virtual assistants process queries more efficiently and respond faster

Conclusion

This was a fun and practical way to get started with Natural Language Processing. I built a simple text summarizer using NLTK that takes a large chunk of text and pulls out the most important parts. It worked surprisingly well and showed how even basic NLP techniques can make tasks like summarizing or improving search more efficient. Overall, this project gave me a solid foundation in NLP and opened up a lot of ideas for where to go next.