top of page

How to Categorize Keywords By Search Intent Using Python: A Complete Guide [2024]

Jul 23 -  Written By Jon-Paul LaFerriere

Support The Author: If you’ve found value in this article, please leave me a positive Google My Business review!

A positive review would really help go a long way! It’s super easy and only takes a few seconds. Everything is much appreciate, especially the positive feedback and how this content may have helped you. For any negative feedback or questions about this article, please reach out to me at hello@raredog.co and I’d be happy to help.

( Bonus: Check out our latest guide on how to scrape web content in bulk using python for just the main areas of the site! )

 

Python is a versatile and powerful programming language that can be used for a wide range of tasks, including data analysis and manipulation. In this guide, I'll walk through how you can use Python to categorize a list of keywords according to search intent, using a pre-prepared Excel file.

Before we get started, you'll need to have Python installed on your computer. If you don't have Python installed yet, follow these steps:

  1. Go to the official Python website.

  2. Download the latest version of Python.

  3. Open the downloaded file and follow the instructions to install Python.

After installing Python, we'll need to install a few Python libraries that we'll be using in this project: Pandas, NLTK, and Openpyxl. Here's how you can install them:

  1. Open Command Prompt (on Windows) or Terminal (on macOS or Linux).

  2. Type the following commands and press Enter after each one:

pip install pandas

pip install nltk

pip install openpyxl

These commands will install the necessary Python libraries using pip, which is a package manager for Python.

  • Pandas: This is a powerful data manipulation library that provides data structures and functions needed to manipulate structured data. It provides tools for reading and writing data, handling missing data, merging datasets, reshaping data, and much more.

  • NLTK (Natural Language Toolkit): This is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

  • Openpyxl: This is a Python library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. It's a comprehensive tool to create, modify and read Excel documents while maintaining all of the Excel file’s properties.

For this project, you'll need two Excel files:

1. An Excel file with your list of keywords. We'll refer to this file as kws.xlsx in this guide.

  1. The structure here is important! The first (and only) column should be titled “Keyword”, with your list of keywords underneath it. Also the name of the file is important as well if you want to copy and paste the python script below as is (otherwise you’ll have to modify the python script example I give you here to match the name). File extension is important as well, needs to be an xlsx (or you can update the script to use csv’s but I use xlsx here)

2. An Excel file with your search intent categories and corresponding keyword modifiers.

  1. This file should have two columns: one for the search intent category and one for the keyword modifiers that correspond to that category. The first column should be named “Search Intent Type” and the second column should be named “Keyword Modifiers”. We'll refer to this file as kw-intent.xlsx in this guide (name of file and file type need to match this).

  1. Here’s an example document you can make a copy of/ download for yourself

Place these two Excel files in a new folder on your computer. For this guide, we'll use the folder path (update to your path as needed) “C:\Users\YourUsername\Documents\PythonScripts\KeywordCategorization\”

Next, we'll create two Python scripts: one for preprocessing the data and one for categorizing the intents of the keywords.

import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
import pandas as pd
import re

# Initialize lemmatizer and stopwords
lemmatizer = WordNetLemmatizer()
stopwords = set(stopwords.words('english'))

def preprocess(text):
    # Convert to string
    text = str(text)
    # Convert to lowercase
    text = text.lower()
    # Remove special characters
    text = re.sub(r'\W', ' ', text)
    # Remove single characters
    text = re.sub(r'\s+[a-zA-Z]\s+', ' ', text)
    # Substitute multiple spaces with single space
    text = re.sub(r'\s+', ' ', text, flags=re.I)
    # Lemmatize and remove stopwords
    text = [lemmatizer.lemmatize(word) for word in text.split() if word not in stopwords]
    return ' '.join(text)

# Apply preprocess function to a list of words
def preprocess_list(word_list):
    return [preprocess(word) for word in word_list]

# Load data
keywords_df = pd.read_excel('kws.xlsx')
intent_df = pd.read_excel('kw-intent.xlsx')

# Keep a copy of the original keywords
keywords_df['Original Keyword'] = keywords_df['Keyword']

# Preprocess keywords
keywords_df['Keyword'] = keywords_df['Keyword'].apply(preprocess)

# Split the "Keyword Modifiers" column into lists of words
intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].str.split(", ")

# Preprocess each word in the "Keyword Modifiers" column
intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].apply(preprocess_list)

# Save the processed data to Excel files
keywords_df.to_excel('preprocessed_kws.xlsx', index=False)
intent_df.to_excel('preprocessed_kw-intent.xlsx', index=False)

This script loads your keywords and search intent data, preprocesses the data (converts it to lowercase, removes special characters, lemmatizes the words, and removes English stopwords), and then saves the preprocessed data back to Excel files.

import pandas as pd

# Load the preprocessed data
keywords_df = pd.read_excel('preprocessed_kws.xlsx')
intent_df = pd.read_excel('preprocessed_kw-intent.xlsx')

# Convert the Keyword Modifiers column into lists of words


# Since the "Keyword Modifiers" column has been preprocessed and saved as a string, we need to evaluate the string to get the list back
intent_df["Keyword Modifiers"] = intent_df["Keyword Modifiers"].apply(eval)

# Initialize a new column in keywords_df to store the intents
keywords_df["Intent"] = "Uncategorized"

# Loop through each keyword
for i, keyword in keywords_df["Keyword"].items():
    # Initialize a list to store the intents of the current keyword
    intents = []
    
    # Convert keyword to string and split it into individual words
    keyword_words = set(str(keyword).split())
    
    # Loop through each intent
    for _, row in intent_df.iterrows():
        # Check if any word from the current intent appears in the keyword_words set
        if any(word in keyword_words for word in row["Keyword Modifiers"]):
            # If it does, append the intent to the list
            intents.append(row["Search Intent Type"])
    
    # If the list of intents is not empty, assign it to the Intent column of the current keyword
    if intents:
        keywords_df.at[i, "Intent"] = ", ".join(intents)

# Save the results to a new Excel file
keywords_df[['Original Keyword', 'Intent']].to_excel('keywords_with_intents.xlsx', index=False)

This script loads the preprocessed data, checks each keyword to see if it contains any of the keyword modifiers associated with each search intent, and then assigns the corresponding search intent(s) to each keyword. The results are saved to a new Excel file.

Now that you have your Python scripts ready, you can run them to categorize your keywords.

1. Open Command Prompt or Terminal.

2. Navigate to the folder where you saved your Python scripts. You can do this by typing cd followed by the folder path. For this guide, the command would be:

cd C:\Users\YourUsername\Documents\PythonScripts\KeywordCategorization\

1. Run the preprocessing script by typing python preprocess.py and pressing Enter.

2. Once the preprocessing script has finished running, run the intent categorization script by typing python categorize.py and pressing Enter.

The intent categorization script will create a new Excel file (keywords_with_intents.xlsx) in the same folder. This file will contain your list of keywords along with their categorized intents.

If you encounter any errors while running the Python scripts, here are a few common issues and how you can resolve them:

  • ModuleNotFoundError: If you see an error message like ModuleNotFoundError: No module named 'pandas', it means that the Python library (in this case, Pandas) is not installed. You can install it by typing pip install pandas in Command Prompt or Terminal and pressing Enter.

  • FileNotFoundError: If you see an error message like FileNotFoundError: [Errno 2] No such file or directory: 'kws.xlsx', it means that the Python script cannot find the Excel file. Make sure that your Excel files are in the same folder as your Python scripts and that the filenames in your Python scripts match the actual filenames.

  • AttributeError: If you see an error message like AttributeError: 'float' object has no attribute 'split', it means that one or more of your keywords are recognized as a float (numeric) type, not a string. This can happen if your data contains numeric values or NaN (Not a Number) values, which are default interpreted as float by pandas. The Python scripts in this guide are designed to handle this issue by converting keywords to strings before splitting them.

In this guide, we've walked through how you can use Python to categorize a list of keywords according to search intent. With a basic understanding of Python and a pre-prepared Excel file, you can automate this categorization process and gain valuable insights from your keyword data. Happy coding!

We can help. Reach out to us and we’ll help you stand out from the pack by optimizng your site and company’s SEO presence.

Support The Author: If you’ve found value in this article, please leave me a positive Google My Business review!

A positive review would really help go a long way! It’s super easy and only takes a few seconds. Everything is much appreciate, especially the positive feedback and how this content may have helped you. For any negative feedback or questions about this article, please reach out to me at hello@raredog.co and I’d be happy to help.

Husband, father, digital marketing nerd and tech junkie.

https://www.linkedin.com/in/jon-laferriere/

https://www.raredogmarketing.com

Free Marketing Tips & Tricks

From analytics to content ideation, sign up for our monthly newsletter for marketing tips and workflow hacks that save you time and money.

PS - We will never sell your information

“Stand out from the pack!”

hello@raredog.co

(860) 255-8557

West Hartford, CT 06107

LINKEDIN / TWITTER

©2023 Rare Dog Marketing Co.

Jon-Paul LaFerriere - Owner, Principal Consultant

bottom of page