Persian poetry is a rich cultural treasure, and Ganjoor is one of the most accessible and comprehensive platforms dedicated to classical Persian literature. This invaluable resource hosts works by iconic poets such as Hafez, Rumi, Saadi, and Ferdowsi, offering a user-friendly interface to explore their timeless writings. Beyond simply reading these masterpieces, Ganjoor enables researchers, enthusiasts, and developers to engage with Persian poetry in innovative ways.
By leveraging programming, we can analyze and visualize these texts creatively. In this project, I will demonstrate how I crawled Hafez’s ghazals from Ganjoor to create a Persian word cloud. This visualization highlights the most frequently used words in his work, offering an engaging way to appreciate the themes and motifs of classical Persian poetry.
Before proceeding, install the necessary Python libraries:
pip install seleniumpip install wordcloud
pip install arabic-reshaper
pip install python-bidi
pip install matplotlib
Additionally, download and configure ChromeDriver for Selenium. Ensure it matches the version of your installed Chrome browser. Place the chromedriver executable in a known directory, and set its path in the script.
Below is the script for extracting all verses (مصرع) from Hafez’s ghazals on Ganjoor:
from selenium import webdriverfrom selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from itertools import zip_longest
driver_path = "path you driver"
service = Service(driver_path)
driver = webdriver.Chrome(service=service)
mesras = []
for i in range(494):
try:
url = f"https://ganjoor.net/hafez/ghazal/sh{i+1}"
driver.get(url)
odd_mesras = driver.find_elements(By.CLASS_NAME, "m1")
odd_mesras_texts = [mesra.text for mesra in odd_mesras]
even_mesras = driver.find_elements(By.CLASS_NAME, "m2")
even_mesras_texts = [mesra.text for mesra in even_mesras]
for odd_mesra, even_mesra in zip_longest(odd_mesras_texts, even_mesras_texts, fillvalue=""):
if odd_mesra:
mesras.append(odd_mesra)
if even_mesra:
mesras.append(even_mesra)
except:
pass
This block of code uses Selenium to scrape verses from each ghazal’s page. It iterates through URLs sequentially, extracts verses by targeting specific class names, and appends them to a list for later use. Errors during the crawling process are ignored to ensure smooth execution.
With the collected verses, we preprocess the text, remove stopwords, and generate a word cloud. The following script accomplishes this:
from wordcloud import WordCloudimport arabic_reshaper
from bidi.algorithm import get_display
import matplotlib.pyplot as plt
from matplotlib import cm
text = " ".join(mesras)
persian_stopwords = {'ای', 'چو', 'گر', 'بود', 'شد', 'بر', 'ز', 'است', 'و',
'که', 'از', 'به', 'در', 'را', 'با', 'این', 'آن', 'برای',
'تا', 'چه', 'یا', 'هم', 'هر', 'چون', 'اما', 'اگر'}
filtered_text = " ".join(word for word in text.split() if word not in persian_stopwords)
reshaped_text = arabic_reshaper.reshape(filtered_text)
display_text = get_display(reshaped_text)
wordcloud = WordCloud(
font_path="font path",
width=800,
height=600,
background_color="white",
max_words=200,
colormap=cm.plasma
).generate(display_text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
title_text = "ابر کلمات غزلیات حافظ"
reshaped_title = arabic_reshaper.reshape(title_text)
display_title = get_display(reshaped_title)
plt.title(display_title, fontsize=60, color="black", fontproperties={'fname': '/Users/sirwan/Downloads/vazirmatn-v33.003/fonts/ttf/Vazirmatn-Light.ttf'})
plt.show()
This block handles preprocessing, including combining verses into a single string, removing common stopwords, and reshaping text for proper display. It then generates a word cloud using a Persian font and displays it with a title in Persian.
The resulting word cloud is a visually striking representation of the most frequently used words in Hafez’s ghazals. This approach can easily be extended to other poets or Persian texts available on Ganjoor, providing a unique lens to explore the language and themes of Persian poetry.
By following these steps, you can delve into the beauty of Persian literature and create engaging visualizations that inspire deeper appreciation and analysis.