Data Science Portfolio

Text Data and Word Cloud: William Shakespeare

Data Source: "The Merchant of Venice" by Shakespeare

I obtained the text from MIT's website (http://shakespeare.mit.edu/merchant/full.html) The text were copied and pasted into csv file and loaded into python.

Raw file

I constructed a function to drop any blank rows, and then process the text file into one line of strings separated by space, for creating the word cloud. The large text at the bottom of this picture shows the final input into word cloud.

Now the data has been processed and ready to use. Let's start with a very basic word cloud. I created a word cloud using built-in stop words.

I can also change the background color from the default black to the color I chose. Let's try white!

I have my word cloud visualization created! However, you may notice that the most frequent (biggest words) were likely to be names. And yes, that makes total sense! There are many characters in the play, and their names will be repeated many many times. So, what if I'm not interested in seeing what names Shakespeare used more often? One other thing I noticed in my basic chart was the ancient English words like thy, thou, etc... were included. It's likely the built-in stop words do not cover any ancient English. To get a better understanding of what words Shakespeare used often, I create my own list of stop words and appended the built-in list to my own list. Problem Solved! See below for the codes, and the final output!


In The Merchant of Venice, Shakespeare used "come", "well", "Jew", most often. "good" and "love" are also among his favorites. What's the word I used most often? "Data"!!!

Tired of seeing the rectangle shaped cloud all the time? Let me add some more fun! Use mask from a picture you like to shape the word clouds into US map? Yoga pose? or what ever you feel like!

Stay Connected!

If you are interested in our Data Science and Business Analytics solutions, feel free to reach out to admin@abetterme.us