How to write a good Quote ? a data-driven investigation …
Analyzing quotes from top Medium articles
Last month I decided to validate a new idea, these days there are many “modern” quotes out there however they don’t get enough coverage like historical ones, so, I decided to collect and aggregate them in one place which I called trendywisdom.com, in this website I collected 100s of “highlighted” sentences (Quotes) from top Medium articles on the internet.
Being a super curious data guy with such unique dataset at hand, I decided to find out why great quotes are great ? in other words
What makes a great Quote ?
For this research I used the python libraries SpaCy and NLTK to analyze the text and seaborn for charting. The dataset contains 1900+ quotes
Top Words & Topics
Let’s start by finding out what are the top words used in the Quotes
So, If you want to write a great Quote
Talk about People, Time & Life …
“Surround yourself with people who represent what you ultimately want to become”
Start words
Most of the quotes started with the following words
Start the Quote with “The”, “If”, “I” or “We” …
“The purpose of life is not to be happy it is to be useful to be honorable to be compassionate to have it make some difference that you have lived and lived well” Ralph Waldo Emerson
End Words
Quotes usually end with the following words
Quotes usually ends with “It”, “You”, “Them” or the main topic of the Quote
“If you want something, anything, do the work and earn it”
Quote Length
How long should the Quote be ?
Try to limit the number of words in your quote between 10 and 20
Patterns — Trigrams
Let’s find the most common 3 consecutive words in the Quotes ?
In your Quote, talk about people’s Needs and guide them to Action …
“The problem isn’t imperfection. It is inaction. All you have to do is anything.”
Patterns — Part of Speech
Can we find common linguistic patterns in Quotes ?
Using the Part of Speech (PoS) Tagging feature is SpaCy, I managed to extract common repetitive linguistic patterns used by quote writers
The meaning of each PoS Tag can be found here
As you can see there are two common patterns used in almost 25% of the Quotes
- Noun + Adposition + Determiner + Noun
- Verb + Adposition + Determiner + Noun
Adpositions: are used to express spatial or temporal relations (Example: of )
Determiner: may indicate whether the noun is referring to a definite or indefinite element of a class (example: The)
Some examples:
Sides of the table → Noun Adposition Determiner Noun
Look at the problem → Verb Adposition Determiner Noun
The key to success → Determiner Noun Adposition Noun
“It’s easy to criticize or redesign other people’s work. But it’s only effective if you look at the problem and solution from both sides of the table”
To Sum up, do the following to write a good Quote
- Talk about People, Time & Life
- Talks about needs
- Motivate the reader by suggesting actions
- Make it short, around 15 words
This analysis only scratches the surface, if you are interested to dive deep you can find the code and the dataset on Github through this link. You can also find the same on Kaggle.com here.