How to write a good Quote ? a data-driven investigation …

Karim Ouda
4 min readFeb 21, 2020

Analyzing quotes from top Medium articles

Image by Mert Talay on Unspash

Last month I decided to validate a new idea, these days there are many “modern” quotes out there however they don’t get enough coverage like historical ones, so, I decided to collect and aggregate them in one place which I called, in this website I collected 100s of “highlighted” sentences (Quotes) from top Medium articles on the internet.

Being a super curious data guy with such unique dataset at hand, I decided to find out why great quotes are great ? in other words

What makes a great Quote ?

For this research I used the python libraries SpaCy and NLTK to analyze the text and seaborn for charting. The dataset contains 1900+ quotes

Top Words & Topics

Let’s start by finding out what are the top words used in the Quotes

Top words (NOUNS) used in Quotes

So, If you want to write a great Quote

Talk about People, Time & Life …

“Surround yourself with people who represent what you ultimately want to become”

Start words

Most of the quotes started with the following words

Top first words in all Quotes

Start the Quote with “The”, “If”, “I” or “We” …

“The purpose of life is not to be happy it is to be useful to be honorable to be compassionate to have it make some difference that you have lived and lived well” Ralph Waldo Emerson

End Words

Quotes usually end with the following words

Top last words for all Quotes

Quotes usually ends with “It”, “You”, “Them” or the main topic of the Quote

“If you want something, anything, do the work and earn it”

Quote Length

How long should the Quote be ?

Number of words in top Quotes

Try to limit the number of words in your quote between 10 and 20

Patterns — Trigrams

Let’s find the most common 3 consecutive words in the Quotes ?

Top Tri-grams in Quotes

In your Quote, talk about people’s Needs and guide them to Action …

“The problem isn’t imperfection. It is inaction. All you have to do is anything.”

Patterns — Part of Speech

Can we find common linguistic patterns in Quotes ?

Using the Part of Speech (PoS) Tagging feature is SpaCy, I managed to extract common repetitive linguistic patterns used by quote writers

The meaning of each PoS Tag can be found here

As you can see there are two common patterns used in almost 25% of the Quotes

  1. Noun + Adposition + Determiner + Noun
  2. Verb + Adposition + Determiner + Noun

Adpositions: are used to express spatial or temporal relations (Example: of )

Determiner: may indicate whether the noun is referring to a definite or indefinite element of a class (example: The)

Some examples:

Sides of the table → Noun Adposition Determiner Noun

Look at the problem → Verb Adposition Determiner Noun

The key to success → Determiner Noun Adposition Noun

“It’s easy to criticize or redesign other people’s work. But it’s only effective if you look at the problem and solution from both sides of the table

To Sum up, do the following to write a good Quote

  • Talk about People, Time & Life
  • Talks about needs
  • Motivate the reader by suggesting actions
  • Make it short, around 15 words

This analysis only scratches the surface, if you are interested to dive deep you can find the code and the dataset on Github through this link. You can also find the same on here.



Karim Ouda

Freelance Consultant — Data & Product. Writing about Data, Entrepreneurship and Life.