How to write a good Quote ? a data-driven investigation …

Analyzing quotes from top Medium articles

Image for post
Image for post
Image by Mert Talay on Unspash

Last month I decided to validate a new idea, these days there are many “modern” quotes out there however they don’t get enough coverage like historical ones, so, I decided to collect and aggregate them in one place which I called trendywisdom.com, in this website I collected 100s of “highlighted” sentences (Quotes) from top Medium articles on the internet.

Being a super curious data guy with such unique dataset at hand, I decided to find out why great quotes are great ? in other words

For this research I used the python libraries SpaCy and NLTK to analyze the text and seaborn for charting. The dataset contains 1900+ quotes

Top Words & Topics

Let’s start by finding out what are the top words used in the Quotes

Image for post
Image for post
Top words (NOUNS) used in Quotes

So, If you want to write a great Quote

“Surround yourself with people who represent what you ultimately want to become”

Nicolas Cole

Start words

Most of the quotes started with the following words

Image for post
Image for post
Top first words in all Quotes

“The purpose of life is not to be happy it is to be useful to be honorable to be compassionate to have it make some difference that you have lived and lived well” Ralph Waldo Emerson

End Words

Quotes usually end with the following words

Image for post
Image for post
Top last words for all Quotes

“If you want something, anything, do the work and earn it”

Casey Neistat

Quote Length

How long should the Quote be ?

Image for post
Image for post
Number of words in top Quotes

Patterns — Trigrams

Let’s find the most common 3 consecutive words in the Quotes ?

Image for post
Image for post
Top Tri-grams in Quotes

“The problem isn’t imperfection. It is inaction. All you have to do is anything.”

Your Fat Friend

Patterns — Part of Speech

Can we find common linguistic patterns in Quotes ?

Using the Part of Speech (PoS) Tagging feature is SpaCy, I managed to extract common repetitive linguistic patterns used by quote writers

Image for post
Image for post

The meaning of each PoS Tag can be found here

As you can see there are two common patterns used in almost 25% of the Quotes

  1. Noun + Adposition + Determiner + Noun
  2. Verb + Adposition + Determiner + Noun

Adpositions: are used to express spatial or temporal relations (Example: of )

Determiner: may indicate whether the noun is referring to a definite or indefinite element of a class (example: The)

Some examples:

Sides of the table → Noun Adposition Determiner Noun

Look at the problem → Verb Adposition Determiner Noun

The key to success → Determiner Noun Adposition Noun

“It’s easy to criticize or redesign other people’s work. But it’s only effective if you look at the problem and solution from both sides of the table

Jason Li

To Sum up, do the following to write a good Quote

  • Talk about People, Time & Life
  • Talks about needs
  • Motivate the reader by suggesting actions
  • Make it short, around 15 words

This analysis only scratches the surface, if you are interested to dive deep you can find the code and the dataset on Github through this link. You can also find the same on Kaggle.com here.

Written by

Freelance Consultant — Data & Product. Writing about Data, Entrepreneurship and Life. https://karim.ouda.net

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store