HEY   HOMBRE!
It looks like this is your first visit. For the next hour you can use the promo code:
to get a free week of GHTV. You’ll never see this deal again, so don’t be a stupid head.
deal countdown

Please rotate your tablet to
use GrowthHacker TV.

Intermediate

Scientific Blogging


Web scraping to optimize sharing
goal
Have you ever noticed how some blog posts take off like an intergalactic rocket ship while others… don’t? Using web scraping and basic analysis, you can now isolate some of the differences and optimize future performance.
Ingredients
Import.io
Excel
SharedCount.com
Textalyser.net
Personnel
Growth Hacker


Time
30 Minutes
Your account is past due.
Your account is currently past due. In order to interact with the community, please update your billing info.
Update

Overview

I’m terrible at predicting the virality of my posts. Sometimes an article I’ve poured hours of writing and editing into will flop while the post I threw together before work drives thousands of visits.

At first I was just frustrated, but then I remembered: working smart always gets better results than working hard. So I decided to analyze the performance of my current posts in order to optimize my future writing. I wanted to know what day of the week to publish, what headlines to use, and what topics to cover.

Step One

First, you’ll need to scrape your blog to find your post titles, authors, categories, tags, publication dates, and URLs. To do this, we’ll use a web scraping tool. Personally, I’m a big fan of http://www.import.io because it’s free and super easy to use.

After you’ve installed the program, navigate to your blog, select crawler mode, and choose the multiple rows option. Then, you’ll need to highlight all rows of content on your blog index (a row would be a one post block). Once it’s accurately identifying rows, it will ask you to highlight columns, and you’ll add, one by one, your post title, author name, publication date, URL, and any other information you can find. URL will be the only semi-difficult field to add - just select “html” instead of “text” when you highlight your post’s link, and it will give you the information you need.

After verifying the crawler on at least five pages, you can save this API template.

Step Two

Once you’ve trained http://www.import.io, you will run the crawler on your website. First, you’ll need to create a list of your archive page URLs. Open Excel, and create a column that lists numbers from 1 to 50. Then, looking at your website’s URL structure, create a simple formula that combines those numbers with your website. (For my site, the formula might look like this: =“http://www.natedesmond.com/blog/”&A2&“/”) This will give you a full list of archive page URLs for your website.

Then open the crawler, paste in the list of URLs, and run the crawler you created in step one.

Once it finishes, you can export this list of all your blog’s meta data to Excel.

Step Three

So now you have all your post meta data, but how do you know if a post performed well? To measure performance, we’ll pull social statistics for each post.

Going to http://SharedCount.com, you’ll upload your list of URLs (which is in the export from http://Import.io). SharedCount will then run automatic analysis to see how each post performed on various social networks.

After it finishes running, simply export from SharedCount and use a vlookup formula (https://exceljet.net/excel-functions/excel-vlookup-function) to combine your social count data with your post meta data.

Step Four

Now the real fun starts. You’ve got your data in hand, and you’re ready to start finding key insights.

Here are a few data points you can pull:

Popular Authors

To see which author writes the most popular posts, simply add a pivot table (https://exceljet.net/excel-tips/pivot-tables) with one column for author name and one column for average social score. To make sure your data is statistically significant, you can also add a third column that counts total posts per author.

Best Day of Week

Using a day of week formula (https://exceljet.net/formula/get-day-of-week-name-from-date), add an additional column that shows which day of week each article was published on. Then create another pivot table showing day of week, social score, and number of posts.

Best Topics

Using a basic text analyzer (http://textalyser.net/), see which words occur most commonly in your titles. Then, for the most popular words, add another column to your Excel table that checks whether each title contains that word (it might look something like this: =IF(ISNUMBER(SEARCH("wordtocheckfor",A2)),"Contains", "Doesn’t Contain") ) Then just run another pivot table to see whether particular topics increase or decrease the popularity of a post.

Conclusion

Now you have actionable steps that can help you make your future posts more likely to become popular. Of course, virality is never guaranteed, but you can increase your chances significantly. Apply your new knowledge.

Recipe courtesy of Nate Desmond (http://www.natedesmond.com)
Photo courtesy of Duncan Hull