Monthly Archives

November 2019

How To Block link Crawlers like Majestic Ahrefs Moz and SEMRush

Posted on November 10, 2019

The web has a lot of web crawlers, some of them are good and vital for your website such as Google bot, others can be harmful like email harvesting crawlers and content scrapers. Link crawlers come short of harmful but far from useful. They are not useful for your website, and they are not harmful in way they try to scrape content or anything like that, but they could be consuming your server resources with no benefit.

For SEOs that adopt black hat tactics like PBN (private blog network) those crawlers are a nightmare and can expose the network to competitors if left open, which in most cases will lead to a spam report causing the whole network to be de-indexed + a manual action applied to the money site if not a total deindexation.

The most popular link crawlers are Majestic, Ahrefs, Moz and SEMRush, please note that their crawlers user-agents will not match their brand name and can change in the future, so it is very important to keep an up-to-date list with the user-agents used by those crawlers. I will list below different ways to block them:

Robots.txt:

You add few lines to your robots.txt file that can disallow most popular link crawlers:

User-agent: Rogerbot
User-agent: Exabot
User-agent: MJ12bot
User-agent: Dotbot
User-agent: Gigabot
User-agent: AhrefsBot
User-agent: SemrushBot
User-agent: SemrushBot-SA
Disallow: /

The method above will be very effective assuming:

You trust those crawler to obey the directions in the robots.txt file.
The crawlers do no keep changing their user-agent's names.
The companies that operate those crawlers do not use third party crawling services that come under different user-agents.

.htaccess:

The issue with this method is that it requires your hosting provider to be Apache based, if your host supports htaccess you can use the code below to block most popular link crawlers:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} (ahrefsbot|mj12bot|rogerbot|exabot|dotbot|gigabot|semrush) [NC]
RewriteRule .* - [F,L]
</IfModule>

This method is better that robots.txt as the crawlers have no choice but to obey assuming they are not changing their user-agents, or using third party crawlers.

Using PHP:

If you website is built with PHP like WordPress you can add the code below to your header.php to block all link crawlers:

$badAgents = array('rogerbot','mj12bot', 'ahrefsbot', 'semrush', 'dotbot', 'gigabot', 'archive.org_bot');
foreach ($badAgents as $blacklist) {
if (preg_match("/$blacklist/", strtolower($_SERVER['HTTP_USER_AGENT'])) ) {
exit();
} }

This methods is good if your server doesn't support .htaccess , if you are using this method you need to make sure you block also the RSS feed feature in WordPress, you can do that by adding the code below to your function.php file in the theme folder:

function wpb_disable_feed() {
wp_die( __('No feed available,please visit our <a href="'. get_bloginfo('url') .'">homepage</a>!') );
}
add_action('do_feed_xml', 'wpb_disable_feed', 1);
add_action('do_feed', 'wpb_disable_feed', 1);
add_action('do_feed_rdf', 'wpb_disable_feed', 1);
add_action('do_feed_rss', 'wpb_disable_feed', 1);
add_action('do_feed_rss2', 'wpb_disable_feed', 1);
add_action('do_feed_atom', 'wpb_disable_feed', 1);
add_action('do_feed_rss2_comments', 'wpb_disable_feed', 1);
add_action('do_feed_atom_comments', 'wpb_disable_feed', 1);

Aggressive blocking (for PBN users):

If you are a regular webmaster that it is willing to save some server resources by blocking link crawlers, applying any of the methods above should be suffice; however, if you are a webmaster that wants to leave no chance for those crawlers to sneak in, you need to apply harsher measurements.

Robots.txt will be as below:

User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

This will allow only Google bot to crawl the website assuming the crawlers will obey robots.txt directions. You can also allow other agents used by major search engines like Bing.

If you are using Wodrepss you can hide the links from all user-agents excluding Google using the code below in functions.php:

add_filter( 'the_content', 'link_remove_filter' );
function link_remove_filter( $content ) {
if (!preg_match("/google/", strtolower($_SERVER['HTTP_USER_AGENT'])) && !preg_match('/\.googlebot|google\.com$/i', gethostbyaddr($_SERVER['REMOTE_ADDR'])) ) {
$content = preg_replace('#<a.*?>(.*?)</a>#is', '\1', $content);
}
return $content;
}

This code will allow only Google to see the links, it verifies also that the IP address belongs to Google and it is not faked.

Make sure also to block RSS using the code listed in the previous step, the code above will not be impacted by those crawlers changing their agents or coming with different agent's names.

Integrating Google Ads Lead Form With CRMs

Posted on November 9, 2019

Google has announced the rollout of Lead Form Extensions that enable advertisers to capture form submissions directly from the ad without sending users to a landing page, the submissions will be stored in Google Ads' database and will be downloadable as a CSV file, there is also an option to integrate with CRMs using a webhook, this howGoogle explained the integration:

A webhook is an API that enables you to send lead data to your CRM system in real-time. To set it up, you will need to add a webhook URL and key to your lead form extension. You may need to do some configuration within your CRM system to generate the URL and key.

The URL is the delivery path: after the user submits a lead form, an HTTP POST request is sent to the configured URL, allowing the lead data to go directly into the CRM system. The key is used for validating the leads sent.

The explanation above is confusing for marketers and at least unclear for developers. What is the webhook URL and where to find it? This program (Google ads lead form extension) is in Beta now, and I am not sure if there is any CRM that supports it at the moment, even Salesforce doesn't have a webhook URL for it yet.

How to generate a webhook for the lead form extensions?

The best resource that can help with that is the developer guide provided by Google, anytime the form is submitted to Google a JSON POST request will be sent to the webhook URL added in the form lead extensions, something like below:

{
"lead_id": "lead_id1",
"form_id" : "form_id1",
"user_column_data":[ {"column_name":"Full Name","string_value":"John Doe"},
{"column_name":"User Phone", "string_value":"12345678"},
{"column_name":"User Email", "string_value":"[email protected]"}],
"api_version":"1.0",
"google_key" : "secret"
}

Firs you need to decide what is the best way to capture this data, for me I am going to use PHP to do that, my webhook could be something like this https://www.wisamabdulaziz.com/webhook.php with the code below:

$json = file_get_contents('php://input');
$form_data = json_decode($json);
$leadid = $form_data->lead_id ;
$form_id = $form_data->form_id ;
$fullname = $form_data->user_column_data[0]->string_value;
$phone = $form_data->user_column_data[1]->string_value;
$email = $form_data->user_column_data[2]->string_value;

At this point all the values are available in the code and ready to be pushed to any CRM or a local database, most popular CRMs like SalesForce and Hubspot have APIs with PHP libraries (they have also libraries available for the most popular programming languages) making it easy to push the data to those CRMs.

Additional values like campaign and keyword could be available also in the fields and worth saving. Before pushing any values to your CRM make sure to create fields to match all the values you will be saving.

How to Diagnose Organic Traffic Loss

Posted on November 4, 2019

Traffic loss is one of the most popular issues that can face SEO specialists, traffic can not go up forever and each website at some point will face a traffic loss situation that needs to be diagnosed. To help you to diagnose a traffic loss situation I will take you first thorough the most popular reasons to lose organic traffic, understanding those reasons and learning how to monitor them will make diagnosing traffic loss an easier task.

Most popular reasons to lose traffic:

1- Ranking loss:

Position #1 in the SERP can enjoy 30% or more click through rate, after that CTR will go down for every lower position in the SERP, position #10 can get 5% CTR or less. If a website losses ranking for highly searched keywords, the overall organic traffic will go down.

The most popular reasons for ranking loss are:

Algorithm updates (search engines like Google runs multiple updates every year).
Losing authority (e.g. Losing a lot of quality inbound links) or slow link growth.
Losing popularity (e.g. Less social signals and lower branded searches).
Website stagnation (no new content or no content refreshment).
CMS change or content change (e.g. website redesign which can include CMS change, URL change and content change).
Increase in competition, competitors could be providing better content and promoting their website more, so they get higher ranking.
Technical issues with search engines like crawlability, indexability, downtime, slow loading and manual actions.

2- SERP layout Change:

Google keeps changing the SERP layout, putting more ads at the top sometimes or featured snippets which can affect CTR while keeping the same ranking.

In the example above even a website ranks #1 it will be still below the fold, which can bring CTR significantly down, so we are not going to enjoy 30% CTR being number one anymore, we could be receiving 10% CTR only with the new layout.

3- Trending change and user behaviour:

Human needs and behaviours change throughout the years. Products and brands get disrupted sometimes and that can change search volume. A product like mini DVD has been disrupted by smart phones and tablets, what brought the search volume for it close to zero:

Another example could be online dating, the need for dating did not decline nor the need for online dating, but social media websites like Facebook are becoming a go-to destination for people that are looking for dating, which brought down the interest for the keyword online dating:

Step by step traffic loss audit:

Now that you know the most popular reasons to lose traffic, it is time to run some analysis on key metrics that can help us to evaluate a website against each traffic loss reason.

1- Identify which keywords are losing traffic using GSC:

Any traffic loss will be linked eventually to ranking and search volume for the keywords that are driving organic traffic, when a website is losing organic traffic the direct reason for that is either some keywords are getting less traffic or less keywords getting traffic, so the main focus of any traffic loss audit should be identifying which keywords are losing traffic, thankfully this analysis is made easier with GSC, just use the comparison feature and choose the too time spans that you want to analyse for traffic loss and identify which keywords are causing that loss:

Once you identify the keywords you need to assess what is causing them to lose traffic by running them against the reasons of losing ranking explained above.

2- Check the brand name in Google trends to make sure that the brand is not losing any popularity.

Run the checking for the last 5 years in the target audience country.

3- Link profile analysis:

Check the domain using Majestic SEO or Ahrefs to see if there is any recent link loss, the screen shot below is taken from Majestic:

Link loss will lead to lower authority which in most cases will cause ranking loss.

4- Check index growth in Google Search Console:

This will help to identify any deindexation issue:

Losing more indexed pages means losing ranking for any keywords those pages are ranking for. Deindexation could be a result of:

Technical errors in the website (e.g. server errors or very slow load time).
Duplicate or thin content.
Issues with setting up the canonical tags.
Algorithm update that affect crawling standards.
Losing authority (e.g. the link profile is getting weaker).

Summary:

Organic traffic loss can happen to any website, diagnosing the situation and finding the reasons behind it is not quick nor easy in most cases, sometimes you get lucky to find out that a technical issue on the site caused it, the webmaster can fix it quickly and things will go back to normal in few weeks, but for most other cases recovering may not be even possible or can take a very long time. The key thing when it comes to traffic loss is finding about early and start reacting to it immediately.

Track Form Abandonment Using Google Analytics Funnel Visualization

Posted on November 1, 2019

My previous post was about CRO, in that post I included form optimization as an important step for a better CRO. Form optimization can not be done without collecting data about users interaction with the form, and form abandonment is probably the most important metric that can do that.

What is form abandonment:

Form abandonment is the event where people start filling a form (at least filling one field) but did not click the submit button, so they have the intent to fill the form but due to possibly some hurdles the did not complete it, possible hurdles:

Long form (too many fields)
Technical issues (the form is not working on some devices or browsers)
The form has personal questions that users are not willing to fill
The form is broken or the Captcha is so difficult.

There are some online service like Hotjar that can track form abandonment, but unfortunately Hotjar does not work with every form easily, not mention that the data lives outside Google Analytics which means you have another platform to work on and monitor. You can see the form tracking chart offered by Hotjar below, it gives you the time spent by users field each field and the abandonment rate for each field.

In this post I will provide a step by step tutorial how to do form abandonment tracking using Google analytics. I will assume you a simple form on your website like the one belwo:

The source code of the form will like like this:

I will assume that the submission will lead to a thank you page like www.yourwebsite.com/thank-you, please be aware that this tutorial will require a good knowledge of Google Analytics and Google Tag Manager (GTM).

The method I will be explaining will be utilizing the funnel visualisation feature in Google Analytics, which is not designed initially for that, it is designed more to track multiple pages funnel. Considering that a high level of accuracy is not required here the method below should provide a good insight on where and why people abandoning your form.

Step #1:

First step will be pushing an event to the data layer when a user try to fill in a field along with the field name, you can do that by adding the JavaScript code below as a custom HTML tag to your Google Tag Manger:

(function($) {
$(document).ready(function() {
$('form :input').blur(function () {
if($(this).val().length > 0 && !($(this).hasClass('completed'))) {
switch($(this).attr('name')) {
case "first_name":
virtualp = "first-name";
break;
case "last_name":
virtualp = "last-name";
break;
case "phone_number":
virtualp = "phone";
break;
case "email_address":
virtualp = "email";
break;
case "comments":
virtualp = "comments";
break;
default:
virtualp = "unknown";
}

dataLayer.push({'eventCategory': 'Form - ' + $(this).closest('form').attr('name'),
'eventAction': 'completed',
'feildLabel': virtualp,
'event': 'gaEvent'});
$(this).addClass('completed');
}
else if(!($(this).hasClass('completed')) && !($(this).hasClass('skipped'))) {
$(this).addClass('skipped');
}
});
});
})(jQuery);

Please note that:

The code above assume that jQuery is installed already on the website.
You need to change the case to match your field names.
You can change virtualp (which is going to be the virtual page URL) to anything also just make sure the page is not already existing on your website.

Step #2:

Add a datalayer variable to track the field name:

Step #3:

Create a virtual page view to track every form field filling as a pageview:

The firing rule for the tag above will be as below:

After that publish your GTM container.

Step #4:

In Google analytics create a goal that tracks the thank your page with funnel steps that reflect the virtual page names you have set in step #1:

When you have enough data you should be able to see a funnel visualisation as below:

The funnel above should tell which field is causing that highest abandonment rate so you can either remove or change it.

Step #5 - Bonus - Track scrolling down as a page view:

Some forms are placed below the fold, or they are multiple step forms where people need to scroll down to see the other parts of the form. For those forms you can add the scrolling down to the form location as a virtual page view (see the tag and the trigger below and use them in GTM):

Summary:

Form tracking abandonment is an important part of conversion rate optimization, always do it with any tool you feel comfortable with, there are other things you can do from a CRO prospective when it comes to form:

Exit intent pop-up window trying to give users unrefusable offer to stay on the page and fill the form.
Tracking filled fields even if people did not click submit and use it to better understand user behaviour.