The web has a lot of web crawlers, some of them are good and vital for your website such as Google bot, others can be harmful like email harvesting crawlers and content scrapers. Link crawlers come short of harmful but far from useful. They are not useful for your website, and they are not harmful in way they try to scrape content or anything like that, but they could be consuming your server resources with no benefit.
For SEOs that adopt black hat tactics like PBN (private blog network) those crawlers are a nightmare and can expose the network to competitors if left open, which in most cases will lead to a spam report causing the whole network to be de-indexed + a manual action applied to the money site if not a total deindexation.
The most popular link crawlers are Majestic, Ahrefs, Moz and SEMRush, please note that their crawlers user-agents will not match their brand name and can change in the future, so it is very important to keep an up-to-date list with the user-agents used by those crawlers. I will list below different ways to block them:
You add few lines to your robots.txt file that can disallow most popular link crawlers:
This methods is good if your server doesn't support .htaccess , if you are using this method you need to make sure you block also the RSS feed feature in WordPress, you can do that by adding the code below to your function.php file in the theme folder:
If you are a regular webmaster that it is willing to save some server resources by blocking link crawlers, applying any of the methods above should be suffice; however, if you are a webmaster that wants to leave no chance for those crawlers to sneak in, you need to apply harsher measurements.
Technical SEO refers to the optimization work done on the website technical infrastructure (HTML source code, server side codes, hosting, assets and more) to make it search engine friendly
When it comes to technical SEO there are few buzzwords that you need always to keep in mind:
Indexability (Readability )
How do search engines work (mainly Google):
When search engines find a new web page they send their crawlers (a software that behaves like a browser installed on a powerful computer) to read the source code of that web page and save it to the search engines storing servers to be parsed and processed later.
Once processing is completed a functional copy of the page will be stored in search engines servers, they will make it available for the public eventually using the command cache:URL at this point the page will be fully indexed and able to rank for whatever keywords search engines decide it relevant to based on the quality of the content and the authority of the website (in other words the ranking algorithm).
Crawlability Optimization, Speed and mobile friendliness):
Firs step in making a website crawlable is providing access points to the crawlers like:
Internal links from other pages
URL submission to Google or Bing Search Console, or using their indexing API if it is available (only available for few industries)
Discuss with your webmaster what happens when a new content is added to the website and make sure there is one or more access point available for the crawler to find that page (ideally sitemap + one or more internal links from prominent pages), modern CMS like WordPress will provide access points automatically when adding a new post but they do not do that when you add a new page and that is where you need to manually modify the information architecture to include a link to that page.
Search engines have a limit on the number of pages they can crawl from a website in a session and in total they call it crawl budget, most websites (with less than a million pages and not a lot of new content added every day) do not need to worry about that, if you have a large website you need to make sure that new content is getting crawled and indexed quickly, providing strong internal links for new pages and pushing them to the sitemap quickly can help a lot with that.
URLs management: Many websites use parameters in URLs for different reasons, many eCommerce websites use parameters in URLs to provide pages for the same product in different colours or the same product in different sizes (faceted navigation), sometimes search engines will be able to index those pages and they will end up with infinitive number of pages to crawl which can create a crawling issue and also a duplicate content issue. Search engines provide webmasters with different tools to control crawlability and indexability by excluding pages from crawling, ideally if the website is structured well there will be less need to use any of the tools below to influence crawlability:
Robots.txt, a file located on the root of your website where you can provide rules and direction to search engines how to crawl the website, you can disallow search engines from crawling a folder, a pattern, a file or a file type.
Canonical tags, <link rel="canonical" href="https://www.wisamabdulaziz.com/" /> you can place them in page B' header to tell search engines that the page with the original content is page A located under that canonical URL. Using canonical tags is a good alternative to 301 as they do not need any server side coding what makes them easier to implement
Redirects, I mean here the server side redirects (301 for example) which is used to tell search engine that page A was moved to page B, this should be used only when the content on page B was moved to page A. It could be used also when there is more then one page with very similar content.
Meta refresh, <meta http-equiv="refresh" content="0;URL='http://newpage.example.com/'" /> normally located in the header area, it directs browsers to redirect users to another page, search engines listen to meta refresh, when the waiting time is 0 they will be treat it like a 301 redirect
noindex tags, <meta name="robots" content="noindex"> they should be place in the header of a page that you do not search engines to crawl or index
The final thing to optimize for crawlbility is website speed which is a ranking factor also, few quick steps you can do to have a fast website:
Use a fast host, always make sure you have extra resources with your host, if your shared host is not doing the job just upgrade to a VPS or a dedicated server it is a worthwhile investment
Use a reliable fast CMS like WordPress
Cache your dynamic pages (WordPress posts for example) into an HTML format
With mobile users surpassing desktop users few years ago mobile friendly websites are becoming more important to search engines and web developers, search engines like Google have created a mobile crawler to understand more how the website is going to look for mobile users, when they find a website ready for mobile users they set their mobile crawler as the default crawler for this website (what they call mobile first), there are few steps you can take to make sure your website is mobile friendly (for both users and crawlers):
Keep the mobile version as fast as possible, if you can not do that for technical or design reasons consider using Accelerated Mobile Pages (AMP)
Indexability Readability (Schema)
When a website is optimized well for crawlability and renderability, Indexability will be almost automatically taken care of, the key point for indexability is providing a page with a dedicated clean URL that returns unique content with a substance and loads fast so search engines can crawl and store in their severs.
Content that can cause indexability issues:
Thin content as it may not be kept in the index ..
Content that can help search engine in parsing and indexability:
Structured data mainly Schema can help search engines to turn content into a searchable database almost without any processing, eventually that will help your website to have rich results in the SERP (example for that is the five stars review that Google adds for some websites)
Using HTML markup to organize content (i.e. <h2>, <strong>, <ol>, <li>, <p>) will make it easier for search engines to index your content and show it when applicable in their featured snippets like the answer box.
Monitoring and errors fixing:
Contentious monitoring of websites crawlability and indexability is key to avoid any situation where part of the website becomes uncrawlable (could be your webmaster adding noindex tag to every page on the website), there are different tool that can help you with that:
Google Search Console (GSC), after verifying your website with GSC Google will start providing you with feedback regarding your website' health with Google, the index coverage is the most important section in the dashboard to keep eye on to find out about crawlability and indexability issues. Google will send messages through the message centre (there is an option to forwarded to your email) for serious crawlability issues
Crawling tools: SEMrush, Ahrefs, Oncrawl, Screaming Frog can be helpful to find out about errors
Monitor 404 errors in Google Analytics and GSC, make sure to customize your 404 error pages, add the words "404 not found" to the title tag so it becomes easier to find 404 error pages using Google analytics
Monitor indexability, check if the number of indexed pages in GSC make sense based on the size of your website (should not be too big or too small comparing to the actual number of unique pages you have in your website)
Monitory renderability using the URL inspection tool in GSC, make sure Google can render that pages as close as possible to how users can see it, pay attention to blocked resources that are required to render the website (the URL inspection tool will notify you about them)
Index coverage monitoring and analysis definitely needs to be a service that you offer to your clients as an SEO specialist, a GSC monthly or quarterly audit is strongly recommended.
SEO is a very dynamic industry that changes almost on a daily basis, there is something new every day to read about or to learn about. It is strongly recommended to read, watch videos or listen to podcasts for an hour or more every day to stay at the top of all changes relate to SEO in specific and internet marketing in general. I will list below the websites (mostly blogs) that I check on a daily basis for news and updates, the links below will take you directly to their RSS feed page so you can add it to any reader of your choice (I personally use Feeder.co which has a Google Chrome Plugin, Phone APP and a web based interface), recently both Firefox and Chrome browsers added a built in RSS reading feature.
This is how Feeder.co will look like in my Google Chrome, the number in the blue box represents the number of the unread posts from the websites I subscribed to
Available form multiple locations, multiple devices and using different internet speed, provides speed index (the time required for the site to visually load for users even if there is still process going in the background of the website)
This tool provides insights how to speed up the website, a report, and a video showing the load progress.
The tool has become more valuable after adding Light House data and Google Chrome data (not available for every website), be aware that score is not speed, speed is measured by seconds only, seeing very low score does matter a lot of your web page speed is 3 seconds or less
This tool provides insights how to speed up the website, a report, a video showing the load progress and industry comparison.
Available form multiple locations, multiple devices and using different internet speed, this tool can track speed history (paid feature) which is a handy feature to evaluate the website speed though out the whole day or the whole week
This tool provides insights how to speed up the website, a report, a video showing the load progress
This tool is designed to analyze speed on mobile websites with low speed connection (3G), it provides insights how to speed up the website, a report and industry comparison
Google Developer Tools (Advanced)
This is a built in extension with Google Chrome, it has the ability to change connection speed, device, disable/enable cache.
Google Analytics (the numbers there are not very reliable)
GA provides average page load time in seconds, I did it find it that reliable, possibility because it averages number from different users and it works based on the code load completion which is not always a reflection of the actuation page load
The SEO certificates post includes a lot of learning resources as all the certificates require you to go through some training before you can take the exam. If you want a faster rout in a case where you applying a job that needs some SEO knowledge (not an SEO specialist job) you can find many online resources that cover the SEO fundamentals and give you a good jump start in your SEO knowledge
Google Quality Guidelines this is very important one to read especially if you are planning to be aggressive in your link building efforts
Google Quality Raters Guidelines Google uses quality raters (humans) to evaluate their search results so their engineers can improve them, what we learned about Google's logarithm throughout the years that it will always try to replicate human quality judgment, reading this document will give you an idea where is Google's algorithm going in the future
Google Best Practice (mainly for ads) this is Google's best practice document for ads, quality guidelines for ads apply in most cases to SEO which makes this document worthy to read even for SEO specialists
Conferences to attend:
Going to conferences to learn SEO is not going to give you the best ROI however; going there to network and meet new people is the investment you should be looking for
Another benefit of going to those conferences is the status and the credibility it gives you with your clients (especially the big ones), major search engines like Google send speakers to many of those conferences so you will have a chance to hear from the horse's mouth, then you can communicate your SEO recommendations with your clients saying I heard Google saying this at SMX Advance for example.
In this post I am going to include what certifications can help you to land your next SEO job (I will add another post for some training courses), most of the certificates below have online training sections that you need to go through before taking the exam, if you pass the exam you will be granted a digital certificate that you can print and hang in you office, you will also get a web page that you can add to your Linkedin profile. With the education system falling behind when it comes to digital marketing the certificates below will give you an instant advantage with any poetical employer.
MOZ Academy (the essential SEO certificate is a good start)
If you are asking why do I need paid search or marketing automation certificates if I will be applying to a SEO job? SEO for most companies is one piece of the whole marketing landscape that includes in most cases, social media marketing, PPC, SEO and marketing automation. SEO specialists will be working closely with other digital marketing channels and they need at least a basic understanding of how those other channels work. The other benefit of having PPC certificates is that smaller companies tend to hire one in-house marketing individual to manage all their digital marketing channels, being at familiar with the PPC channels will increase the number of jobs you can apply to.
SEO certificates are not must to gain SEO knowledge, all the information your need to learn SEO is available online however they can help you with three things:
As the education system is not caching up yet they will give you some credibility and increase your chances to land an SEO job
They will streamline your learning curve and test your knowledge
They will increase you commitment level to learn SEO and chase it as a career especially if you pay for some of those courses
There are some tools and platforms that you need to master if you want to be an SEO expert, some of these tools are used for monitoring and tracking others are used to make your work more efficient. Some of those tools will be required for the SEO specialist role and employers ask for them in the job posting.
Google Search Console GSC (Bing also has its own):
This tool has been growing for years and becoming the most important tool for SEO specialists. Why GSC is that important?
This is only place where you can see what keywords are receiving impressions, clicks, CTR and where do they rank
The message centre is a great communication tool that Google uses to tell webmasters about issues and improvements for their websites
The index coverage and crawlability information contain very valuable insights that will help webmasters to understand how Google crawls and index their websites
A sample of back links is available in GSC
A dedicated post related to Google Search Console will be posted soon I will make sure I link to it form this post
Google Analytics (GA):
"If you can not measure it, you can not improve it.", GA is the tool when it comes to track traffic sources and users' interaction with a website. Key things you need to know how to do in GA:
Setup goals and track goals per source
Understand and analyze bounce rate, time on site and pages per session
How to monitor users interactions like reviews and phone calls
How to tag the URL using utm tags so you can identify GMB traffic in GA
Link analysis tools:
Those tools will enable you to analyze the link profile of any website, the most popular usage of those tools is creating competition analysis reports that help to understand the authority gap between your website and other competitors, the most popular ones are:
All of the tools above are paid, buying one of them only could be suffice
Keyword research Tools:
Learning about the client's business then finding relevant keywords to that business is the starting point for any SEO project, there are many tools that can help with that and all of them are using Google's database:
Google Keyword planner
Website speed tools:
Website speed is a ranking factor with Google, more importantly it can improve user experience and eventually increase conversion rate. It is important to monitor website speed on a regular basis and the best tools that can help with that are:
Google Analytics (the numbers there are not very reliable)
Mobile friendliness tools:
Mobile users have surpassed desktop users a long time ago and Google is following that trend by focusing more on mobile users, Google bot desktop is being replaced by Google mobile crawler for the most of mobile ready websites. Having a mobile friendly website that is fast and provides a good user experience is key for SEO success, tools that can help with improving mobile friendliness are:
Major search engines use structured data (Schema is the most popular one) to have better understanding of the content structure as structured data can provide content in a database friendly format (almost ready to save to a database without processing), once structured data is added to a website there are many tools that can help to preview them and test them for errors:
The first step any search engine needs to do is crawling the web, if the content is not crawlable or reachable (via a link or a sitemap) then search engines will not find it, index it and rank it. Best tools to find about crawlability issues: