Technical SEO

Technical SEO

Technical SEO refers to the optimization work done on the website technical infrastructure (HTML source code, server side codes, hosting, assets and more) to make it search engine friendly

When it comes to technical SEO there are few buzzwords that you need always to keep in mind:

  • Crawlability
  • Renderability
  • Indexability  (Readability )

How do search engines work (mainly Google):

When search engines find a new web page they send their crawlers (a software that behaves  like a browser installed on a powerful computer) to read the source code of that web page and save it to the search engines storing servers to be parsed and processed later.

Second step will be parsing that source code, possibly save missing resources like images, CSS and other dependencies if the crawler decides that the content in the source code is readable without rendering (that happens when the content is included in plain HTML or a simple JavaScript format) they will start processing it by turning that source code into structured data (think about your excel sheet, columns and rows) that they will eventually turn into a database, considering the size of the web searching files and returning results quickly is impossible but searching a database and returning results in less than a second is a possibility. In order for search engines to turn web content into a searchable database they need to isolate text from codes, if they find any content with Schema codes (structured data) turning content into database will be a lot easier, HTML elements like title tags and descriptions tags are also easy to process.

If the crawler decides that this is Javascript heavy page that needs to be rendered to find the text content a more advanced crawler will be sent later (some times in few days) to render the page and get it ready for processing, watch the video below to understand how Google's crawlers work.

Renderability was not it a thing when search engines started, it is a very resource intensive and slow process but with more websites using advanced JavaScript framework like Angular (where the source code doesn't have any text content in some cases) search engines started to see a need to fully render the page in order to capture the content. The other benefit of fully rendering the page is understanding the location and the state of the content on the page (hidden content, above the fold or under the fold etc).

Currently the only search engine that does well with rendering is Google, they send a basic crawler at the beginning (it can understand simple JavaScript but not frameworks like Angular) then they comeback after with another crawler that can fully render the page, check this tool to see how a web page renders with Google bot 

Once processing is completed a functional copy of the page will be stored in search engines servers, they will make it available for the public eventually using the command cache:URL at this point the page will be fully indexed and able to rank for whatever keywords search engines decide it relevant to based on the quality of the content and the authority of the website (in other words the ranking algorithm).

Crawlability Optimization, Speed and mobile friendliness):

Firs step in making a website crawlable is providing access points to the crawlers like:

  • Site map
  • Internal links from other pages
  • External Links
  • RSS feed
  • URL submission to Google or Bing Search Console, or using their indexing API if it is available (only available for few industries)

Discuss with your webmaster what happens when a new content is added to the website and make sure there is one or more access point available for the crawler to find that page (ideally sitemap + one or more internal links from prominent pages), modern CMS like WordPress will provide access points automatically when adding a new post but they do not do that when you add a new page and that is where you need to manually modify the information architecture to include a link to that page.

Each piece of content available on a website must have it is own dedicated URL, this URL must be clean and not fragmented using charterers like # . Single Page Application (SAP) is an example of a situation that you need to avoid where the whole website operates based on a single URL (normally the root domain and the reset will be fragmented URLs), in this case technologies like AJAX (mix of HTML and JavaScript) will be used to load content from the database to answer any new page request (#anotherpage), users will not see any issue with that but search engines will not be able to crawl the website since they use dedicated URLs as keys to define a page in their index which is not available in this case as they totally ignore fragment URLs

Search engines have a limit on the number of pages they can crawl from a website in a session and in total they call it crawl budget, most websites (with less than a million pages and not a lot of new content added every day) do not need to worry about that, if you have a large website you need to make sure that new content is getting crawled and indexed quickly, providing strong internal links for new pages and pushing them to the sitemap quickly can help a lot with that.

URLs management: Many websites use parameters in URLs for different reasons, many eCommerce websites use parameters in URLs to provide pages for the same product in different colours or the same product in different sizes (faceted navigation), sometimes search engines will be able to index those pages and they will end up with infinitive number of pages to crawl which can create a crawling issue and also a duplicate content issue. Search engines provide webmasters with different tools to control crawlability and indexability by excluding pages from crawling, ideally if the website is structured well there will be less need to use any of the tools below to influence crawlability:

  • Robots.txt, a file located on the root of your website where you can provide rules and direction to search engines how to crawl the website, you can disallow search engines from crawling a folder, a pattern, a file or a file type.
  • Canonical tags, <link rel="canonical" href="https://www.wisamabdulaziz.com/" />  you can place them in page B' header  to tell search engines that the page with the original content is page A located under that canonical URL. Using canonical tags is a good alternative to 301 as they do not need any server side coding what makes them easier to implement
  • Redirects, I mean here the server side redirects (301 for example) which is used to tell search engine that page A was moved to page B, this should be used only when the content on page B was moved to page A. It could be used also when there is more then one page with very similar content.
  • Meta refresh, <meta http-equiv="refresh" content="0;URL='http://newpage.example.com/'" />   normally located in the header area, it directs browsers to redirect users to another page, search engines listen to meta refresh, when the waiting time is 0 they will be treat it like a 301 redirect
  • noindex tags, <meta name="robots" content="noindex"> they should be place in the header of a page that you do not search engines to crawl or index

The final thing to optimize for crawlbility is website speed which is a ranking factor also, few quick steps you can do to have a fast website:

  • Use a fast host, always make sure you have extra resources with your host, if your shared host is not doing the job just upgrade to a VPS or a dedicated server it is a worthwhile investment
  • Use a reliable fast CMS like WordPress
  • Cache your dynamic pages (WordPress posts for example) into an HTML format
  • Compress and optimize images
  • Minify CSS and JavaScript

To test your website you can use different speed testing tools listed here

Google Bot For Mobile:

With mobile users surpassing desktop users few years ago mobile friendly websites are becoming more important to search engines and web developers, search engines like Google have created a mobile crawler to understand more how the website is going to look for mobile users, when they find a website ready for mobile users they set their mobile crawler as the default crawler for this website (what they call mobile first), there are few steps you can take to make sure your website is mobile friendly (for both users and crawlers):

  • Use responsive design for your website
  • Keep important content above the fold
  • Make sure your responsive website is errors free, you can use GSC for that or Google Mobile Friendly Test
  • Keep the mobile version as fast as possible, if you can not do that for technical or design reasons consider using Accelerated Mobile Pages (AMP)

Renderability Optimization:

The best SEO optimization you can do for renderability is to remove the need for search engines to render your website using second crawling (advanced crawling), if your website is built around some advanced JavaScript platform like Angular is strongly recommended to give crawlers like Google bot a pre-rendered HTML copy of each page  (regular users can still get the Angular format of the website, this practice is called dynamic rendering), this could be done using built in feature with Angular Universal or using third party solutions like Prerender.io

Do not block resources (CSS, JavaScript and images) that are needed to render the website, back in the days webmasters used to block those resources using robots.txt to reduce the load on the server or for security purposes, inspect your website using Google Search Console, if you see any blocked resources that are needed for Google to render the website discuss your web developer how you can safely allow those resources for crawling

Indexability Readability (Schema)

When a website is optimized well for crawlability and renderability, Indexability will be almost automatically taken care of, the key point for indexability is providing a page with a dedicated clean URL that returns unique content with a substance and loads fast so search engines can crawl and store in their severs.

Content that can cause indexability issues:

  • Thin content as it may not be kept in the index ..
  • Duplicate content
  • Text content in images,  text in SWF files, text in a video files and text in complex JavaScript file, this type of content will not make its way to the index and it will not be searchable in Google

Content that can help search engine in parsing and indexability:

  • Structured data mainly Schema can help search engines to turn content into a searchable database almost without any processing, eventually that will help your website to have rich results in the SERP (example for that is the five stars review that Google adds for some websites)
  • Using HTML markup to organize content (i.e. <h2>, <strong>, <ol>, <li>, <p>) will make it easier for search engines to index your content and show it when applicable in their featured snippets like the answer box.

Monitoring and errors fixing:

Contentious monitoring of websites crawlability and indexability is key to avoid any situation where part of the website becomes uncrawlable  (could be your webmaster adding noindex tag to every page on the website), there are different tool that can help you with that:

  • Google Search Console (GSC), after verifying your website with GSC Google will start providing you with feedback regarding your website' health with Google, the index coverage is the most important section in the dashboard to keep eye on to find out about crawlability and indexability issues. Google will send messages through the message centre (there is an option to forwarded to your email) for serious crawlability issues
  • Crawling tools: SEMrush, Ahrefs, Oncrawl, Screaming Frog can be helpful to find out about errors
  • Monitor 404 errors in Google Analytics and GSC, make sure to customize your 404 error pages, add the words "404 not found" to the title tag so it becomes easier to find 404 error pages using Google analytics
  • Monitor indexability, check if the number of indexed pages in GSC make sense based on the size of your website (should not be too big or too small comparing to the actual number of unique pages you have in your website)
  • Monitory renderability using the URL inspection tool in GSC, make sure Google can render that pages as close as possible to how users can see it, pay attention to blocked resources that are required to render the website (the URL inspection tool will notify you about them)

 

Index coverage monitoring and analysis definitely needs to be a service that you offer to your clients as an SEO specialist, a GSC monthly or quarterly audit is strongly recommended.

Next step: user experience and conversion rate optimization

 

You Might Also Like

No Comments

    Leave a Reply