Post by amirmukaddas on Mar 12, 2024 4:22:24 GMT -5
How to easily identify and resolve crawling problems on e-commerce sites, as in all tree-structured websites that host large amounts of content. ecommerce scanning problems The inspiration for today's website comes from an SEO consultancy that I provided last week for a large e-commerce site in the sports sector. Specifically, Google had indexed hundreds of thousands of pages compared to a website with "only" thousands of products. Robots.txt The technical manager of the project dealt with the case by moving the website to HTTPS while simultaneously blocking the paths that generated infinite scanning loops , mostly linked to contents such as user and checkout pages, search pages and technical folders. A good compilation of the Robots.
File can do a lot in these cases, but this time the website did not recover, organic traffic was little despite the company being established and the website very good. Yes, it's one of those bitter Montenegro stories. Scan analysis The first Denmark Telegram Number Data alarm bell went off when I realized that Google generated a cache for category pages both by calling them with https and without, while for many product sheets, the cache often existed only by calling the URLs with https, otherwise ) Google reported a 404 code. All cached pages are still with HTTPS. This is because Google tries to save crawling resources when there are huge websites or websites with serious crawling problems. F
or example, if you call up the Google cache of the Fanpage Facebook page (5 million fans) with HTTPS it comes out, without no. The website I examined was not the size of Facebook, but it had 4,000 crawl errors on 14,000 indexed pages, with a heavy internal link structure and little verticalization on the objectives. Consequently, the reabsorption frequency was high (cache in one or two days) on superficial contents such as categories and very low (cache in two months) on product sheets, from which the entire organic visibility of the website was compromised.
File can do a lot in these cases, but this time the website did not recover, organic traffic was little despite the company being established and the website very good. Yes, it's one of those bitter Montenegro stories. Scan analysis The first Denmark Telegram Number Data alarm bell went off when I realized that Google generated a cache for category pages both by calling them with https and without, while for many product sheets, the cache often existed only by calling the URLs with https, otherwise ) Google reported a 404 code. All cached pages are still with HTTPS. This is because Google tries to save crawling resources when there are huge websites or websites with serious crawling problems. F
or example, if you call up the Google cache of the Fanpage Facebook page (5 million fans) with HTTPS it comes out, without no. The website I examined was not the size of Facebook, but it had 4,000 crawl errors on 14,000 indexed pages, with a heavy internal link structure and little verticalization on the objectives. Consequently, the reabsorption frequency was high (cache in one or two days) on superficial contents such as categories and very low (cache in two months) on product sheets, from which the entire organic visibility of the website was compromised.