Skip to main content

How Search Engine Works


The work of the search engine is divided into three stages, i.e. crawling, indexing and retrieval.

1) Crawling

The search engines have the web crawler or spiders to perform crawling. The task of crawler is to visit a web page, read it and follow the links to other web pages of the site. Each time the crawler visits a webpage it makes a copy of the page and adds its URL to the index. After adding the URL it regularly visits the sites like every month or two to look for updates or changes.

2) Indexing




In this stage, the crawler creates the index of the search engine. The index is like a huge book which contains a copy of each web page found by the crawler. If any web page changes the crawler updates the book with new content.
So, the index comprises URL of different web pages visited by the crawler and contains the information collected by the crawler.

3) Retrieval

This is the final stage in which the search engine provides the most useful and relevant answers in a particular order. Search engines use algorithms to improve the search results so that only genuine information could reach to the users, e.g. Page Rank is a popular algorithm used by search engines. It shifts through the pages recorded in the index and shows those web pages on the first page of results that it thinks are the best.

Google Algorithm Updates

In the beginning, in 90's, search engines was not as effective as it is today; it was mainly focused on keyword matching and back  links. So, it was quite easy for the low-quality websites to rank higher by targeting their exact keywords with lots of back links.
To solve this problem, Google introduced a algorithm to filter the results so that it could clean the web. Since then Google is continuously updating its algorithm to maintain and improve the efficiency of its search engine.
Some of the major Google updates which helped it filter sites more precisely and clean the web effectively are given below:

2016 Updates

Penguin 4.0

Penguin 4.0 was announced on September 23, 2016, with few changes like it will be part of core algorithm, will update in real time and will be page specific instead of affecting the entire domain.

Mobile Friendly Boost Update

It was launched in May 12, 2016, to help mobile-friendly sites on mobile search.

2015 Updates

Panda 4.2

On 17 July 2015, Google rolled out Panda refresh (Panda 4.2). It has no immediate effect on rankings. According to Google, it impacted 2?3 % of English language search queries.

Mobile-Friendly Update (Mobilegeddon)

It was rolled out on 21 April, 2015. It made mobile-friendliness an important ranking factor for mobile searches. Its job was to boost the rankings of mobile-ready pages so that quality and relevant content could be provided to mobile users.

2014 Updates

Penguin 3.0

It was introduced on 17 October 2014. It was just a refresh that helped those websites boost their ranking who were de-ranked in the previous update (Penguin 2.1).

Panda 4.1

It was the 27th version of Panda released by Google on 23 September 2014. Google said that it will help search engine identify poor content so that small or medium sized websites with quality content could rank better.

Pigeon

It was rolled out in July 2014, for local businesses. Google said that it will create closer ties between local and core algorithms so that people could find useful and accurate information in local search results.

Panda 4.0

This Panda update was introduced on 19 May 2014, to help small websites and businesses with limited resources. It was a change a data refresh; a change in Panda algorithm.

2013 Updates

Hummingbird 1.0

It was introduced by Google on 20 August 2013 to better understand the changing face of the Web. It was capable of understanding the intent of long search terms instead of just recognizing specific keyword. It helped Google recognize long-tail search terms and accurately rank answers to such long-tail keywords. It enabled users to ask questions and get appropriate answers.

2012 Updates

Penguin

It was introduced on 24th April 2012 to target the sites that were spamming the search results by buying links or using some other link networks designed specifically to boost rankings. Google issued warnings through Webmaster tools and penalized the sites for not following its guidelines.

2011 Updates

Panda/Farmer

It was first launched on Feb24 2011. This algorithm was used to assign a score to webpages based on the quality of the content and de-rank the sites with low-quality content. Its job was to identify and de-rank content farms, sites offering thin content or sites with high ad-to-content ratio.

2010 Updates

Caffeine

In June 2010, Google updated its caffeine algorithm to introduce new web indexing system. It helped Google to improve the speed of search engine and integrated crawling and indexing that resulted in a fifty percent fresher index.

2009 Updates

Caffeine (Preview)

In August 2009, Google released Caffeine (Preview); the upcoming infrastructure change to improve and integrate indexing, crawling and range of their search engine index.

Vince

It was introduced in February 2009. It was seen as a big change that would favor big brands but Google?s Matt Cutts cleared that it was a minor change focused on ranking signals like trust and authority.

2007 Updates

Buffy

It was introduced in June 2007. This update was named in honor of Google's Vanessa Fox. Matt Cutts said that it was just some minor changes like the integration of search results with news, images and videos, etc.

2005 Updates

Bigdaddy

It was rolled out in December 2005. It was an infrastructure change that brought new technicalities related to URL canonicalization, redirects, etc. It helped Google to prepare for future developments.

2004 Updates

Brandy

This update was launched in February 2004. It expanded Google?s index and incorporated Latent Semantic Indexing (LSI) which enabled Google to better understand synonyms.

Austin

It was introduced on 23rd January 2004. This update was actually some improvements in the Florida update. It targeted the on-page spam tactics like invisible text and meta-tag stuffing.

2003 Updates

Florida

It was introduced on 16 November, 2003. It brought significant change to Google's algorithm and put an end to the use of keyword stuffing to manipulate search engine results.

Fritz

It was introduced in July 2013. With this update Google changed its way of updating the index; now instead of indexing on a monthly basis, it started indexing on daily basis.





Comments

Popular posts from this blog

History of the World Wide Web:

The World Wide Web was invented by a British scientist, Tim Berners-Lee in 1989. He was working at CERN at that time. Originally, it was developed by him to fulfill the need of automated information sharing between scientists across the world, so that they could easily share the data and results of their experiments and studies with each other. CERN, where Tim Berners worked, is a community of more than 1700 scientists from more than 100 countries. These scientists spend some time on CERN site, and rest of the time they work at their universities and national laboratories in their home countries, so there was a need for reliable communication tools so that they can exchange information. Internet and Hypertext were available at this time, but  no one thought how to use the internet to link or share one document to another. Tim focused on three main technologies that could make computers understand each other, HTML, URL, and HTTP. So, the objective behind the invention of WWW ...