How search engines are work?
A search engine is a service that allows Internet users to search for content via the World Wide Web (WWW). A user enters keywords or key phrases into a search engine and receives a list of Web content results in the form of websites, images, videos or other online data. The list of content returned via a search engine to a user is known as a search engine results page (SERP). Some search engines also mine data available in databases or open directories. Search engines also maintain real-time information by running an algorithm on a web crawler. Internet content that is not capable of being searched by a web search engine is generally described as the deep web.
Google is the world's most popular search engine, with a market share of 90.14 percent as of February 2018. Google is not the only search engine available on the Internet today! Besides Google and Bing, there are other search engines that may not be so well known but still serve millions of search queries per day.
The most popular search engines in the world are Yahoo, Ask.com, AOL.com, Baidu, WolframAlpha, Duckduckgo, archive.org, Yandex.ru, etc.
How do search engines work?
Every search engine has three main functions:
Crawling to discover content
Crawling is where it all begins - the acquisition of data about a website. This involves scanning sites and collecting details about each page - titles, images, keywords, other linked pages, etc. Different crawlers may also look for different details, like page layouts, where advertisements are placed, whether links are crammed in, etc. When a web crawler visits a page, it collects every link on the page and adds them to its list of next pages to visit. It goes to the next page in its list, collects the links on that page, and repeats. Web crawlers also revisit past pages once in a while to see if any changes happened.
Indexing to track and store content
Indexing is when the data from a crawl is processed and placed in a database. Imagine making a list of all the books you own, their publishers, their authors, their genres, their page counts, etc. Crawling is when you comb through each book while indexing is when you log them to your list.
Retrieval to fetch relevant content
Retrieval is when the search engine processes your search query and returns the most relevant pages that match your query. Most search engines differentiate themselves through their retrieval methods - they use different criteria to pick and choose which pages fit best with what you want to find.
Most search engines build an index based on crawling is the process through which engines find new pages to index.
Mechanisms known as bots or spiders crawl the Web looking for new pages.
The bots typically start with a list of website URLs determined from previous crawls.
When they detect new links on these pages, through tags like HREF and SRC, they add these to the list of sites to index.
Then, search engines use their algorithms to provide you with a ranked list from their index of what pages you should be most interested in based on the search terms you used.
Then, the engine will return a list of Web results ranked using its specific algorithm.
Other elements like personalized and universal results may also change your page ranking.
In personalized results, the search engine utilizes additional information it knows about the user to return results that are directly catered to their interests.
Universal search results combine video, images and Google news to create a bigger picture result, which can mean greater competition from other websites for the same keywords.
Stock photo from Sammby