What are search engines and how they work?
I would like to tell in detail what is based on the work of search engines, as well as to differentiate search categories in a way that this is what search engines do. We consider this important because once you understand what search engines are and how they work, it will be easier to understand the results as you'll see them appear on the page after searching for a specific keyword.
This will help you rationalize and then optimize the way the web pages are created to be contributed into the search engine database, and better understand why a search engine is needed.
5 Components of a Search Engine
Search engines are comprised of five separate software components.
□ The Search Engine Results Page (SERP) retrieves database search results.
□ Spider (spider, English, spider) - a program that downloads web pages. This works exactly like your browser when you connect to a website and load the page. You can observe the same action (download) when you are viewing a certain page by selecting the View command in your browser HTML code.
□ Crawler (from English, crcmder, "traveling" spider) - a program that automatically goes through all the links found on the page. Crawler Highlights all links on the page. Its task is to determine where the spider should go further, based on links or based on a predetermined address list. The crawler, following the found links, searches new documents not yet known to the search engine.
□ Indexer (from English, indexer) disassembles the page into parts and analyzes them. Elements such as page titles, link titles, text, structural elements, bold, italic elements and other style parts of the page are highlighted and are analyzed.
□ Database (eng, database) - a repository of all data that the search engine and system download analyzes. Often requires huge resources.
It is wrong to assume that search engines immediately search for a site as soon as you enter a keyword in the request. It's a pure myth that search engines crawl the entire Internet in search of the desired request. The search engine can search only within its own database (index).
And of course the quantity and the freshness of the data is determined by how often this database is updated. Often such an update of the database of a particular search engine is called update (from English, update - modernization, adjustment, updating information, data) - replenishment of the database of the search engine with new information. Major search engines index information like a catalog in a library.
By storing a large amount of information, they must be able to quickly find desired documents for given keywords or phrases. The Internet as a whole has no clear structure, and sites have a huge number of options for authoring content compared to standard texts. This makes it almost impossible for a search engine to use standard methods, used in database management and general information retrieval.
Search algorithms (mathematical methods that sort the results found) each of the search services are unique. you can check it yourself: enter a keyword or phrase into the bing search engine (www.bing.com) and remember the results.
Now go to Google (www.google.com) and repeat the same. You will always get different results in different search engines machines. Given this fact, it can be noted that it is necessary to have an individual approach to various search services.
It is worth noting that some search engines use the database and algorithms of more authoritative counterparts. Google is the most popular and largest database search engine in the world. The network is growing at an unstoppable pace. Research conducted in 2000 found approximately 7.5 million pages added each day. Thus, it is impossible to imagine that any search engine will ever overtake it.
On average, each page contains about 5-10 KB of text, therefore, even if we take into account that the search engine stores information only about text pages, then this already translates into tens of terabytes that are in the database search engine data. There is a so-called invisible network, which represents more than 550 billion documents.
With these documents search engines are either unfamiliar (not affiliated with other sites) or unable to access them (some areas are password protected), or their technology simply does not allow them to "capture" these pages (for example, those that include only complex file types: visual, audio, animation, compressed files, password files etc.). Constantly "crawling" the web and indexing web pages with many documents, as search engines do, this is not a cheap task.
You will see this for yourself when we study the "anatomy" of a search engine. Support search engine database requires a lot of investment to provide work, the necessary technical resources and the continuation of scientific research. We must understand that search engine databases are constantly changing. Google may have more pages indexed than, any other search engine database.
However, for example, if the bing search engine updates its data faster than Google, then even if its a relatively small number of pages, this can give the user more fresh and comprehensive results. Apart from the technical factor, there are also many others to be taken into account.
We should mention that search engines often list a large amount of pages contained in their database as a sign of their exclusivity. This is a kind of game or competition between the quantity and quality of available resources.
While size is an important indicator, other factors regarding base quality data may provide better results in terms of being much larger correspond to the user's key query. Finding relevant pages on the web for indexing − search engine priority. But how can a machine determine how important one page or another is? Some search engines, use manual checks for relevance, the technical term called "assessors".
Assessors work according to a given methodology. There are certain criteria by which page quality should be measured. The assessor enters a search request and puts a certain assessment of how relevant the data will be. According to these criteria will be sites that are in the search results. I will talk more, later about the methods used by search engines to determine what makes some web pages more important than others.
Since search engines often return inappropriate results, we should also dwell in more detail on the fact that the information in the databases data must be constantly updated. Except for new pages that every day appear on the Web, the old ones are continuously updated. Consider an example. In one study it shows that, as a result of a four-month study of half a million pages, it was found that more than 23% of all web pages are updated daily. About half of the pages were updated every 10 days, and some documents were completely moved to a new domain address.
Search engine spiders find millions of pages a day that are listed in the database and indexed. But, as you can understand from the above, for search engines it is very difficult to determine the frequency of page changes. Search robot machines can "crawl" to a page once, then return to it to refresh, and may find that some changes have been made. But can't find out how many times a page has changed since it was last visited.
Some websites change very frequently. For example, news sites where information must be constantly updated, or websites of online stores, where prices, product ranges, etc. regularly change.
Today a lot of both scientific and commercial research is being carried out to develop methods for the rapid discovery of fresh information. Even if the 'important' page will be checked by the robot every 48 hours, webmasters can update these pages much more often.
It is critical that your site works when it is visited by a search engine robot. If at this point it doesn't work, then you can disappear from the index until the next update! The search engine thinks your site doesn't exist and therefore removes it from the list after several visits.
If the webmaster uploads the page to the server and then makes the page available to search via the Add URL option in the search engine, or if the page is just found by a search engine through a link from another site, then the content pages in the index will be exactly the same as it was when indexed by the search engine during the first round trip.
So, if on the day of indexing the page, has a certain number of words that are contained in a predetermined number of paragraphs and, to a certain extent, relate to the keyword - all this will be recorded in the search engine index before next indexing.
If the author of the page decides to supplement it (add images, headings, perform text editing), then the search engine will not know about it until the next visit to the page.
If the user makes a request on a certain topic on the day that the search engine just updated this page, then he will receive updated information that has already entered the database search engine.
However, if the user performs a search after the author changed the page, then the search engine will lead him by the key phrase to the same page, even if the author may have changed the context or removed important links to the topic without informing the search engines.
Of course, this situation frustrates the user who wanted to find a web page relevant to their queries. This, as you understand, is the main problem of search engines. They just can't keep up with web page changes all the time.
This content is accurate and true to the best of the author’s knowledge and is not meant to substitute for formal and individualized advice from a qualified professional.
© 2022 Temoor Dar