Skip to main content

Search Features from a Search Engines Point of View


One thing that comes to mind in the field of automatic search is how well search engines perform using conventional search methods. information storage and catalog (such as in a library) is simply not suitable for network searches.

What does a library catalog have that search engines don't? let's get a look on the characteristics of the search conducted by search engines, and then back to the library catalog.

The main search features fall into three categories. The first thing to follow is the difference between traditional information retrieval and those problems that such a search faces in a network application. Although the algorithms have been adapted to traditional search to guide systems, the web needs a clearer structure and a clear distinction between these related systems.

For example, small, carefully controlled private gatherings texts like scientific papers or news are easier to match the criteria selection. Text REtrieval Conference (TREC, Text REtrieval Conference) approved the starting point for such a large volume (collections of letters, texts, etc.) of information as 100 GB.

(Search engines already have tens of terabytes of storage information: in order to give an idea of ​​the size of the information, I will say that one large encyclopedia would take about 1 GB, and a public library of more than 300,000 books would equal approximately 3TB of information.)

The web, as you know, is a huge collection of heterogeneous pages that are created and developed by anyone, without being subjected to any control. This lack of governance structure or standardization on the Web is leading to an 'explosion' in the amount of information available, but it also poses huge challenges for search engines to find the information you need.

The main question, which is relevant in this case: how productive are the results that we get in the process of searching?

□ Informational. Such a search is carried out by users who really looking for the information they need on the web. They make a request just like in speech, for example ''low hemoglobin''. This is a medical term. The user is looking for specific information about this term. This is very close to classical information retrieval.

□ Navigational. It is used when the user wants to go to a certain Web site. At the same time, such a request is formulated, such as, for example, "RosBusinessConsulting". In this case, what he really wants to find must be directly on the website of the company "RosBusinessConsulting" ( If someone typed "RIA Novosti" in a search engine, then, probably, he would like to get to the RIA Novosti website, and not find a story RIA, etc. We all formulate a large number of such requests, and they, in fact, make up about 20% of all requests.

□ Business. Means that, ultimately, the user wants to do something through the network. A good example is visiting the Internet shop. Do you really want to buy something? Or do you need to download a file or find a service such as the Yellow Pages. What you really want to do, involves you in a deal. Turning to the issue of visiting stores, here people want to buy a product, etc., so they want to find an answer to their request that can satisfy their need. Thus, when we talk about the accuracy and relevance of the response to a query, it is important to distinguish between these three classes. Because, for example, a classic business query for a resident of St. Petersburg will be different than for someone living in Moscow.

In the case of some business queries, it is difficult to decide which of the results are the best. Context plays a big role. As for business inquiries, sometimes we get a more efficient result from other sources, and not from what we collect ourselves. It's like shopping: to buy something, you have to go to the mall, not to the library.

Library Cataloging

Let's go back to the library. It is quite difficult for a search engine to comprehend nature user requests. The machine can find a site for a suitable keyword to the word topic, link in a particular text, and even choose a quote, but it cannot intuitively understand the purpose of the request. If you come to the library of a small town and ask the librarian to help you find this or that information, then he, for sure, will understand why you need this information, and will offer a specific search location.

Scroll to Continue

As already mentioned, much of what search engines are trying to achieve is based on a conventional information retrieval system. Suppose I come to the librarian and ask if the library has a certain popular book. If the librarian understands that this book exists, then he will either find it or order it. When the librarian receives the book, he will make certain entries about the book in the library directory.

The book entry will include: the title of the book, the name of the author, some keywords describing the content, the serial number of identification (ISBN), a title that categorizes the book, and an index number for the subsequent search of the book. In this case, the book would be placed on the library shelf in alphabetical order, and the book index card is listed in the catalog.

A quality library catalog system allows you to search not only title, but also by author or category. By receiving a large number of requests about a particular book or topic, a librarian can sometimes intuitively say, where the given book is located, or at least indicate the section.

Registering selection and return of books, the librarian may also notice how much time one or another book was missing and how many readers took it at any time, in other words, which books are the most popular, and which are just gathering dust on the shelves. All these observations will help in improving the library systems, allowing you to move 'outdated' books that are no longer popular, to more distant shelves, making room for more popular literature.

If you think about these two paragraphs you have just read, you will realize that that in fact, although in a rather strange way, this is a description of the principle work of search engines. At first glance, this seems to be a simple process, but in reality, the problems faced by the largest developers in this area, trying to imitate this principle, are very extensive and serious.

You can give an example of a schoolboy who comes to the library and asks book about Italy. The librarian, with little information, can assume that the student needs to write an essay, and therefore he needs books about the history and culture of Italy.


If a teenager comes to the library in the summer and asks for books about Italy, the librarian may think he is going to Italy on vacation and he needs travel guides for Italy. Therefore, he points to similar books.

In other words, the librarian can help you answer you more precisely by targeting your question which will eventually lead you to the most appropriate topics for you.

This proves that those databases that are sorted by category, have fewer problems, which is why some call Internet catalog editors, "Internet librarians". search engines are not always able to pinpoint the scope of your request, but they can at least try to display those pages that they consider specific and associated with your query using ''network topology''.

In addition, regardless of the subject matter of the query, most search engines will all also display several thousand or several million options, of which only some parts can be really relevant. So how quickly does your search relevance score decrease? The answer is: after the first two pages of search results this figure starts to decrease sharply.

This content is accurate and true to the best of the author’s knowledge and is not meant to substitute for formal and individualized advice from a qualified professional.

© 2022 Temoor Dar

Related Articles