banner
Leo

Leo的恒河沙

一个活跃于在珠三角和长三角的商业顾问/跨境电商专家/投资人/技术宅/骑行爱好者/两条边牧及一堆小野猫的王/已婚;欢迎订阅,日常更新经过我筛选的适合精读的文章,横跨商业经济情感技术等板块,总之就是我感兴趣的一切

2023-11-05 - A "Key" to Test the Unsolvable Lock: Review of Kagi Search - Minority

A "Key" to Unlocking the Unsolvable Lock: Review of Kagi Search - Minority#

#Omnivore

TL;DR

The Good:

  • Clean, organized, anti-content farm, highly customizable search results
  • Acceptable Chinese results rare among grassroots engines
  • Comprehensive yet unobtrusive AI features

The Bad:

  • Responses to "daily life" type searches can be sluggish
  • Pricing is too high

The Ugly:

  • The search engine market is not friendly to small players; survival depends not only on strength but also on luck

Once, a parent sought advice from an entrepreneurial master. The parent said, "My child has a passion for searching; how about starting a search engine?"

The master’s eyes widened in shock. After a moment, he regained his composure and said earnestly, "Please, don’t. Just take the directory from Product Hunt, close your eyes, and pick any direction from it; it’s better than a search engine. Let me put it this way: if I were a parent and my child insisted on making a search engine, I would do one thing—knock him out. Then find him another direction."

……

Of course, the above is fictional, but if the parody teacher really switched to entrepreneurship consulting, one could reasonably believe he might make a similar judgment about the entrepreneurial prospects of search engines.

Why? Building a search engine from scratch is not something that can be accomplished on a whim. In the 1996 paper "The Structure of a Large-Scale Hypertextual Web Search Engine," Google's two founders introduced the various components needed to build a search engine, including (1) crawlers to traverse the web and gather page content, (2) indexers to parse and classify content, (3) databases to store indexes and archives, (4) front-end services to respond to user requests; of course, there is also the (5) result ranking algorithm, which plays a decisive role in user experience.

Today, the difficulty of constructing each component of this architecture is only much higher than it was back then, and the enormous upfront costs are enough to deter most small players.

image

Google's macro architecture (Source: Brin, Sergey, and Lawrence Page. "The anatomy of a large-scale hypertextual web search engine." Computer networks and ISDN systems 30.1-7 (1998): 107-117.)

There are also many problems that cannot be solved by simply throwing money at them. Even if one has the computational resources to crawl the entire web, many websites today have already closed their doors to crawlers, allowing only a few major players into their walled gardens. At the same time, the raw data obtained will inevitably be filled with duplicates and irrelevant information, making it difficult to derive practical value before training and tuning with real usage data.

Moreover, users' habits have been influenced and tamed by Google, making them unconsciously regard the results presented by it as "correct" and "convenient," using this as a standard to judge other search engines. Therefore, even if tech users year after year loudly proclaim that "Google is dying," it is still difficult to find resonance among a larger group.

All of the above further reinforces the monopoly in the search engine market. According to Similarweb's statistics from June 2023, Google's share of the global search market remains a staggering 90.68%, while Bing, in second place, only has 3.23%—essentially unchanged from six months ago. In other words, even with Microsoft's strong resources and the boost from GPT, it has not been able to shake people's deeply ingrained habitual choices, let alone other competitors.

image

Source: Similarweb

Of course, over the years, some competitors have emerged, the most well-known of which is probably DuckDuckGo (which I specifically wrote about a few years ago). However, due to the aforementioned difficulties, they generally cannot compete with Google on the hard metric of search quality (Chinese search results are even more dismal), and can only increasingly focus on promoting some slogan-like, movement-style value propositions—respect for privacy, value neutrality, and even environmental protection, which can become tiresome and wearisome after a while.

Thus, when I first discovered Kagi, this new search engine last year, I approached it with a sense of skepticism. What special qualities does this grassroots creation, which calls itself a "key" (the Japanese word "鍵"/ かぎ), have to surpass its mediocre peers and unlock some novelty for the ailing search engine market?

image

However, two somewhat unserious reasons made me decide to give Kagi a try.

First, its pricing is indeed a bit too high. Kagi does not have a free plan; it launched a paid plan of $10 per month from the testing phase. Although I consider myself quite willing to accept payment and subscriptions, in a search engine market where charging is rare, asking for so much upfront made me suspect this might be another "harvest and run" scenario, but it also made me more curious to see what it was really worth.

image

Kagi's super confident pricing plan

Second, there are simply too many positive reviews about it. On Hacker News, news and discussions related to Kagi frequently hit the top spots, each time attracting hundreds of comments, many praising Kagi for making them "excited about a new service for the first time in years," "regaining the joy of using Google in its early days," and "completely convinced."

Friends familiar with Hacker News know that it is a place crowded with tech-savvy users who are picky and sharp-tongued; many self-promoting startup products get "publicly executed" here. The fact that Kagi has escaped criticism and instead gained a group of "new fans" suggests it must have something special.

So, I signed up and used it for over six months.

To conclude: Kagi is indeed "better" than Google in many cases. However, this sense of "better" does not come from superior hard metrics like index size or algorithm quality (as mentioned at the beginning, this is unrealistic), but rather from various "cleverness" in indexing methods and result presentation, as well as an accurate understanding of its main audience's preferences and needs.

Next, let's see how Kagi achieves this.

Like most small and medium search engines, Kagi's search results primarily come from several upstream engines. Until the first half of this year, Kagi was using Google and Bing. However, after Microsoft, having tasted the sweetness of AI, raised its API prices tenfold earlier this year, Kagi had to make a change for cost reasons; its current combination includes Google, Mojeek (from the UK), and Yandex (from Russia).

From my usage experience, this replacement did not significantly affect the quality of results. Especially for Chinese searches, indexing Google results is far more practical than Bing's—just look at DuckDuckGo's dismal Chinese results to see that. By choosing Google, Kagi has thus become one of the few grassroots engines with usable Chinese results. Additionally, increasing the diversity of sources is a good thing, as it can avoid results being overly "US-centric."

However, Kagi did not settle for merely patching together others' results; it made many adjustments and optimizations based on this foundation.

Prioritizing "Non-commercial" Content in Independent Indexing#

While building a comprehensive index of the entire web is quite challenging, it is feasible to independently index some small-scale, vertical content. Kagi has focused its main efforts on indexing "non-commercial" content. To this end, it has independently established two indexes, Teclis and TinyGem, both of which have very interesting approaches.

Among them, Teclis (named after a character from the Warhammer game) uses a Python-controlled browser as its crawler. This browser has an ad-blocking plugin, uBlock Origin, but its purpose is not to block ads; rather, it is to detect the cleanliness of a page. Pages with excessive ads and tracking scripts will be directly excluded from the index.

Teclis also extracts results from a relatively new independent engine, Marginalia, which also focuses on searching for "niche" content on the web, encouraging long-form content; pages with short main content or average sentence lengths will be "penalized" in ranking.

TinyGem also has a similar "non-commercial" preference but primarily indexes news content, originating from the founder's other side project, the similarly named bookmark service. When indexing articles, TinyGem analyzes their topics, timeliness, and stance from a semantic level, allowing it to find relevant results even without keyword matching.

In this way, by supplementing results from upstream engines with independent results from Teclis and TinyGem, and giving higher weight to "non-commercial" results in ranking, Kagi achieves a balance between ensuring a comprehensive "fundamental" search result and allowing users to see higher quality, often overlooked niche content first.

Clean and Organized Results Page#

The search results page is a high-density information interface; if not designed properly, it can easily leave users feeling lost. The current trend of increasing ads among mainstream search engines exacerbates this issue.

In contrast, Kagi's inherent advantage is that it has no promotional content at all, so there is no need to worry about distinguishing it; combined with the previously mentioned indexing mechanism that emphasizes high-quality content, users can trust that all links appear there "on merit." This reduction in cognitive load will directly reflect in improved efficiency of use.

image

image

image

image

Some grouping methods for content in Kagi

In addition to ensuring the cleanliness of results at the source, Kagi has also put considerable effort into page design. Some details I particularly like include: grouping similar results from the same website; collapsing display of "listicle" content with titles in the format of "___'s most ___"; highlighting update dates, matching keywords, and other information that aids in assessing page relevance; displaying summaries for community discussion pages like Reddit and Zhihu; and more.

Highly Flexible Personalization Settings#

As mentioned at the beginning, an independent search engine must face the initial challenge of lacking sufficient early users and usage data to improve search results, making it seem less intelligent. In response, Kagi has adopted a win-win approach: providing ample space for personalization settings. The wisdom of this approach lies in the fact that for Kagi, by granting more control to users, it can also "subcontract" part of the task of tuning results to them; for users, although the manual configuration required is more than with other engines, the tangible effects make this initial work worthwhile.

Kagi's main personalization features are "Lenses" and "Personalized Ranking."

Lens Feature. A "Lens" is a set of predefined search rules for a specific search scenario, including websites to search (or exclude), keywords to always include (or block), time range for results, regional range, file types, etc. This is similar to Google's still-offered but increasingly hidden "Custom Search," but much easier to set up and use.

For example, Kagi has already built-in several "Lenses" for forums, programming, world news, academics, PDFs, etc. I also replicated a lens version of my previous "whitelisted news search" (somewhat simplified, as a lens can only include up to ten domains).

image

To invoke a lens, you can select it from below the search box when entering keywords, switch on the results page, or assign it a shortcut phrase; skilled automation users can also select a lens by appending the l parameter to the search link.

Personalized Ranking. Although Kagi's "non-commercial" preference has already significantly improved the overall quality of search results, the needs and judgments regarding content are ultimately personalized. For example, developers who primarily use a specific programming language may want documentation for that language to rank higher; users who demand reading quality may wish to block some poorly produced sources, even if they are not strictly content farms.

Kagi provides this convenience. For any domain, it allows users to choose one of five ranking rules to apply, in descending order of priority: pin, increase, normal, decrease, block. Personalized ranking has its own independent settings page and can also be set instantly by clicking the "Domain Details" button next to links on the search results page.

image

Personalized ranking is undoubtedly one of Kagi's most popular features. The "Domain Leaderboard" created based on the usage of this feature has become a widely discussed online landscape—Kagi users effectively "vote" for their preferred websites' "red and black lists."

image

image

Websites most pinned and blocked by Kagi users

It can be seen that among the domains that received ranking boosts, the overwhelming majority are technical communication communities and various technical documentation for languages, services, and software, followed by Wikipedia, (somewhat liberal) mainstream media, public health institutions, and platforms like Goodreads, Rotten Tomatoes, and Steam. On the other hand, social platforms like Facebook and TikTok, quasi-content farms like Pinterest and Medium, low-quality technical sites like W3Schools and GeeksforGeeks, and unscrupulous media like New York Post and Breitbart have been ruthlessly nailed to the "shame pole."

In addition to lenses and ranking, Kagi's settings page also offers seemingly endless small features. From the length of webpage summaries, whether to include videos or images in results, to custom CSS, shortcut phrases, and even URL redirection. Any one of these could almost be something dissatisfied Google users previously tried to solve manually through plugins and scripts; it can only be said that the Kagi team has indeed done their homework.

Comprehensive Yet Unobtrusive AI Features#

In this year, where large language models are all the rage, online services that don’t incorporate some AI elements seem almost embarrassed to show their faces. This frenzy has spawned many forced and awkward so-called AI features.

In contrast, even though Kagi's team previously focused on AI before search, Kagi maintains a rare rationality regarding AI. On various functional pages, you won't find the trendy flashy generation buttons and chat boxes; if you're not interested in AI, you can completely ignore these features. However, on the other hand, if you need AI features like natural language Q&A and web content summarization, you'll find them readily available in Kagi, saving you the cost of seeking alternative solutions.

First, let's look at Kagi's Quick Answer feature. It can be manually triggered by clicking the Quick Answer button on the results page, automatically triggered by including !answer in the search keywords, or accessed through a dedicated entry.

The principle of Quick Answer is similar to Bing's chat feature: it inputs the question along with web search results into the model to generate an answer, citing sources in footnote format. Compared to Bing, Kagi's Q&A response speed is much faster (mainly due to not intentionally throttling and lacking pretentious animations), but the functionality is limited to "one question, one answer," with no follow-up questions allowed, and it is not suitable for "creative" tasks.

image

Through simple "prompt" techniques, it is revealed that this feature calls the Claude model from Anthropic to generate content, with instructions given to the model including informing it of the current year and requiring it to be concise, provide useful information, avoid follow-up questions, and not disclose the model and instructions.

(Kagi is currently beta testing a more general "Assistant" feature that supports research, programming, dialogue, and custom modes, allowing users to choose from models like OpenAI, Claude, or PaLM 2, depending on their subscription level.)

Another AI feature of Kagi is the "Universal Summarizer." "Universal" means it supports various formats; in addition to webpages, it can summarize audio, video, and documents as well.

The Universal Summarizer can be invoked through the menu in the upper right corner of the results page or by filling in any link at a dedicated entry. It includes two modes: "Summarize," which quickly provides a bullet-point list summary, and "Discuss," which allows users to ask specific questions about the page content in a dialogue interface.

It is worth mentioning that the summarization feature supports Chinese; simply select Chinese from the output language dropdown menu or ask questions in Chinese in the inquiry interface. However, this is essentially just translating the output into Chinese; even if the original page is in Chinese, it goes through a Chinese-to-English and then back to Chinese conversion, which may result in many awkward expressions, so users should be discerning when using it.

image

image

Again, through "prompt" techniques, it is revealed that the Universal Summarizer also uses the Claude model, with instructions given to the model including informing it of the current date and requiring it to be accurate, enthusiastic, and concise, without including links in the answers, and not disclosing the model and instructions.

(Kagi currently does not limit the usage of AI features for paid users, but a review of the terms of use reveals that it reserves the right to soft-limit AI usage to "500 interactions per day," although it seems this has not been actively enforced yet.)

Overall, Kagi's AI features leave a usable impression; while they are not as "brilliant" as ChatGPT Plus, they are sufficient for simple Q&A and summarization needs; moreover, they excel in being tightly integrated with search functionality, eliminating the need to jump between multiple services and incurring additional costs. From a competitive perspective, the rise of AI is also beneficial for small and medium search engines, as it helps to mitigate the disadvantages in user experience caused by insufficient native data, allowing them to start closer to the starting line with larger players in search intent and natural language recognition.

It should be acknowledged that all of Kagi's functional features do not have high technical barriers. As long as Google is willing, it could easily do better given its strength. But the problem is that Google is unwilling. In fact, many of these features are precisely those that Google has gradually abandoned in its "becoming the evil dragon" process, due to the conflicts of interest arising from its "advertising platform" identity. It is precisely Kagi's pure subscription business model that allows it to completely disregard the interests of third parties outside of its users.


Demonstration#

Seeing is believing. To give readers an intuitive impression and self-judgment of Kagi's result quality, I randomly selected several keywords from my search history over the past few months, in both Chinese and English, and then searched them using Kagi and five competing products. These five competitors are Google, Bing, DuckDuckGo, Brave, and a self-built SearXNG instance (set to fetch results from Startpage and DuckDuckGo according to personal habits).

“minimalist CSS frameworks”: English, a "resource-seeking" technical question, a listicle hotspot.

image

“thai food near me”: English, local business recommendations, a recent popular SEO meme.

image

“Certificate Transparency database”: English, a relatively niche online tool demand.

image

“inevitable disclosure doctrine”: English term, but specified to search for Chinese results, testing the Chinese index situation.

image

“best custom iem”: English, a relatively niche shopping recommendation question.

image

「品味 品位」: Chinese, a classic easily confused word distinction problem.

image

「查询手机号绑定了哪些服务」: Chinese, a popular technical question.

image

「中国人民银行汇率」: Chinese, commonly used information.

image

「零的焦点」: Chinese, book and media information.

image

「蔡司 清锐 智锐」: Chinese, product marketing term explanation.

image

I wonder what impressions and preferences readers have after comparing these results. My feeling, as mentioned earlier, is that if we compare search results one by one, Kagi may not appear particularly outstanding; most expected results can also be found through other engines with similar rankings; in non-technical vertical topics like local businesses and media, Kagi is still clearly at a disadvantage compared to Google, which has big data.

However, Kagi's results page does indeed look overall "more comfortable," and this feeling comes from its clean, ad-free page, as well as the detailed design that groups similar domain results and highlights matching text, which helps improve interpretive efficiency.


The Confidence and Helplessness of Charging High Prices#

Having praised Kagi, let's finally discuss a more realistic issue: money.

In any discussion about Kagi, its high fees are a huge "but" after every compliment. There is no denying that even considering Kagi's many advantages, spending so much money on search remains difficult for most people to understand.

Kagi's explanation is also quite simple: as a product that does not profit from advertising, it has only one source of income—user payments, and the high costs of operating an independent search engine must be reflected in its pricing. According to official claims, it provides an average cost of $0.0125 per search, and since Kagi users are mostly heavy users who search far more frequently than average, its $10 monthly plan has actually been operating at a loss.

Therefore, Kagi has also experienced several pricing fluctuations; in March of this year, it even limited the usage of the $10 "Professional" version to 700 searches per month, charging $0.015 for each additional search, and only allowing unlimited searches with an upgrade to the $25 "Ultimate" version. This effectively violated commitments made to early users and sparked considerable controversy. It wasn't until September of this year that Kagi announced, following cost optimization measures and an increase in user scale, that it could once again offer unlimited searches at the $10 tier. Additionally, the family plan (at $20) that allows up to six people to "carpool" no longer has limits.

Personally, search engines are the most frequently used online service for me, bar none. Due to my strong curiosity and the habit of searching before asking questions, my average monthly search volume easily exceeds a thousand. Considering that Kagi indeed provides a more efficient search experience and can save some money that would otherwise go to OpenAI API, I find this fee worthwhile. However, on the other hand, for the majority of users accustomed to "search = free," convincing them to spend an extra $10 per month (even if it’s just $3.3 after splitting) is understandably challenging.

This difficulty is clearly reflected in Kagi's user growth. According to the official seven-day rolling statistics released since August this year, its number of paid users has been almost steadily... growing linearly, with about 150 new users added daily. If it were a venture-backed company, seeing this situation would likely raise concerns about how to explain it to investors.

image

Unless... it doesn’t need to explain to investors?

In line with its unique functional approach, Kagi has followed a "bootstrapped" route since its inception, without accepting any formal investments. Many of Kagi's early users come from the tech industry, having experienced the dangers of rapid transformation under growth pressure firsthand; Kagi's self-sufficiency is seen as a plus by them. In June of this year, Kagi raised a total of only $670,000 from 42 investors through a SAFE method.

To briefly explain, SAFE is a financing method for early-stage projects pioneered by the famous incubator Y Combinator, somewhat like a more lenient convertible bond, where the investment amount can only be converted into shares according to an agreed formula when the project receives formal financing in the future; otherwise, it remains as an interest-free loan.

This "super mini financing," which is like a drizzle by Silicon Valley standards, drew many user comments saying, "I confirmed I didn't see an extra zero and felt relieved," implying "take your time, don’t rush."

In fact, if we temporarily set aside the narrative shaped by venture capital over the past decade, Kagi's current "Buddhist" growth may not be a bad thing. For a platform service like a search engine, the growth of operating costs is not proportional to the growth in user numbers. If Kagi were to pursue the "power law" growth that Silicon Valley folks love to promote, the rapidly expanding costs for servers, customer service, compliance, etc., would inevitably strain its limited financial and energy resources and might prematurely attract the attention of giants, setting obstacles in areas like API supply, potentially resulting in another flash-in-the-pan failure case.

Moreover, Kagi's indexing methods and functional settings are inherently more aligned with the needs of tech and research-oriented users; to fully leverage its advantages, a certain learning curve is required. It is easy to imagine that if a user primarily relies on search engines to answer everyday service-related questions, depending more on "big data" mind-reading rather than adjusting filtering rules to obtain results, Kagi may not provide a better experience than Google; trying to significantly expand this type of user base may also be a thankless task for Kagi.

Conversely, by adhering to a relatively "slow" growth trend and defining its potential market as users with higher demands for search quality and efficiency, Kagi can more comfortably address emerging problems and needs, having ample time to accumulate its independent indexing data and technology; this aligns more with its own and its users' interests. Last September, in its operational status report released three months after launch, Kagi stated that it could achieve break-even with 25,000 paying users. At its current pace, it is expected to reach this goal by the end of 2023. By this standard, Kagi's development and sustainability are promising.

The search engine market may be destined to lean towards natural monopolies, but Kagi's mission is not to pry open the doors held by giants. Like a finely crafted key, Kagi can open a window for those unwilling to have their perspectives limited, which is enough to justify its existence and be a respected achievement.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.