banner
Leo

Leo的恒河沙

一个活跃于在珠三角和长三角的商业顾问/跨境电商专家/投资人/技术宅/骑行爱好者/两条边牧及一堆小野猫的王/已婚;欢迎订阅,日常更新经过我筛选的适合精读的文章,横跨商业经济情感技术等板块,总之就是我感兴趣的一切

2024-02-23-Hiding Disruptive Elements with Ad Blockers: Common Ideas and Syntax Introduction | Minority Member π+Prime

Hiding Disruptive Elements with Ad Blockers: Common Ideas and Syntax Introduction | Minority Party Member π+Prime#

#Omnivore

In response to the need to block disruptive elements on web pages, this article introduces the basic principles of "cosmetic rules" in ad blockers, as well as common ideas for creating element selectors.


Introduction#

In today's chaotic online content ecosystem, ad blockers have almost become a necessity rather than an option; a browsing experience without ad blocking is nearly unacceptable.

However, simply installing an ad blocker is not enough to achieve satisfactory results. The rule sets that ad blockers rely on—essentially lists of elements to be blocked or hidden—are maintained by the community and vary in quality; even popular rules can accumulate a lot of redundancy and errors over time. If one indiscriminately enables too many rules, it can easily degrade performance and lead to false positives.

What content should be blocked is also a very subjective judgment. Some ads that are visually less disruptive, infrequent, and do not involve privacy tracking are actually acceptable and provide a way to support independent, small-scale websites. (This is also the stance of projects like Better Ads and Acceptable Ads, although both have been criticized for their ties to advertisers.)

Conversely, some elements that are not strictly "ads" can also waste resources and occupy attention, such as eye-catching recommendation panels in sidebars, auto-playing videos forcibly inserted into articles, and email subscription requests, among others.

Therefore, it is quite difficult to find a ready-made rule set that fully meets one's requirements. Even rule sets targeting "annoying" elements, such as Fanboy’s Annoyance List and AdGuard Annoyances Filter, only conservatively cover a small portion of disruptive elements on websites with a certain amount of traffic and popularity. To truly achieve ideal results, one must take matters into their own hands.

However, the official documentation of ad blockers (such as uBlock Origin and AdGuard) is generally quite obscure, and it may seem a bit daunting for beginners. Therefore, this article will introduce some common ideas and corresponding syntax from a beginner's perspective, hoping to help readers write the rules they need.

(Note: There are many ad blockers on the market, with varying quality and different rule syntaxes. Unless otherwise specified, the examples in the following text will be based on the well-reviewed uBlock Origin. If your browser is not supported, AdGuard can also be used, as the syntax of the two is generally compatible.)

Basic Preparation#

Before writing rules, it is essential to understand that the rules used by ad blockers are mainly divided into two categories:

  • Basic rules (basic rules), also known as blocking rules. These rules target specific file addresses, aiming to prevent the data required to display ads from loading. Therefore, they are more suitable for blocking ad resource files and tracking scripts with obvious URL characteristics (e.g., fixed patterns in filenames or paths); only these rules can protect privacy.
  • Cosmetic rules (cosmetic rules), also known as hiding rules. These rules target specific page elements, aiming to hide the parts of the page where ads are located. Therefore, they are more suitable for blocking elements that are not easily distinguishable by URL characteristics but have regular positions on the page; they are often used as a supplement to basic rules to hide the blank spaces left by ads that cannot be loaded.

(Some more complex rules, such as those that inject custom CSS styles or JavaScript scripts, are implemented by some ad blockers themselves and lack general applicability, so this article will not discuss them.)

In the context of blocking disruptive elements discussed in this article, since these elements often contain both text and media files and are frequently interspersed with the main content, using blocking rules is neither convenient nor realistic. Therefore, cosmetic rules are primarily used, aiming for "out of sight, out of mind." Thus, the following will focus on introducing cosmetic rules.

The general format of cosmetic rules is as follows:

example.org##selector

Here, the part before the delimiter ## is the domain targeted by the rule, and the part after is the expression for the elements to be blocked.

How to specify the elements to be blocked? This is mainly expressed through CSS selectors.


Interlude: Introduction to CSS Selectors

If you are not familiar with what CSS is and its working mechanism, here is a simple explanation: the appearance of a webpage is determined by two parts: "content" and "style." For example, "what text is on the webpage" is a content issue, while "what font is used" is a style issue; "what images are displayed" is a content issue, while "the size and effects of the images" are style issues; and so on.

CSS is a set of syntax rules for applying styles to content. Selectors are part of this set of rules, and their function is to determine "which content to style." The basis for selection may include—

  1. Element type, such as heading h1, image img, etc.;
  2. Class, which can be understood as a custom "tag" used to categorize a group of related elements;
  3. ID, which can be understood as a custom name used to identify specific elements;
  4. Any attribute of the element, such as a specific URL that a link points to a[href="https://example.com"];
  5. Specific states of the element, such as the element when the mouse hovers over it :hover, or the first child element :first-child;
  6. Recognizable features on the page that do not have corresponding elements in the code, such as positions immediately following a certain element ::after, or the first character in a paragraph (::first-letter); or
  7. Combinations of the above.

Clearly, CSS selectors provide ample tools for specifying elements on a webpage. Moreover, since it is a universally supported syntax across all browsers, using CSS selectors to specify the range to be blocked is a reasonable choice.


Therefore, to write cosmetic rules that can block specific disruptive elements, the essence is to find the CSS selectors that can select the relevant elements. In fact, the basic working principle of cosmetic rules is to inject a style display: none !important targeting the selected elements into the current page, indicating "make this element not display, ignoring all contrary regulations."

How to find them? To facilitate users in customizing cosmetic rules, most ad blocker plugins provide a "select and block" feature. For example, the following image shows the "element selector" mode of uBlock Origin, where you can right-click on the page and select Block element, or click the dropper icon in the plugin window to open it.

image

In element selector mode, by clicking on the part of the page you want to block, uBlock Origin will automatically record its corresponding CSS selector and add it to the rule list. (The two sliders on the left and right are used to fine-tune the "depth" and "breadth" of the selection, which you can imagine as controlling the height and width of the claw in a claw machine.)

This seems quite convenient; however, it is becoming increasingly difficult to achieve satisfactory results this way. Clearly, for cosmetic rules to be effective, the writing of the selectors must be sufficiently "representative"; rules that are only effective for the current page are of little significance. This is precisely the flaw of the element selector: it often mechanically copies the properties of the selected element, resulting in obscure and redundant rules like those shown in the image above.

The causes and problems of these "redundant rules" will be explained in detail later, but it is not difficult to intuitively feel that they are too "specific," thus losing their representativeness.

Compared to relying on the overly mechanical element selection tools of ad blockers, this article recommends using the built-in DevTools (Developer Tools) of the browser, specifically the inspector feature. You can right-click on a blank area of the webpage and select Inspect (or Inspect Element, or similar menu items) or press F12 to open it (Safari requires manual activation first).

image

The main interface of the inspector displays a tree structure of all elements on the current page (DOM). By clicking the arrow button in the upper left corner (the target button in the upper right corner for Safari), you can highlight and select elements from the page and locate their position in the DOM.

Based on this groundwork, the following introduces several common ideas for customizing rules to block disruptive elements.

Blocking Elements by Class, ID, or Other Attribute Values#

As mentioned earlier, class and ID are similar to "categorization tags" and "exclusive names" for web elements, respectively, and are therefore significant for identifying and locating web elements. In the context of blocking disruptive elements, matching class and ID is often the most efficient way to write blocking rules.

The syntax for blocking elements by class and ID values is as follows:

.value
#value

Here, value is the value of the class or ID.

As for which values to match, generally speaking, you can focus on keywords related to functionality, such as ad (advertisement), promo (promotion), popup (pop-up), cta (call to action), and keywords related to form and format, such as video (video), banner (banner), carousel (carousel image).

Conversely, you generally should not consider names like row-span-2 (indicating "occupying 2 rows in a grid layout") or col-md-6 (indicating "occupying 6 columns in a grid on medium and larger devices"). These names are products of using CSS frameworks like Tailwind and Bootstrap, which specify the layout of elements in the page grid. Using them to locate elements is neither targeted enough nor likely to remain effective as the website layout is adjusted.

Here’s an example. In the 9to5Mac page shown below, the "related videos" embedded at the end is a typical example of a very selfish design, occupying a large amount of webpage space solely for traffic generation, providing no value to users. By using the inspector, it can be seen that this video is located within a div element with the class name article__youtube-video.

image

Therefore, the corresponding blocking rule would be:

9to5mac.com##.article__youtube-video

Similarly, the following rule can be used to block the comments section:

9to5mac.com###comments

(Note that there are three # here, where the first two are delimiters, and the third is the identifier for the ID selector.)

But this is just the simplest case. In many cases, the class names you see may look like the following:

image

Here, although you can infer from the class name that it corresponds to an ad banner, if you directly write the complete class name as a rule, you may find it ineffective after a short time.

This is because the random string __LvD17 at the end of the class name is generated by the web development framework during compilation and will change with each update of the website version. However, the purpose of this is not to "play hide and seek" with users, making it deliberately difficult to block, but to avoid the cumbersome nature of manually naming classes and the conflicts that arise, which is a mainstream practice in large site development. Of course, the downside is that it reduces the readability of the webpage code and indirectly increases the difficulty of blocking elements.

But there are still ways, as CSS selectors can match fragments (substrings) of attribute values, as shown below:

SyntaxMeaning
[class^=value]Select elements whose class name starts with value
[class$=value]Select elements whose class name ends with value
[class*=value]Select elements whose class name contains value at any position

(Adding a letter i before the closing parenthesis of the above syntax, such as [class^=value i], makes the matching case-insensitive.)

Therefore, using [class^=inlineoffer] (matching the beginning) or [class*=inline-img-offer-container] (matching the middle part) can select the ad banner above. As for which is better, there is no standard answer; it should be judged on a case-by-case basis based on testing results. Generally speaking, longer names mean more specificity, which helps avoid false positives; however, it may also increase the workload of writing rules, requiring multiple rules to cleanly block what could have been "caught all at once."

Here, considering that the shorter inlineoffer (literally meaning "inline ad") is undoubtedly a disruptive element, and after testing confirms its effectiveness, it can be selected.

Additionally, as mentioned above, class and ID are just two special types of HTML element attributes, and the selector syntax mentioned in this section also applies to any other attributes. For example, on the The Economist page shown below, the ad element on the right can be identified by the value of the custom attribute data-test-id.

image

Therefore, the following rule can be used to block it:

www.economist.com##[data-test-id="right-hand-rail-ads"]

In practice, other attributes that help identify disruptive elements include data-ad-type, data-ad-zone, role, src, aria-label, etc. Observing these attributes can often reveal patterns.

Blocking Elements Based on Positional Features#

Although class, ID, and attribute selectors can effectively address many blocking needs, these selectors rely on identifying patterns in attribute values, which is not always feasible. In some cases, the disruptive elements may not have obvious attribute characteristics, but their positions are relatively fixed. In such cases, combinators—selecting elements based on hierarchy and positional relationships—are a better choice.

Reading Information

Total word count: 5451 words

Reading this article takes about 9 minutes

Font size selection

Small

Medium

Large

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.