Google Does Try To Handle Broken Canonicals

With all this talk about clustering and canonicalization today, what about if you get it wrong, or your CMS breaks and serves the wrong thing to Google? We know how consistency and communicating that

Search Engine Roundtable
Google Search: How Clustering Works With Localization

As part of the topic of clustering and canonicalization with Google Search today, Allan Scott from Google explained how Google Search's clustering technology works with its localization technology. Th

Search Engine Roundtable
Google Marauding Black Holes With Clustering & Error Pages

As part of the topic of clustering and canonicalization with Google Search today, Allan Scott from Google explained what he called "marauding black holes" in Google Search. Where Google's clustering t

Search Engine Roundtable

Google Search uses about 40 signals for canonization including redirects, canonicals, x-default and got rid of Redirect to Shortner https://www.seroundtable.com/google-search-40-signals-for-canonicalization-38528.html

#google #canonicalization #seo #redirects

Google Has 40 Signals For Canonicalization

Did you know that Google uses around 40 different signals to perform canonicalization in Google Search? This not only includes numerous type of redirects, the rel canonical attribute but also things l

Search Engine Roundtable

Google on the difference between clustering and canonicalization: "Clustering is basically taking the pages that we think are the same. And then canonicalization is, from those pages, which one is the best one" @johnmu said https://www.seroundtable.com/google-search-clustering-canonicalization-38529.html

#seo #google #canonicalization #clustering #search

Google Search Difference Between Clustering & Canonicalization

Google's John Mueller explained the difference between clustering and canonicalization within Google Search. He said, "Clustering is basically taking the pages that we think are the same. And then can

Search Engine Roundtable

Google's @johnmu clarifies that #s in the Google Search Console reports are unrelated to canonicalization, they are there for reporting on Sitelinks https://www.seroundtable.com/pounds-in-google-search-console-reports-37583.html with more from @LilyRay @thetafferboy

#googlesearchconsole #google #googleseo #googlesitelinks #googleperformancereport #googlesearch #canonicalization

Google: #s In Google Search Console Reports Are Unrelated To Canonicalization

There is a lot of talk in the wider SEO community around pound signs, #, in the Google Search Console reports, meaning something about canonicalization. John Mueller from Google, along with a number o

Search Engine Roundtable

#Business #Explainers
Google explains how it chooses canonical webpages · Why duplicate pages can be important for SEO https://ilo.im/15yhco

_____
#SEO #Google #SearchEngine #Website #Webpage #Content #Deduplication #Canonicalization #Design #WebDesign

Google Explains How It Chooses Canonical Webpages

Google's Gary Illyes describes the signals it uses to choose canonical pages and shares why duplicate pages can be important for SEO

Search Engine Journal

Table of Contents

In a previous article we defined folding functions, and used them to enable some canonicalization and the sccp constant propagation pass for the poly dialect. This time we’ll see how to add more general canonicalization patterns.

The code for this article is in this pull request, and as usual the commits are organized to be read in order.

Why is Canonicalization Needed?

MLIR provides folding as a mechanism to simplify an IR, which can result in simpler, more efficient ops (e.g., replacing multiplication by a power of 2 with a shift) or removing ops entirely (e.g., deleting $y = x+0$ and using $x$ for $y$ downstream). It also has a special hook for constant materialization.

General canonicalization differs from folding in MLIR in that it can transform many operations. In the MLIR docs they will call this “DAG-to-DAG” rewriting, where DAG stands for “directed acyclic graph” in the graph theory sense, but really just means a section of the IR.

Canonicalizers can be written in the standard way: declare the op has a canonicalizer in tablegen and then implement a generated C++ function declaration. The official docs for that are here. Or you can do it all declaratively in tablegen, the docs for that are here. We’ll do both in this article.

Aside: there is a third way, to use a new system called PDLL, but I haven’t figured out how to use that yet. It should be noted that PDLL is under active development, and in the meantime the tablegen-based approach in this article (called “DRR” for Declarative Rewrite Rules in the MLIR docs) is considered to be in maintenance mode, but not yet deprecated. I’ll try to cover PDLL in a future article.

Canonicalizers in C++

Reusing our poly dialect, we’ll start with the binary polynomial operations, adding let hasCanonicalizer = 1; to the op base class in this commit, which generates the following method headers on each of the binary op classes

// PolyOps.h.incstatic void getCanonicalizationPatterns( ::mlir::RewritePatternSet &results, ::mlir::MLIRContext *context);

The body of this method asks to add custom rewrite patterns to the input results set, and we can define those patterns however we feel in the C++.

The first canonicalization pattern we’ll write in this commit is for the simple identity $x^y – y^2 = (x+y)(x-y)$, which is useful because it replaces a multiplication with an addition. The only caveat is that this canonicalization is only more efficient if the squares have no other downstream uses.

// Rewrites (x^2 - y^2) as (x+y)(x-y) if x^2 and y^2 have no other uses.struct DifferenceOfSquares : public OpRewritePattern { DifferenceOfSquares(mlir::MLIRContext *context) : OpRewritePattern(context, /*benefit=*/1) {} LogicalResult matchAndRewrite(SubOp op, PatternRewriter &rewriter) const override { Value lhs = op.getOperand(0); Value rhs = op.getOperand(1); if (!lhs.hasOneUse() || !rhs.hasOneUse()) { return failure(); } auto rhsMul = rhs.getDefiningOp(); auto lhsMul = lhs.getDefiningOp(); if (!rhsMul || !lhsMul) { return failure(); } bool rhsMulOpsAgree = rhsMul.getLhs() == rhsMul.getRhs(); bool lhsMulOpsAgree = lhsMul.getLhs() == lhsMul.getRhs(); if (!rhsMulOpsAgree || !lhsMulOpsAgree) { return failure(); } auto x = lhsMul.getLhs(); auto y = rhsMul.getLhs(); AddOp newAdd = rewriter.create(op.getLoc(), x, y); SubOp newSub = rewriter.create(op.getLoc(), x, y); MulOp newMul = rewriter.create(op.getLoc(), newAdd, newSub); rewriter.replaceOp(op, {newMul}); // We don't need to remove the original ops because MLIR already has // canonicalization patterns that remove unused ops. return success(); }};

The test in the same commit shows the impact:

// Input:func.func @test_difference_of_squares( %0: !poly.poly<3>, %1: !poly.poly<3>) -> !poly.poly<3> { %2 = poly.mul %0, %0 : !poly.poly<3> %3 = poly.mul %1, %1 : !poly.poly<3> %4 = poly.sub %2, %3 : !poly.poly<3> %5 = poly.add %4, %4 : !poly.poly<3> return %5 : !poly.poly<3>}// Output:// bazel run tools:tutorial-opt -- --canonicalize $FILEfunc.func @test_difference_of_squares(%arg0: !poly.poly<3>, %arg1: !poly.poly<3>) -> !poly.poly<3> { %0 = poly.add %arg0, %arg1 : !poly.poly<3> %1 = poly.sub %arg0, %arg1 : !poly.poly<3> %2 = poly.mul %0, %1 : !poly.poly<3> %3 = poly.add %2, %2 : !poly.poly<3> return %3 : !poly.poly<3>}

Other than this pattern being used in the getCanonicalizationPatterns function, there is nothing new here compared to the previous article on rewrite patterns.

Canonicalizers in Tablegen

The above canonicalization is really a simple kind of optimization attached to the canonicalization pass. It seems that this is how many minor optimizations end up being implemented, making the --canonicalize pass a heavyweight and powerful pass. However, the name “canonicalize” also suggests that it should be used to put the IR into a canonical form so that later passes don’t have to check as many cases.

So let’s implement a canonicalization like that as well. We’ll add one that has poly interact with the complex dialect, and implement a canonicalization that ensures complex conjugation always comes after polynomial evaluation. This works because of the identity $f(\overline{z}) = \overline{f(z)}$ for all polynomials $f$ and all complex numbers $z$.

This commit adds support for complex inputs in the poly.eval op. Note that in MLIR, complex types are forced to be floating point because all the op verifiers that construct complex numbers require it. The complex type itself, however, suggests a complex is perfectly legal, so it seems nobody in the MLIR community needs Gaussian integers yet.

The rule itself is implemented in tablegen in this commit. The main tablegen code is:

include "PolyOps.td"include "mlir/Dialect/Complex/IR/ComplexOps.td"include "mlir/IR/PatternBase.td"def LiftConjThroughEval : Pat< (Poly_EvalOp $f, (ConjOp $z)), (ConjOp (Poly_EvalOp $f, $z))>;

The two new pieces here are the Pat class (source) that defines a rewrite pattern, and the parenthetical notation that defines sub-trees of the IR being matched and rewritten. The source documentation on the Pattern parent class is quite well written, so read that for extra detail, and the normal docs provide a higher level view with additional semantics.

But the short story here is that the inputs to Pat are two “IR tree” objects (MLIR calls them “DAG nodes”), and each node in the tree is specified by parentheses ( ) with the first thing in the parentheses being the name of an operation (the tablegen name, e.g., Poly_EvalOp which comes from PolyOps.td), and the remaining arguments being the op’s arguments or attributes. Naturally, the nodes can nest, and that corresponds to a match applied to the argument. I.e., (Poly_EvalOp $f, (ConjOp $z)) means “an eval op whose first argument is anything (bind that to $f) and whose second argument is the output of a ConjOp whose input is anything (bind that to $z).

When running tablegen with the -gen-rewriters option, this generates this code, which is not much more than a thorough version of the pattern we’d write manually. Then in this commit we show how to include it in the codebase. We still have to tell MLIR which pass to add the generated patterns to. You can add each pattern by name, or use the populateWithGenerated function to add them all.

As another example, this commit reimplements the difference of squares pattern in tablegen. This one uses three additional features: a pattern that generates multiple new ops (which uses Pattern instead of Pat), binding the ops to names, and constraints that control when the pattern may be run.

// PolyPatterns.tddef HasOneUse: Constraint, "has one use">;// Rewrites (x^2 - y^2) as (x+y)(x-y) if x^2 and y^2 have no other uses.def DifferenceOfSquares : Pattern< (Poly_SubOp (Poly_MulOp:$lhs $x, $x), (Poly_MulOp:$rhs $y, $y)), [ (Poly_AddOp:$sum $x, $y), (Poly_SubOp:$diff $x, $y), (Poly_MulOp:$res $sum, $diff), ], [(HasOneUse:$lhs), (HasOneUse:$rhs)]>;

The HasOneUse constraint merely injects the quoted C++ code into a generated if guard, with $_self as a magic string to substitute in the argument when it’s used.

But then notice the syntax of (Poly_MulOp:$lhs $x, $x), the colon binds $lhs to refer to the op as a whole (or, via method overloads, its result value), so that it can be passed to the constraint. Similarly, the generated ops are all given names so they can be fed as the arguments of other generated ops Finally, the second argument of Pattern is a list of generated ops to replace the matched input IR, rather than a single node for Pat.

The benefit of doing this is significantly less boilerplate related to type casting, checking for nulls, and emitting error messages. But because you still occasionally need to inject random C++ code, and inspect the generated C++ to debug, it helps to be fluent in both styles. I don’t know how to check a constraint like “has a single use” in pure DRR tablegen.

Share this:

#c_ #canonicalization #compilers #complexNumbers #mathematics #mlir #polynomials #primer #programming

https://jeremykun.com/2023/09/20/mlir-canonicalizers-and-declarative-rewrite-patterns/

GitHub - j2kun/mlir-tutorial

Contribute to j2kun/mlir-tutorial development by creating an account on GitHub.

GitHub
Google Does Major Refresh Of The Canonicalization Help Documentation

Google has updated its search help documentation around canonicalization this morning. The Google Search Relations team split in three distinct sections and updated a lot of the content to provide cle

Search Engine Roundtable