How can I obtain this simplification of an expression?

Question

I saw this question at the Maple forum.

The input is

expr = G1*P3 + G1*P5 + G1*P6 + G2*P3 + G2*P6 + G3*P2 + G3*P5 + G4*P2 +
    G4*P3 + G4*P5 + G4*P6 + G5*P2 + G5*P3 + G5*P5 + G5*P6 + G6*P3 +
   G6*P6 + G7*P2 + G7*P5 + G8*P2 + G8*P3 + G8*P5 + G8*P6;
LeafCount[expr]
(* 70 *)

They wanted to convert it to this:

desired = (G1 + G2 + G4 + G5 + G6 + G8)*(P3 + P6) + (G1 + G3 + G4 +
      G5 + G7 + G8)*P5 + (G3 + G4 + G5 + G7 + G8)*P2;
LeafCount[%]
(* 29 *)

Nothing I tried in Mathematica worked. So I think this requires a special transformation:

Simplify[expr - desired]
(* 0 *)

Some things I tried:

Simplify[expr]
LeafCount[%]

Mathematica graphics

FullSimplify[expr]
LeafCount[%]

Mathematica graphics

Collect[expr, {P3, P6, P5, P2}]
LeafCount[%]

Mathematica graphics

etc..

Some of the answers in the above link use some Maple commands which I could not reproduce in Mathematica.

How can I do it? I am using version 13.1.

I am not sure what the constraints are but since you used Collect, a direct approach could be Collect[expr /. P3 -> -P6 + s, {P5, P2, s}] /. s -> (P3 + P6) — userrandrand, Nov 24 '22 at 22:51
@userrandrand from linked question, they had no constrained whatsoever. It will be better to get a solution that does not requires specific hardcoded transformation on each variable as shown in Maple answers. One can obtain list of variables ofcourse. But an automated answer would be better as shown given in the linked page. One answer uses V:= indets(ex)[]: codegen[optimize](unapply(ex, [V]), tryhard)(V); but I do not know how to reproduce this in Mathematica as it uses special Maple package. idents finds the variables in expression. can reproduce indets but not the optimize command. — Nasser, Nov 24 '22 at 22:56
In addition, Collect[expr,{P2,P3,P5,P6}]//Collect[#,#[[1]]&/@List@@#]& gives a LeafCount of 25 — user1066, Nov 25 '22 at 21:18

Bob Hanlon · Answer 1 · 2022-11-25T03:23:42.993

$Version

(* "13.1.0 for Mac OS X x86 (64-bit) (June 16, 2022)" *)

Clear["Global`*"]

expr = G1*P3 + G1*P5 + G1*P6 + G2*P3 + G2*P6 + G3*P2 + G3*P5 + G4*P2 +
    G4*P3 + G4*P5 + G4*P6 + G5*P2 + G5*P3 + G5*P5 + G5*P6 + G6*P3 + 
   G6*P6 + G7*P2 + G7*P5 + G8*P2 + G8*P3 + G8*P5 + G8*P6;

desired = ((Coefficient[expr, #] & /@ {P2, P3, P5, P6}) . {P2, P3, P5,
      P6}) /. a_*b_ + c_*b_ :> (a + c) b

(G3 + G4 + G5 + G7 + G8) P2 + (G1 + G3 + G4 + G5 + G7 + G8) P5 + (G1 +
     G2 + G4 + G5 + G6 + G8) (P3 + P6)

expr == desired // Simplify

(* True *)

EDIT: As suggested by @userrandrand in a comment, this can be simplified to

desired2 = Collect[expr, {P2, P3, P5, P6}] /. 
  a_*b_ + c_*b_ :> (a + c) b
(* (G3 + G4 + G5 + G7 + G8) P2 + (G1 + G3 + G4 + G5 + G7 + G8) P5 + 
    (G1 + G2 + G4 + G5 + G6 + G8) (P3 + P6) *)

Maybe the Coefficient and dot product part can be simplified with Collect[expr, {P2, P3, P5, P6}]. I ended up changing my final code to use your replacement rule. — userrandrand, Nov 25 '22 at 02:03

userrandrand · Accepted Answer · 2022-11-25T16:01:37.693

Outline

Usage example
Code (included cases where the expression has powers of terms)
Explanation
Comparing with Simplify on random expressions (major addition since last edit)
Previous version of code

One could consider a home cooked simplify.

The purpose of the code below is to eliminate any guidance from the user. That is, there is no hard coding or any way to incorporate insight from the user.

The main idea is to find the terms that occur the most in multiplications and use Collect with those variables.

Usage example

Using simplify defined below:

expr // simplify

(G3 + G4 + G5 + G7 + G8) P2 + (G1 + G3 + G4 + G5 + G7 + G8) P5 + (G1 + G2 + G4 + G5 + G6 + G8) (P3 + P6)

In LaTeX :

$$(\text{P3}+\text{P6}) (\text{G1}+\text{G2}+\text{G4}+\text{G5}+\text{G6}+\text{G8})+\text{P5} (\text{G1}+\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8})+\text{P2} (\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8})$$

(expr // simplify) == desired

(*True*)

Code:

(code unpacked and mostly explained below)

Note: The code checks that the expression is a sum containing powers or multiplications and aborts otherwise.

Note: this version of the code uses ReplaceRepeated and so it might end up in an infinite loop for some expressions.

I changed the code to use Bob Hanlon's replacement rule as my idea looks silly now. The beginning of the code that did not involve finding multiple occurrences of a factor is the same. The previous version of the code is given at the end of this answer

Clear[simplify];
simplify[expression_]:=
Module[{check,var,tocollect,
simplified},
(******************************)
(* Begin check (End check below) *)
(* 
check that the expression is a 
sum of products or powers
 *)
check= Head[expression]===Plus && 
(List@@expression//AllTrue[#,MatchQ[_Power | _Times]]&);
If[check===False,
    Print["simplify is not adapted to this structure"];
    Abort[]
];
(* End check *)
(**********************************)
var=(expression//Variables);
tocollect={Count[expression,#*_],#}&/@var
            //MaximalBy[First]
            //Map[Last];
simplified=Collect[expression, tocollect, Simplify]
//. a_b_+c_b_:> (a+c)*b
]

Explanation

Step 1

Find the variables:

var = (expr // Variables);

Step 2

Find which variables occur the most in the multiplications:

{Count[expr, #*_], #} & /@ var // Sort

{{2, G2}, {2, G3}, {2, G6}, {2, G7}, {3, G1}, {4, G4}, {4, G5}, {4, G8}, {5, P2}, {6, P3}, {6, P5}, {6, P6}}

(included transposition to reduce displayed height )

$$\left( \begin{array}{cccccccccccc} 2 & 2 & 2 & 2 & 3 & 4 & 4 & 4 & 5 & 6 & 6 & 6 \\ \text{G2} & \text{G3} & \text{G6} & \text{G7} & \text{G1} & \text{G4} & \text{G5} & \text{G8} & \text{P2} & \text{P3} & \text{P5} & \text{P6} \\ \end{array} \right)$$

Step 3

Collect the variables that occur the most in multiplications:

simplified = Collect[expr, {P6, P5, P3}, Simplify]

(G3 + G4 + G5 + G7 + G8) P2 + (G1 + G2 + G4 + G5 + G6 + G8) P3 + (G1 + G3 + G4 + G5 + G7 + G8) P5 + (G1 + G2 + G4 + G5 + G6 + G8) P6

$$ \text{P3} (\text{G1}+\text{G2}+\text{G4}+\text{G5}+\text{G6}+\text{G8})+\text{P6} (\text{G1}+\text{G2}+\text{G4}+\text{G5}+\text{G6}+\text{G8})+\text{P5} (\text{G1}+\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8})+\text{P2} (\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8}) $$

Step 4:

factor common factors with //. a_*b_+c_*b_:> (a+c)*b (using Bob Hanlon's method instead of mines in the previous code )

Comparing with Simplify on random expressions

List of variables:

vars = Array[m, 30];

Test expression:

expression = RandomChoice[vars, 20] . RandomChoice[vars, 20];

Comparison between Simplify and simplify on this example:

simplified1 = expression // Simplify;
simplified2 = expression // simplify ;
simplified1 // LeafCount
simplified2 // LeafCount

Simplify : 94

simplify : 82

Check:

simplified1 == simplified2 // Simplify

(* True *)

Statistics :

expressiontable = 
  Table[RandomChoice[vars, 20] . RandomChoice[vars, 20], 20];
simps1 = simplify /@ expressiontable ;
simps2 = Simplify /@ expressiontable;
simps1 == simps2 // Simplify

(* True *)

LeafCount /@ simps1 // Mean // N
LeafCount /@ simps2 // Mean // N

Average complexity using simplify: 76.3

Average complexity using Simplify: 94.

Previous version of the code

Clear[simplify];
simplify[expression_]:=
Module[{check,var,to⎵collect,
simplified,duplicated,x,renaming⎵rule,
renaming⎵rule⎵inversed,new⎵variables},
check= Head[expression]===Plus && 
(List@@expression//AllTrue[#,MatchQ[_Times]]&);
If[check===False,
    Print["simplify is not adapted to this structure"];
    Abort[]
];
var=(expression//Variables);
to⎵collect={Count[expression,#*_],#}&/@var
            //MaximalBy[First]
            //Map[Last];
simplified=Collect[expression, to⎵collect, Simplify];
(* Find subexpressions that occur
 multiple times *)
duplicated=simplified
//Level[#,{2}]&
//Gather
//Select[Length@#>1&]
//Map[DeleteDuplicates]
//Flatten;
Collect[simplified, 
        duplicated,
        Simplify]
]

The major change difference with the newer version is that it explicitly collects sub expressions that occur more than once.

Explanation:

One could maybe use

Experimental`OptimizeExpression

(see for example Common subexpression from two expressions )

to find common sub expressions in the expression above but instead I consider a more simple approach for this kind of structure:

simplified // Level[#, {2}] & // Tally

{{G3 + G4 + G5 + G7 + G8, 1}, {P2, 1}, {G1 + G2 + G4 + G5 + G6 + G8, 2}, {P3, 1}, {G1 + G3 + G4 + G5 + G7 + G8, 1}, {P5, 1}, {P6, 1}}

$$\left( \begin{array}{cc} \text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8} & 1 \\ \text{P2} & 1 \\ \text{G1}+\text{G2}+\text{G4}+\text{G5}+\text{G6}+\text{G8} & 2 \\ \text{P3} & 1 \\ \text{G1}+\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8} & 1 \\ \text{P5} & 1 \\ \text{P6} & 1 \\ \end{array} \right)$$

G1 + G2 + G4 + G5 + G6 + G8 appears twice we can collect that term:

Collect[simplified , G1 + G2 + G4 + G5 + G6 + G8]

$$(\text{P3}+\text{P6}) (\text{G1}+\text{G2}+\text{G4}+\text{G5}+\text{G6}+\text{G8})+\text{P5} (\text{G1}+\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8})+\text{P2} (\text{G3}+\text{G4}+\text{G5}+\text{G7}+\text{G8})$$

Nice. A suggestion, I do not think it is good idea to use [Ellipsis] in variable names. It actually makes it little harder to read. So instead of renaming...rule you could use CamelLettering as in renamingRule. I also prefer under score letter for long variable names to separate the name, but that is not possible in Mathematica. Replacing _ that with ... makes it harder to read as looking at 3 dots is not same as looking at one underscore letter. It is up to you ofcourse, just a suggestion. — Nasser, Nov 25 '22 at 01:08
@Nasser It's interesting to see any opinion on the legibility of \[Ellipsis] in particular when sharing code. I would be interested in a survey on Ellipsis because I feel as though it goes both ways from seeing comments on stack exchange. For me camelCase scratches me in a sensitive spot because of the asymmetry and it is dense to read. I have been trying a few. Some of my favorites so far are : [UnderBracket] (esc + u[ esc), ˘ (esc+ bv+esc) and Ellipsis but I am leaning away from Ellipsis and more towards UnderBracket — userrandrand, Nov 25 '22 at 01:18
Maybe it would be best to wait a bit more before accepting the answer. I am curious as to why this is not the default behavior of Simplify and whether there is some complexity rule that implements some of the steps I took. — userrandrand, Nov 25 '22 at 01:20
Maybe it would be best to wait a bit Sure, as you wish. WIll unaccept and wait few more days. But I like your answer as well Bob's answer. It is always hard to pick from good answers which to accept. — Nasser, Nov 25 '22 at 01:22
@Nasser I removed the renaming part, I realized it was not necessary and I could collect directly a sum of terms. I also changed ... to Underbracket. It looks nicer in mathematica than here to me. — userrandrand, Nov 25 '22 at 01:29
Bob's /. a_*b_ + c_*b_ :> (a + c) b rule is what I should have done in my code instead of searching for terms that appear more than twice. — userrandrand, Nov 25 '22 at 01:32
@Nasser I included Bob's replacement rule in the code (I used ReplaceRepeated so that it would simplify larger expressions too). Now other than the check at the begining it's a 3 liner. — userrandrand, Nov 25 '22 at 01:49
What? \[UnderBracket] can be converted by SE editor? I thought it can't be… — xzczd, Nov 25 '22 at 01:58
@xzczd Maybe there was an update. We probably use the same editor from Haluritan. — userrandrand, Nov 25 '22 at 02:00

David G. Stork · Answer 3 · 2022-11-25T06:34:40.153

The HornerForm of a polynomial is a simplification that minimizes the number of arithmetic operations in the evaluation of that polynomial. Thus:

expr = G1*P3 + G1*P5 + G1*P6 + G2*P3 + G2*P6 + G3*P2 + G3*P5 + G4*P2 + G4*P3 + G4*P5 + G4*P6 + G5*P2 + G5*P3 + G5*P5 + G5*P6 + G6*P3 + G6*P6 + G7*P2 + G7*P5 + G8*P2 + G8*P3 + G8*P5 + G8*P6;
HornerForm[expr]

(G4 + G5 + G7 + G8) P2 + (G2 + G6) P3 + G7 P5 + G3 (P2 + P5) + G2 P6 + G6 P6 + G1 (P3 + P5 + P6) + G4 (P3 + P5 + P6) + G5 (P3 + P5 + P6) + G8 (P3 + P5 + P6)

LeafCount[%]

(* 51 *)

This is superior to:

LeafCount[expr // FullSimplify]

(* 57 *)

but inferior to versions that require a lot of insight from the user.

How can I obtain this simplification of an expression?

3 Answers3

Usage example

Code:

Explanation

Comparing with Simplify on random expressions