In C, why limit || and && to evaluate to booleans?

Question

In some languages, e.g. Ruby, an expression like:

val = call_fn() || DEFAULT_VAL;

will set val to the results of call_fn() if it is truthy (non-zero, non-NULL, etc), or to DEFAULT_VAL otherwise. This can be really useful.

In C, the result of the || operator is always a boolean value (1 or 0).

Is there a strong argument for C limiting the result to a boolean value (other than tradition)?

Addendum

Using some gnu extensions, a macro that approximates the behavior I'm looking for could be written as:

#define OP_OR(a, b)                                 \
    ({                                              \
        __typeof__(a) _a = (a);                     \
        _a ? _a : b                                 \
    })

This evaluates to a iff it is truthy (i.e. non-0 for numeric values, non-NULL for pointers, not '\0' for chars), otherwise it evaluates to b. The __typeof__ line guarantees that a is evaluated exactly once.

IMO, the desire for this is actually a direct result of the language lacking a nice optional type. This would be useful for not just that, but other things as well. For example, you could store optional integers in a container and you don't need to worry about whether 0 means failure or just the usual meaning of 0. You could also have types which aren't "truthy" that can be directly used with optional. You can also have "optional optional" things (which could be an intermediate result, where there are two different kinds of failure that can happen which we want to distinguish) — David, Nov 03 '22 at 06:52
I would think this is simply caused by the operands being of potentially different types, and C does not support values having multiple types, so the result of the operator must be a specific type (int in C). ruby does not have this issue, as it isn't strongly typed. — Remember Monica, Nov 04 '22 at 10:17
@RememberMonica, I don't see why that would be an issue. In C, 3 + 4.0 is also valid, despite the operands having different types. This is solved by definining type conversion rules. And this does not explain why int x = 0 || 3 would result in x being 1 instead of 3, even though x, 0 and 3 are all integers. — wovano, Nov 04 '22 at 10:20
The first sentence is kinda misleading. In Ruby, 0 is truthy. — Eric Duminil, Nov 04 '22 at 10:52
In haskell, this is called the Alternative: https://en.wikibooks.org/wiki/Haskell/Alternative_and_MonadPlus — Agnishom Chattopadhyay, Nov 04 '22 at 15:45
They actually enumerate to the ints 1 and 0, respectively, not to booleans. This distinction is important; there’s often conditionals used in arithmetics, or in funny stuff like printf("%c\n", "yn"[condition]); (simple example to show the idea, I’ve seen good use cases over the years). For your scenario, you have the ternary operator, best with a GCC extension: val = call_fn() ? : DEFAULT_VAL; — if you don’t have GCC, you need v = x ? x : y; but if x has side effects like in your example, store it temporarily: val = call_fn(); val = val ? val : DEFAULT_VAL; • can make this a macro — mirabilos, Nov 04 '22 at 18:31

score 21 · Answer 1 · edited Nov 03 '22 at 14:03

21

Boolean types and Boolean expressions are the crux of the logical formalism and the latter is ubiquitous in programming. C was lacking them in the beginning, yet another design flaw of this language (took like twenty years to be fixed!).

Tinkering a boolean type with integers is not just a conceptual mistake, it is a source of problems, such as comparisons not behaving as expected.

I don't know about the semantics of && in Ruby, but presumably it has to make an asymmetric choice between the two values in case of two non-zeroes.

This said, the ternary ? operator allows you constructs like

val= call_fn();
val= val != 0 ? val : DEFAULT_VAL;

Independently of this answer, I don't see the trick

val = call_fn() || DEFAULT_VAL;

with a good eye, as it forces you to reserve the value 0 to somehow indicate "do not use this returned value" and cannot work with functions that can legitimately return 0. I'd prefer a function that returns DEFAULT_VAL directly.

edited Nov 03 '22 at 14:03

Trang Oul

101
3

answered Nov 02 '22 at 16:10

6

You can actually (and unfortunately) write call_fn() ?: DEFAULT_VAL. E.g. 5 ?: 3 evaluates to 5, and 0 ?: 3 evaluates to 3. – Dmitry Nov 03 '22 at 00:23
16

@Dmitry: Implicit reuse of the condition of a ternary as one of the values is a GNU C extension. https://gcc.gnu.org/onlinedocs/gcc/Conditionals.html . In ISO C, you'd need to capture the return value in a temporary variable. (Or repeat the function call, including any side effects.) Unless ISO C added that in some later version. – Peter Cordes Nov 03 '22 at 06:25
1

IIRC, Perl's || works the same way, returning the first operand unchanged if it would be true in a boolean context (like if you did if(call_fn())). Otherwise return the 2nd operand, whatever it is. This semantic makes sense when you consider the short-circuit evaluation rule; it's not commutative when the operands are expressions with side-effects. – Peter Cordes Nov 03 '22 at 06:33
1

@PeterCordes: such operators are arithmetic (just like the bitwise ones), not boolean. They should not be first class citizens. – Nov 03 '22 at 08:09
30

I take issue with your C bashing. Yes, booleans are "the crux of the logical formalism"; but that's not what C was designed for. It is not a scientific language built for conceptual purity but an engineering language built to implement an operating system as efficiently as is possible in a high-level language. It is very closely aligned with the generated machine code, in many places so close that it is not much more than a macro assembler. Treating ints a booleans is exactly what the CPU does, and it's conceptually really elegant -- for an engineer ;-). – Peter - Reinstate Monica Nov 03 '22 at 10:41
2

@Peter-ReinstateMonica: I am an engineer. I have practiced and suffered C++ for ages. I acknowledge that for systems programming C/C++ are still irreplaceable. – Nov 03 '22 at 13:10
6

FIY, only false and nil are falsy in Ruby, every other object is truthy (e.g. 0, 1, empty string, empty array, ...). It's different in Python, in which 0, empty string, empty array, empty dict are falsy. – Eric Duminil Nov 03 '22 at 14:04
"Tinkering a boolean type with integers is a conceptual mistake" - sure, but that would preclude the operators being overloaded to either bool || bool: bool or int || int: int – Bergi Nov 04 '22 at 02:41
IIRC, Ruby's && and Python's and work like C's X ? Y : X, except that X is only evaluated once. – dan04 Nov 04 '22 at 18:57
@dan04: very funny. Didn't you observe that I took care to assign the expression to val ? – Nov 04 '22 at 19:00

IMSoP · Answer 2 · 2022-11-07T18:27:07.473

An operator can be described in terms of three things:

The accepted range of values (type) of each of its operands
Its behaviour
The possible values (type) of its result

The simplest "or" operator is defined as having two operands and a result all with values "true" or "false", i.e. a "boolean" type. With such a definition, the result of 42 || 69 would be undefined, just as the value of "hello" / "world" is undefined. The result of any valid pair of operands would always be either "true" or "false", never another value.

C evolved gradually from a language called B, which had no distinction between types, so all operators were defined over the range of binary values that could be contained in a particular memory location. For operators that would normally require "true" and "false" values, B defined a value of all zero bits as "false", and any other value as "true". This carried over into early versions of C, where the || operator was defined† for any pair of integers, with the same rules. However, the result was still defined as only having two possibilities: 0 for false, 1 for true.

Other languages have generalised the operator further, and said that the range of results also has more than two values: the representation of "false" is fixed, but "true" is replaced by a direct copy of one of the inputs.

We can speculate why early versions of C didn't generalise the result in that way:

A fixed result may be more efficient to implement than a memory copy
The possibility simply didn't occur to the language's designers at the time

More clearly, we can answer why C didn't later adopt a different definition: it could break existing code. Consider the following snippet of code:

c = a || b;
z = x || y;
switch ( c + z ) {
    case 2: ... // both c and z are true
    case 1: ... // one or other is true
    case 0: ... // neither is true
}

This code relies on the guaranteed range of outputs from the || operator. If a || b evaluated to 2, the wrong branch would be entered; if it evaluated to 42, none of the branches would be entered.

In general, this is similar to the rules around covariance and contravariance:

It is safe to widen the input, e.g. to define a result for "hello" / "world" if that was previously forbidden or Undefined Behaviour
It is safe to restrict the output more narrowly than previously guaranteed, e.g. to add a guarantee that a certain operator will never evaluate to NaN
It is not safe to narrow the input, and forbid code that was previously acceptable
It is not safe to widen the output, and produce values that code was not previously expecting

† - According to Dennis Ritchie's description of the language's development, B only had the bitwise or operator, |. That gives the same result as a boolean or with the above definition of "true" and "false", because 0 | 0 yields 0, and any other operands yield a non-zero result. The downside is that you need to examine both operands to produce a result, whereas a boolean or can "short-circuit" if the first argument is true. The separate || operator was added to early C as a replacement for a special case that allowed | to short-circuit in conditional statements, along with even more complex rules for &.

If the operands of || and && were limited to integers, having the operator yield whichever operand was evaluated last would be faster and easier than having to yield 1 for all non-zero scenarios. Having it convert all operands to integers by an implied "is not equal to zero" operation, however, was probably the easiest way of accommodating non-integer values. — supercat, Nov 03 '22 at 19:57
@supercat I'm not sure why that would be the case; setting a memory location to a constant seems like a simpler operation than copying one memory location to another. Also, I don't think the original implementation had any concept of "converting to integer", it was all in terms of binary values in memory, and "is memory location all zeroes?" is a standard concept in a lot of instruction sets. Converting to integer is a later rationalisation once the language introduced more static typing features. — IMSoP, Nov 03 '22 at 22:09
The way a typical C compiler works is to have operations that evaluate an expression place the result in a register, so something having "expr1 || expr2" yield "expr1" if it's non-zero and otherwise yield expr2 would be "compute expr1. If non-zero, branch to skipExpr2. Compute expr1. skipExpr2:". The way || is defined in C ends up requiring something more like "compute expr1. If non-zero branch to yep. Compute expr2. If zero, branch to nope. yep: Load 1 into result register. nope:". — supercat, Nov 03 '22 at 22:16
Otherwise, as a general observation, setting a memory location to a constant may be cheaper than copying a memory location to another in cases where no register is known to hold the contents of that location, storing the contents of a register to memory is generally cheaper than storing a constant, though some processors have a specialized 'store zero' instruction. — supercat, Nov 03 '22 at 22:20
@supercat: Also, if the result is not actually used (or only used in a boolean context), the optimizer is free to elide the register store altogether (so long as the resulting code behaves the same). — Kevin, Nov 03 '22 at 23:17
@supercat That does sound reasonable (although remember we're talking about the historical behaviour of the very first C compiler, not the typical behaviour of one even a few years later). I did say that section was very appreciative, and that still leaves open the "just didn't think of it" possibility - they might just have assumed that limiting the result to one of two values was the natural definition of that operator. — IMSoP, Nov 04 '22 at 08:49
C has been (all) statically typed at least from the time of the first edition of K&R (1978). Until C99, the types of some objects and functions could be determined implicitly, but that does not make them any less static. — John Bollinger, Nov 04 '22 at 13:36
@IMSoP: I suppose another issue may have been that if one has a construct like int1 = expr1 || expr2;, generating code that ends up branching to one of two labels based upon whether the overall condition is true or false, and then having code at those labels uncondiitonally load 0 or 1 into R0 before storing it to int1, may be easier than trying to identify cases where R0 would already have a non-zero value that could be stored directly. — supercat, Nov 04 '22 at 16:58
@JohnBollinger I've reworded that sentence to more specifically say that it was B that lacked types, and C added them gradually, rather than talking about "static typing" per se. I also added a footnote about when exactly || appeared, which I found interesting. — IMSoP, Nov 07 '22 at 18:20

Peter - Reinstate Monica · Answer 3 · 2022-11-03T16:36:40.017

Edit: After re-reading the question and wovano's comment I think my answer below misses the point because the suggestion to return the non-zero operand or zero if there isn't any would not be overly complicated or abstract if the operands stayed restricted to scalar types. I let the answer stand because I still think it's a good text ;-).

The reason is that C, by design, is a really simple language that provides operations closely aligned with what machine code of machines like the PDP11 offered. You posted the question in the computer science stack exchange, but computer science is not what C was designed for or from which it came. C is the creation of engineers writing an operating system and hitting a limit of what was maintainable and portable in assembler. They were in a unique spot because they had the insight, skill, time and attitude to backtrace and write a tool for that: C. They seized the day and, one may say without exaggeration, changed the world.

C is a language created to solve an engineering problem.

Because of the memory and computing limits the language had to be dirt simple, so that even a simple compiler could produce good code.

The language had to be closely aligned with what the machine offered.

While I'm not familiar with PDP11 machine code I bet you there is a test for zero, an AND operation on registers, and a conditional jump. On that level there is no concept of a boolean; everything is integers.

This is what C mimics. The machine operation has a C equivalent which makes it totally obvious what the compiler would produce. (The same "everything is an integer" concept was used for pointers.)

In order to appreciate the differences of C's "engineering" approach to a computer science approach, have a look at Algol68. That is a (seriously) great language designed without concerns for the resources the compiler or code would need. It is also instructive to compare the language descriptions^1,2, a contrast which more than highlights the pragmatic approach of the C creators.

As others remarked, a shortcut for selective evaluation dependent on a condition exists with the ternary ?: which is about as complicated as you get with C.

¹ For Algol68: https://www.softwarepreservation.org/projects/ALGOL/report/Algol68_revised_report-AB.pdf

²For C: The first edition of Kernighan/Ritchie: The C Programming Language

FORTRAN introduced the LOGICAL data types (and operators) already in 1962. Algol had a boolean type and Pascal naturally adopted it. It seems that K&R did not learn the lessons. Unfortunately, they did influence innumerable language designers later. — , Nov 03 '22 at 11:49
@YvesDaoust: The behavior of && and || doesn't map to anything I know of in the PDP-11 nor most other machines as well as would an integer operator that evaluates the left operand and then either keeps the resulting value or evaluates the right operand. Further, a specification for a "logical" type designed to facilitate machine implementation would say that reading a logical-value object in which zero is stored will yield zero, reading a logical-value object in which an odd value was stored will yield an odd value, and each read in other cases would yield an unspecified value. — supercat, Nov 03 '22 at 20:03
Fun fact: MIPS and RISC-V can materialize integer 0 / 1 directly, according to a register being 0 or non-zero: sltu $dst, $zero, $src does 0U < x which is equivalent to 0U != x. It doesn't directly implement A||B, though; you'd still have to bne (branch on not-equal) to select which input to booleanize, or if there aren't side-effects, booleanize both sides and bitwise AND or OR. But yeah, most ISAs need at least two instructions to booleanize an integer from zero / non-zero to 0 / 1. Producing a 0 or 1 isn't the most efficient thing they could have chosen in terms of asm. — Peter Cordes, Nov 04 '22 at 07:37
@PeterCordes Ah, you are saying that simply keeping one of the values (which may be 0 in case both are) would be faster? Interesting. So my entire argument is wrong. Bummer :-). — Peter - Reinstate Monica, Nov 04 '22 at 08:27
Yes, I think that's true. If you're using it as the controlling expression for if() or while() etc. they're equivalent. Otherwise if assigning the result to a value of the same type as both operands (which in C don't actually have to be the same type), yeah, branch on the first operand and then you have one more mov to do. (Or two if you need to load+store if the 2nd operand wasn't already hot in a register. Or of course if it was an expression that needs evaluating then you eval it, but then you use the value directly instead of doing the extra step of e != 0 or !!e booleanizing) — Peter Cordes, Nov 04 '22 at 09:37
"You posted the question in the computer science stack exchange, but computer science is not what C was designed for or from which it came." As the OP, I didn't know if this was the right place for the question. But I'm gratified -- and educated -- by all the discussion this has generated! :) — fearless_fool, Nov 04 '22 at 16:10

score 5 · Answer 4 · answered Nov 03 '22 at 16:14

The operand of if, while, etc. isn't limited to integer types, but will also work with floating-point or pointer types. For a value x of any integer, floating-point, or pointer type, if(x) is equivalent to if ((x) != 0), with the literal zero being interpreted as a value of x's type. Making a construct like if ((x) || (y)) work with any combination values x and y that could be used as conditionals individually, without regard for whether the types would be compatible with each other, would require that the || operator be capable of yield a value of a type that was incompatible with at least one operand.

Note that the behavior of a construct like:

double d = 0.125;
char *p = "Hello there!";
if (d || p)
  printf("The values are %5.3f, %s!", d, p);

is quite different from what it would be if the condition were written as

if ((int)d || (int)p)

The value of (int)d would be zero, even though d is regarded as non-zero by the || operator, and the while the value of (int)p probably wouldn't be zero, there are some platforms where pointers are longer than integers, and where the value of that expression might be zero by chance.

While there probably would have been no difficulty saying that the && operator be syntactic sugar for leftOp ? rightOp : 0, I can't think of any nice way of describing the behavior of || that would allow it to be used in scenarios where the types of the operands differ, but wouldn't introduce behavioral inconsistencies in such cases.

Given that C is in fact generally parsed outside-in, and efficient processing of || and && operators would require knowing whether they are being used as part of a top-level conditional construct, it might have been practical to specify that when the result of || or && is coerced to a "conditional test", the operands would be likewise coerced, and that the operands of those operators must be integers in all other cases. A similar principle could also have been usefully applied to constructs like uint1 = ushort1*ushort2; on implementations that don't use quiet-wraparound two's-complement semantics: since the result of the multiplication will be coerced to unsigned int, the operands should be likewise coerced and then multiplied as that type. Unfortunately, this kind of thing is easier to do than to formally specify, and thus language specs make no accommodation for it.

Interesting point with the integer cast not being the "truth value". If I read you correctly you want to say that finding a common type for x and y the way it is necessary in the ?: ternary (under the OP's proposal that would be necessary) compromises the condition outcome the way you show. One could avoid inconsistencies though by separating the issues: For the condition, the operands are compared against zero as always, without aligning the types; for the value of the expression they are then "aligned" to a common type. — Peter - Reinstate Monica, Nov 06 '22 at 05:54
Oh, and in the ternary b ? x : y the "truth value vs. aligned value" issue does not occur because the operator is designed to separate concerns, at the cost of possibly needing to repeat the condition (p ? p : defaultval, even though p ? *p : defaultval is more common, in fact so common that C# has the ?? operator for a similar purpose). — Peter - Reinstate Monica, Nov 06 '22 at 06:00

score 4 · Answer 5 · edited Nov 03 '22 at 17:57

The || and && operators in Ruby are closer to basic process algebra composition operators than to boolean connectives. The languages just happen to use the same symbols for these two different sorts of operators due to the lack of glyphs on the keyboard.

Keeping this in mind there are important distinctions between the || operator in Ruby and alternative composition. One such distinction is that || in Ruby is not commutative.

The analogy stands in the sense that in basic process algebra one can simulate boolean algebra but not vice versa.

Coming now to the original question. The C programming language exists as a language on its own and not as a subset of C++ precisely because it has a relatively clean semantics and it is relatively easy to tell how an expression is to be evaluated. If we instead allow things like -1L || 4U or 0.0 || 42 with numeric type promotion, this would make compilation results less predictable.

Finally as it was mentioned before the ternary conditional operator x>0 ? x : y in C would be an almost perfect drop in replacement for the proposed numeric choice operator x || y.

C || isn't commutative in general either, but it is when neither operand has side-effects. But it wouldn't be if the value it produced was always one of the operands, so fair point. — Peter Cordes, Nov 04 '22 at 07:05
@PeterCordes what I wanted to say is that the alternative composition operator is commutative while || is not. — Dima Chubarov, Nov 04 '22 at 09:02
Interestingly, your examples produce the intuitive results. supercat shows examples where that may not be the case, for example with float values 0 < x < 1. — Peter - Reinstate Monica, Nov 06 '22 at 06:10

score 4 · Answer 6 · answered Nov 04 '22 at 18:12

In many Lisp dialects, there is usually an or operator which returns the leftmost value that is not nil, and stops evaluating the rest.

GNU C does have an operator which evaluates A, and returns A if A is nonzero, otherwise B, namely: A ?: B.

Why, in C, we don't want || to have this semantics because C is statically typed in a particular way. The expression A || B has to have a type. The type has to be derived from the types of A and B. If it has the semantics of evaluating either to A or B, that then requires A and B to be compatible. They have to have the same type, or else one has to convert to the type of the other.

Suppose P, Q and R are pointers to different object types, and thus incompatible. You cannot write P || Q || R under this semantics; it violates a type constraint. Under the C semantics of || as they are, the above is allowed: it yields true if any of the three pointers is non-null.

In terms of type theory, such an operator calls for a sum type: its type wants to be the sum type of the constituent expressions. Thus if we have W || G where W is a Widget and G is a Grommet, the resulting type is Widget | Grommet: a type that can be either.

This is why the or operator works in Lisp dialects, and its equivalentin other dynamic langauges: they are dynamically typed, and so every value is potentially a sum type of everything, making the type problem go away into run-time. (or W G) has no problem yielding a value that is either a Widget or a Gadget, because that dynamism is already everywhere.

They wouldn't necessarily have to be compatible with each other, just all separately compatible with the consumer of the value. That's how ? : works currently, IIRC. e.g. void *first_non_null = P || Q || R could work, as could if( P || Q || R ). Or intptr_t boolval = (intptr_t)(P || Q || R). But C didn't have void* initially, so indeed, it would make it cumbersome to use with different pointer types, although you could presumably use it in an if. Of course you could (P!=0 || Q!=0 || R!=0) to get an integer with value 0 or 1. — Peter Cordes, Nov 05 '22 at 20:40
@PeterCordes That's not what ISO C says. The type of A?B:C is synthesized from the types of B and C according to a small set of rules. If B and C are numeric then the type is determined similarly to that of expressions B + C. If they are pointers, they have to be compatible. If they are struct/union types, they must be the same type. The consumer is not considered. For instance void *VP = widget_ptr || gadget_ptr could work on the basis of the pointers being independently convertible to void *; but that changes how types work in C. — Kaz, Nov 05 '22 at 20:54
Oh, apparently I was mis-remembering or guessing wrong about how ?: worked. Thanks. — Peter Cordes, Nov 05 '22 at 21:01

Davislor · Answer 7 · 2022-11-05T01:47:59.153

One possible reason is that bitwise operations on relations would give the wrong results. For example, the code

if ((printf( "hello, %s, it's", higher_power ) == 16) &
    (printf( " me, %s\n", luser ) <= 15)) {

would evaluate to false if both conditions are met, if logical operations returned the value of one of their operands, and not execute the body of the loop! And the programmer could not change & to && without potentially short-circuiting the second function call, which has side-effects. When both results are constrained to be either 1 or 0, the above works.

This doesn’t literally use && and ||; I picked a more natural example using comparison relations. The principle is exactly the same. But, in B, the predecessor language to C, there were no || or && operators, and everyone used & and | instead. This worked because false was always 0 and true was always ~0.

The choice of 1 as TRUE in C, rather than ~0, also means that the representation of TRUE is the same for every integral type, and can round-trip convert with floating-point, so treating boolean results as int values could just work. If x == y returned 0xFFFFU on a DEC PDP-11 when the types of X and Y were unsigned int, but 0xFFFFFFFFUL when the types were unsigned long int, suddenly any subexpression that widened the result of a 16-bit unsigned boolean true to unsigned long int width would have produced 0x0000FFFFUL and caused these logical operations to fail on the upper word. (This had not been an issue for B, as it was an untyped language.) With no special Boolean type, automatic conversion to and from a float would have been an issue as well. Later, although we know this was unfortunately not something on their minds at the time, the default type promotion rules would have caused even more bugs in the 16-to-32-bit transition. Migrating this code to a 32-bit VAX would have caused all the boolean operations on int values to suddenly produce 0xFFFFFFFF, not 0xFFFF.

The choice of 1 as the value of Boolean true also follows the precedent of Algol, a trendy language in the early ’70s.

The above is my own speculation. Dennis Ritchie would later write about the creation of separate logical operators: What he had to say about it in his own words:

Rapid changes continued after the language had been named, for example the introduction of the && and || operators. In BCPL and B, the evaluation of expressions depends on context: within if and other conditional statements that compare an expression's value with zero, these languages place a special interpretation on the and (&) and or (|) operators. In ordinary contexts, they operate bitwise, but in the B statement
if (e1 & e2) ...
the compiler must evaluate e1 and if it is non-zero, evaluate e2, and if it too is non-zero, elaborate the statement dependent on the if. The requirement descends recursively on & and | operators within e1 and e2. The short-circuit semantics of the Boolean operators in such `truth-value' context seemed desirable, but the overloading of the operators was difficult to explain and use. At the suggestion of Alan Snyder, I introduced the && and || operators to make the mechanism more explicit.

Their tardy introduction explains an infelicity of C's precedence rules. In B one writes
if (a==b & c) ...
to check whether a equals b and c is non-zero; in such a conditional expression it is better that & have lower precedence than ==. In converting from B to C, one wants to replace & by && in such a statement; to make the conversion less painful, we decided to keep the precedence of the & operator the same relative to ==, and merely split the precedence of && slightly from &. Today, it seems that it would have been preferable to move the relative precedences of & and ==, and thereby simplify a common C idiom: to test a masked value against another value, one must write
if ((a&mask) == b) ...
where the inner parentheses are required but easily forgotten.

In Section 2.6 of The C Programming Language, Kernighan and Ritchie specify:

By definition, the numeric value of a relational or logical expression is 1 if the relation is true, and 0 if the relation is false.

Since then, C and all of its successor languages have been boxed in by backward compatibility.

Postscript

Since one commenter requested an example of code involving literal && and || that breaks when a boolean sub-expression either takes the value of its truthy argument, as you ask about, or promotes from 0xFFFFU to 0x0000FFFFUL, as it would have if Ritchie et al. had represented TRUE as in B and given C the same implicit-promotion rules, here is an arbitrary one.

(function_returning_unsigned_int() && function_returning_signed_int()) ^
(unsigned_long_value || function_call()) /* 0x0000FFFFUL ^ 0xFFFFFFFFUL is truthy. */

I make no claim about how likely it is to occur, but at the time, Ritchie, Kernighan and Thompson had a large codebase that always used bitwise operators on logical expressions.

Your example with two printf calls doesn't involve || or &&. I don't think the question is proposing that the values of comparison operators could have been different, just that || could have produces its first truthy operand instead always 1 on true, and A&&B could have been !A ? A : B. (But as supercat points out, int x = 0.75 || 1.25 would be 0 from (int)0.75. I guess one could write A!=0 || B!=0 if necessary, but then it's a tradeoff in needing more code for some cases.) — Peter Cordes, Nov 04 '22 at 06:52
You could make an even more complex expression that did (A||B) & (C&&D) where some of A,B,C, or D are expressions with side effects, but that's starting to feel very contrived, like something a normal person would have done with a temp variable and/or some if/else. Also, the question is why C wasn't designed this way originally. As other answers have pointed out, this couldn't be changed later without breaking existing code that uses the 0/1 value. (But interesting history that it was ~0 in B; that would often be more useful for bit-manipulation.) — Peter Cordes, Nov 04 '22 at 07:00
@PeterCordes I picked a more natural example using different logical operations, yes. It would have been odd to have <= and == always be 1 or 0, but not ``&&and||. I think the actual reason is that, in B, there were only&and|, and programmers would pass complex conditional expressions to these bitwise operations. That worked because false was always0and true was always~0`. — Davislor, Nov 05 '22 at 00:41
@PeterCordes I expanded my answer with additional possible reasons, although I admit they are speculative, not based on Ritchie²s own recollections. — Davislor, Nov 05 '22 at 01:03
You're still kind of going beyond what the question proposed, inventing different results for == for no reason. Or begging the question by asserting that == would have to return the same result as &&. — Peter Cordes, Nov 05 '22 at 01:09
Also, 0x0000FFFF & 0xFFFFFFFF is still a non-zero value, thus still true when used with if ((A==B)&(C==D)), and also with &&. You'd only run into trouble if you were doing something like (A||B) == (C==D) or something. (Comparing truth values for exact equality is kind of like logical XOR, if your inputs are booleanized integers. If not then it's problematic.) So some expressions might need an extra !! to booleanize to 0/1 if C had originally been designed to work as proposed in the question. The B history does help explain why they weren't thinking that way, though. — Peter Cordes, Nov 05 '22 at 01:11
@PeterCordes Where they fail is when someone migrates B code or writes B-style code using bitwise operations in C. One reason to do so is to use exclusive or, and and another is to avoid short-circuiting operations with side-effects. I chose an example that happened to use different logical relations, but the principle is the same. With respect, I think you’re focusing on that irrelevant detail. — Davislor, Nov 05 '22 at 01:14
@PeterCordes To sum up, the default promotion rules would have failed to maintain the invariants that any Boolean expression evaluates to one of two values and that the bitwise operations work as expected on them. Is that the actual reason? I don’t have a statement from K, R or T saying so. But it’s an educated guess. — Davislor, Nov 05 '22 at 01:18
I think there is a point here involving the B history and how the language designers thought about booleans, but I don't think your answer is making it very well. First of all, if((x <= 16) & (y == 15)) does work even if they return all-ones for some reason, even though C doesn't do that, B does. if(int_expression) runs the if body for any non-zero value of the expression, it doesn't have to be all-ones. So your chosen example isn't demonstrating any problem at all, and claims without proof that it's somehow similar to something involving && or ||. — Peter Cordes, Nov 05 '22 at 01:20
@PeterCordes I will make one further edit giving a specific example of code that would have failed if a boolean expression widened to long could have value 0x0000FFFFL, and then move on. Respectfully, you admit that it’s possible to write code that generates the same behavior with bitwise operators, which do not use truthy/falsy. And K/R/T had a large codebase written in B which did. That I chose an example using comparions is unimportant. — Davislor, Nov 05 '22 at 01:24
The question is about why || and && can't return one of their inputs. Yes, with those semantics you might need an extra !! if you want to use the result of one as an operand for & with ints, but (A&&B) & C is a bit unusual. And isn't what your answer seems to be talking about. This stuff about 0x0000FFFFL is something you made up based on the B semantics for == and <= and how they might hypothetically work in C with other things changed as well (for reasons you haven't stated). I don't find it clear why you're bringing that up, and doesn't seem to answer the question. — Peter Cordes, Nov 05 '22 at 01:30
@PeterCordes I have now provided an example of actual code that breaks, under the implicit type-promotion rules of C, if the && and || operators return either the the value of a truthy operand or ~0 for TRUE. I hope other readers will agree with me that this is pertinent. I am now, as I said, going to move on. — Davislor, Nov 05 '22 at 01:45
Ok, that final example is starting to get somewhere. But the B equivalent would have needed extra work because without &&, foo() & bar() where those functions return non-booleanized int values doesn't necessarily work. So it's not an example of breaking existing B code, or B code where & has been replaced with &&. I'm also fine with moving on, and I'm glad you posted something about how things worked in B, but I still don't think the answer you've constructed around that really answers the question, at least not very directly. — Peter Cordes, Nov 05 '22 at 01:54

score 1 · Answer 8 · answered Nov 05 '22 at 01:14

I think the direct answer is, "No, there is not a strong argument."

It seems to me that the functionality you're describing would have worked overall pretty consistently in C (if introduced at the start), and if you had been able to make your case to the creators of C at the time they might well have at least considered it on the strength of your arguments (if you had some good examples for them of why "this can be really useful").

Brendan McKay · Answer 9 · 2022-11-04T09:27:36.743

0

The question states

"In C, the result of the || operator is always a boolean."

This is misleading. The result of both || and && is type int. The value of the int is either 0 or 1.

Note that the C11 standard introduced type _Bool, and the upcoming C23 standard goes further with primitive type bool. However, in neither case is the type of || or && changed. It is still int.

This statement should always test true even if the size of an int is different from the size of a boolean:

sizeof(A || B) == sizeof(int)

Anyway, the answer to the question is that || means "or" and is the C version of "or" in earlier languages like Algol. It was never intended to mean "else". The rule that says the right side is not evaluated if the left side is true was intended as an efficiency plus an convenient way to guard the second part (such as "if (i > n || a[i] == 0)" if a[n] is the last element of the array).

Brendan.

edited Nov 04 '22 at 09:27

answered Nov 04 '22 at 05:09

Brendan McKay

161
4

3

This probably could have just been a comment on the question, pointing out that it's a "boolean-valued int", not a value of type _Bool. But as you say, C didn't have _Bool or bool for decades, so when you talk about a "boolean" in C, it's normal for that to mean a boolean-valued integer, especially if talking about C history before _Bool existed. – Peter Cordes Nov 04 '22 at 06:44
3

I don't agree the statement is misleading, since there doesn't exist a boolean type in C (there are _Bool and bool though). Boolean is just an English word for something that could be true or false (or as in C, 1 or 0). And I think the context makes it clear that it's about the value, not the type of the expression. – wovano Nov 04 '22 at 06:50
@wovano C23 does indeed have a boolean type. It is bool and it has its own rules distinct from the rules of int. Its values are restricted to the new constants false and true which are language keywords, not macros. For example, if i is int and b is bool, then "i=3; b=i; i=b;" sets i to 1. – Brendan McKay Nov 04 '22 at 07:31
2

C99 already had a boolean type (called _Bool). But my point is that when talking about "boolean logic", we're referring to logic that has two values, not necessarily to logic implemented with a data type called "boolean". So the question is why in C the statement 0 || 5 evaluates to 1 and not to 5, which is the case in languages like Ruby and Python. Your answer does not answer that. NB: the question is now updated to be more clear about this. – wovano Nov 04 '22 at 08:49

score 0 · Answer 10 · answered Nov 05 '22 at 00:21

If I understand the use here correctly it's because C languages already have another operator for that: ?:

So in C languages

val = call_fn() || DEFAULT_VAL;

would be written as

val = call_fn() ? call_fn() : DEFAULT_VAL;

or

tmp = call_fn();
val = tmp ? tmp : DEFAULT_VAL;

Depending on constness of call_fn().

The || and && operators are meant exclusively for boolean logic in C and so they condense to only the minimum definition of true and false for their output.

In C, why limit || and && to evaluate to booleans?

Addendum

10 Answers10

Postscript