An Exchange on Bayesian Inference and Formal Axiomatic Systems
Alan Baker and Paul Grobstein
(continuing a discussion begun at a meeting of the Information Working Group
of the Center for Science in Society at Bryn Mawr College)
Grobstein to Baker, 22 July 2004
Alan -
Pleased you were able to be around today (particularly), and appreciated the issues you raised/helped to raise. Which do relate to/depend on expertise in philosophy of math (that I, obviously, do not have). Two related points in particular seem to me important, worth pursuing further:
- The issue of whether and how Bayes' Theorem is derivable from some other set of axioms (ie is itself an expression of a FAS), and the implications of that, whatever the answer.
- The possible significance of the fact (I think) that Bayes' Theorem at the very least permits the use of incompressible numbers.
I think these are inter-related in the following potentially interesting ways:
Bayes' Theorem, as I currently understand it, requires as "primitives" the concepts of arithmetic operators, of current and next (but not of "time" in any continuous sense), and of "input from outside" (the "new" observation; this is less demanding than a concept of "space"). I accept your assertion that the theorem is derivable from some existing set of axioms but that doesn't I think necessarily equate to "can't create anything more than can be created by a formal axiomatic system", which is what I'm interested in. I suspect that any derivation of Bayes' theorem depends ALSO on an "indeterminacy" primitive (something that allows for "probability"). If so, that would, as I understand it, take the axiomatic system which gives rise to Baye's theorem out of the realm of "formal axiomatic systems" and make it not subject to the Godel/Turing limitations. There exists also the possibility, even if Bayes CAN be derived in a strictly formal axiomatic system, that it could instead be taken as an axiom for a new axiomatic system lacking one or more of the starting points in the system from which it can be derived and that this in turn could potentially yield a less "incomplete" deductive system.
What particularly intrigues me is the idea (perhaps incorrect?) that the history of math takes integer as the primitive, and consistency as the sine qua non, and then gets into trouble when one hits successively infinities of various sizes, incompleteness, formally undecidable propositions and incompressible numbers. I can't help but wonder what would happen if one took as primitives ALL the reals BETWEEN zero and one and a few other things (as perhaps in Baye's theorem). Might one be able to work backwards to integers in a way that would perhaps create less of a struggle between "consistency" and "completeness"?
Paul
Baker to Grobstein, with further comments by Grobstein indented, 25 July 2004
Paul,
Thanks for your message. I am definitely interested in many of the
issues concerning the pros and cons of formal axiomatic systems
which seem to have arisen in the course of the Group's discussions
over the past couple of months. With regard to the specific topic of
probability I have less background, though the question of how exactly to
interpret probability remains a philosophically "hot" topic.
The standard, and widely accepted, formal axiomatization of probability
theory is due to Kolmogorov. This "classical' theory seems to have a
similar status to that of classical logic. Bayes' Theorem can be derived
from the Kolmogorov axioms. In general, the notion of consistency in
classical logic is replaced - in probability - with the notion of coherence.
As you mentioned in discussion, agents are free to assign any
probability to any individual (contingent) proposition, but if their overall
distribution of probabilities (at a given time) is not coherent then the
axioms can be used to derive more than one different probability
assignment for some propositions.
Ah so. If I'm understanding correctly, "coherence" basically means that the probability assignments for "related" hypotheses ("statements" in classical logic) must sum to less than 1.0 (I imagine that defining "related" is a major task in its own right?). And this then assures that there is one and only one probability assignment for permitted assemblies of hypotheses (parallel to the truth or falsity of well-constructed compound statements in classical logic, yes?). Is there an accessible (to me) literature on this? Does it follow from the parallels that the Kolmogorov logic generates a Godel-like "incompleteness" as classical logic does? ie are there compound statements whose probability assignment is indeterminate?
The Bayesian approach treats probabilities as subjective (about
degrees of belief) rather than objective (about intrinsic properties of the
world). One philosophical issue concerns just what is meant by "degree
of belief." It's easy enough to see intuitively what this is, but not easy to
define it in a non-circular way. [e.g. it won't do to say that your degree of
belief measures how likely you think the hypothesis is to be true,
because this is still an inherently probabilistic notion] One popular
approach is to define degrees of belief in terms of betting quotients (this
is due to Ramsey, in the 1920's) - i.e. a degree of belief can be
expressed in terms of a ratio between a stake and a pay-off where the
pay-off occurs if the hypothesis in question is true. e.g. to say that I have
degree of belief .25 that it will rain tomorrow is to say that I would be
indifferent w.r.t. a bet which involves staking $1 and pays out $3 if it rains
tomorrow.
Think I understand both the problem and the Ramsey approach, both in spirit and in practice. Assume that the Ramsey approach does not, by itself, assure "coherence"?, that that's handled as a separate problem? FEELS to me like the K-derived probability logic stems from the same mindset as classical two-valued logic, ie not only does it have the same "problems" but has also the same "disconnect" from how the brain actually works (I think). That "degree of belief" would be a "philosophical issue" might be an indication of that. Could one not simply take "degree of belief" as something along the lines of "existence in brain of some relevant degree of order" defined re "randomness"?
On this interpretation, it is difficult to see how degrees of belief could be
irrational, let alone incompressible.
[There is also Mark's argument that the things we attach probabilities to
tend to involve ratios between discrete quantities, but this seems less
convincing for two reasons.
- We are not required to fit our degrees of belief to any particular
feature of the phenomena.
- We can make up examples where the phenomena themselves
involve irrational ratios. [e.g. the probability of a random point picked
inside a unit square falling inside a circle inscribed in the square]
Like/share your sense of limitations of Mark's argument. Had your first point in my mind; like your second and think it likely to be more compelling to Mark. I was, I freely admit, hoping the incompressibility notion might give some heft to intuitions about how things work that I have trouble describing in terms compelling to most people. So it could well be a red herring. Agree it WOULD be a red herring on the Ramsey/Kolmogorov approach. On flip side, though, I have no reason to believe that the nervous system is constrained in its basic functions to operations that would yield only rational numbers and it is demonstrably capable of conceiving not only irrationals but incompressibles as well.
One other thing which follows from Bayes' Theorem (which I think is
contrary to what you said in discussion) is that conditionalizing on
evidence that is inconsistent with an hypothesis results in a posterior
probability of zero for that hypothesis. This is because the numerator in
the likelihood ratio (i.e. P (H/e) ) is itself zero when e implies ~H.
[Of course, there is a fudge here between "The sun did not come up this
morning' and "It appears to me that the sun did not come up this
morning.' I agree that conditionalizing on the second of these would not
reduce the posterior probability of "The sun rises every day' to zero.]
I had enough trouble persuading myself that I partially understood the "likelihood factor" (and STILL don't understand what constrains it in such a way as to preclude the posterior probability from exceeding one). So I'm more than happy to receive correction/tutelage on this one. Actually, I LIKE the idea that the posterior probability can go to zero. I have for years been arguing that "science" cannot "prove" an hypothesis (an argument for which I will find Bayes' Theorem helpful) but CAN "disprove" one. The latter is important to me for other (though perhaps related) reasons, basically because its the incentive to "change the story", ie go off looking for observations in some new direction (which I think is one essential component of "science", something akin to Kuhn's "revolutions").
With respect to what Bayes' Theorem does or does not require in terms
of assumptions, I'm not entirely convinced that it requires a primitive
concept of current / next. For instance, there is nothing in principle to
stop us conditionalizing on temporally "older' evidence before we
conditionalize on more recent evidence.
Also, any "indeterminacy" primitive that is necessary for Bayes' Theorem
seems to me to already be necessary anyway for classical probability
theory (at least for the theory to be interesting). All "indeterminacy'
seems to mean in this context is that propositions about the world can
take numerical values between 0 and 1.
Hence I remain unclear why Bayes' Theorem takes us "out of the realm
of "formal axiomatic systems' " (as you put it). [We know from examples
such as fuzzy logic that formality and indeterminacy / vagueness can be
combined]
In here is, I think, the core of what makes this worth talking through/trying to clarify for me (and perhaps for you too?). Am not of course sure what Bayes had in mind, nor do I have the background to know what potentially related developments there were at the time/have been since (though am intrigued by the notion that Bayes was working BEFORE the rational/analytic mindset become dominant). What I AM interested in is whether "indeterminacy" is ONLY "that propositions about the world can take numerical values between 0 and 1". My intuition is that "indeterminacy" can usefully be seen as more than that. It is, for me, the "wiggle room" that precludes adequate understanding through the mechanism of ANY "formal axiomatic system" and that makes the next iteration fundamentally unpredictable (to some degree) from the results of the previous one. Is not just "vagueness" in understanding; is at the core of the phenomena one is trying to "understand". And if THAT's the case, then one DOES have to recognize the fundamental limits of any/all FAS's (since they are "mechanical" in the specific sense that the next iteration IS fully predictable from the prior one). And one might aspire to a formalization of the inquiry/understanding process that doesn't have those limitations. My guess is that whatever the "primitive" is that represent "indeterminacy" in classical probability theory, it doesn't have the additional character I want, but I'm prepared to be told I'm wrong about this (the issue is not dissimilar from the Copenhagen arguments re quantum theory).
You suggest that even if BT is derivable from the standard probability
axioms one could construct an alternate system which takes BT as an
axiom and drops one or more of the other standard axioms. In fact it can
be shown by logical argument that this scenario is impossible.
Let P, Q, R be the axioms of probability theory. [it doesn't matter exactly
how many there are]
By assumption {P, Q, R} |- B
Try dropping an axiom, say R, and adding B to the other axioms, so we
have {P, Q, B}
Assume that this allows us to prove something we couldn't prove in the
original theory, call it K.
So {P, Q, B} |- K and {P, Q, R} does not |- K
But B is derivable from {P, Q, R}, so {P, Q, R} and {P, Q, R, B} are
logically equivalent theories. So {P, Q, R, B} does not |- K.
But {P, Q, R, B} is a stronger theory than {P, Q, B} so {P, Q, B} does not |-
K, and we have contradicted our assumption above.
Had a suspicion there was a problem along these lines when I made the suggestion. And may be another red herring. Let's see whether its worth returning to in light of other things. What I had in mind was, I think, not quite the one the proof responds to. The idea was not to substitute Baye's theorem for an existing axiom or set of axioms but rather to use Baye's theorem as a starting point and then add additional axioms as seemed necessary.
I don't think that I have a complete handle on your suggestion
concerning integers and reals. Perhaps it is because I am not sure
what you mean (in this context) by "taking as primitive." In terms of
formal systems, we can give an axiomatization of arithmetic (viz. Peano
arithmetic) where the only primitives are "0' and the successor relation.
So none of the other natural numbers are "primitive' in this formal
sense. Real numbers are typically defined in terms of sequences of
rational numbers, which are in turn defined in terms of ratios of integers.
>From this perspective, are you suggesting an axiomatization of the reals
which does not proceed in this cumulative way from the integers? I'm
pretty sure this can (and has) been done, by focusing directly on the
defining properties of the real numbers, such as being ordered, dense,
closed under polynomial composition, etc. But I am unclear how this
might (even in principle) avoid the Godel-style incompleteness results,
since the resulting theory will be strong enough to embed integer
arithmetic.
Yeah, had precisely in mind "an axiomatization of the reals which does not proceed in this cumulative ways from the integers". And, here too, may be a red herring (or something already fully explored). I'm not though entirely dissuaded by your notion that doing so would not "avoid the G-style incompleteness results, since the resulting theory would be strong enough to embed integer arithmetic". It seems to me at least possible (and if possible desirable) that it would further illuminate G-incompleteness by showing it to hold in one realm (integer arithmetic) of a larger space in which it does not generally hold.
On a separate note, relating to Godel and to Thursday's discussion, I
think that it is important not to lose sight of the limited scope and
strength of Godel's result. He's not saying that there are truths which
cannot be captured in any formal system, but that any given formal
system will unavoidably leave out some truths. Also Godel does not
show that there are any interesting true arithmetical claims which are
unprovable (i.e. of independent interest to mathematicians, aside from
their unprovability). There has been considerable work done over the
past twenty years by logicians and mathematicians to try to find
"interesting' arithmetical claims which are not provable in Peano
arithmetic (PA) but are provable in Zermelo Frankel set theory (ZF).
Increasingly interesting and natural claims have been found, but nothing
which approaches the simplicity or intuitiveness of (say) Fermat's Last
Theorem, or the Goldbach Conjecture.
Do understand that Godel incompleteness is NOT "absolute", that what one can't reach in one formal system CAN be reached in another. In fact, one of the things that I think is important in all this is that FAS's themselves GENERATE the unreachable (for themselves). Would need some tutoring on PA, ZF etc if it looks relevant in the long run, but had the impression that Chaitin was quite directly concerned with establishing "interesting arithmetical claims" (his omega number and its relation to diophantine equations) not provable in FAS's as he understands them at least.
Baker to Grobstein, 29 July 2004,
referring by indent italics to previous Grobstein
Paul,
I think that your are correct to emphasize the parallels between classical
logic and the standard axiomatization of probability. One further parallel
- often overlooked - is that classical logic also gives you "freedom' in
initial assignments. From the point of view of pure logic, each contingent
atomic proposition can be assigned either "true' or "false." Below I have
some more specific responses to your comments from the last
message.
Ah so. If I'm understanding correctly, "coherence" basically means
that the probability assignments for "related" hypotheses
("statements" in classical logic) must sum to less than 1.0 (I
imagine that defining "related" is a major task in its own right?).
And this then assures that there is one and only one probability
assignment for permitted assemblies of hypotheses (parallel to the
truth or falsity of well-constructed compound statements in classical
logic, yes?). Is there an accessible (to me) literature on this?
Does it follow from the parallels that the Kolmogorov logic generates
a Godel-like "incompleteness" as classical logic does? ie are there
compound statements whose probability assignment is
indeterminate?
The point about incoherence, for the probability calculus, is not that you
have a choice of different probabilities you can assign (since this is also
true for coherent systems), but that you can derive - within the
(incoherent) system - two (or more) different probability assignments
for a single proposition. Hence the analogy with inconsistency in
classical logic - where you can derive two truth-value assignments (true
and false) for a single proposition. For example, say I assign probability
25 to the proposition that it will rain tomorrow. If I am working with an
incoherent set of probability axioms then I will be able to derive, within
the system, a second, distinct probability for this same proposition, say
45.
It's not right, I think, to say that the probabilities for "related" hypotheses
must sum to less than 1 (at least not on any understanding of "related"
that I can think of). Perhaps you are thinking of the theorem of classical
probability which says that if A and B are mutually inconsistent
propositions then P(A) + P(B) is less than or equal to 1.
More generally, there seem to be plenty of counterexamples to your
claim.
e.g.
let C = "the dice does not come up 6"
D = "the dice comes up odd"
C and D are related, since D logically implies C.
But (for a fair dice), P(C) + P(D) = 5/6 + 1/2 = 4/3 > 1.
There *are* more general constraints on the probability assignments of
compound statements, relative to the probability assignments of their
atomic components. But there is no single such constraint.
e.g.
P(A & B) < P(A)
P(A & B) < P(B)
P(A v B) = max [P(A), P(B)]
Also, if A -> B then P(B) > P(A)
The issue of completeness in probability is an interesting one. What the
above remarks indicates is that there are "compound statements
whose probability is indeterminate" (relative to the probabilities of their
components). Perhaps the simplest case is conjunctions of
non-independent events.
Let A be the hypothesis that George Bush is re-elected. Let B be the
hypothesis that the stock market rises in 2005. Say you assign
probabilities of .5 to each. Then the only coherence constraint on P(A &
B) is that it be less than .5. You are free to assign any probability to A&B
between 0 and .5.
When discussing Godel and related results it is important to be clear
about what "completeness" means in the logical context. A logic typically
consists of a syntax (rules for well-formed formulas, rules for proof) and
a semantics (an interpretation of the syntax, in which meaning is
assigned to the logical symbols). In classical 1st-order propositional
logic, the semantics is simply the truth-table interpretation of each
logical connective. A logic is complete iff every semantically valid claim
is syntactically derivable. (Soundness is the converse condition.) And in
this sense, classical (1st-order) logic is complete.
Once we move beyond purely logical systems, a more useful definition
of completeness is iff - for any sentence of the system, S - either S or
~S is provable in the system. This is the sense of completeness
relevant to Godel's result. Clearly this is an unreasonable - indeed
undesirable - condition to place on logical systems in general. (We
don't want to be able to prove logically that it will rain tomorrow or prove
logically that it won't rain tomorrow!) But in the special case of
arithmetic (and in mathematical systems more generally) this sort of
completeness is a desirable goal. We would like our formal
mathematical systems to decide every question in the given domain. It
is this goal, in the particular case of mathematical systems strong
enough to embed arithmetic, which Godel showed is impossible.
Think I understand both the problem and the Ramsey approach, both
in
spirit and in practice. Assume that the Ramsey approach does not, by
itself, assure "coherence"?, that that's handled as a separate
problem? FEELS to me like the K-derived probability logic stems from
the same mindset as classical two-valued logic, ie not only does it
have the same "problems" but has also the same "disconnect" from
how
the brain actually works (I think). That "degree of belief" would be
a "philosophical issue" might be an indication of that. Could one
not simply take "degree of belief" as something along the lines of
"existence in brain of some relevant degree of order" defined re
"randomness"?
I didn't mention this in my previous message, but the Ramsey-style
"betting quotient' approach does assure coherence. There are some
theorems (known as "Dutch Book' theorems) which show that a person
will accept a set of bets in which he is guaranteed to lose money (a
so-called "Dutch Book') iff his method for calculating the probabilities of
compound statements does not conform to the standard probability
axioms.
You write: "Could one not simply take "degree of belief" as something
along the lines of "existence in the brain of some relevant degree of
order" defined re "randomness"?"
I'm unclear what you mean here. Degrees of belief are supposed to
attach to particular hypotheses. These could relate to immediate
sensory experience (e.g. "the coin will come up heads on the next flip")
or they could relate to much more abstruse matters (e.g. "the 109th
element in the periodic table will be created by scientists in the next 24
hours"). I don't see how a particular number - corresponding to the
person's degree of belief for each such hypothesis - can (even in
principle) be extracted from the physical state of the brain. And where
does randomness come in, exactly?
Like/share your sense of limitations of Mark's argument. Had your
(i) in my mind; like your (ii) and think it likely to be more
compelling to Mark. I was, I freely admit, hoping the
incompressibility notion might give some heft to intuitions about how
things work that I have trouble describing in terms compelling to
most people. So it could well be a red herring. Agree it WOULD be a
red herring on the Ramsey/Kolmogorov approach. On flip side,
though,
I have no reason to believe that the nervous system is constrained in
its basic functions to operations that would yield only rational
numbers and it is demonstrably capable of conceiving not only
irrationals but incompressibles as well.
I totally agree with your observation that there is "no reason to believe
that the nervous system is constrained "to operations that would yield
only rational numbers." But I think that it is important not to conflate the
numbers which enter into a description of the nervous system with the
numbers which our minds can conceive of. I don't see any clear
inference, for example, from "We can conceive of incompressibles" to
"The best description of the functioning of our nervous system involves
incompressibles."
I had enough trouble persuading myself that I partially understood
the "likelihood factor" (and STILL don't understand what constrains
it in such a way as to preclude the posterior probability from
exceeding one). So I'm more than happy to receive
correction/tutelage on this one. Actually, I LIKE the idea that the
posterior probability can go to zero. I have for years been arguing
that "science" cannot "prove" an hypothesis (an argument for which I
will find Bayes' Theorem helpful) but CAN "disprove" one. The latter
is important to me for other (though perhaps related) reasons,
basically because its the incentive to "change the story", ie go off
looking for observations in some new direction (which I think is one
essential component of "science", something akin to Kuhn's
"revolutions").
The possibility of posterior probabilities going to zero clearly fits nicely
with the Popperian view of scientific hypotheses as falsifiable. The
easiest way, I think, to see why the posterior probability cannot exceed 1
is to cash out the conditional probability in the numerator using the
formula P(A/B) = P(A & B) / P(B) (using P(B) does not = 0)
Bayes' Theorem says P(H/e) = P(H). P(e/H) / P(e)
Rewriting the RH side, using the above formula, yields
P(H) . P(e&H) / P(H) . P(e)
But now we can cancel out the two P(H)'s, so the resulting (posterior)
probability is P(e&H) / P(e). And since the probability of a conjunction is
less than or equal to the probability of one conjunct, this expression
cannot exceed 1.
In here is, I think, the core of what makes this worth talking
through/trying to clarify for me (and perhaps for you too?). Am not
of course sure what Bayes had in mind, nor do I have the background
to know what potentially related developments there were at the
time/have been since (though am intrigued by the notion that Bayes
was working BEFORE the rational/analytic mindset become
dominant).
What I AM interested in is whether "indeterminacy" is ONLY "that
propositions about the world can take numerical values between 0
and
1". My intuition is that "indeterminacy" can usefully be seen as
more than that. It is, for me, the "wiggle room" that precludes
adequate understanding through the mechanism of ANY "formal
axiomatic
system" and that makes the next iteration fundamentally unpredictable
(to some degree) from the results of the previous one. Is not just
"vagueness" in understanding; is at the core of the phenomena one is
trying to "understand". And if THAT's the case, then one DOES have
to recognize the fundamental limits of any/all FAS's (since they are
"mechanical" in the specific sense that the next iteration IS fully
predictable from the prior one). And one might aspire to a
formalization of the inquiry/understanding process that doesn't have
those limitations. My guess is that whatever the "primitive" is that
represent "indeterminacy" in classical probability theory, it doesn't
have the additional character I want, but I'm prepared to be told I'm
wrong about this (the issue is not dissimilar from the Copenhagen
arguments re quantum theory).
The connections between issues concerning formal axiomatic systems,
Godel's results, and quantum mechanics are deep and interesting.
[The Lucas-Penrose argument is one well-known attempt to exploit
such connections.] When you claim that FAS's are 'mechanical' in the
specific sense that the next iteration IS fully predictable from the prior
one," are you referring to iterations *within* the system or iterations *of*
the system. In the latter case it seems up to us (to a considerable
degree) how we change one FAS to yield another. In the former case,
I'm not sure exactly what is being iterated. Are we talking about the
repeated application of a rule of the system? But even in this case we
may not have full predictability. For example, the Disjunction rule in
classical logic allows you to infer from "P' to "P v Q', and you are free to
choose for "Q' *any* well-formed formula in the language. (In other
cases we do have full predictability. For example, the Modus Ponens
rule gives us no choice but to infer "B" from "If A then B" and "A".)
Had a suspicion there was a problem along these lines when I made
the
suggestion. And may be another red herring. Let's see whether its
worth returning to in light of other things. What I had in mind was,
I think, not quite the one the proof responds to. The idea was not
to substitute Baye's theorem for an existing axiom or set of axioms
but rather to use Baye's theorem as a starting point and then add
additional axioms as seemed necessary.
Using Bayes' Theorem as a starting point "and then add[ing] other
axioms as seem necessary" still sounds weird to me. I think that my
worry is that the expressions appearing in BT require more fundamental
axioms to define their meaning. (In particular the concept of conditional
probability.) It seems analogous to starting, say, with Fermat's Last
Theorem as an axiom, without having in place basic axioms for defining
the concept of number.
Yeah, had precisely in mind "an axiomatization of the reals which
does not proceed in this cumulative ways from the integers". And,
here too, may be a red herring (or something already fully explored).
I'm not though entirely dissuaded by your notion that doing so would
not "avoid the G-style incompleteness results, since the resulting
theory would be strong enough to embed integer arithmetic". It seems
to me at least possible (and if possible desirable) that it would
further illuminate G-incompleteness by showing it to hold in one
realm (integer arithmetic) of a larger space in which it does not
generally hold.
As I understand the Godel incompleteness results, any formal system
which is strong enough to embed arithmetic is either inconsistent or
incomplete. Hence it cannot be the case that Godel incompleteness
holds only in "one realm of a larger space in which it does not generally
hold."
Do understand that Godel incompleteness is NOT "absolute", that
what
one can't reach in one formal system CAN be reached in another. In
fact, one of the things that I think is important in all this is that
FAS's themselves GENERATE the unreachable (for themselves).
Would
need some tutoring on PA, ZF etc if it looks relevant in the long
run, but had the impression that Chaitin was quite directly concerned
with establishing "interesting arithmetical claims" (his omega number
and its relation to diophantine equations) not provable in FAS's as
he understands them at least.
I think we need to be careful in interpreting the claim that "FAS's
themselves GENERATE the unreachable (for themselves)." Certainly,
the demonstrably unprovable statements are typically system-specific.
But remember that Godel's system for coding statements about
provability-in-a-system into arithmetical statements is itself arbitrary.
There are infinitely many different coding schemes for each system, and
each coding system generates a different set of demonstrably
unprovable (in that system) statements. In other words, it is misleading
to think that the system itself leads us inexorably to some particular
(unprovable) Godel sentence.
As for your point about the omega number, I'm sure you're right. I
haven't looked at Chaitin's stuff for a while, and I had forgotten (or was
never aware) that the omega number has independent mathematical
interest.
As for accessible references concerning probability and its
philosophical ramifications, as I said in my original reply this is not my
area of professional expertise so I am not really up on the literature.
Having said that, one good place to start is the article "Interpretations of
Probability" in the Stanford Encyclopedia of Philosophy. This is an online
encyclopedia which (in general) has excellent survey articles on a wide
range of topics. It is peer reviewed and considered highly respectable
within the field. At the end of the article is a fairly extensive bibliography. The author of
the article, Alan Hajek, is a good friend of mine from grad school,
currently at Caltech. There is another article in the Stanford
Encyclopedia specifically on Bayes' Theorem. I haven't read it, but it
might be useful.
Grobstein responding to Baker indent italics, 4 August 2004
I think that your are correct to emphasize the parallels between classical
logic and the standard axiomatization of probability. One further parallel
"often overlooked" is that classical logic also gives you "freedom" in
initial assignments. From the point of view of pure logic, each contingent
atomic proposition can be assigned either "true" or "false."
Do think its interesting/potentially productive to ask both about the degree of parallelism and about the inevitability or lack thereof of whatever parallelisms exist. Seems to me possible (subject to some of the below) that parallelisms might reflect the fact that formalization of probability theory occurred during/after people got interested in formalization of arithmetic, and might have gone differently if it had proceeded more independently.
A gingerly suggestion, since I'm finding myself constantly reminded of my naivete in these realms, and there may in consequence be an important something I keep overlooking. Perhaps in the area of what is/is not taken as "givens" in FAS's (aspects of which we touched on earlier). It hadn't been clear in my mind that the "givens" are not only axioms and "sentence generation mechanisms" and "inference rules" (my terms, do they bear some relation to your vocabulary?) but also (in all cases?) "atomic propositions" and arbitrary "value" assignments. Wouldn't mind clearing this up, if you don't mind what are probably some very elementary clarifications. While on this topic, I assume there is nothing in an FAS that correspond to the "new observation" of the "scientist", ie a something from OUTSIDE the formal system that has to be taken into account? Suspect this bears on a point below where you found me unclear ("I'm not sure what is being iterated"), will see when we get there.
The point about incoherence, for the probability calculus, is not that you
have a choice of different probabilities you can assign (since this is also
true for coherent systems), but that you can derive - within the
(incoherent) system - two (or more) different probability assignments
for a single proposition. Hence the analogy with inconsistency in
classical logic - where you can derive two truth-value assignments (true
and false) for a single proposition. For example, say I assign probability
25 to the proposition that it will rain tomorrow. If I am working with an
incoherent set of probability axioms then I will be able to derive, within
the system, a second, distinct probability for this same proposition, say
45.
It's not right, I think, to say that the probabilities for "related" hypotheses
must sum to less than 1 (at least not on any understanding of "related"
that I can think of). Perhaps you are thinking of the theorem of classical
probability which says that if A and B are mutually inconsistent
propositions then P(A) + P(B) is less than or equal to 1.
More generally, there seem to be plenty of counterexamples to your
claim.
e.g.
let C = "the dice does not come up 6"
D = "the dice comes up odd"
C and D are related, since D logically implies C.
But (for a fair dice), P(C) + P(D) = 5/6 + 1/2 = 4/3 > 1.
There *are* more general constraints on the probability assignments of
compound statements, relative to the probability assignments of their
atomic components. But there is no single such constraint.
e.g.
P(A & B) < P(A)
P(A & B) < P(B)
P(A v B) = max [P(A), P(B)]
Also, if A -> B then P(B) > P(A)
There IS something here that I'm having trouble getting my head around, and so may require us to backtrack a bit (if you're willing). The "sum to less than 1" idea was my "guess" as to what the requirement was to achieve "coherence" in probability calculus. The assumption I was making (I think) is that what probability calculus does is to derive probabilities given a set of starting conditions (which include some probability assignments to "atomic sentences"?) and that "coherence" (parallel to completeness) meant, as you say, that there would never be "two or more probability assignments for a single proposition". Since I assumed probability assignments (like true/false?) could be ARBITRARILY assigned only to "atomic sentences", I don't quite understand the idea of "probability assignments" of COMPOUND sentences. Am I missing something? On the other hand, the "general constraints" do make sense to me (and indeed probably more so than my "sum to less than 1" idea.
Its interesting though, assuming I'm understanding you, that the "operators" used in setting the constraints are those of "classic logic". Again, I can't help but wonder what kind of "formalization" of probability might have resulted if it had been done (was done) independent of that framework.
The issue of completeness in probability is an interesting one. What the above remarks indicates is that there are ěcompound statements
whose probability is indeterminateî (relative to the probabilities of their
components). Perhaps the simplest case is conjunctions of
non-independent events.
Let A be the hypothesis that George Bush is re-elected. Let B be the
hypothesis that the stock market rises in 2005. Say you assign
probabilities of .5 to each. Then the only coherence constraint on P(A &
B) is that it be less than .5. You are free to assign any probability to A&B
between 0 and .5.
Hmmmm. So there is not "rule" for determining the value of A&B in probability calculus as there is (I thought) in classical logic (ie true if A and B are both true, otherwise false)? Is there a difference between "undetermined" and "inconsistent"? In probability calculus? In classical logic? (are Godel statements simply "undetermined" or do they themselves or indirectly lead to inconsistencies?).
When discussing Godel and related results it is important to be clear
about what ěcompletenessî means in the logical context. A logic typically
consists of a syntax (rules for well-formed formulas, rules for proof) and
a semantics (an interpretation of the syntax, in which meaning is
assigned to the logical symbols). In classical 1st-order propositional
logic, the semantics is simply the truth-table interpretation of each
logical connective. A logic is complete iff every semantically valid claim
is syntactically derivable. (Soundness is the converse condition.) And in
this sense, classical (1st-order) logic is complete.
Hmmmm. This may be part of where my naivete gets me into trouble. I understand (I think) "syntax" (that is what I called above "construction rules" together with "inference rules", yes?). What's less clear to me is the distinction between that and "semantics". I guess I assumed the "truth table" was part of the definition of the "logic", and am less clear about how "semantics" would be characterized in other cases. Even so, I thought that Godel showed "incompleteness" for 1st order logic. Not so, huh? Only for more sophisticated logics adequate to support arithmetic? And those have a different "semantics"?
Once we move beyond purely logical systems, a more useful definition
of completeness is iff "for any sentence of the system, S" either S or
~S is provable in the system. This is the sense of completeness
relevant to Godel"s result. Clearly this is an unreasonable, indeed
undesirable, condition to place on logical systems in general. (We
don"t want to be able to prove logically that it will rain tomorrow or prove
logically that it won"t rain tomorrow!) But in the special case of
arithmetic (and in mathematical systems more generally) this sort of
completeness is a desirable goal. We would like our formal
mathematical systems to decide every question in the given domain. It
is this goal, in the particular case of mathematical systems strong
enough to embed arithmetic, which Godel showed is impossible.
Back on track here (thanks for the detour). And agree that, for ARITHMETIC "completeness" in the sense defined is would be desirable. Is less obvious, MUCH less obvious, that it would be desirable for "science", much less down to day life. Assume this has been appreciated, issue is whether people have proposed alternate formal systems more approprite for ... science?
I didn't mention this in my previous message, but the Ramsey-style
"betting quotient" approach does assure coherence. There are some
theorems (known as "Dutch Book" theorems) which show that a person
will accept a set of bets in which he is guaranteed to lose money (a
so-called "Dutch Book") iff his method for calculating the probabilities of
compound statements does not conform to the standard probability
axioms.
You write: "Could one not simply take 'degree of belief' as something
along the lines of 'existence in the brain of some relevant degree of
order' defined re 'randomness'?"
I'm unclear what you mean here. Degrees of belief are supposed to
attach to particular hypotheses. These could relate to immediate
sensory experience (e.g. ěthe coin will come up heads on the next flipî)
or they could relate to much more abstruse matters (e.g. ěthe 109th
element in the periodic table will be created by scientists in the next 24
hoursî). I don"t see how a particular number "corresponding to the
person's degree of belief for each such hypothesis" can (even in
principle) be extracted from the physical state of the brain. And where
does randomness come in, exactly?
May want to come back to interesting "Dutch book" issue, "theorems ... which show that a person" confuses me (observations can show ...., but a theorem?). But, let's stay for the moment with your question to me. If Bayes was right, that probability = degree of certainty, then "degree of certainty" MUST be a feature of the "physical state of the brain" and, if it is, it can be "extracted" by appropriate observations. This is true whether the "cause" of that particular brain state is "immediate sensory experience" or "more abstruse matters". Randomness may be a factor in the "more absruse matters". But, more immediately, my speculation is that the brain representation of "degree of certainty" uses a state of randomness (ie 0 certainty) as a baseline.
I totally agree with your observation that there is "no reason to believe
that the nervous system is constrained" to operations that would yield
only rational numbers.î But I think that it is important not to conflate the
numbers which enter into a description of the nervous system with the
numbers which our minds can conceive of. I don't see any clear
inference, for example, from "We can conceive of incompressibles" to
"The best description of the functioning of our nervous system involves
incompressibles."
No argument about lack of "clear inference", is speculative leap. BUT am not conflating description with what "our minds can conceive of". The latter is a very small part of what the nervous system is/does (consciousness, the "story teller"). The rest of the nervous system may well not only be "describable" using incompressible numbers but may operate in ways that involve them (whether consciousness can conceive them or not ... the rest of the ns, for example, clearly works happily with high numbers of degrees of freedom/dimensions whereas most people can only "conceive" three).
The possibility of posterior probabilities going to zero clearly fits nicely
with the Popperian view of scientific hypotheses as falsifiable. The
easiest way, I think, to see why the posterior probability cannot exceed 1
is to cash out the conditional probability in the numerator using the
formula P(A/B) = P(A & B) / P(B) (using P(B) does not = 0)
Bayes' Theorem says P(H/e) = P(H). P(e/H) / P(e)
Rewriting the RH side, using the above formula, yields
P(H) . P(e&H) / P(H) . P(e)
But now we can cancel out the two P(H)'s, so the resulting (posterior)
probability is P(e&H) / P(e). And since the probability of a conjunction is
less than or equal to the probability of one conjunct, this expression
cannot exceed 1.
Nice, got it. Relates to one of the coherence criteria above, yes? No, wait a minute. If that argument holds (as I'm understanding it) then the "likelihood factor" can't exceed one and the posterior probability can never be increased by a new observation? What am I missing?
The connections between issues concerning formal axiomatic systems,
Godel's results, and quantum mechanics are deep and interesting.
[The Lucas-Penrose argument is one well-known attempt to exploit
such connections.] When you claim that FAS's "are "mechanical" in the
specific sense that the next iteration IS fully predictable from the prior
one," are you referring to iterations *within* the system or iterations *of*
the system. In the latter case it seems up to us (to a considerable
degree) how we change one FAS to yield another. In the former case,
I'm not sure exactly what is being iterated. Are we talking about the
repeated application of a rule of the system? But even in this case we
may not have full predictability. For example, the Disjunction rule in
classical logic allows you to infer from "P" to "P v Q", and you are free to
choose for "Q" *any* well-formed formula in the language. (In other
cases we do have full predictability. For example, the Modus Ponens
rule gives us no choice but to infer "B" from "If A then B" and "A".)
IS (perhaps?) the key issue (in my way of thinking/concerns). Had in mind iteration WITHIN the system and the ambition that some scientists have explicitly (and many scientists as well as non-scientists have unconsciously) to achieve a "theory of everything": a small set of starting conditions and transformation rules from which everything else in "reality" follows. The week before the session you were at, I made a strong (I think) argument (see http://serendip.brynmawr.edu/local/scisoc/information/grobstein15july04.html) that trying to describe "reality" as an FAS would inevitably fail (whether it actually IS one or not), and so one needn't some formalization of the inquiry process that didn't include the presumption that reality could be usefully described as a particular FAS. DO understand that "fully predictable iteration" is NOT a property of formal logic (nor of arithmetic; both involve "irreversible" transformations (your examples in logic; 3 + 5 = 8 in arithmetic). But physics tries to use only time-symmetric, hence reversible, hence deterministically iterative transformations, and many other sciences try to copy that. And THAT is what I am fundamentally trying to get people to get away from.
Hmmmm .... effort to clarify here between us useful (yeah other places too but maybe here especially). Am realizing that my quarrel may be less with FAS per se and more with a particular KIND of FAS?, a kind exemplified by Turing computability. I thought FAS's and Turing computability systems were formally equivilent? Does anyone make a distinction between FAS's with and without irreversibility?, with and without transformation rules that have some "arbitrariness" (observer choice) to them? What is presumed to be the "source" of the arbitrary choices in the theory of FAS?
And another line to think more about: do NOT want to argue that FAS's and/or Turing computability are irrelevant to science, only that they cannot serve as the aspired-to goal. So COULD imagine a characterization of science in terms of "how we change one FAS to yield another". Are there efforts to "formalize" science in those terms?
Using Bayes' Theorem as a starting point ěand then add[ing] other
axioms as seem necessaryî still sounds weird to me. I think that my
worry is that the expressions appearing in BT require more fundamental
axioms to define their meaning. (In particular the concept of conditional
probability.) It seems analogous to starting, say, with Fermat"s Last
Theorem as an axiom, without having in place basic axioms for defining
the concept of number.
Yeah, fair enough. But ... what if one DID take the "message" of Bayes Theorem (ie "degree of uncertainty", no presumptions about nature/even existence of external world, iterative change) as a starting point and then attempted to create a "formal system" around that, defining axioms/terms as needed?
As I understand the Godel incompleteness results, any formal system
which is strong enough to embed arithmetic is either inconsistent or
incomplete. Hence it cannot be the case that Godel incompleteness
holds only in "one realm of a larger space in which it does not generally
hold."
Maybe issue here, what needs to be clarified, is "formal system", what exactly the requirements/constraints presumed in it are?
I think we need to be careful in interpreting the claim that "FAS's
themselves GENERATE the unreachable (for themselves)." Certainly,
the demonstrably unprovable statements are typically system-specific.
But remember that Godel's system for coding statements about
provability-in-a-system into arithmetical statements is itself arbitrary.
There are infinitely many different coding schemes for each system, and
each coding system generates a different set of demonstrably
unprovable (in that system) statements. In other words, it is misleading
to think that the system itself leads us inexorably to some particular
(unprovable) Godel sentence.
As for your point about the omega number, I"m sure you're right. I
haven't looked at Chaitin's stuff for a while, and I had forgotten (or was
never aware) that the omega number has independent mathematical
interest.
Was not aware of the dependence of the definition of the "unreachable" on the choice of coding scheme. Is important to know that, thanks. Was myself impressed by learning about Chaitin's persistence in establishing "independent mathematical interest". All of which makes me wonder still more about my presumption of a formal equivalence between Godel and Turing. There may be more of interest to chew over there.
Home
| Calendar | About
| Getting Involved
| Groups | Initiatives | Bryn Mawr Home | Serendip Home
Director: Liz McCormack -
emccorma@brynmawr.edu
| Faculty Steering Committee
| Secretary: Lisa Kolonay
© 1994-
, by Center for Science in Society, Bryn Mawr College and Serendip
Last Modified:
Wednesday, 04-Aug-2004 11:40:53 EDT