Abstract:-
First-order probabilistic logic is a powerfulknowledge rep-resentation language.Unfortunately, deductive reasoning basedon the standard semantics for this logicdoes not sup-port certain desirable patternsof reasoning, such as indif-ference toirrelevant information or substitution of constants into universal rules. We showthat both these patterns rely on a first-order version of probabilistic independence,and provide semantic conditions to capturethem. The resulting insight enables us tounderstand the effect of conditioning onindependence, and allows us to describe aprocedure for determining whenindependencies are preserved under con-ditioning. We apply this procedure in thecontext of a sound and powerful inferencealgorithm for reasoning from statis-ticalknowledge bases
Introduction:-
The tradeoff between expressivenessand tractability is a central problem in AI.First-order logic can express mostknowledge compactly, but is intractable. Asa result, much research has focused ondeveloping tractable subsets of it. First-order logic and its subsets ignore that mostknowledge is uncertain, severely limitingtheir applicability. Graphical models likeBayesian and Markov networks canrepresent many probability distributionscompactly, but their expres-siveness is onlyat the level of propositional logic, and infer-ence in them is also intractable.
Many languages have been developed tohandle uncer-tain first-order knowledge.Most of these use a tractable subset of first-order logic, like function-free Horn clausesUnfortu-nately, the probabilistic extensionsof these languages are intractable, losingthe main advantage of restricting the rep-resentation. Other languages, like Markovlogic (Domingos and Lowd 2009), allow afull range of first-order constructs. Despitemuch progress in the last decade, theapplicability of these representationscontinues to be significantly limited by thedifficulty of inference.
History:-
The ideal language for most AIapplications would be a tractable first-orderprobabilistic logic, but the prospects forsuch a language that is not too restricted tobe useful seem dim. Cooper (1990) showedthat inference in propositional graphicalmodels is already NP-hard, and Dagum andLuby (1993) showed that this is true even
for approximate infer-ence. Probabilisticinference can be reduced to model count-ing, and Roth (1996) showed that evenapproximate count-ing is intractable, evenfor very restricted propositional lan-guages,like CNFs with no negation and at most twoliterals per clause, or Horn CNFs with atmost two literals per clause and threeoccurrences of any variable.Although probabilisticinference is commonly assumedto beexponential in treewidth, a line of researchthat includes arithmetic circuits (Darwiche2003), AND-OR graphs (Dechter andMateescu 2007) and sum-product net-works (Poon and Domingos 2011) showsthat this need not be the case. However,the consequences of this for first-orderprobabilistic languages seem to haveso far gone unnoticed. In addition, recentadvances in lifted probabilistic inference(e.g., (Gogate and Domingos 2011)) maketractable infer-ence possible even in caseswhere the most efficient propo-sitionalstructures do not. In this paper, we takeadvantage of these to show that it ispossible to define a first-order prob-abilisticlogic that is tractable, and yet expressiveenough to encompass many cases of interest, including junction trees, non-recursive PCFGs, networks with clusterstructure, and inheritance hierarchies.
Logic:-
The choice of logic programs as theunderlying represen-tation is motivated byits expressive power and general-ity.Indeed, logic programs allow us torepresent relational databases, grammarsand also programs. This will turn out to beuseful when discussing commonalities anddifferences in the probabilistic relational orlogical representations later on. In addition,logic programs are well-understood, have aclear semantics and are wide-spread.Consider the alarm program in Figure 2. Inthis exam-ple, there are five propositionsthat occur, i.e. {alarm,earthquake,marycalls, johncalls, burglary}. This is theso-called Herbrand base HB. It consists of allfacts that one can construct with thepredicate, constant and function symbols inthe program.
alarm : − burglary, earthquake.
johncalls : − alarm.
marycalls : − alarm.
Figure 2: The alarm program. The Herbrand base
–
in a sense
–
specifiesthe set of all possible worlds described bythe program. Indeed, for the alarmprogram, there are 2^5= 32possibleassignments of truth-values to theHerbrand base.These assignments aresometimes called (Herbrand) inter-pretations and they each describe a world.
PROBABILISTIC LOGICS:-
The modeltheoretic and proof theoretic views of logicare also useful from a probabilisticperspective. Indeed, many of the presentprobabilistic representations can beregarded as defining probabilities onpossible worlds (e.g., Bayesian networks) oron proofs (e.g., stochastic context freegram-mars). As these probabilisticrepresentations form the ba-sis for theprobabilistic logics, we will now provide ashort overview to these. We will use P todenote a probabil-ity distribution, e.g. P(X),
and P to denote a probability value, e.g. P (X= x).
Probabilities on Possible Worlds:-
The most popular formalism for definingprobabilities on possible worlds is that of Bayesian networks. As an ex-ample of a
Bayesian network, consider Judea Pearl’s fa
-mous alarm network,
where each nodecorre-sponds to a random variable Xi (e.g.,{alarm, earthquake, marycalls, johncalls,burglary}) and each edge indicates a directinfluence among the random variables. Wewill denote the set of parents of a node Xiby Pa(Xi ), e.g.,Pa(alarm) = {earthquake, burglary}. ABayesian network specifies a jointprobability distribution P(X1, . . . ,Xn) over afixed, finite set {X1, . . . ,Xn} of randomvariables. As we will
–
for simplicity
–
assume that the random variables are allboolean, i.e., they can have the domain{true, false}1, this actually amounts tospecifying a probability distribu-tion on theset of all possible interpretations. Indeed, inour alarm example, the Bayesian networkdefines a probability distribution over truth-assignments to {alarm, earthquake,marycalls, johncalls, burglary}.A Bayesiannetwork stipulates the following conditionalin-dependency assumption: Each node Xi inthe graph is conditionally inde-pendent of any subset A of nodes that are notdescendants of Xi given a joint state of Pa(Xi), i.e. P(Xi | A, Pa(Xi)) = P(Xi | Pa(Xi ))
Satisfaites votre curiosité
Tout ce que vous voulez lire.
À tout moment. Partout. Sur n'importe quel appareil.
Aucun engagement. Annulez à tout moment.