root/library/doc/tutorial/02userguide2.dox @ 909

Revision 870, 10.6 kB (checked in by mido, 15 years ago)

LOG_LEVEL final cut (or rather semifinal? I hope to improve work with ids soon)
and also it rests to adapt applications, work is in progress

Line 
1/*!
2\page userguide2 BDM Use - Estimation and Bayes Rule
3
4Baysian theory is predominantly used in system identification, or estimation problems.
5This section is concerned with recursive estimation, as implemneted in prepared scenario \c estimator.
6
7The function of the \c estimator is graphically illustrated:
8\dot
9digraph estimation{
10        node [shape=box];
11        {rank="same"; "Data Source"; "Bayesian Model"}
12        "Data Source" -> "Bayesian Model" [label="data"];
13        "Bayesian Model" -> "Result Logger" [label="estimation\n result"];
14        "Data Source" -> "Result Logger" [label="Simulated\n data"];
15}
16\enddot
17
18Here,
19\li Data Source is an object (class DS) providing sequential data, \f$ [d_1, d_2, \ldots d_t] \f$.
20\li Bayesian Model is an object (class BM) performing Bayesian filtering,
21\li Result Logger is an object (class logger) dedicated to storing important data from the experiment.
22
23Since objects  datasource and the logger has already been introduced in section \ref userguide, it remains to introduce
24object \c Bayesian \c Model (bdm::BM).
25
26\section ug2_theory Bayes rule and estimation
27The object bdm::BM is basic software image of the Bayes rule:
28\f[ f(x_t|d_1\ldots d_t) \propto f(d_t|x_t,d_1\ldots d_{t-1}) f(x_t| d_1\ldots d_{t-1}) \f]
29
30Since this operation can not be defined universally, the object is defined as abstract class with methods for:
31 - <b> Bayes rule </b> as defined above, operation bdm::BM::bayes() which expects to get the current data record \c dt, \f$ d_t \f$
32 - <b> log-likelihood </b> i.e. numerical value of \f$ f(d_t|d_1\ldots d_{t-1})\f$ as a typical side-product, since it is required in denominator of the above formula.
33   For some models, computation of this value may require extra effort, hence it computation can be suppressed by setting BM::set_evalll(false).
34 - <b> prediction </b> the object has enough information to create the one-step ahead predictor, i.e. \f[ f(d_{t+1}| d_1 \ldots d_{t}), \f]
35        this object can be either created bdm::BM::predictor(), sometimes it is enought only a few values of prediction hence construction of the full predictor would be too expensive operation. For this situation were designed cheaper operations bdm::BM::logpred() or bdm::BM::epredictor().
36These are only basic operations, see full documentation for full range of defined operations.
37       
38These operation are abstract, i.e. not implemented for the general class.
39Implementation of these operations is heavily dependent on the specific class of prior pdf, or its approximations. We can identify only a few principal approaches to this problem. For example, analytical estimation which is possible within sufficient the Exponential Family, or estimation when both prior and posterior are approximated by empirical densities.
40These approaches are first level of descendants of class \c BM, classes bdm::BMEF and bdm::PF, repectively.
41
42Variants of these approaches are implemented as descendats of these level-two classes. This way, each estimation method (represented a class) is fitted in its place in the tree of approximations. This is useful even from software point of view, since related approximations have common methods and data fields.
43
44\section ug2_arx_basic Estimation of ARX models
45
46Autoregressive models has already been introduced in \ref ug_arx_sim where their simulator has been presented.
47We will use results of simulation of the ARX datasource defined there to provide data for estimation using MemDS.
48
49The following code is from bdmtoolbox/tutorial/userguide/arx_basic_example.m
50\code
51A1.class = 'ARX';
52A1.rv = y;
53A1.rgr = RVtimes([y,y],[-3,-1]) ;
54A1.log_level = 'logbounds,logevidence';
55\endcode
56This is the minimal configuration of an ARX estimator. Optional elements of bdm::ARX::from_setting() were set using their default values:
57
58The first three fileds are self explanatory, they identify which data are predicted (field \c rv) and which are in regressor (field \c rgr).
59The field \c log_level is a string of options passed to the object. In particular, class \c BM understand only options related to storing results:
60 - logbounds - store also lower and upper bounds on estimates (obtained by calling BM::posterior().qbounds()),
61 - logevidence - store also loglikelihood of each step of the Bayes rule.
62These values are stored in given logger (\ref ug_loggers). By default, only mean values of the estimate are stored.
63
64Storing of the log-likelihood is useful, e.g. in model selection task when two models are compared.
65
66The bounds are useful e.g. for visualization of the results. Run of the example should provide result like the following:
67\image html arx_basic_example_small.png
68\image latex arx_basic_example.png "Typical run of tutorial/userguide/arx_basic_example.m" width=\linewidth
69
70\section ug2_model_sel Model selection
71
72In Bayesian framework, model selection is done via comparison of marginal likelihood of the recorded data. See [some theory].
73
74A trivial exammple how this can be done is presented in file bdmtoolbox/tutorial/userguide/arx_selection_example.m. The code extends the basic A1 object as follows:
75\code
76A2=A1;
77A2.constant = 0;
78
79A3=A2;
80A3.frg = 0.95;
81\endcode
82That is, two other ARX estimators are created,
83 - A2 which is the same as A1 except it does not model constant term in the linear regression. Note that if the constant was set to zero, then this is the correct model.
84 - A3 which is the same as A2, but assumes time-variant parameters with forgetting factor 0.95.
85 
86Since all estimator were configured to store values of marginal log-likelihood, we can easily compare them by computint total log-likelihood for each of them and converting them to probabilities. Typically, the results should look like:
87\code
88Model_probabilities =
89
90    0.0002    0.7318    0.2680
91\endcode
92Hence, the true model A2 was correctly identified as the most likely to produce this data.
93
94For this task, additional technical adjustments were needed:
95\code
96A1.name='A1';
97A2.name='A2';
98A2.rv_param = RV({'a2th', 'r'},[2,1],[0,0]);
99A3.name='A3';
100A3.rv_param = RV({'a3th', 'r'},[2,1],[0,0]);
101\endcode
102First, in order to distinguish the estimators from each other, the estimators were given names. Hence, the results will be logged with prefix given by the name, such as M.A1ll for field \c ll.
103
104Second, if the parameters of a ARX model are not specified, they are automatically named \c theta and \c r. However, in this case, \c A1 and \c A2 differ in size, hence their random variables differ and can not use the same name. Therefore, we have explicitly used another names (RVs) of the parameters.
105
106\section ug2_bm_composition Composition of estimators
107
108Similarly to pdfs which could be composed via \c mprod, the Bayesian models can be composed. However, justification of this step is less clear than in the case of epdfs.
109
110One possible theoretical base of composition is the Marginalized particle filter, which splits the prior and the posterior in two parts:
111\f[ f(x_t|d_1\ldots d_t)=f(x_{1,t}|x_{2,t},d_1\ldots d_t)f(x_{2,t}|d_1\ldots d_t) \f]
112each of these parts is estimated using different approach. The first part is assumed to be analytically tractable, while the second is approximated using empirical approximation.
113
114The whole algorithm runs by parallel evaluation of many \c BMs for estimation of \f$ x_{1,t}\f$, each of them conditioned on value of a sample of \f$ x_{2,t}\f$.
115
116For example, the forgetting factor, \f$ \phi \f$ of an ARX model can be considered to be unknown. Then, the whole parameter space is \f$ [\theta_t, r_t, \phi_t]\f$ decomposed as follows:
117\f[ f(\theta_t, r_t, \phi_t) = f(\theta_t, r_t| \phi_t) f(\phi_t) \f]
118Note that for known trajectory of \f$ \phi_t \f$ the standard ARX estimator can be used if we find a way how to feed the changing \f$ \phi_t \f$ into it.
119This is achieved by a trivial extension using inheritance method bdm::BM::condition().
120
121Extension of standard ARX estimator to conditional estimator is implemented as class bdm::ARXfrg. The only difference from standard ARX is that this object will change its forgetting factor via method ARXfrg::condition(). Existence of this function is assumed by the MPF estimator.
122Informally, the name 'ARXfrg' means: "if anybody calls your condition(0.9), it tells you new value of forgetting factor".
123
124The MPF estimator is implemented by class bdm::MPF. In the toolbox, it can be constructed as follows:
125\code
126%%%%%% ARX estimator conditioned on frg
127
128A1.class = 'ARXfrg';
129A1.rv = y;
130A1.rgr = RVtimes([y,u],[-3,-1]) ;
131A1.log_level ='logbounds,logevidence';
132A1.frg = 0.9;
133A1.name = 'A1';
134
135%%%%%% Random walk on frg - Dirichlet
136phi_pdf.class = 'mDirich';         % random walk on coefficient phi
137phi_pdf.rv    = RV('phi',2);       % 2D random walk - frg is the first element
138phi_pdf.k     = 0.01;              % width of the random walk
139phi_pdf.betac = [0.01 0.01];       % stabilizing elememnt of random walk
140
141%%%%%% Combining estimators in Marginalized particle filter
142E.class = 'MPF';
143E.BM = A1;                         % ARX is the analytical part
144E.parameter_pdf = phi_pdf;         % Random walk is the parameter evolution model
145E.n = 20;                          % number of particles
146E.prior.class = 'eDirich';         % prior on non-linear part
147E.prior.beta  = [1 1]; %
148E.log_level ='logbounds,logevidence';
149E.name = 'MPF';
150
151M=estimator(DS,{E});
152
153\endcode
154 
155Here, the configuration structure \c A1 is a description of an ARX model, as used in previous examples, the only difference is in its name 'ARXfrg'.
156
157The configuration structure \c phi_pdf defines random walk on the forgetting factor. It was chosen as Dirichlet, hence it will produce 2-dimensional vector of \f$[\phi, 1-\phi]\f$. The class \c ARXfrg was designed to read only the first element of its condition.
158The random walk of type mDirich is:
159\f[ f(\phi_t|\phi_{t-1}) = Di (\phi_{t-1}/k + \beta_c) \f]
160where \f$ k \f$ influences the spread of the walk and \f$ \beta_c \f$ has the role of stabilizing, to avoid traps of corner cases such as [0,1] and [1,0].
161Its influence on the results is quite important.
162
163This example is implemented as bdmtoolbox/tutorial/userguide/frg_example.m
164Its typical run should look like the following:
165\image html frg_example_small.png
166\image latex frg_example.png "Typical run of tutorial/userguide/frg_example.m" width=\linewidth
167
168Note: error bars in this case are not directly comparable with those of previous examples. The MPF class implements the qbounds function as minimum and maximum of bounds in the condidered set (even if its weight is extreemly small). Hence, the bounds of the MPF are probably larger than it should be. Nevertheless, they provide great help when designing and tuning algorithms.
169
170*/
Note: See TracBrowser for help on using the browser.