欢迎访问一起赢论文辅导网
本站动态
联系我们
 
 
 
 
 
 
 
 
 
 
 
QQ:3949358033

工作时间:9:00-24:00
SCI期刊论文
当前位置:首页 > SCI期刊论文
Missing agricultural price data: an application of mixed estimation
来源:扉衙赢论冼网     日期:2013-06-04     浏览数:4090     【 字体:

This article shows the co nvenience of combini ng the Count ry–Produ ct–Dummy (CPD) model and the Thei l–Goldb erger (TG) mixe d estimat or toobtain be tter estimat es of mis sing prices than those obtaine d by margi nalmean imputa tion. We use the TG estimat or to co mbine aggrega te price datafor regions of Ecua dor wi th producer- level data (the sampl e da ta) to fill- inthe missing price obse rvations. Our results show bette r price esti mates forpredicting missing da ta than estimat es obtaine d from using the CPD modelon the sampl e da ta only. Thro ugh this appro ach, missing da ta can bereplaced wi th ec onomically meani ngful data.
I. Introduction
This article sho ws how combini ng the Country–Product–D ummy (CPD) model (Summ ers, 1973) andthe Theil–G oldberger (TG) mixe d e stimato r (Theiland Goldber ger, 1961) results in bette r estimat es ofmissing prices than tho se obt ained by margin al meanimputati on. Missing data are a frequent ne mesis forapplied econo metrics, especi ally when developi ngcountry data is the subject of a nalysis. For exampl e,agricultur al rents and profits are usu ally ke y varia blesneeded for developi ng co untry welfare an alyses; how-ever, producer- level observat ions almos t always mis sat least some of the output price infor matio n.
    Our study examines the use of the TG estimator tocombine aggregate price data for regions in Ecuadorwith producer-level data to fill-in the missing priceobservations. The mixed estimator allows for a generalspecification of price at the producer level by expandingindividual data with regional data.
  The CPD modelprovides the formulation for price estimation.The C PD model estim ates the p ri ce of a com modit y ina r eg ion by regressing the na tural logarithm of price oncoun try and product du mmies. It has been used e xten-sively to expl ain p rice vari ati ons in internati onal compar -isons (Theil et al. , 1989; Rao, 2004, 2005; Diewert, 20 05 ).Rao (2004) a nd Diewert (2005) indica te that the CPD is asimple he donic r egression model in w hi ch the o nly char-acteristic of the commodity is the c ommodity itself.
  The TG mixe d esti mator (an extens ion of Durbi n’s(1953) work) allows the us e of prior or extra neou sinformat ion (such as a p riori expecta tions on c oeffi-cients based on theo ry or exp erience) in estimat ion.This addition al infor matio n impro ves the estimationover the esti mates using sampl e da ta only, but it intr o-duces a bias. In fact, the TG ’s mixed e stimato ralthoug h popul ar during the 1960s throu gh the 1 980swas abandon ed due to critici sm on the subjectivi sm onthe choice of a priori co efficie nts (see Mittelham merand Conway, 1988). How ever, the mixe d estimationused in this article doe s not depen d on the definiti on ofa prioriexp ectations on coeff icients but makes use oftwo indep endent sets of data a s a means to obtainingmore acc urate price predict ions. Our resul ts show thatthrough this ap proach, the missing data can bereplaced wi th ec onomically meani ngful data.
II. Data and Methods
The data to be used in this study come from two mainsources . The first is a househ old survey perfor med bythe World Bank and the Nation al In stitute ofStatistics and Census from Ecua dor. This survey was*Corresponding author. Applied Economics Letters , 2010, 17, 537–541applied to 5816 urban and rural house holds in theCoast an d Sierr a regions of Ecua dor (encom passinga total of 15 pro vinces ), and it collects farmers’ infor -mation on producti on, costs an d prices, among otherdata. Out of those househo lds, 1807 repo rted quanti-ties of at least one harvested c rop. Coun ting each cropas an independen t observat ion, we ha ve 7085 observa-tions, 67% of whi ch do not have price infor matio n.From this data set, 2308 vali d pr ice observat ions areused for the estimat ions.
  The second data set includes price data gathered atfarmers’ markets carried out in cities across Ecuadorand reported by the Ecuadorian Ministry ofAgriculture. This data set, although smaller than thehousehold survey, provides price information for someprovinces for which the household survey has onlymissing or scarce information. The number of priceobservations from this set is 472. Both data sets arecross-sectional and correspond to the same year, 1999.
The country–product–dummy model
The CPD model suggest s that price variation can bemostly exp lained by the type of co mmodi ty (commo d-ity effe ct) and the country or region it belongs to(country effe ct). For our case, the varia tion in agric ul-tural prices in Ecua dor is exp ected to be exp lained bythe type of crop and the province where the farmer islocated. The mo del specificat ion of the CPD mod el inthis case is the followi ng.pij ¼ aibjeijIn logari thms,ln pij ¼ ln ai þ ln bj þ ln eijwhere pijrefers to the price of cro pi in province j , aiand bjare the crop an d province co efficients to beestimated and eijga thers residu al effects explain ingvariations inpij. I f w e re- spec ify yij=lnpij, ai=lnai,j=lnbiand uij=lneijand express the model us ingdumm y v ariables ( D) for crops (i =1,2,...,C)andpr ov in ce s (j = 1 ,2, . . . , P ) , we obtainyij ¼  1 D 1 þ  2 D2 þþ C D Cþ 1D1þ 2D2þþPDPþ uijð 1 Þ
After dro pping one province dummy (1= 0), whichwould beco me the base province, Equation 1 is theregression equati on to be estimat ed by ordinar y leastsquares – the be st linea r unbiased estimat or for thismodel (Summ ers, 1973). Thus, studies applyin g thismethod ha ve used the estimated coeff icien ts inEquation 1 to predict the mis sing price va lues (Theilet al ., 1989; Rao, 2005).Theil–Goldberger mixed estimatorThe TG mixed estimator procedure uses data outsidethe sample to improve sample-based coefficients. Itcombines the two sources of information, as follows:yr¼XR þuvwhere y is the N · 1 v ector o f observations on thede p e nd e n t v ar ia b l e , X the N · K matrix of explana-tory variables and u the N · 1 vector of disturbancesof the s am ple d ata. For the al ternative d ata s et,r isthe M · 1 v ector o f the depe ndent variable, R theM · K matrix of independent variabl es and v theM · 1 v ector of e rrors. The c oefficient  is theK · 1 v ector o f unknown p arameters to beestim a ted.Thr ee standar d assum ptions allow us to derive thebest linea r unbiased coefficie nt for the model:
Both sets of explanatory variables arenonstochastic; Euv¼ 0; and Euvu0v0¼ 00 Genera lize d least squares can then be applie d to themodel, yiel ding the foll owing coeff icient estimat or:^ ¼ X0R0½ 00  1XR  ! 1· ½ X0R0 00  1yrIf the p revious assumptions are met,^ is a best linearunbiased estimate of  ,withthischaracteristicapplying to the s ample a nd the extrane ous datasimultaneously (The il and G ol dberge r, 1961; Thei l,1963). If in addition we assume that the u ’s aremutually independent with a c on stant v ariance2uand that – following B rehm and G ue nthner(1990) – the v ’s are a lso m utually i nd ependent witha constant variance 2v,^ can b e s im pl ifi ed t o t hefollowi ng weighted estim at o r:^ ¼12uX0X þ12VR0R 112uX0y þ12VR0rð 2 ÞThe varia nce matrix of this estimat or is given by538 M. J. Castilloet al .V ð^ Þ¼ E½ð^   Þð^   Þ0¼12uX0X þ12VR0R 1ð 3 ÞCPD–TG combined procedure
The two proc edures (CPD model and TG’s mixedestimato r) are combined by, fir st, estimat ing theCPD model for each data set separat ely, so as to beable to compute an estimat e for 2uand 2v. Then, thetwo data sets are pooled, in order to obtain^ mak inguse of the respect ive weights (i.e. the invers e of thevariances).
  More specif ically, because our sample dataconsists of 101 crops and 15 provinces while the extra -neous data includes 47 crops and 14 provinces , theequatio n with the infor mation to be used for mixedestimation wi ll look as follows :ySijyEij"#¼DS1DS2:: DS47:: DS101D S2::: DS14DS15DE1DE2:: DE470 : 0 D E2::: DE140 þuijvijð 4 Þwhere DSið i ¼ 1 ; 2 ; ...; 101 Þ and D Sjð j ¼ 2 ; 3 ; ...; 15Þare 2308 · 1 vector s of crop an d province dummi escorresp onding to the sample da ta, andDEið i ¼ 1 ; 2 ; ...; 47Þ an d D Ejð j ¼ 2 ; 3 ; ...; 14Þ are472 · 1 vector s of crop an d province dummi es fromthe extra neou s data. Wh en computi ng Equation 2, alldata in the fi rst row o f the dummi es m atrix i n Equation 4will be weighted by 1=2uand t he second row b y 1=2v.
  In addition, we calculate the contribution of eachindependent data set in the mixed coefficient andstandard error estimates. This measure is based onthe precision of the TG estimator, Equation 3. Morespecifically, the shares of the sample ( S) and theextraneous (E) i n f or ma t i on i n t h e p o s t e ri or k n ow l-edge are as follows (Theil, 1963):S ¼1Ktr ^ 2uX0X ð ^ 2uX0X þ ^ 2vR0R Þ 1E ¼1Ktr ^ 2vR0R ð ^ 2uX0X þ ^ 2vR0R Þ 1where K is the number of explanat ory va riables in thesample data and tr refers to the trace of a matrix.
III. Results
For the sampl e da ta, the CPD model sho wed an expla-natory power for price varia tions of 0.54 (the adjustedR2was 0.51), whi ch sugg ests a good fit of the regres -sion mo del. The extran eous data showe d an evenbetter R2of 0.90 with an adjustedR2of 0.88.
  With homo scedastici ty test s not rejected for bothdata sets at 1% signifi cance, the sampl e resid ual var-iance ð 2uÞ is 0.512 and that of the alte rnative data ð 2vÞis 0.057. Because the wei ght assigned to each set ofdata is the invers e of their respect ive varia nce, theextraneo us da ta had a larger weight in the mixedestimation. The mixe d estimat es are repo rted inTable 1. Obser vatio n of the standar d errors indica testhat they are overal l smal ler than the standar d errorsusing the sampl e da ta only. Also , t-st atistics show thatthere was a net increa se of statist ically significa ntcoefficie nts comp ared to the sampl e results.
  Resi duals obtaine d from the mixe d estimat es showa coefficie nt of variation (cv = / ) smal ler than theresiduals obtaine d using the sampl e data only (a dif-ference of 0.11 be tween the two coeff icients of varia -tion computed using abso lute values of the residu als).This indica tes a decreas ed disper sion in the dist ribu-tion of the resi duals when using the mixed estimat or.1Thus, the use of the mixe d estimat ion procedureimproves the precis ion of the predicted prices byallowin g the inclus ion of alternati ve informat ive datasources in the e stimation. This resul ts in pr edictedprices that are closer to the true prices than predict ionsobtaine d from the traditi onal methodol ogies.
  Fin ally, by calcul ating the con tribution of each ofthe data sets in the estimat ion, we observe that thesample data still carry the largest impor tance – itsshare is 0.63, while that of the extra neous infor mationis only 0.37.
  1In order to test for the robustness of the mixed estimator, we incorporated some of the sample variance to the weight of theextraneous data so as to balance the importance of the latter. We introduced a parameter   2 [0,1] such that^ ¼12uX0X þ12Vþ  2uR0Rhi 112uX0y þ12Vþ  2uR0rhi. In this way, we would compensate for the lack of idiosyncratic variationin the aggregated price data, hence reducing its weight in the mixed estimator. Our findings indicate that for values of   greaterthan 0, the coefficients of variation of the residuals are always smaller than those obtained by using only the sample data(ordinary least squares estimation). However, results turn out to be best, in terms of this same criterion, when   = 0, i.e. theregular TG estimator.
  Missing agricultural price data 539
  Table 1. Theil–Goldberger’s mixed estimates (CPD model)DummyMixedestimates SE tCrops Avena -1.5062 0.2960 -5.0879Cebada -0.3196 0.0813 -3.9329Maiz -0.3586 0.0529 -6.7679Morocho -0.4045 0.3617 -1.1184Quinua 0.3430 0.5088 0.6742Trigo -0.2374 0.0872 -2.7234Other cereals -0.2735 0.4160 -0.6574Acelga -0.9616 0.2207 -4.3573Achogchas 0.4557 0.7178 0.6349Aji -0.4047 0.5093 -0.7946Ajo 1.1718 0.0743 15.7758Alcachofa 1.3678 0.7178 1.9056Apio -2.6987 0.7200 -3.7481Arveja 0.2209 0.0643 3.4342Broccoli 0.5654 0.7178 0.7877Cebollablanca-0.4102 0.0673 -6.0954Cebollapaitena-0.7052 0.0691 -10.1988Chochos 0.7010 0.1390 5.0243Col -1.4321 0.0700 -20.4592Col morada -1.5380 0.5091 -3.0212Colifor -0.5385 0.1855 -2.9023Culantro -0.8379 0.1904 -4.4002Espinaca -0.9793 0.4162 -2.3531Frijol 0.4413 0.0612 7.2046Habas 0.0092 0.0671 0.1366Habichuelas 0.1236 0.5086 0.2430Lechuga -0.8247 0.0839 -9.8252Lenteja 0.7263 0.0746 9.7321Mani 1.0097 0.1523 6.6305Nabo -1.8815 0.2433 -7.7327Pepinillo -0.9691 0.1521 -6.3715Pimiento -0.2480 0.0701 -3.5398Rabano -1.2279 0.2212 -5.5516Soya -0.0606 0.1236 -0.4900Tomate rinon -0.0235 0.0690 -0.3409Vainitas -1.0267 0.2395 -4.2860Zambo -0.6352 0.2430 -2.6136Zapallo -0.0345 0.3232 -0.1069Otrasverdures-0.4625 0.3610 -1.2811Aguacate -0.4436 0.0738 -6.0118Babaco -0.1456 0.2211 -0.6585Badea 0.3308 0.5082 0.6509Banano -1.5669 0.0824 -19.0036Caimito -0.2599 0.7177 -0.3621Capuli -1.3544 0.2435 -5.5621Cherimoya -0.2263 0.5085 -0.4452Ciruelas -1.2046 0.2965 -4.0619Claudia -0.2895 0.1203 -2.4064Coco -1.3516 0.2757 -4.9028Durazno -0.2967 0.1201 -2.4702Frutilla 1.7953 0.7178 2.5011Granadilla -0.9238 0.3241 -2.8499Guaba -1.7891 0.3608 -4.9586Guayaba -2.0238 0.5083 -3.9814Higo -1.2935 0.7178 -1.8020(continued )Table 1. (Continued )DummyMixedestimates SE tLimon -0.2866 0.0684 -4.1893Mandarina -0.7708 0.1010 -7.6316Mango -0.5862 0.1077 -5.4445Manzana -0.0968 0.1104 -0.8768Maracuya -0.4769 0.0695 -6.8640Melon -0.5720 0.1048 -5.4555Mora 0.8240 0.0845 9.7487Naranja -0.9899 0.0678 -14.6004Naranjilla -0.2902 0.0711 -4.0806Papaya -0.6175 0.0790 -7.8132Pepino -0.7997 0.3296 -2.4264Pera -0.2687 0.1402 -1.9164Pina -0.7989 0.0774 -10.3145Platano -1.4448 0.0614 -23.5251Sandia -0.8421 0.1073 -7.8465Taxo -1.0532 0.5084 -2.0717Tomate dearbol-0.1341 0.0710 -1.8879Tuna 0.1682 0.5083 0.3309Zapote -1.5066 0.1270 -11.8593Camote -0.6917 0.3233 -2.1397Meyoco -1.0135 0.1595 -6.3549Oca -1.2934 0.7177 -1.8021Papanabo -0.8437 0.5086 -1.6590Papa -0.9043 0.0586 -15.4414Remolacha -0.8562 0.0717 -11.9441Yucca -0.6682 0.0686 -9.7407Zanahoriaamarilla-1.1864 0.0683 -17.3808Zanahoriablanca-1.9054 0.5085 -3.7471Otrostuberculos-1.1022 0.7176 -1.5359Achiote 1.1026 0.5084 2.1686Alfalfa -2.1985 0.1707 -12.8811Algodon 0.2213 0.2961 0.7475Cabuya 1.0457 0.7173 1.4578Cacao 0.8411 0.0676 12.4240Cafe engrano0.0913 0.0830 1.1004Cana deazucan-0.4468 0.3233 -1.3821Flores -0.0941 0.2126 -0.4425Hierba ypotreros-2.6619 0.2572 -10.3483Hierbasaromaticas-1.0735 0.2038 -5.2660Higuerilla -0.8178 0.7175 -1.1397Linaza 0.2143 0.7176 0.2987Oregano -1.1059 0.7178 -1.5406Paja toquilla -1.1055 0.7175 -1.5407Pimiento 0.1682 0.7178 0.2343Tabaco 1.0182 0.7177 1.4187Provinces Bolivar -0.3068 0.0510 -6.0061Can˜ar -0.2018 0.0505 -3.9911Carchi -0.1945 0.0559 -3.4807Chimborazo -0.4628 0.0459 -10.0719Cotopaxi -0.1125 0.0457 -2.4647(continued )540 M. J. Castilloet al .
IV. Conclusions
This arti cle has shown the ad vantage of using a mixedCPD–TG pro cedure for the prediction of produ cerprices when de aling with incompl ete price sampledata. By allowi ng to syste matical ly inco rporate alter-native infor mative data sources into the regressionanalysis, the mixe d procedu re pro vides be tter priceestimates than by imput ing marginal means or us ingthe CPD model with the sampl e data a lone. The abilityto include extra neou s data in esti mation, in this caseofficial price data reporte d by the Ecuado rianMinistry of Agri culture, allows the resear cher tomake be tter use of incompl ete data, raise s her confi-dence on the resulting price predictions an d contri -butes to the co mputation of richer and unbiasedestimates of produ cer reven ues a nd profit functio ns.
ReferencesBrehm, P. and Guenthner, D. (1990) The econometricmethod of mixed estimation: an application to the cred-ibility of trend, Casualty Actuarial Society DiscussionPaper Program, 1, 172–216.Diewert, E. (2005) Weighted country product dummy vari-able regressions and index number formulae, Review ofIncome and Wealth, 51, 561–70.Durbin, J. (1953) A note on regression when there is extra-neous information about one of the coefficients,Journal of the American Statistical Association, 48,799–808.Mittelhammer, R. and Conway, K. (1988) Applying mixedestimation in econometric research, American Journalof Agricultural Economics , 70, 859–66.Rao, P. (2004 ) The c ountry-produc t-dumm y m ethod: astocha stic approach to the c om putation o f pur-ch asing powe r paritie s in the I CP, P ap er pre -sented at the SSHRC Con ference o n IndexNumbers and Productivity Measurement,Vancouver,Canada.Rao, P. (2005) On the equivalence of weighted country-product-dummy method and the Rao-system for multi-lateral price comparisons, Review of Income andWealth, 51, 571–80.Summers, R. (1973) International price comparisons basedupon incomplete data, Review of Income and Wealth ,19, 1–16.Theil, H. (1963) On the use of incomplete prior informationin regression analysis, Journal of the AmericanStatistical Association, 58, 401–14.Theil, H., Chung, C. F. and Seale, J. (1989) InternationalEvidence on Consumption Patterns , JAI Press,Greenwich, Connecticut.Theil, H. and Goldberger, A. S. (1961) On pure and mixedstatistical estimation in economics, InternationalEconomic Review, 2, 65–78.Table 1. (Continued )DummyMixedestimates SE tEl Oro -0.0235 0.0555 -0.4237Esmeraldas -0.2895 0.0489 -5.9108Guayas -0.3966 0.0468 -8.4778Imbabura 0.0353 0.0645 0.5477Loja -0.1544 0.0472 -3.2730Los Rios -0.3023 0.0469 -6.4499Manab━´ -0.1367 0.0445 -3.0717Pichincha -0.3222 0.0481 -6.6943Tungurahua -0.5305 0.0732 -7.2490Constant -1.8222 0.0548 -33.2318Missing agricultural price data 541

[返回]
上一篇:Is Price a Barrier to Eating More Fruits and Vegetables for Low-Income Families?
下一篇:Research of Financial Early-Warning Model onEvolutionary Support Vector Machines Based o nGenetic Algorithms