<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-5908-4-37</ui>
   <ji>1748-5908</ji>
   <fm>
      <dochead>Systematic Review</dochead>
      <bibl>
         <title>
            <p>Are there valid proxy measures of clinical behaviour? a systematic review</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Hrisos</snm>
               <fnm>Susan</fnm>
               <insr iid="I1"/>
               <email>susan.hrisos@ncl.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Eccles</snm>
               <mi>P</mi>
               <fnm>Martin</fnm>
               <insr iid="I1"/>
               <email>martin.eccles@ncl.ac.uk</email>
            </au>
            <au id="A3">
               <snm>Francis</snm>
               <mi>J</mi>
               <fnm>Jill</fnm>
               <insr iid="I2"/>
               <email>j.francis@abdn.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Dickinson</snm>
               <mi>O</mi>
               <fnm>Heather</fnm>
               <insr iid="I1"/>
               <email>heather.dickinson@ncl.ac.uk</email>
            </au>
            <au id="A5">
               <snm>Kaner</snm>
               <mi>FS</mi>
               <fnm>Eileen</fnm>
               <insr iid="I1"/>
               <email>e.f.s.kaner@ncl.ac.uk</email>
            </au>
            <au id="A6">
               <snm>Beyer</snm>
               <fnm>Fiona</fnm>
               <insr iid="I1"/>
               <email>fiona.beyer@ncl.ac.uk</email>
            </au>
            <au id="A7">
               <snm>Johnston</snm>
               <fnm>Marie</fnm>
               <insr iid="I3"/>
               <email>m.johnston@abdn.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Institute of Health and Society, Newcastle University, 21 Claremont Place, Newcastle upon Tyne, NE2 4AA, UK</p>
            </ins>
            <ins id="I2">
               <p>Health Services Research Unit, University of Aberdeen, Health Sciences Building, Foresterhill, Aberdeen AB25 2ZD, UK</p>
            </ins>
            <ins id="I3">
               <p>Department of Psychology, University of Aberdeen, Health Sciences Building, Foresterhill, Aberdeen AB25 2ZD, UK</p>
            </ins>
         </insg>
         <source>Implementation Science</source>
         <issn>1748-5908</issn>
         <pubdate>2009</pubdate>
         <volume>4</volume>
         <issue>1</issue>
         <fpage>37</fpage>
         <url>http://www.implementationscience.com/content/4/1/37</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19575790</pubid>
               <pubid idtype="doi">10.1186/1748-5908-4-37</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>14</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>03</day>
               <month>7</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>03</day>
               <month>7</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Hrisos et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Accurate measures of health professionals' clinical practice are critically important to guide health policy decisions, as well as for professional self-evaluation and for research-based investigation of clinical practice and process of care. It is often not feasible or ethical to measure behaviour through direct observation, and rigorous behavioural measures are difficult and costly to use. The aim of this review was to identify the current evidence relating to the relationships between proxy measures and direct measures of clinical behaviour. In particular, the accuracy of medical record review, clinician self-reported and patient-reported behaviour was assessed relative to directly observed behaviour.</p>
            </sec>
            <sec>
               <st>
                  <p>Methods</p>
               </st>
               <p>We searched: PsycINFO; MEDLINE; EMBASE; CINAHL; Cochrane Central Register of Controlled Trials; science/social science citation index; Current contents (social &amp; behavioural med/clinical med); ISI conference proceedings; and Index to Theses. Inclusion criteria: empirical, quantitative studies; and examining clinical behaviours. An independent, direct measure of behaviour (by standardised patient, other trained observer or by video/audio recording) was considered the 'gold standard' for comparison. Proxy measures of behaviour included: retrospective self-report; patient-report; or chart-review. All titles, abstracts, and full text articles retrieved by electronic searching were screened for inclusion and abstracted independently by two reviewers. Disagreements were resolved by discussion with a third reviewer where necessary.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Fifteen reports originating from 11 studies met the inclusion criteria. The method of direct measurement was by standardised patient in six reports, trained observer in three reports, and audio/video recording in six reports. Multiple proxy measures of behaviour were compared in five of 15 reports. Only four of 15 reports used appropriate statistical methods to compare measures. Some direct measures failed to meet our validity criteria. The accuracy of patient report and chart review as proxy measures varied considerably across a wide range of clinical actions. The evidence for clinician self-report was inconclusive.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Valid measures of clinical behaviour are of fundamental importance to accurately identify gaps in care delivery, improve quality of care, and ultimately to improve patient care. However, the evidence base for three commonly used proxy measures of clinicians' behaviour is very limited. Further research is needed to better establish the methods of development, application, and analysis for a range of both direct and proxy measures of behaviour.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The measurement, reporting and improvement of the quality of health care provision are central to many current health care initiatives that aim to increase the delivery of optimal, evidence-based care to patients (<it>e.g.</it>, quality and outcomes framework (QOF) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, new GMS contract <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>). In the UK, the new GMS contract <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> introduced in 2004 represents a growing trend towards pay-for-performance incentives in primary care, delivered through the QOF. Accurate measures of health professionals' clinical practice are therefore critically important not only to policy makers in guiding health policy decisions but also to practitioners in the evaluation of their own practice and to researchers both in identifying deficits and evaluating changes in the process of care.</p>
         <p>Clinical practice can be measured directly &#8211; by actual observation of clinicians while practicing, or indirectly &#8211; by the use of a proxy measure, such as a review of medical records or interviewing the clinician. Direct measures include observation by a trained observer, video- or audio-recording of consultations, and the use of 'standardised' or 'simulated' patients. These are generally considered to provide an accurate reflection of the behaviour under observation, and as such represent a 'gold standard' measure of performance. However, direct measures are intrusive, can promote (unrepresentative) socially-desirable behaviour in the individuals being observed, and are time-consuming and costly to use, placing significant limitations on their use in any context other than small studies. Thus, they are not always a feasible option.</p>
         <p>Measurement of clinical behaviour has therefore commonly relied on less costly and more readily available indirect sources of performance data, including review of medical records (chart review), clinician self-report, and patient report. Having effective and less costly proxy measures of behaviour could expand both the policy and research agendas to include important clinical behaviours that might otherwise go unexamined because of measurement difficulties. However, despite their widespread use, the extent to which these proxy measures of clinical behaviour accurately reflect a clinician's actual behaviour is unclear.</p>
         <p>The aim of this review was to identify the current evidence relating to the relationships between direct measures and proxy measures of clinical behaviour. In order to establish whether any indirect measures can be used as proxies for actual clinical behaviour, the accuracy of medical record review, clinician self-reported and patient-reported behaviour were assessed relative to a direct measure of behaviour.</p>
      </sec>
      <sec>
         <st>
            <p>Objective</p>
         </st>
         <p>The objective of the review was to assess whether there is a relationship between measures of actual clinical behaviour and proxy measures of the same behaviour, and how this relationship can best be described both on average and for individual clinicians.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Inclusion and exclusion criteria</p>
            </st>
            <p>We included any study that examined clinical behaviour (behaviour enacted by a clinician &#8211; doctor, nurses and allied health professionals &#8211; with respect to a patient or their care) within a clinical context. Studies were included if they reported a quantitative evaluation of the relationship between a direct measure representing actual behaviour and an indirect, proxy measure of the same behaviour. We excluded studies of undergraduate students. A direct measure of behaviour was defined as one based on direct observation of a clinician's actual behaviour in a clinical context by either a trained observer or a simulated patient, or of a video- or audio-recording of it. A proxy measure of behaviour was defined as one based on clinician self-report of recent or usual behaviour in a specified clinical situation, or patient-report of clinicians' behaviour or medical record review.</p>
         </sec>
         <sec>
            <st>
               <p>Search strategy for identification of studies</p>
            </st>
            <p>The following databases were searched: PsycINFO (1840 to Aug 2004), MEDLINE (1966 to Aug wk 3 2004), EMBASE (1980 to Aug wk 34), CINAHL (1982 to Aug wk 3 2004), Cochrane central register of controlled trials (2004 issue 2), science/social science citation index (1970 to Aug 2004), current contents (social and behavioural med/clinical med) (1998 to Aug 2004), ISI conference proceedings (1990 to Aug 2004), and Index to Theses (1716 to Aug 2004). The search terms for behaviour, health professionals, and scenarios are shown in Table <tblr tid="T1">1</tblr>. The search strategy was devised to also identify studies for a related review that examined the relationship between intention and clinical behaviour, and hence contained the additional search term 'intention' <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. The search domains were combined as follows: (Intention) AND (Behaviour) AND (health professionals), (Intention-behaviour) AND (health professionals), (behaviour) AND (outcomes) AND (health professionals). The reference lists of all included papers were checked manually.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Keyword combinations for three domains, combined for the database search</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="center">
                        <p>
                           <b>Behaviour</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Health professionals</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>Intention</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Thesaurus headings:</p>
                        <p>&#8226; BEHAVIOR</p>
                        <p>&#8226; CHOICE BEHAVIOR</p>
                        <p>&#8226; PLANNED BEHAVIOR</p>
                        <p>&#8226; Behaviour?*</p>
                        <p>&#8226; Clinician performance*</p>
                        <p>&#8226; (Actor or abstainer) near behaviur*</p>
                     </c>
                     <c ca="left">
                        <p>(Intention or intend*) near behaviour?*</p>
                        <p>Thesaurus headings:</p>
                        <p>&#8226; HEALTH PERSONNEL</p>
                        <p>&#8226; ATTITUDE OF HEALTH PERSONNEL</p>
                        <p>&#8226; CLINICIANS</p>
                        <p>Clinician*</p>
                        <p>Counsellor*</p>
                        <p>Dentist*</p>
                        <p>Doctor*</p>
                        <p>Family practition*</p>
                        <p>General practition*</p>
                        <p>GP*/FP*</p>
                        <p>Gynaecologist*</p>
                        <p>Haematologist*</p>
                        <p>Health professional*</p>
                        <p>Internist*</p>
                        <p>Neurologist*</p>
                        <p>Nurse*</p>
                        <p>Obstetrician*</p>
                        <p>Occupational therapist*</p>
                        <p>Optometrist*</p>
                        <p>OT*</p>
                        <p>Paediatrician*</p>
                        <p>Paramedic*</p>
                        <p>Pharmacist*</p>
                        <p>Physician*</p>
                        <p>Physiotherapist*</p>
                        <p>Primary care</p>
                        <p>Psychiatrist*</p>
                        <p>Psychologist*</p>
                        <p>Radiologist*</p>
                        <p>Social worker*</p>
                        <p>Surgeon*/surgery</p>
                        <p>Therapist*</p>
                     </c>
                     <c ca="left">
                        <p>Thesaurus heading:</p>
                        <p>INTENTION</p>
                        <p>&#8226; Intend* or intention*</p>
                        <p>&#8226; Inclin* or disinclin*</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Example thesaurus headings are given for the PsycINFO database and were adjusted and exploded as appropriate for other databases.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Review methods</p>
            </st>
            <p>All titles and abstracts retrieved by electronic searching were downloaded to a reference management database; duplicates were removed, the remaining references were screened independently by two reviewers, and those studies which did not meet the inclusion criteria were excluded. Where it was not possible to exclude articles based on title and abstract, full text versions were obtained and their eligibility was assessed by two reviewers. Full text versions of all potentially relevant articles identified from the reference lists of included articles were obtained. The eligibility of each full text article was assessed independently by two reviewers. Disagreements were resolved by discussion or were adjudicated by a third reviewer.</p>
         </sec>
         <sec>
            <st>
               <p>Quality assessment</p>
            </st>
            <sec>
               <st>
                  <p>External validity</p>
               </st>
               <p>External validity relates to the generalisability of study findings. We assessed this for included studies on the basis of:</p>
               <p>1. whether the target population of clinicians was local, regional, or national.</p>
               <p>2. whether the target population of clinicians was sampled or whether the entire population was approached &#8211; and if the population was sampled, whether it was a valid random (or systematic) sample &#8211; in order to assess the potential for selection bias.</p>
               <p>3. the number of clinicians recruited and the total number of consultations assessed.</p>
               <p>4. the percentage of participants enrolled for whom the relationship between direct and proxy measures of behaviour was analysed (attrition bias).</p>
            </sec>
            <sec>
               <st>
                  <p>Internal validity</p>
               </st>
               <p>Internal validity relates to the rigor with which a study was conducted, and how confident we can be about any inferences that are subsequently made <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Important aspects of internal validity that are particularly relevant to the included studies are the reliability and validity of the measurement methods used to assess the performance of clinical behaviours. We therefore assessed internal validity on the basis of the psychometric evaluations performed by each study:</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Reliability</p>
            </st>
            <p>1. Measurement of inter-rater and intra-rater reliability for checklist scoring by trained observers and simulated patients.</p>
            <p>2. Test re-test reliability of either direct or indirect measures.</p>
         </sec>
         <sec>
            <st>
               <p>Validity of the scoring checklist</p>
            </st>
            <p>Content and face validity of the scoring checklist: <it>e.g.</it>, the rationale and process for the choice of items included and for any weights assigned to them;</p>
         </sec>
         <sec>
            <st>
               <p>Validity of the direct measure method</p>
            </st>
            <p>General: The ability of the direct measure to accurately detect the aspects of behaviour under scrutiny (<it>e.g.</it>, the range of clinical actions on the scoring checklist).</p>
         </sec>
         <sec>
            <st>
               <p>Simulated patients</p>
            </st>
            <p>1. Content validity of simulated cases: the level of correspondence between components of simulated cases and actual clinical presentations of the condition in question.</p>
            <p>2. Face validity: judgments made by individuals other than the research team that the simulated case 'looks like' a valid case representation of the clinical condition in question.</p>
            <p>3. Training of simulated patients in the case protocol.</p>
            <p>4. Assessment of cueing and reporting of detection of simulation.</p>
         </sec>
         <sec>
            <st>
               <p>Validity of the Proxy methods</p>
            </st>
            <sec>
               <st>
                  <p>Patient vignettes</p>
               </st>
               <p>Content validity: Correspondence between the operationalisation of the simulated case in the standardized patient protocols and written vignettes.</p>
            </sec>
            <sec>
               <st>
                  <p>Patient report and Clinician self-report</p>
               </st>
               <p>Content validity: Correspondence between the content and wording of items on the scoring checklist and the items on the questionnaire or interview schedule.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Appropriateness of the statistical methods used</p>
            </st>
            <p>The studies included in the current review used a range of statistical methods to summarise and compare direct and proxy measures of behaviour. To help us synthesise the data from included studies we conducted a companion review to assess the appropriateness of the different statistical methods they used (Dickinson HO et al. Are there valid proxy measures of clinical behaviour? Statistical considerations, submitted). Our conclusions are summarized below.</p>
            <p>The included studies were based on recording whether a clinician performed one or more clinical actions that we refer to as 'items'. Some studies compared direct and proxy measures 'item-by-item'; other studies combined items into summary scores and then compared direct and proxy summary scores.</p>
            <p>Statistical methods used by studies that compared direct and proxy measures item-by-item included: sensitivity and specificity; total agreement; total disagreement; and kappa coefficients. For these studies, we concluded that sensitivity and specificity were generally the best statistics to assess the performance of a proxy measure, provided these statistics were not based on a combination of items describing different clinical actions.</p>
            <p>Statistical methods used by studies that compared summary scores included: comparisons of means; analysis of variance (ANOVA); t-tests; and Pearson correlation. For these studies, we concluded that summary measures should capture a single underlying aspect of behaviour and measure that construct using a valid measurement scale. The average relationship between the direct and proxy measures should be evaluated over the entire range of the direct measure, and the variability about this average relationship should also be reported. Hence, comparisons of mean scores are inappropriate. ANOVA and t-tests are likewise inappropriate because they are essentially methods of testing whether the mean score is the same in both groups. Correlation is inappropriate because it cannot assess whether there is systematic bias in the proxy measure (<it>i.e.</it>, whether the proxy measure consistently under- or overestimates performance by a certain amount). Furthermore, the strength of the estimated correlation depends on the range of scores of the proxy and direct measures.</p>
         </sec>
         <sec>
            <st>
               <p>Data extraction</p>
            </st>
            <p>For each study, we extracted the: age and professional role of participants; behaviour assessed; quantitative data measuring the relationship between the direct and proxy measures of behaviour; method of measuring behaviour and psychometric properties of measure; and quality criteria specified above.</p>
         </sec>
         <sec>
            <st>
               <p>Evidence synthesis</p>
            </st>
            <p>For studies that reported single binary (yes/no) items, we extracted, if possible, the number of consultations for which: both the direct and proxy measures recorded the item as performed (true positives); both the direct and the proxy measures recorded the item as not performed (true negatives); the direct measure recorded the item as performed but the proxy measure did not (false negatives); and the direct measure recorded the item as not performed but the proxy measure recorded it as performed (false positives).</p>
            <p>We estimated the mean and 95% confidence intervals (CI) for the sensitivity, specificity, and positive predictive value of the item and present these on forest plots. If studies did not report the above numbers but reported the sensitivity and/or specificity, these statistics were extracted. For all studies for which their mean values were available, the sensitivity was plotted against the false positive rate (1-specificity) because studies which fall in the top left of this plot are generally regarded as having better diagnostic accuracy (high sensitivity and high specificity); however, a summary ROC curve was not fitted to plots due to the heterogeneity between studies in behaviour measured and methods of measurement. Where possible, we also calculated the positive and negative predictive values for individual items.</p>
            <p>For studies that reported aggregated scores summarising several items, we extracted any statistics presented that summarised the mean and variance of the direct measure and/or proxy summary scores and the relationship between the direct measure and proxy.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Description of included studies</p>
            </st>
            <p>The search strategy identified 5,260 references (Figure <figr fid="F1">1</figr>). The titles and abstracts of these references were screened independently by two reviewers. Ten papers were retrieved for full text review and their reference lists screened for other potential papers. A further 102 papers were identified from the reference lists of retrieved papers, their abstracts were again reviewed independently by two reviewers, and 41 of these were retrieved for full text review. Fifteen papers, based on comparisons from eleven separate source studies, fulfilled the inclusion criteria and their data were abstracted <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. As papers reporting different findings from the same study <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B18">18</abbr></abbrgrp> present different data and, with the exception of two <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B18">18</abbr></abbrgrp>, used different methods of analysis, we have considered them as 15 separate reports for the purpose of this review.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Identification of included references (QUORUM diagram)</p>
               </caption>
               <text>
                  <p><b>Identification of included references (QUORUM diagram)</b>.</p>
               </text>
               <graphic file="1748-5908-4-37-1"/>
            </fig>
            <p>For the 15 reports, 771 clinicians were enrolled and proxy measures of the clinical behaviour of 717 (93%) clinicians were evaluated relative to a direct measure. A summary of the characteristics of the 15 included reports is presented in Table <tblr tid="T2">2</tblr>, with further detail presented in Additional File <supplr sid="S1">1</supplr>. Ten reports originated in the United States, two in the Netherlands and one each in the United Kingdom, Australia, and Canada. The aim of 12 of 15 reports was to validate or to assess the 'accuracy' of an indirect measure of clinician behaviour relative to a specific direct measure. The aim of the remaining three reports was to assess the relative validity of different measures (both indirect and direct) to each other.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Characteristics of included studies</b>. Detailed description of the characteristics of all studies included in the review.</p>
               </text>
               <file name="1748-5908-4-37-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Summary of included study characteristics and clinical behaviours measured</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Study</b>
                        </p>
                     </c>
                     <c cspan="7" ca="center">
                        <p>
                           <b>Characteristics</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Behaviour measured</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>1. Type of participants</b>
                        </p>
                        <p>
                           <b>2. Target population</b>
                        </p>
                        <p>
                           <b>3. Sampling strategy</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Participants approached &amp; analysed</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Consultations/sessions/indications observed/vignettes completed &amp; analysed</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>1. Clinical area/s</b>
                        </p>
                        <p>
                           <b>2. Behaviour/s observed</b>
                        </p>
                        <p>
                           <b>(No. of clinical actions scored)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>No. of checklist</b>
                        </p>
                        <p>
                           <b>items</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Summarised</b>
                        </p>
                        <p>
                           <b>(weighted)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>n</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>%</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Stange </b>
                           <abbrgrp>
                              <abbr bid="B5">5</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1998</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Family practice physicians</p>
                        <p>2. Members of the Ohio Academy of FPs, practice within 50 miles radius of Cleveland &amp; Youngstown</p>
                        <p>3. Convenience sample</p>
                     </c>
                     <c ca="center">
                        <p>138</p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>4454</p>
                     </c>
                     <c ca="center">
                        <p>4432</p>
                        <p>(MR)</p>
                        <p>3283</p>
                        <p>(PR)</p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                        <p>(MR)</p>
                        <p>74</p>
                        <p>(PR)</p>
                     </c>
                     <c ca="left">
                        <p>1. Delivery of a range of outpatient medical services</p>
                        <p>2. Counselling (29), physical examination (16), screening (5), Lab tests (10), immunisation (7), Referral (4)</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Flocke </b>
                           <abbrgrp>
                              <abbr bid="B6">6</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2004</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Family physicians</p>
                        <p>2. Primary care physicians in North West Ohio</p>
                        <p>3. All physicians approached</p>
                     </c>
                     <c ca="center">
                        <p>138</p>
                     </c>
                     <c ca="center">
                        <p>128</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="center">
                        <p>4454</p>
                     </c>
                     <c ca="center">
                        <p>2,670</p>
                     </c>
                     <c ca="center">
                        <p>60</p>
                     </c>
                     <c ca="left">
                        <p>1. Health promotion</p>
                        <p>2. Smoking (2), alcohol, exercise, diet, substance use, sun exposure, seatbelt use, HIV &amp; STD prevention</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Wilson </b>
                           <abbrgrp>
                              <abbr bid="B7">7</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1994</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. General practitioners (GPs)</p>
                        <p>2. 10 general practices in Nottinghamshire</p>
                        <p>3. Selection of GPs not reported. Minimum of two non-random consultations were recorded</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>3324</p>
                     </c>
                     <c ca="center">
                        <p>516 (MR)</p>
                        <p>335 (PR)</p>
                     </c>
                     <c ca="center">
                        <p>16 (MR)</p>
                        <p>10 (PR)</p>
                     </c>
                     <c ca="left">
                        <p>1. Health promotion</p>
                        <p>2. Asked patient about 4 health behaviours: smoking (1), alcohol (1), diet &amp; exercise (1); measurement of blood pressure (1)</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ward </b>
                           <abbrgrp>
                              <abbr bid="B8">8</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1996</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Post-graduate trainees</p>
                        <p>2. Training general practices in New South Wales</p>
                        <p>3. Trainees who were having their first experience in supervised general practice</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>34</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>1500</p>
                     </c>
                     <c ca="center">
                        <p>1075</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>1. Smoking cessation</p>
                        <p>2. Establish smoking status &amp; provide smoking cessation counselling (2)</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Zuckerman </b>
                           <abbrgrp>
                              <abbr bid="B9">9</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1975</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Paediatricians</p>
                        <p>2. Physicians working in a university medical centre serving an inner-city population</p>
                        <p>3. All 3 staff physicians</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>51</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Paediatric consultation</p>
                        <p>2. Diagnosis and management (8), historical items (7)</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Luck </b>
                           <abbrgrp>
                              <abbr bid="B10">10</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians</p>
                        <p>2. 2 general internal medicine primary care outpatient clinics</p>
                        <p>3. Random sample of 10 physicians at each site</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of LBP, DM, COPD, CAD.</p>
                        <p>2. History, Physical exam, Tests ordered, Diagnosis &amp; Treatment/management (21 for LBP)</p>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730; (w)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Page </b>
                           <abbrgrp>
                              <abbr bid="B11">11</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1980</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Community pharmacists</p>
                        <p>2. Participants on a continuing education course in British Columbia, Canada</p>
                        <p>3. All participants</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>58</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of: Cold, Pain</p>
                        <p>2. Recommend either: non-prescription medication (cold = 17, pain = 15) or see physician (cold = 17, pain = 18)</p>
                     </c>
                     <c ca="center">
                        <p>103</p>
                     </c>
                     <c ca="center">
                        <p>&#8730; (w)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Gerbert </b>
                           <abbrgrp>
                              <abbr bid="B12">12</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1988</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians</p>
                        <p>2. Primary care physicians serving 6 counties in California</p>
                        <p>3. Convenience sample</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>197</p>
                     </c>
                     <c ca="center">
                        <p>197</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Medication regimens in the management of COPD</p>
                        <p>2. Prescription of theophyllines (1), sympathomimetics (2), oral corticosteroids (1)</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Pbert </b>
                           <abbrgrp>
                              <abbr bid="B13">13</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1999</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians 2. Attending physicians &amp; their patients at University medical centre in Massachusetts.</p>
                        <p>3. Convenience sample</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>154</p>
                     </c>
                     <c ca="center">
                        <p>108</p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                     <c ca="left">
                        <p>1. Smoking cessation</p>
                        <p>2. Cessation counselling (15)</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Gerbert </b>
                           <abbrgrp>
                              <abbr bid="B14">14</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1986</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians</p>
                        <p>2. NR</p>
                        <p>3. Convenience sample</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>214</p>
                     </c>
                     <c ca="center">
                        <p>192</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of COPD</p>
                        <p>2. Symptoms (8), signs (2), Tests (3), Treatments (3), Patient education (4)</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Dresselhaus </b>
                           <abbrgrp>
                              <abbr bid="B15">15</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians</p>
                        <p>2. 2 general internal medicine primary care outpatient clinics</p>
                        <p>3. Random sample of 10 physicians at each site</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of low back pain, diabetes mellitus, COPD, CAD.</p>
                        <p>2. Preventive care: tobacco screening (1), smoking cessation advice (1), prevention measures (1), alcohol screening (1), diet evaluation (1), exercise assessment (1) &amp; exercise advice (1)</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Rethans </b>
                           <abbrgrp>
                              <abbr bid="B16">16</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1987</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. GPs</p>
                        <p>2. GPs working in Maastricht</p>
                        <p>3. All participants</p>
                     </c>
                     <c ca="center">
                        <p>55</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>46</p>
                     </c>
                     <c ca="center">
                        <p>27</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>93</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of Urinary Tract Infection</p>
                        <p>2. History taking (8); Physical Examination (3); Instructions to patients (7); Treatment (2); Follow-up (4)</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Rethans </b>
                           <abbrgrp>
                              <abbr bid="B17">17</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1994</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. GPs</p>
                        <p>2. Sampling strategy reported elsewhere.</p>
                        <p>3. Sampling strategy reported elsewhere</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>140</p>
                     </c>
                     <c ca="center">
                        <p>101</p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of tension headache; acute diarrhoea; pain in the shoulder; check-up for non-insulin dependent diabetes.</p>
                        <p>2. History, Physical exam, Lab exam, Advice, Medication &amp; follow-up (range over 4 conditions: 25&#8211;36)</p>
                     </c>
                     <c ca="center">
                        <p>25&#8211;36</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Peabody </b>
                           <abbrgrp>
                              <abbr bid="B18">18</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Primary care physicians</p>
                        <p>2. 2 general internal medicine primary care outpatient clinics</p>
                        <p>3. Random sample of 10 physicians at each site</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>160</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Management of low back pain (LBP), diabetes mellitus (DM), Chronic obstructive pulmonary disease (COPD) oronary artery disease (CAD).</p>
                        <p>2. History taking (7), Physical examination (3), lab tests (5), Diagnosis(2), Management (6) (Averaged 21 actions per case)</p>
                     </c>
                     <c ca="center">
                        <p>168</p>
                     </c>
                     <c ca="center">
                        <p>&#8730; (w)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>O'Boyle </b>
                           <abbrgrp>
                              <abbr bid="B19">19</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2001</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. Nurses</p>
                        <p>2. ICU staff in 4 metropolitan teaching hospitals in "Mid-West" USA</p>
                        <p>3. ICUs with comparable patient populations</p>
                     </c>
                     <c ca="center">
                        <p>124</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>97</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="left">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="left">
                        <p>1. Adherence to hand hygiene recommendations 2. Hand washing (for a maximum of 10 indications)</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Participants in 12 reports were primary care physicians <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>; in other reports participants were nurses <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, community pharmacists <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, and paediatricians <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Clinical behaviours</p>
            </st>
            <p>Five reports considered a range of clinical behaviours (<it>e.g.</it>, history taking, physical examination, ordering of laboratory tests, referral, diagnosis, treatment, patient education, and follow-up) in relation to the management of a variety of common out-patient conditions: urinary tract infection (UTI) <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>; tension headache, acute diarrhoea, and pain in the shoulder <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>; coronary artery disease (CAD), low back pain, and chronic obstructive pulmonary disease (COPD) <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B14">14</abbr><abbr bid="B18">18</abbr></abbrgrp>; diabetes <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. One report considered the behaviour of recommending non-prescription medication or physician visit for common cold and pain symptoms <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, and one report evaluated medication regimens prescribed for patients with COPD <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Six reports considered health promotion behaviours, <it>e.g.</it>, giving advice about: smoking cessation <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr></abbrgrp>; alcohol use, exercise, and diet <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>; preventive care in relation to CAD, low back pain, and COPD <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>; and sun exposure, substance use, seatbelt use, and sexual health <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. One report considered the provision of a wide range of outpatient services including counselling, screening, and physical examination <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>; and one evaluated physician communication in paediatric consultations <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. One report considered hand hygiene <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
            <p>With the exception of two studies <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B13">13</abbr></abbrgrp>, the clinical behaviours measured were 'necessary' or 'recommended' clinical actions categorized as such according to either national guidelines or expert consensus. Four studies also included actions that were unnecessary or that should not be performed (<it>e.g.</it>, prescribing an antibiotic for a viral infection) <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Methods used for measuring clinical behaviour</p>
            </st>
            <p>In all studies a checklist was used to record the performance of clinical actions relevant to the clinical area studied. All clinical actions were discrete activities, that is, could be coded as 'yes' or 'no' (<it>e.g.</it>, the recording of blood pressure, asking about smoking habits). The number of possible clinical actions observed in each study ranged from one <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> to 168 <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>.</p>
            <p>A summary of the proxy and direct measures used by the 15 included reports is presented in Table <tblr tid="T3">3</tblr>, with further detail presented in Additional File <supplr sid="S2">2</supplr>. The direct measure of clinical behaviour was based on either: post-encounter reports from simulated patients, <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>; prospective reports made by trained observers during direct observation of actual consultations<abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B19">19</abbr></abbrgrp>; or post-encounter reports from trained observers rating audio- or video-recordings of consultations <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Results presented by studies included in the review</b>. Detail of the samples, analyses and outcomes presented by studies included in the review.</p>
               </text>
               <file name="1748-5908-4-37-S2.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Summary of the measures used by included studies, methods of analysis and results of comparisons</p>
               </caption>
               <tblbdy cols="12">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Study</b>
                        </p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <b>Proxy measure</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Direct Measure (DM)</b>
                        </p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>
                           <b>Analysis</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Description</b>
                        </p>
                        <p>
                           <b>1. Method</b>
                        </p>
                        <p>V = Clinical vignette (No. of case simulations)</p>
                        <p>CI/Q = Clinician interview/questionnaire</p>
                        <p>MR = Medical Record review</p>
                        <p>PI/Q = Patient interview/questionnaire</p>
                        <p>
                           <b>2. Timing</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Clinician self report (SR)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Medical Record Review (MR)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Patient report (PR)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Description</b>
                        </p>
                        <p>
                           <b>1. Method</b>
                        </p>
                        <p>SP = Simulated Patients</p>
                        <p>DO = Direct Observation</p>
                        <p>VR = Video recording</p>
                        <p>AR = Audio recording</p>
                        <p>
                           <b>2. Timing</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>SP Training reported</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Psychometrics (IRR)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Compared Item by Item</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Compared Summary Scores</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Agreement between measures:</b>
                        </p>
                        <p>Co-efficient r; kappa (k); Structural equation modelling (SEM); Sensitivity (Sens) &amp; Specificity (Spec)</p>
                        <p>
                           <b>Difference between mean scores:</b>
                        </p>
                        <p>ANOVA; T-test</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>P</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Stange </b>
                           <abbrgrp>
                              <abbr bid="B5">5</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1998</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. MR; PQ</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>DO</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.39 to 1.00 (kappa)</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MR</p>
                        <p>Sens = 8% (diet advice) &#8211; 92% (Lab tests)</p>
                        <p>Spec = 83% (social history) &#8211; 100% (counselling services, physical exam, lab tests)</p>
                        <p>k = 0.12 to 0.92 (79 comparisons)</p>
                        <p>PR</p>
                        <p>Sens = 17% (mammogram) &#8211; 89% (Pap test)</p>
                        <p>Spec = 85% (in-office referral) &#8211; 99% (immunisation, physical exam, lab tests)</p>
                        <p>k = 0.03 to 0.86 (53 comparisons)</p>
                     </c>
                     <c ca="left">
                        <p>NR</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Flocke </b>
                           <abbrgrp>
                              <abbr bid="B6">6</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2004</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. PQ</p>
                        <p>2. At end of consultation (24%) or postal return (76%)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>DO</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Sens* = 11% (substance use) &#8211; 76% (smoking cessation)</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Wilson </b>
                           <abbrgrp>
                              <abbr bid="B7">7</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1994</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. MR; PQ</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>AR</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.79 to 1.00</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>MR</p>
                        <p>Sens = 31%, Spec* = 99%</p>
                        <p>28.6 (Alcohol)</p>
                        <p>Sens = 29%, Spec* = 100%</p>
                        <p>83.3 (BP)</p>
                        <p>Sens = 83%, Spec* = 93%</p>
                        <p>% agreement between DM &amp; MR:</p>
                        <p>45.5 (Smoking)</p>
                        <p>PR</p>
                        <p>Sens = 74%, Spec* = 94%</p>
                        <p>75.0 (Alcohol)</p>
                        <p>Sens = 75%, Spec* = 94%</p>
                        <p>100 (BP)</p>
                        <p>Sens = 100%, Spec* = 90%</p>
                        <p>% agreement between DM &amp; PR:</p>
                        <p>81.8 (Smoking)</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Ward </b>
                           <abbrgrp>
                              <abbr bid="B8">8</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1996</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. PQ</p>
                        <p>2. Questionnaire mailed to patient within 2 days of consultation</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>AR</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.74 to 0.94 (kappa)</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Sens = 93% (smoking status)</p>
                        <p>Spec = 79%</p>
                        <p>Sens = 92% (cessation advice)</p>
                        <p>Spec = 82%</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Zuckerman </b>
                           <abbrgrp>
                              <abbr bid="B9">9</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1975</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. MR</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>AR</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Sens* = 0% (side effects) &#8211; 100% (Diagnosis)</p>
                        <p>Spec* = 9% (Diagnosis) &#8211; 100% (side effects)</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Luck </b>
                           <abbrgrp>
                              <abbr bid="B10">10</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. MR</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (27) each role-playing 1 of 8 case simulations</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>ANOVA (4-way)</p>
                        <p>Necessary care:</p>
                        <p>Sens = 70%, Spec = 81%</p>
                        <p>Unnecessary care:</p>
                        <p>Sens = 65%' Spec = 64%.</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.0001</p>
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Page </b>
                           <abbrgrp>
                              <abbr bid="B11">11</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1980</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. V (4)</p>
                        <p>2. Upto 6 weeks before or 3 weeks after SP visit</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (4) each role-playing 1 case simulation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>r = .56 &amp; .68</p>
                        <p>r = .26 &amp; .37</p>
                        <p>"Must do" actions</p>
                        <p>Sens* = 97%, Spec* = 33%</p>
                        <p>"Must not do" actions</p>
                        <p>Sens* = 30%, Spec* = 98%</p>
                     </c>
                     <c ca="left">
                        <p>>0.05</p>
                        <p>&lt;0.05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Gerbert </b>
                           <abbrgrp>
                              <abbr bid="B12">12</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1988</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. CI; MR; PI</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;R</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>k = 0.67 (SR)</p>
                        <p>k = 0.54 (MR)</p>
                        <p>k = 0.50 (PR)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.001</p>
                        <p>&lt;0.001</p>
                        <p>&lt;0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Pbert </b>
                           <abbrgrp>
                              <abbr bid="B13">13</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1999</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. CI; PI</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>AR.</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>r = 0.77 (SR)</p>
                        <p>r = 0.67 (PR)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.0001</p>
                        <p>&lt;0.0001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Gerbert </b>
                           <abbrgrp>
                              <abbr bid="B14">14</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1986</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. CI; MR; PI</p>
                        <p>2. At end of consultation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;R</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.52 to 0.93 (kappa)</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Median % agreement (All categories):</p>
                        <p>0.84 (SR)</p>
                        <p>0.88 (MR)</p>
                        <p>0.86 (PR)</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Dresselhaus </b>
                           <abbrgrp>
                              <abbr bid="B15">15</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1.V (8); MR</p>
                        <p>2. NR</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (4) each role-playing a simple and complex case presentation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>NA</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>ANOVA (3-way)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.01</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Rethans </b>
                           <abbrgrp>
                              <abbr bid="B16">16</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1987</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. V (1).</p>
                        <p>2. Completed 2 months after SP visit</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (3) each role-playing same case simulation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>0.78 to 1.0 (kappa)</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>T-test:</p>
                        <p>Overall</p>
                        <p>"Obligatory"</p>
                        <p>"Intermediate"</p>
                        <p>"Superfluous"</p>
                     </c>
                     <c ca="left">
                        <p>ns</p>
                        <p>&lt;0.005</p>
                        <p>&lt;0.05</p>
                        <p>&lt;0.05</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Rethans </b>
                           <abbrgrp>
                              <abbr bid="B17">17</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>1994</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. MR</p>
                        <p>2. Charts reviewed two years after SP visit.</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (4) each role-playing 1 of 4 case simulations</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>0.93 (kappa)</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>r = 0.54 (Overall)</p>
                        <p>r = 0.17 (History taking)</p>
                        <p>r = 0.45 (Physical exam)</p>
                        <p>r = 0.75 (Lab exam)</p>
                        <p>r = 0.50 (Advice)</p>
                        <p>r = 0.43 (Medication)</p>
                        <p>r = -0.04 (Follow-up)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.05)</p>
                        <p>ns</p>
                        <p>ns</p>
                        <p>&lt;0.01</p>
                        <p>&lt;0.05</p>
                        <p>ns</p>
                        <p>ns</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Peabody </b>
                           <abbrgrp>
                              <abbr bid="B18">18</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2000</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. V (8); MR</p>
                        <p>2. Completed "several weeks" after SP visit</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>SP (4) each role-playing a simple and complex case presentation</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>NA</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>ANOVA (4-way)</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.001</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>O'Boyle </b>
                           <abbrgrp>
                              <abbr bid="B19">19</abbr>
                           </abbrgrp>
                        </p>
                        <p>
                           <b>2001</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>1. % time practiced hand hygiene</p>
                        <p>2. Up to one month prior to observation period</p>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>DO</p>
                        <p>Nurses observed for 2 hours or until 10 indications for handwashing had occurred</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.94 to 0.98</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>&#8730;</p>
                     </c>
                     <c ca="left">
                        <p>r = 0.21</p>
                        <p>SEM = 0.201</p>
                     </c>
                     <c ca="left">
                        <p>&lt;0.05</p>
                        <p>&lt;0.05</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>* Calculated by authors NA = Not applicable NR = Not reported ns = non-significant</p>
               </tblfn>
            </tbl>
            <p>The proxy measure of clinical behaviour was based on either: clinician self-report of recent behaviour on self-completion questionnaire or by exit interview <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr></abbrgrp>; clinician self-report of simulated behaviour in a specified clinical situation using clinical vignettes <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr></abbrgrp>; medical record review <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>; patient report on self-completion questionnaire or by exit interview <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>; or eight reports evaluated multiple proxy measures <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B19">19</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Methodological quality of included studies</p>
            </st>
            <sec>
               <st>
                  <p>External validity</p>
               </st>
               <p>The target populations in nine reports were regional <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B19">19</abbr></abbrgrp>; all other reports targeted local populations, such as physicians in two general internal primary care outpatients clinics <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>, attending physicians at a university medical centre <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B13">13</abbr></abbrgrp>, and general practitioners in ten general practices <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Six reports approached all participants in their target population <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B11">11</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, three randomly sampled a group of clinicians <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>, and six used convenience sampling <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B8">8</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B19">19</abbr></abbrgrp>. The number of clinicians enrolled and analysed in each report ranged from three <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> to 138 <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> (median 34). Ten reports retained and analysed 100% of recruited clinicians <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>. The median number of consultations observed was 160, with a range from 27 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> to 4,454 <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. For further details see Additional File <supplr sid="S2">2</supplr>.</p>
            </sec>
            <sec>
               <st>
                  <p>Internal validity</p>
               </st>
               <sec>
                  <st>
                     <p>Validity of the checklists used</p>
                  </st>
                  <p>In six reports, the content of the checklist was based on national guidelines for the behaviour in question <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, and for a further six reports content was derived by expert consensus <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Two reports asked simply whether or not a physician asked about a particular lifestyle behaviour (<it>e.g.</it>, smoking), and whether or not they offered counselling <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. One report did not report the rationale for their choice of clinical actions <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Inter-rater reliability for assignment of weights to individual checklist items was presented in one report <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and was 0.73.</p>
                  <p>An important criterion for validity is that a measure should be reliable. Inter-rater reliability of scores generated from checklists using direct measures were reported for eight of the 15 included reports <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B11">11</abbr><abbr bid="B14">14</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B19">19</abbr></abbrgrp>, and ranged from 0.39 <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> to 1.00 <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B16">16</abbr></abbrgrp> (Table <tblr tid="T2">2</tblr>). Five additional reports evaluated the reliability of scoring between raters &#8211; stating these to be 'good' &#8211; but did not present inter-rater reliability statistics <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>. Two reports presented intra-rater reliabilities which were 0.78 to 0.96 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and 0.74 to 1.0 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Two reports did not discuss the reliability of the scoring procedure <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B12">12</abbr></abbrgrp>. One report evaluated the reliability of the proxy measures used <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>.</p>
               </sec>
               <sec>
                  <st>
                     <p>Validity of the direct methods used</p>
                  </st>
                  <p>Only one report presented assessment of the ability of the direct measure to detect the behaviours of interest <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. They found that videorecording captured a median of 48% of the content of the overall consultation observed, but that the level of capture varied from 10% to 100% depending on the clinical action.</p>
                  <p>Of the six reports that used standardised patients as the direct measure, four assessed the content and face validity of the patient scripts using expert review <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>. All reported that training was provided to standardised patients, but two reports did not provide detail about the duration or nature of the training <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. In three studies, standardised patients were experienced actors, who were trained according to a published protocol which was delivered by experienced university-based educators <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>. One report used graduate students who were trained for four hours as standardised patients <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The experience of the trainer was not reported, but standardised patients pilot tested one of their simulated roles with a community pharmacist, and their checklist ratings were compared across four videotaped standardised patient encounters with pharmacists. Three reports reported detection rates of the standardised patient (<it>i.e.</it>, the clinician realised that standardised patients were not genuine patients), and these were low (3%) <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp>.</p>
               </sec>
               <sec>
                  <st>
                     <p>Validity of the proxy methods used</p>
                  </st>
                  <p>With the exception of one report <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, the proxy method was directly related to the study visit; for example, reports using medical record review as the proxy method abstracted medical records pertaining only to the study visit, or patients were asked about a specific consultation. The proxy measure used by O'Boyle <it>et al. </it><abbrgrp><abbr bid="B19">19</abbr></abbrgrp> was collected two weeks to four months before the direct measurement.</p>
                  <p>In four reports that compared performance on the direct measure with a written vignette <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr></abbrgrp>, all but one <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> reported these to be identical case matches. In the latter report, two standardised patient case protocols differed from the corresponding written vignette in the nature of the clinical complication presented by the standardised patient <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The correspondence of standardised patient and vignette case protocols for two reports was not reported <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B17">17</abbr></abbrgrp>.</p>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Appropriateness of statistical methods used to summarise and report the relationship between direct and proxy measures</p>
            </st>
            <sec>
               <st>
                  <p>Studies comparing items</p>
               </st>
               <p>Thirteen reports compared measures of behaviour item-by-item <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. Four of these studies estimated the sensitivity of the proxy measure for each clinical action measured <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, two the specificity <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B8">8</abbr></abbrgrp> and one <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> the false positive rate from which we calculated specificity. It was possible to calculate the sensitivity and specificity for individual clinical actions from the raw data presented in a further report <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Three studies grouped clinical actions into categories: 'necessary' and 'unnecessary' actions <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>; 'must do', 'should do', 'must not do' and 'should not do' actions <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>; and 'essential' and 'intermediate' actions <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Luck <it>et al. </it><abbrgrp><abbr bid="B10">10</abbr></abbrgrp> then estimated the sensitivity and specificity within each category, and it was possible to estimate the sensitivity and specificity for each category specified by Page <it>et al. </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> from the raw data presented. Rethans <it>et al. </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> also calculated the sensitivity of each item (referred to by the authors as 'content scores') but reported only the mean and inter-quartile range of sensitivities within each clinical area. Hence, sensitivities were available for seven studies and specificities for six studies.</p>
               <p>Six reports comparing item-by-item used other statistical methods to compare their data <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. These studies assessed 'agreement' and/or 'disagreement' between measures; five reported agreement as the percentage of recommended behaviours performed as recorded on the direct and proxy measures <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, one also reported disagreement as the proportion of behaviours not recorded by the proxy measure that were detected by the direct measure <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>; and one study estimated the 'total agreement' and 'total disagreement' between measures, reporting median 'convergent validity' for 20 individual items and five clinical categories <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Studies comparing summary scores</p>
               </st>
               <p>Seven reports aggregated items into summary scores of clinicians' behaviour <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Three studies used ANOVA to compare summary scores <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr><abbr bid="B18">18</abbr></abbrgrp>; one study used paired t-tests <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>; and four studies reported Pearson correlation coefficients <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B17">17</abbr><abbr bid="B19">19</abbr></abbrgrp>.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Relationship between direct and proxy measures behaviour</p>
            </st>
            <sec>
               <st>
                  <p>Studies comparing items</p>
               </st>
               <sec>
                  <st>
                     <p>Patient report</p>
                  </st>
                  <p>Three reports comparing item-by-item and reporting sensitivity and specificity <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, and one reporting sensitivity only <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, examined patient report as a proxy measure of clinician performance. Measurement techniques used were either patient questionnaire or patient interview, which were compared with direct observation <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> and audio-recording <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> (Table <tblr tid="T2">2</tblr>).</p>
                  <p>Median sensitivities for clinical actions relating to the provision of general outpatient services <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and for health advice on a range of patient behaviours <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> were 53% (range 25 to 89) and 43% (range 11 to 76), respectively. Sensitivities for: the provision of smoking cessation advice were 74% <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, 93% <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, and 76% <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>; for asking about alcohol use they were 75% <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and 29% <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and 100% for measuring blood pressure <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> (Figure <figr fid="F2">2</figr>). Median specificity for patient report was 98% (range 83% to 99%) <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> across a number of services, 79% <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and 94% <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> for smoking cessation counselling, and 90% for the measurement of blood pressure <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> (Figure <figr fid="F2">2</figr>).</p>
                  <fig id="F2">
                     <title>
                        <p>Figure 2</p>
                     </title>
                     <caption>
                        <p>Sensitivities and specificities for six studies</p>
                     </caption>
                     <text>
                        <p><b>Sensitivities and specificities for six studies</b>.</p>
                     </text>
                     <graphic file="1748-5908-4-37-2"/>
                  </fig>
                  <p>Positive and negative predictive values could be calculated from the raw data of two reports evaluating the provision of smoking and alcohol advice and the measurement of blood pressure <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. The positive predictive values for patient-report were: 0.49 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, 0.42, and 0.55 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> for smoking advice; 0.40 for alcohol advice <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>; and 0.70 for the measurement of blood pressure <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> (Figure <figr fid="F3">3</figr>). The negative predictive values for patient-report of the same behaviours were high for both studies (>0.90) <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. This would suggest that patients accurately reported not receiving advice and not having their blood pressure measured, but they are less accurate in reporting that clinicians did perform these behaviours.</p>
                  <fig id="F3">
                     <title>
                        <p>Figure 3</p>
                     </title>
                     <caption>
                        <p>Positive and Negative Predictive Values for six studies</p>
                     </caption>
                     <text>
                        <p><b>Positive and Negative Predictive Values for six studies</b>.</p>
                     </text>
                     <graphic file="1748-5908-4-37-3"/>
                  </fig>
                  <p>Three further reports compared item-by-item but did not report sensitivity or specificity for their data <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Gerbert <it>et al. </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp> report a median 'total agreement' of 86% between measures for the performance of clinical actions relating to the management of COPD. Gerbert <it>et al. </it><abbrgrp><abbr bid="B12">12</abbr></abbrgrp> present a kappa coefficient of 0.50 for the level of concordance between patient report and their direct measure of video-recording and a 'disagreement' between the measures of 24%. Pbert <it>et al. </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> made comparisons across measures for the detection of individual items using Cochrane's <it>Q </it>tests. These comparisons suggested that patients tended to over-report their clinician's behaviour compared to the direct measure of audio-recording.</p>
               </sec>
               <sec>
                  <st>
                     <p>The accuracy of patient-report</p>
                  </st>
                  <p>ROC curves were plotted for the three studies where both sensitivity and specificity were available <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>(Figure Figure <figr fid="F4">4</figr>). The accuracy of patient report varied according to the clinical action of interest. Performance of the behaviours located in the top-left quadrant of this plot were reported most accurately by patients. These included the provision of counselling for health behaviours such as smoking, alcohol use, seat belt use, and breast self-examination, which were more accurately reported by patients than the provision of counselling for accident prevention, dental health, contraception, and exercise (behaviours located in the bottom-left quadrant). The accuracy of patient report for clinical actions relating to physical examination, laboratory tests, and screening services also varied with the type of examination, test, or service undertaken <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
                  <fig id="F4">
                     <title>
                        <p>Figure 4</p>
                     </title>
                     <caption>
                        <p>ROC plots of sensitivities and specificities for three proxy measures</p>
                     </caption>
                     <text>
                        <p><b>ROC plots of sensitivities and specificities for three proxy measures</b>. Behaviours/actions in the top left-hand quadrant have both high sensitivity and specificity. See Stange 1998 <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> for additional sensitivities and specificities for 78 items.</p>
                     </text>
                     <graphic file="1748-5908-4-37-4"/>
                  </fig>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Medical record review</p>
            </st>
            <p>Four reports comparing item-by-item and reporting sensitivity and specificity compared medical record review with direct observation in one report <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, with audio-recording in two reports <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr></abbrgrp>, and standardised patient accounts in one report <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, (Table <tblr tid="T2">2</tblr>).</p>
            <p>Median sensitivity for a range of clinical actions relating to the provision of general outpatient services was 60% (range 8% to 92) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> and 83% (range 0 to 100%) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> for clinical actions undertaken during routine patient consultations (Figure <figr fid="F2">2</figr>). For smoking cessation advice, alcohol counselling and the measurement of blood pressure sensitivities were 31%, 29%, and 83%, respectively <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and for 'necessary' and 'unnecessary' actions sensitivities were 70% and 65%, respectively <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> (Figure <figr fid="F2">2</figr>). Median specificity for medical record review across a number of services was 90% (range 81% to 100%) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, and 97% (range 9% to 100%) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Specificities for smoking counselling, alcohol counselling, and the measurement of blood pressure were 99%, 100%, and 93%, respectively <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, and 64% and 81% for 'necessary' and 'unnecessary' actions, respectively <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> (Figure <figr fid="F2">2</figr>).</p>
            <p>As the raw data were available for three reports evaluating medical record review <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, it was possible to calculate a range of positive and negative predictive values for this proxy method (Figure <figr fid="F3">3</figr>). The positive predictive ability of medical record review ranged from 0.30 to 0.92 (Median = 0.86) across different clinical actions, and was highest for 'necessary' care items (PPV = 0.85) <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, recording of drug dosage (PPV = 0.88), diagnostic behaviours (PPV = 0.91) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and the measurement of blood pressure (PPV = 0.84) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> (Figure <figr fid="F3">3</figr>). The negative predictive ability of medical record review ranged from 0.39 to 1.00 (Median = 0.73) across different clinical actions, and was lowest (&lt;0.50) for the recording of drug dosages and drug action <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, and highest for advice-giving behaviours and the measurement of blood pressure <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> (Figure <figr fid="F3">3</figr>).</p>
            <p>Four further reports compared item-by-item but used other statistical methods to do this <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr></abbrgrp>. Gerbert <it>et al. </it>(1986) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> report total agreement of 88% between medical record review and video-recording for behaviours relating to the general management of COPD. Gerbert <it>et al. </it>(1988) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> present a kappa coefficient of 0.54 for the level of concordance between medical record review and video-recording, and a total disagreement between these measures of 21%. Rethans <it>et al. </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and Dresselhaus <it>et al. </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp> presented summary percentage scores (65.6%, 54.0%, and 45.8%, respectively) that were consistently lower than scores reported by a standardised patient (76.2%, 68.0%, and 61.7%, respectively). Rethans <it>et al. </it><abbrgrp><abbr bid="B17">17</abbr></abbrgrp> also reported a correlation coefficient of r = 0.54 between summary scores relating to the management of commonly presenting outpatient conditions (Table <tblr tid="T2">2</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>The accuracy of medical record review</p>
            </st>
            <p>ROC curves were plotted for four studies where both sensitivity and specificity were reported <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> or could be calculated from the raw data presented <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp> (Figure <figr fid="F4">4</figr>). The accuracy of medical record review varied according to the type of clinical behaviour or action that was being measured. Review of medical records yielded more accurate estimates of clinician performance for actions relating to physical examination, blood pressure measurements, laboratory tests, and screening services (which were located in the top-left quadrant) than for actions relating to the provision of a wide range of counselling services, including smoking cessation advice, and alcohol counselling.</p>
         </sec>
         <sec>
            <st>
               <p>Clinician self-report</p>
            </st>
            <p>The sensitivity and specificity for clinical behaviours categorised as 'must do' and 'must not do' actions are presented in Figure <figr fid="F2">2</figr> for one report that that used clinical vignettes to elicit clinician self-reported behaviour <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>.</p>
            <p>Sensitivities and specificities ranged from 0.47 to 0.95 and 0.40 to 0.80, respectively, for 'must do' and 'should do' behaviours, and from 0.20 to 0.70 and 0.45 to 0.90, respectively, for 'must not do' and 'should not do' behaviours (Figure <figr fid="F2">2</figr>). Positive (PPV) and negative (NPV) predictive values were also calculated for this study <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. PPVs ranged from 0.17 (cold relief: physician required/should not do) to 0.89 (cold relief: recommend medication/should not do (Median = 0.42) (Figure <figr fid="F3">3</figr>). NPVs ranged from 0.50 (cold relief: physician required/should do) to 1.00 (cold relief: recommend medication/must not do), median = 0.80 (Figure <figr fid="F3">3</figr>).</p>
            <p>Item-by-item comparisons evaluating clinician self-report were made by three further reports that used methods other than sensitivities and specificities <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Gerbert <it>et al. </it>(1986) <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> report 84% total agreement between clinician self-report and a video-recording of the consultation. Gerbert <it>et al. </it>(1988) <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> presented a kappa coefficient of 0.67 for the level of concordance between clinician self-report during interview and video-recording, and a total disagreement between these measures of 13%. Pbert <it>et al. </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> made comparisons across measures for the detection of individual items using Cochrane's <it>Q </it>tests. These comparisons suggest that clinicians tended to over-report their behaviour on some items compared to audio-recording.</p>
         </sec>
         <sec>
            <st>
               <p>The accuracy of clinician self-report</p>
            </st>
            <p>A ROC curve was plotted for the one study where both sensitivity and specificity could be calculated for several, 'must do/not do' and 'should do/not do' clinical actions <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> (Figure <figr fid="F4">4</figr>). Behaviours categorized as 'should not do' tended to group in the top left quadrant of the plot, tentatively suggesting that clinician's accurately report for such behaviours (<it>e.g.</it>, should not recommend medication for cold relief). Accuracy was poorer for behaviours categorized as 'must not do' and 'should do' (which tended to group in the bottom left quadrant of the plot) and behaviours categorized as 'must do' (which tended to fall into the top right quadrant of the plot).</p>
         </sec>
         <sec>
            <st>
               <p>Studies combining items into summary scores</p>
            </st>
            <sec>
               <st>
                  <p>Patient report</p>
               </st>
               <p>One report that evaluated patient report and made item-by-item comparisons also combined items into summary scores <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Pbert <it>et al. </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp> calculated scores that represented the number of smoking advice intervention steps taken by a clinician during a patient consultation. The correlation of these scores between patient report and audio-recording was r = 0.67.</p>
            </sec>
            <sec>
               <st>
                  <p>Medical record review</p>
               </st>
               <p>Three reports evaluating medical record review <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> presented summary percentage scores (65.6%, 54.0%, and 45.8%, respectively) that were consistently lower than scores reported by a standardised patient (76.2%, 68.0%, and 61.7%, respectively). One report <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> reported an overall correlation coefficient of r = 0.54 between summary scores relating to the management of commonly presenting outpatient conditions (Table <tblr tid="T2">2</tblr>).</p>
            </sec>
            <sec>
               <st>
                  <p>Clinician self-report</p>
               </st>
               <p>Six reports evaluating clinician self-report calculated summary scores <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Different reports compared these self-reports to different direct measures.</p>
               <p>One report <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> presented scores for the mean number of clinical actions performed by a group of clinicians as measured by each method in relation to the management of urinary tract infection (mean (SD) self-report = 9.88 (3.44), standardised patient report = 10.04 (3.37)). Rethans <it>et al. </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp> also presented subgroup means that suggest clinicians under-report their performance for 'obligatory' actions and over-report for less essential 'Intermediate' and 'superfluous' actions (Table <tblr tid="T2">2</tblr>). Two reports calculated the proportions for actions correctly performed; one in relation to the management of common outpatient conditions (% (SD) self-report = 71.0 (5.4), standardised patient report = 76.2 (7.2)) <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>, and one in relation to the provision of preventive care advice (% (SD) self-report = 48.3 (14.4), standardised patient report = 61.7 (12.9)) <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Page <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> present an overall total agreement of 66% between self-report and standardized patient report.</p>
               <p>Three reports <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B13">13</abbr><abbr bid="B19">19</abbr></abbrgrp> present correlation coefficients of: 0.26 to 0.68 <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> for the relationship between performance on clinical vignettes and standardized patient reports; 0.21 for a global self-estimate of performance of hand hygiene actions with direct observation <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>; and 0.54 for clinician self-reported provision of smoking cessation counselling compared with audio-taped accounts of the consultation <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <sec>
            <st>
               <p>Validity of the direct measures used</p>
            </st>
            <p>A problem in assessing any proxy measure of clinician performance is the validity of the direct measure itself as a true reflection of actual behaviour. Simulated patients (standardised patients) have been widely used in medical education, and there is an extensive literature to support their validity as a 'gold standard' method for measuring clinical behaviour <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B14">14</abbr><abbr bid="B18">18</abbr></abbrgrp>. Standardised patients require careful and detailed training in the clinical case they are to represent <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and for those studies reviewed here that provide information about the training of standardised patients, this appears to have been adequate <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Three included studies assessed detection rates by clinicians, and reported these to be low. The six studies <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp> that used simulated patients specify very precisely the characteristics of the cases presented to the clinicians. The other studies observed the clinicians' behaviour with actual patients and therefore had less control over the clinical situation in which behaviour was assessed, but are likely to be more generalisable to real-life clinical situations.</p>
            <p>Direct observation using trained observers, audio- or video-recording are also methods that are commonly used as direct measures of clinical behaviour. However, one study <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> using video-recording of consultations found that relevant clinical detail &#8211; for example, assessment of symptoms and signs &#8211; was more frequently reported as having been done when measured by clinician self-report. Taken at face value, this may suggest over-reporting on behalf of clinicians. However, it is feasible that some aspects of the clinical assessment of symptoms and signs are performed non-verbally. In another study, the measurement of blood pressure was accurately recorded in the patient medical record but was not detected by the direct measure used (audio-recording) <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. It is also plausible that, while we can expect that standardized patients may observe a clinician making an entry in a medical record, they could not accurately comment on the content of the entry. A further example of the limits of capture for direct measures can be seen in one of four reports that compared the direct measure of audio-recording with the proxy of medical record review <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. This report found that while some clinical actions investigated (for example, the discussion of a diagnosis or drug name during a consultation with a patient) were not detected during evaluation of the audiotape session a diagnosis and the name and dosage of drugs prescribed had been recorded in medical records by the physician. As an aim of this report was to evaluate clinician communication with patients, the direct measure was valid as it gave an accurate account of what the physician did, and did not, communicate to the patient. However, audio-recording would lack validity as a direct measure for the making or documenting of a diagnosis and some related management decisions.</p>
            <p>This suggests that there are very few gold standard, direct methods for assessing clinical performance &#8211; possibly only standardised patient methodology and participant observation &#8211; that can validly cover an extensive range of clinical actions, and that none can truly capture all aspects of behaviour. A direct measure can only be a valid gold standard for any given behaviour of interest, if it can reliably capture that behaviour.</p>
         </sec>
         <sec>
            <st>
               <p>Validity of the proxy measures used</p>
            </st>
            <p>The accuracy of three proxy measures was reviewed: patient report, medical record review, and clinician self-report. These indirect measures were used by the included reports to estimate the performance of a wide range of clinical actions. The accuracy of each proxy measure varied across the clinical behaviours measured. Reports evaluating clinician self-report and patient-report also used different techniques to capture the measure of behaviour (<it>e.g.</it>, interview, self-completion questionnaire, patient vignettes).</p>
         </sec>
         <sec>
            <st>
               <p>Patient report</p>
            </st>
            <p>Patient-report measures demonstrated greater accuracy than the other two proxy measures for reporting clinician performance, particularly with respect to counselling behaviours and routine procedures. A cautionary adjunct to this, however, is the finding of one study that the predictive validity of patient-reported information deteriorates markedly as the time between patient exposure to clinician behaviour and the timing of their recall of events increases <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Also, patient recall was found by another study to be significantly influenced by the duration of the advice and factors relating to relevancy, <it>i.e.</it>, advice provided during well-care consultations and the presence of a health behaviour-relevant diagnosis during an illness visit <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Medical record review</p>
            </st>
            <p>Medical record review appeared to underestimate many aspects of clinician behaviour, particularly in the domain of patient counselling. Thus, our findings suggest that medical record review, in the outpatient setting, lacks validity as a general measure of clinician behaviour. However, there was evidence to suggest that the predictive ability of medical record review improves substantially for, but is restricted to, specific types of clinical action, for example, physical examination, the recording of drug dosages, and the ordering of laboratory tests. Medical records may therefore be a relatively low-cost and accessible proxy measure for these clinical behaviours. Medical records may also be advantageous in that they can be good 'history keepers' because they can store information from several consultations and a variety of conditions.</p>
         </sec>
         <sec>
            <st>
               <p>Clinician self-report</p>
            </st>
            <p>The accuracy of clinician self-report as a measure of actual behaviour is harder to establish because different studies using different methods produced different outcomes. Also, none of the studies evaluating clinician report used appropriate statistical methods to summarise and/or report the relationship between the measures used.</p>
            <p>Four reports that calculated summary scores of performance on vignettes appear to suggest that clinician's self-reported estimates of their behaviour were, overall, close to those generated by the direct measure. However, closer examination of the individual behaviours contributing to the overall summary scores by one of these studies <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> revealed that clinicians were overestimating their performance of some clinical actions and underestimating their performance of others, an observation lost in the summary score due to counterbalancing. Over- and underestimation was also tentatively suggested on the ROC plot for an additional study <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, albeit in a contrasting direction.</p>
            <p>Of these two studies demonstrating over- and underestimation of self-reported behaviour, one provided clinicians with a closed-ended checklist of possible behaviours <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. The second study used an open-ended response mode with responses coded later by an independent observer <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. This may explain the conflicting outcomes of these two studies; because closed-ended checklists provide clinicians with an extensive list of possible actions, they may produce a cueing effect for them to select additional actions or act as a prompt to elicit knowledge about what they could, or should not do <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Such variation in the ability of vignettes to predict the occurrence of important behaviours that clinicians should or should not do undermines their validity. However, this may be a problem that can be overcome by careful and rigorous development of vignette cases and the method of their presentation <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
            <p>Measures that use vignettes require clinicians to report their behaviour in the context of what they would do in a given clinical scenario. The remaining studies evaluating clinician self-report collected retrospective accounts of actual behaviour using either interview or questionnaire methods and report correlation coefficients and measures of 'total agreement' that suggest good agreement between measures. However, correlation is a measure of association, and a high correlation can effectively disguise important disagreement if there is a consistent bias in one measure <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. A similar problem exists with the interpretation of 'total' or 'observed' agreement in that a large proportion of the agreement may be for behaviours that were reported by both measures as not performed, again disguising important deficits in a proxy measure to accurately detect actual performance <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Review limitations</p>
            </st>
            <p>Many references reviewed were sourced from the reference lists of retrieved articles. We did not find a common terminology for describing written case simulations or proxy methods, and it is therefore possible that our database search was subsequently limited by this. A common terminology for measures would greatly facilitate research in this area. The literature search only covered up to August 2004; an update of this review could provide further useful information. A further limitation of this review is that we were not able to combine data due to the heterogeneity of the included reports. We tried to minimise publication bias by searching not only the peer-reviewed literature but also abstracts of conferences and unpublished theses. As we were unable to conduct a formal meta-analysis because of the heterogeneity in the designs, proxy measures, and summary statistics used in the included studies, we could not use conventional methods of assessing publication bias <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Nevertheless, the included studies presented various results &#8211; seven studies <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B9">9</abbr><abbr bid="B11">11</abbr><abbr bid="B14">14</abbr><abbr bid="B17">17</abbr></abbrgrp> presented a range of both positive and negative findings, six studies <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B10">10</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B15">15</abbr><abbr bid="B18">18</abbr></abbrgrp> presented positive findings only and one <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> presented only negative or inconclusive findings &#8211; suggesting that there is no apparent systematic tendency towards publication bias in the current review.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In validating a proxy measure of clinical behaviour it is imperative that the direct measure for comparison is itself both reliable and valid. In some of the included reports the direct measure lacked validity. Only four studies were found that used appropriate statistical methods to compare measures. The validity of patient report and medical record review varied widely across a number of clinical actions but was high for some specific clinical actions. The evidence for the validity of clinician self-report is inconclusive.</p>
         <p>Two recent systematic reviews evaluated the efficacy of social cognitive models of behaviour in explaining clinical performance <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B27">27</abbr></abbrgrp>. Both reviews found that the relationship between clinicians' self-reported intention and their behaviour is not perfect (maximum R<sup>2 </sup>reported was 0.44 <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>), and that the strength of the relationship often varied depending on the method used to measure their behaviour. The current review supports the notion that at least some of the discrepancy between intentions and behaviour can be explained by error originating from unreliable measures of behaviour.</p>
         <p>Valid measures of clinical behaviour are of fundamental importance to accurately identify gaps in care delivery, to continuous improvement of quality of care, and ultimately to improved patient care. However, the evidence base for three commonly used proxy measures of clinicians' behaviour is very limited. Further research needs to establish the scope of capture for a range of both direct and indirect measures of clinical behaviour and the potential for using a combination of proxy measures to obtain an all round picture of clinical behaviour.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors contributed to the conception and design and analysis of the study and approved the submitted draft. MPE, JJF, EK SH and HD reviewed the articles and abstracted the data.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Quality and Outcomes Framework for GP practices</p>
            </title>
            <aug>
               <au>
                  <cnm>The Information Centre</cnm>
               </au>
            </aug>
            <url>http://www.ic.nhs.uk/</url>
            <note>[cited 28.08.2008]</note>
         </bibl>
         <bibl id="B2">
            <title>
               <p>New GMS Contract 2003. Investing in general practice</p>
            </title>
            <aug>
               <au>
                  <cnm>Department of Health</cnm>
               </au>
            </aug>
            <publisher>NHS Confederation and the British Medical Association. London</publisher>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Do self-reported intentions predict clinicians' behaviour: a systematic review</p>
            </title>
            <aug>
               <au>
                  <snm>Eccles</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Hrisos</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Francis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kaner</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Dickinson</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Beyer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Johnston</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Implement Sci.</source>
            <pubdate>2006</pubdate>
            <volume>1</volume>
            <fpage>28</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1664582</pubid>
                  <pubid idtype="pmpid" link="fulltext">17118180</pubid>
                  <pubid idtype="doi">10.1186/1748-5908-1-28</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <aug>
               <au>
                  <snm>Streiner</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>GR</fnm>
               </au>
            </aug>
            <source>Health Measurement Scales: a practical guide to their development and use</source>
            <publisher>Oxford: Oxford University Press</publisher>
            <edition>3</edition>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B5">
            <title>
               <p>How valid are medical records and patient questionnaires for physician profiling and health services research? A comparison with direct observation of patients visits</p>
            </title>
            <aug>
               <au>
                  <snm>Stange</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Zyzanski</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Kelly</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Langa</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Flocke</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Jaen</snm>
                  <fnm>CR</fnm>
               </au>
            </aug>
            <source>Medical Care</source>
            <pubdate>1998</pubdate>
            <volume>36</volume>
            <fpage>851</fpage>
            <lpage>867</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1097/00005650-199806000-00009</pubid>
                  <pubid idtype="pmpid">9630127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Direct observation and patient recall of health behavior advice</p>
            </title>
            <aug>
               <au>
                  <snm>Flocke</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Stange</snm>
                  <fnm>KC</fnm>
               </au>
            </aug>
            <source>Prev Med</source>
            <pubdate>2004</pubdate>
            <volume>38</volume>
            <fpage>343</fpage>
            <lpage>349</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.ypmed.2003.11.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">14766118</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Comparison of patient questionnaire, medical record, and audio tape in assessment of health promotion in general practice consultations. Source</p>
            </title>
            <aug>
               <au>
                  <snm>Wilson</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMJ</source>
            <editor>McDonald P</editor>
            <pubdate>1994</pubdate>
            <volume>309</volume>
            <fpage>1483</fpage>
            <lpage>1485</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2541640</pubid>
                  <pubid idtype="pmpid" link="fulltext">7804055</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Accuracy of patient recall of opportunistic smoking cessation advice in general practice</p>
            </title>
            <aug>
               <au>
                  <snm>Ward</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sanson-Fisher</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Tobacco Control</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <issue>2</issue>
            <fpage>110</fpage>
            <lpage>113</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1759509</pubid>
                  <pubid idtype="pmpid">8910991</pubid>
                  <pubid idtype="doi">10.1136/tc.5.2.110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Validating the content of pediatric outpatient medical records by means of tape-recording doctor-patient encounters</p>
            </title>
            <aug>
               <au>
                  <snm>Zuckerman</snm>
                  <fnm>ZE</fnm>
               </au>
               <au>
                  <snm>Starfield</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hochreiter</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kovasznay</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Pediatrics</source>
            <pubdate>1975</pubdate>
            <volume>56</volume>
            <issue>3</issue>
            <fpage>407</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1161397</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>How well does chart abstraction measure quality? A prospective comparison of standardized patients with the medical record</p>
            </title>
            <aug>
               <au>
                  <snm>Luck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Peabody</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Dresselhaus</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Glassman</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>American Journal of Medicine</source>
            <pubdate>2000</pubdate>
            <volume>108</volume>
            <issue>8</issue>
            <fpage>642</fpage>
            <lpage>649</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0002-9343(00)00363-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">10856412</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Performance on PMPs and performance in practice: are they related?</p>
            </title>
            <aug>
               <au>
                  <snm>Page</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Fielding</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <pubdate>1980</pubdate>
            <volume>55</volume>
            <fpage>529</fpage>
            <lpage>537</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7381906</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Agreement among physician assessment methods. Searching for the truth among fallible methods</p>
            </title>
            <aug>
               <au>
                  <snm>Gerbert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Stone</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Stulbarg</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gullion</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Greenfield</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Medical Care</source>
            <pubdate>1988</pubdate>
            <volume>26</volume>
            <fpage>519</fpage>
            <lpage>535</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1097/00005650-198806000-00001</pubid>
                  <pubid idtype="pmpid">3379984</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The patient exit interview as an assessment of physician-delivered smoking intervention: a validation study</p>
            </title>
            <aug>
               <au>
                  <snm>Pbert</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Quirk</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Herbert</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Ockene</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Luippold</snm>
                  <fnm>RS</fnm>
               </au>
            </aug>
            <source>Health Psychol</source>
            <pubdate>1999</pubdate>
            <volume>18</volume>
            <fpage>183</fpage>
            <lpage>188</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1037/0278-6133.18.2.183</pubid>
                  <pubid idtype="pmpid" link="fulltext">10194054</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Measuring physician behavior</p>
            </title>
            <aug>
               <au>
                  <snm>Gerbert</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hargreaves</snm>
                  <fnm>WA</fnm>
               </au>
            </aug>
            <source>Medical Care</source>
            <pubdate>1986</pubdate>
            <volume>24</volume>
            <fpage>838</fpage>
            <lpage>847</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1097/00005650-198609000-00005</pubid>
                  <pubid idtype="pmpid">3762247</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Measuring compliance with preventive care guidelines: standardized patients, clinical vignettes, and the medical record</p>
            </title>
            <aug>
               <au>
                  <snm>Dresselhaus</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Peabody</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Luck</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of General Internal Medicine</source>
            <pubdate>2000</pubdate>
            <volume>15</volume>
            <issue>11</issue>
            <fpage>782</fpage>
            <lpage>788</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1495610</pubid>
                  <pubid idtype="pmpid">11119170</pubid>
                  <pubid idtype="doi">10.1046/j.1525-1497.2000.91007.x</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Simulated patients in general practice: a different look at the consultation</p>
            </title>
            <aug>
               <au>
                  <snm>Rethans</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>van Boven</snm>
                  <fnm>CPA</fnm>
               </au>
            </aug>
            <source>British Medical Journal</source>
            <pubdate>1987</pubdate>
            <volume>294</volume>
            <fpage>809</fpage>
            <lpage>812</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1245868</pubid>
                  <pubid idtype="pmpid">3105753</pubid>
                  <pubid idtype="doi">10.1136/bmj.294.6575.809</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>To what extent do clinical notes by general practitioners reflect actual medical performance? A study using simulated patients</p>
            </title>
            <aug>
               <au>
                  <snm>Rethans</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Metsemakers</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>British Journal of General Practice</source>
            <pubdate>1994</pubdate>
            <volume>44</volume>
            <issue>381</issue>
            <fpage>153</fpage>
            <lpage>156</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1238838</pubid>
                  <pubid idtype="pmpid">8185988</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality</p>
            </title>
            <aug>
               <au>
                  <snm>Peabody</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Luck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glassman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Dresselhaus</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>JAMA</source>
            <pubdate>2000</pubdate>
            <volume>283</volume>
            <issue>13</issue>
            <fpage>1715</fpage>
            <lpage>1722</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1001/jama.283.13.1715</pubid>
                  <pubid idtype="pmpid" link="fulltext">10755498</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Understanding adherence to hand hygiene recommendations: the theory of planned behavior</p>
            </title>
            <aug>
               <au>
                  <snm>O'Boyle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Henly</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Larson</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Am J Infect Control</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>352</fpage>
            <lpage>360</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1067/mic.2001.18405</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The use of standardised patients in research in general practice</p>
            </title>
            <aug>
               <au>
                  <snm>Buellens</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rethans</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Goedhuys</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Buntinx</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Family Practice</source>
            <pubdate>1997</pubdate>
            <volume>14</volume>
            <fpage>58</fpage>
            <lpage>62</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/fampra/14.1.58</pubid>
                  <pubid idtype="pmpid" link="fulltext">9061346</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Measuring the quality of physician practice by using clinical vignettes: a prospective validation study</p>
            </title>
            <aug>
               <au>
                  <snm>Peabody</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Luck</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glassman</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jain</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Spell</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Annals of Internal Medicine</source>
            <pubdate>2004</pubdate>
            <volume>141</volume>
            <fpage>771</fpage>
            <lpage>780</lpage>
            <xrefbib>
               <pubid idtype="pmpid">15545677</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Which data source in clinical performance assessment? A pilot study comparing self-recording with patient records and observation</p>
            </title>
            <aug>
               <au>
                  <snm>Spies</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mokkink</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>De Vries Robbe</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Grol</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>International Journal for Quality in Health Care</source>
            <pubdate>2004</pubdate>
            <volume>16</volume>
            <issue>1</issue>
            <fpage>65</fpage>
            <lpage>72</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/intqhc/mzh001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15020562</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Written case simulations: do they predict physicians' behaviour?</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>TV</fnm>
               </au>
               <au>
                  <snm>Gerrity</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Earp</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Journal of Clinical Epidemiology</source>
            <pubdate>1990</pubdate>
            <volume>43</volume>
            <issue>8</issue>
            <fpage>805</fpage>
            <lpage>815</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0895-4356(90)90241-G</pubid>
                  <pubid idtype="pmpid" link="fulltext">2200852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Association or Agreement?</p>
            </title>
            <aug>
               <au>
                  <snm>Chia</snm>
                  <fnm>KS</fnm>
               </au>
            </aug>
            <source>Annals Academy of Medicine Singapore</source>
            <pubdate>2000</pubdate>
            <volume>29</volume>
            <fpage>263</fpage>
            <lpage>264</lpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Measuring agreement in medical informatics reliability studies</p>
            </title>
            <aug>
               <au>
                  <snm>Hripcsak</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Heitjan</snm>
                  <fnm>DF</fnm>
               </au>
            </aug>
            <source>Journal of Biomedical Informatics</source>
            <pubdate>2002</pubdate>
            <volume>35</volume>
            <issue>2</issue>
            <fpage>99</fpage>
            <lpage>110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1532-0464(02)00500-2</pubid>
                  <pubid idtype="pmpid">12474424</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <aug>
               <au>
                  <snm>Egger</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Davey Smith</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <cnm>(Eds)</cnm>
               </au>
            </aug>
            <source>Investigating and dealing with publication and other biases. Chapter 11 in Systematic reviews in health care: meta-analysis in context</source>
            <publisher>London: BMJ books</publisher>
            <edition>2</edition>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Healthcare professionals' intentions and behaviours: A systematic review of studies based on social cognitive theories</p>
            </title>
            <aug>
               <au>
                  <snm>Godin</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Belanger-Gravel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eccles</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Grimshaw</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Implementation Science</source>
            <pubdate>2008</pubdate>
            <volume>3</volume>
            <issue>36</issue>
         </bibl>
      </refgrp>
   </bm>
</art>

