Blended & layered research: Avdi & Evans 2020

Created 11.ix.21.

Yesterday I went, virtually, to Malta to the 8th conference of Qualitative Research in Mental Health, QRMH8 to its loyal friends. I was there because Professor Avdi, from Aristotle University in Thessaloniki gave a presentation of the work she and I did for the paper in the title of this post: Avdi, E., & Evans, C. (2020). Exploring Conversational and Physiological Aspects of Psychotherapy Talk. Frontiers in Psychology, 11, 591124. https://doi.org/10.3389/fpsyg.2020.591124. (Open access at https://www.frontiersin.org/articles/10.3389/fpsyg.2020.591124/full.)

I’m very proud of that paper as it was a genuine attempt to try to do more than a “mixed methods” piece of work, i.e. a mix of qualitative and quantitative methods. The paper came out of work Evrinomy had done within the Relational Mind research project which was a Finnish led collaborative project using both qualitative and quantitative methods to explore that title: minds in relationship and perhaps constitute by relationships. I’ve been following their qualitative work, and, intermittently, the QRMH conferences, for some years now and Evrinomy and I have know each other for many years, starting with the Greek translation of the CORE-OM co-led by myself and Dr. Damaskinidou: a good friend to both of us who introduced us through that work.

Evrinomy approached me some years back, 2015 or 2016 perhaps as I think I was still in clinical work. At that point she was asking my views on the work she was doing with colleagues in Thessaloniki trying to link the physiological arousal indicators with the processes in couple and individual therapies in which therapists and clients wore heart and respiratory rate recorders. That led to me not being terribly useful to a very tolerant PhD student supervised by Evrinomy, Anna Mylona on work she was doing linking “rupture” classification of the transcripts from 2018 to 2020 and that led, in turn to this paper that Evrinomy and I got done last year.

While Evrinomy, with I think some genuinely useful input from me worked up a fascinating conversation analytic (CA) unpicking of the session, we, well probably I is more accurate, worked through a series of quantitative tools to look at the changes in ASV, the arousal measure, through the session we were dissecting. I taught myself a lot about time series analyses and got to understand PDC: partially directed coherence analysis which had been a method her Thessaloniki colleagues (from neurophysiology) had advocated. In the end we agreed only to use very simple plots of the data against the eight “Topical Episodes” (TEs) that emerged from the CA. That led to plots like these. (Click on them to get the full plot.)

If you’re interested to read more, particularly the excerpts and CA, do look at the paper. As an example of truly blended, rather than simply mixed, research it’s not sophisticated but what I think did emerge was what happens when a largely qualitative researcher (Evrinomy is seriously experienced and skilled) and a quant geek like myself, but who both share a clinical background try to complement each other. It’s not particularly clear in the paper (it’s big and quite dense as it is!) but we each learned a lot about blending.

Three simple methodological things emerged for me:
1. one huge strength of statistical quantitative research is the ability to formulate “objective” tests to tell us whether we appear to have non-random things in our data;
2. however, very often the purity of those models is not really a good model of how the actual data arose and sometimes “descriptive, exploratory and ‘estimating'” statistical methods may be more robustly useful;
3. if your methods are so sophisticated, complex and unfamiliar that practitioners will be essentially reduced to the role of audience at a display of magic we have an odd relational mind relationship being created between researchers/authors, readers (practitioners) and the data.

#2 was clearly the case for our data and a lot of the sophisticated things I had hoped might be useful were clearly stretching the relationship between data and model and others, to me the PDC method, fell into that “this is magical” #3 so we ended up with very simple plot methods but tried to keep a genuine blending of quantitative and qualitative data.

Perhaps more interestingly and importantly, this pushed us into a lot of thinking about the fact that methodological issues like those, or any of the many qualitative methodological choices, actually sit on top of epistemological choices. (Quick acknowledgement to Professor Edith Steffen now in Plymouth who, when we overlapped in the Univesity of Roehampton, challenged me to take epistemology seriously despite the brain ache that causes me!)

There is an odd polarisation that goes with the general qual/quant polarisation in research about minds: qualitative papers almost always have some statement of epistemological position and, largely implicitly, locate that in the mind of the author(s) exposing it for consideration by the readers; by contrast, epistemological position statements are hardly ever seen in quantitative papers. This has the effect of leaving the reader of quant papers to assume the paper isn’t arising from an authorial mind or minds, but in some abstract “reality”: in fact the papers claim truth value in what seems to me to be a completely untenable “empirical positivist” position. I’m left wondering if we could blend our methods so much more usefully if started to insist that all papers have at least a one line statement of epistemological position. I’m trying to make sure I put mine into all my papers now and to insist that authors should put that into their work when I’m peer reviewing. I think it’s going to be a long time before this might become a norm and don’t think we’ll tap the real power of genuinely blended method instead of often very tokenistic mixed methods.

Onwards! Oh, here’s a souvenir from my only non-virtual visit to Malta, in 2018.

I do like the blending of languages and a clear plot of the message! Malta is a fascinating place. Perhaps I’ll get round to doing the intended second blog post in my personal blog that was supposed to complement the first, rather negative one. If you’re wondering about the choice of header image: it’s a recent image from the terrace outside my current workplace and I thought the juxtaposition of the moutains through atmospheric haze and the 1970 brutalist balcony and wooden fences had something of the flavour of blending qual and quant! For more on living there you might try “Squatting (with my coffee)“!

NICE consultation 2021

[Written 9.vii.21]

NICE is having a consultation. As the Email I got says:

We have now embarked on the latest phase of this user research. I’d like to invite you to contribute so we can better understand your views and experiences of NICE. Your feedback is truly important and will help us continue our journey to transform over the next 5 years.
The survey is open until Friday 16 July 2021. So, please do take 10 minutes to share your views before then.
Complete our short survey
Gillian Leng CBE
Chief executive, NICE

So I started it and got to “Please explain why you feel unfavourably towards NICE.” which had a nice big free text box. So I typed in my set of, I think fair and carefully thought out criticisms (below) and hit the button to move on to the next question and got this.

We’re sorry but your answer has too much text. The current length of your answer is 3944 and the maximum length is 1024 characters. Please change your answer and try again.

Wonderful! No initial warning that only 1024 characters were allowed, no warning as you approach 1024, no block when you hit 1024. Terrible design!

For what it’s worth, these were my 3944 words.

What was originally a system to provide information has morphed relentlessly into something that is used in the commoditisation of health care to dictate what practitioners should do. It is so preoccupied, to a large extent understandably, in containing exploding pharmaeutical costs, that it is very focused on RCT evidence used to assess cost effectiveness. That’s not bad for those pharmaceutical interventions that can be given double blind but even there generalisabillity appraisal is poor with a dearth of attention to post-marketing, “practice based evidence” to see how RCT findings do or do not generalise. For most interventions, all psychosocial interventions, where double blind allocation is impossible, this is crazy and leads almost all research funding to be diverted into RCTs “as they have political influence” but where their findings are such that it is essentially impossible to disentangle expectancy/placebo/nocebo effects from “real effects” (there is an interesting argument about that separation but there is some meaning in it). This goes on to make it impossible with your methodologies to evaluate complex real world interventions including psycho-social ones, impossible to compare those with pharmaceutical or surgical/technological ones and impossible to evaluate mixed interventions.

Decisions are theoretically about quality of life but, at least in the mental health field, all work I have seen has been based on short term symptom data and makes no attempt to weight in what QoL and functioning data does exist. This is not a new issue: McPherson, S., Evans, C., & Richardson, P. (2009). The NICE Depression Guidelines and the recovery model: Is there an evidence base for IAPT? Journal of Mental Health, 18, 405–414. https://doi.org/10.3109/09638230902968258 showed this clearly 12 years ago (yes, I contributed to that). In addition, foci are not always, but are generally, on diseases leading to a neglect of the growing complexities of multi-diagnostic morbidity and of the whole complex interactions of mind and body even when there are crystal clear, organic, primary disorders (Diabetes Mellitus and cancers are a classic example of clear organic pathologies where the complexities of how individuals and families handle the same organic pathology make huge differences in problem and QoL trajectories). In the mental health domain, to make a rather crude physical/mental distinction, there are crystal clear diagnoses of organic origin (Huntingdon’s Disease and a tiny subset of depression, anxiety disorders, much but not all intellectual disabilities and some psychotic states) but the disease model, certainly in simple “diagnosis is all and dictates treatment to NICE guidelines” is often more of a handicap than an aid.

That focus also leaves NICE almost irrelevant when it has to address “public health attitude” issues like obesity, diet more generally, smoking, alcohol and other substance abuse and spectacularly at the moment, attitudes to vaccination and social interventions to minimise cross-infection. Again, cv-19 has exposed this, and the slowness of NICE, horribly, but all the warnings have been there for decades.

In addition, NICE processes come across as increasingly smug (routine Emails I get from NICE long ago lost any sense that there could be any real doubts about your decisions) and the history of the recent depression guideline should be a marker that the good law project should turn from the government to NICE processes. From what I see of that, NICE has come across as opaque and more concerned to protect its processes than to recognise the huge problems with the particular emerging guideline but really more generally.

Why waste time typing all this: this is all so old and has so consistently developed to avoid and minimise problems that I suspect this will be another process of claiming to have been open and listening but changing little.

New developments here

Created 14.iv.21

Oh dear, about 16 months since I last posted here. Still I hope that featured image above gives a sense of spring arriving! I guess some of those months have been pretty affected by the coronavirus pandemic. There’s a little bit more about how that impacted on me and how I spent a lot of those months high up in the Alps, very protected from cv-19. During this time I have been working very hard and been fairly successful getting papers I like accepted (see publication list and CV)

In the last month I have protected some work time away from Emails, data crunching and paper writing and come back to web things. That has resulted in:

  1. My SAFAQ or Rblog. This is a set of Self-Answered Frequently Answered Questions (hence SAFAQ) and is the best way I have found to present how I use R and allows me to do that in a way that I can’t here in WordPress.

That is quite closely linked with:

  1. The CECPfuns R package. R (r-project.org)is a brilliant, open source, completely free system for statistical computation than runs on pretty much any Linux, on Macs and on Windows. It is partly made up of packages of functions and I have written one that I hope will grow into a useful resource for people wanting to use R for therapy and psychology work but wanting a fairly “newbie friendly” rather than R geeky hand with that. It complements the SAFAQ/Rblog. cecpfuns.psyctc.org is a web site built out of the package documentation and all the geeky details are at github.com/cpsyctc/CECPfuns.

Those are both developing quite fast with the latter getting updates daily, sometimes more often, and the former getting new items more often than once a week. I suspect that now I have announced them here, I may do intermittent posts here that just give an update about one or both of those and they will get to be linked more into pages here. There are two more key developments coming up:

  1. My own shiny server here that will provide online apps for data analysis and providing explanations of analytic tools.
  2. A book “Outcome Measures and Evaluation in Counselling and Psychotherapy” written with my better half, Jo-anne Carlyle, that should be coming out through SAGE in late November (and they have just sent us the first proofs exactly to their schedule so perhaps I should start believing that it will come out then.) That is aimed at demystifying that huge topic area and, we hope, making it easier for practitioners both to understand, and critique, it, and to start doing it. That will lean heavily on SAFAQ pages and the apps on the shiny server.

And now, in radical contrast to the featured/header image, something completely different:

An 8Gb Raspberry Pi 4

That’s sitting on my desk, about the size of a small book. It’s a local version of the system that, I hope, will host the shiny server!

Ethics committees and the fear of questionnaires about distress

Created 10.xii.19

Perhaps this post should be a post, or even an FAQ, in my CORE web site, but then I fear it would be taken too formally so it’s here. However, I’ll put a link to this from the CORE blog: this thinking started with one of a sudden slew of Emails I’ve had coming to me via the CORE site.

I won’t name names or Universities but I will say that it came from the UK. I think it could easily have come from other countries but I have the general experience that many countries still have less of this problem, seem less fearful of asking people about unhappiness or even self-destructiveness than many UK ethics committees seem to be.

The specific problem is the idea that if a research project asks people, particularly young people: teenagers or university students, about distress, and particularly about thoughts of self-harm or suicide, that there’s a terrible risk involved and the project shouldn’t happen. This sometimes takes the form of saying that it would be safer only to ask about “well-being” (or “wellbeing”, I’m not sure any of us know if it needs its hyphen or not).

A twist of this, the one that prompted this post, is the idea that the risk might be OK if the researcher using the measure, offering it to people, is clinically qualified or at least training in a clinical course. That goes with a question I do get asked fairly regularly about the CORE measures: “do you need any particular qualifications to use the measures?” which has always seemed to me to be about the fantasy that if we have the right rules about who can do what, everything will be OK.

This post is not against ethics committees. I have worked on three ethics committees and have a huge respect for them. They’re necessary. One pretty convincing reading of their history is that their current form arose particularly out of horrors perpetrated by researchers, both in the US and the UK, and also in the concentration camps. Certainly the US and UK research horrors did lead to the “Institutional Review Board (IRBs)” in the States and the “Research Ethics Committees (RECs)” in the UK. Those horrors that really were perpetrated by researchers, particularly medical researchers, but not only medical researchers, are terrifying, completely unconscionable. It’s clearly true that researchers, and health care workers, can get messianic: can believe that they have divine powers and infallibility about what’s good and what’s bad. Good ethics committees can be a real corrective to this.

Looking back, I think some of the best work I saw done by those ethics committees, and some of my contributions to those bits of work, were among the best things I’ve been involved with in my clinical and research careers so I hope it’s clear this isn’t just about a reasearcher railing against ethics committees. However, my experience of that work brought home to me how difficult it was to be a good ethics committee and I saw much of the difficulty being the pressure to serve, in the Freudian model, as the superego of systems riven with desires, including delusional aspirations to do good through research. I came to feel that those systems often wanted the ethics committee to solve all ethical problems partly because the wider systems were underendowed with Freud’s ego: the bits of the system that are supposed to look after reality orientation, to do all the constant, everyday, ethics they needed done.

In Freud’s system the superego wasn’t conscience: a well functioning conscience is a crucial part, a conscious part, of his ego. You can’t have safe “reality orientation” without a conscience and, as it’s largely conscious, it’s actually popping out of the top of his tripartite model, out of the unconscious. His model wasn’t about the conscious, it was about trying to think about what can’t be thought about (not by oneself alone, not by our normal methods). It was about the “system unconscious”: that which we can’t face, a whole system of the unreachable which nevertheless, he was arguing, seemed to help understand some of the mad and self-destructive things we all do.

In my recycling of Freud’s structural, tripartite, model, only his id, the urges and desires is unequivocally and completely unconscious, the superego has some conscious manifestations and these do poke into our conscious conscience, and the ego straddles the unconscious (Ucs from here on, to speed things up) and the conscious. (I think I’m remembering Freud, the basics, correctly, it was rather a long time ago for me now!)

I’m not saying that this model of Freud’s is correct. After all, Freud with theories, was rather like Groucho Marx with principles, they both had others if you didn’t like their first one …) What I am arguing, (I know, I do remember, I’ll come back to ethics committees and questionnaires in a bit) is that this theory, in my cartoon simplification of it, may help us understand organisations and societies, even though Freud with that theory was really talking about individuals.

As I understand Freud’s model it was a double model. The was combining his earlier exploration of layers of conscious, subconscious and Ucs with this new model with its id, superego and ego. They were three interacting systems with locations in those layers. With the id, ego and superego Freud was mostly interested in their location in unconscious. Implicitly (mostly, I think) he was saying that the conscious (Cs), could be left to deal with itself.

That makes a lot of sense. After all consciousness is our capacity to look after ourselves by thinking and feeling for, and about, ourselves. To move my metaphors on a century, it’s our debugging capability. Freud’s Ucs, like layers of protection in modern computer operating systems, was hiding layers of our functioning from the debugger: our malware could run down there, safely out of reach of the debugger.

The id, superego, ego model is surely wrong as a single metatheory, as a”one and only” model of the mind. It’s far too simple, far too static, far too crude. Freud did build some two person and three person relatedness into it, but it was still a very late steam age, one person, model and desperately weak on us as interactional, relational, networked, nodes, it was a monadic model really.

However, sometimes it fits! My experience on those committees, and equally over many more years intersecting with such committees, is that they can get driven rather mad by the responsibilities to uphold ethics. They become, like the damaging aspects of Freud’s model of the individual’s superego, harsh, critical, paralysing, sometimes frankly destructive. The more the “primitive”: rampant desire (even for good), anger and fears of premature death and disability gets to be the focus, the more they risk losing reality orientation, losing common sense and the more the thinking becomes rigid. It becomes all about rules and procedures.

The challenge is that ethics committees really are there to help manage rampant desire (even for good), anger and fears of premature death and disability. They were created specifically to regulate those areas. They have an impossible task and it’s salutory to learn that the first legally defined medical/research ethics committees were created in Germany shortly before WWII and theoretically had oversight of the medical “experiments” in the concentration camps. When society loses its conscience and gives in to rigid ideologies (Aryanism for example) and rampant desires (to kill, to purify, to have Lebensraum even) perhaps no structure of laws can cope.

OK, so let’s come back to questionnaires. The particular example was the fear that a student on a non-clinical degree course might use the GP-CORE to explore students’ possible distress in relation to some practical issues that might plausibly not help students with their lives, or, if done differently, might help them. The central logic has plausibility. I have no idea how well or badly the student was articulating her/his research design, I don’t even know what it was. From her/his reaction to one suggestion I made about putting pointers to health and self-care resources at the end of the online form, I suspect that the proposal might not have been perfect. Ha, there’s my superego: no proposal is perfect, I’m not sure any proposal ever can be perfect!

What seemed worrying to me was that the committee had had suggested that, as someone doing a non-clinical training, s/he should leave such work and such questionnaires to others.To me this is hard to understand. S/he will have fellow students who self-harm, some who have thoughts of ending it all. One of them may well decide to talk to her/him about that after playing squash, after watching a film together.

Sure, none of us find being faced with that, easy: we shouldn’t. Sure, I learned much in a clinical training that helped me continue conversations when such themes emerged. I ended up having a 32 year clinical career in that realm and much I was taught helped (quite a bit didn’t but we’ll leave that for now!) It seems to me that a much more useful, less rule bound, reaction of an ethics committee is to ask the applicant “have you thought through how you will react if this questionnaire reveals that someone is really down?” and then to judge the quality of the answer(s). The GP-CORE has no “risk” items. It was designed that way precisely because the University of Leeds which commissioned it to be used to find out about the mental state of its students, simply didn’t want to know about risk. (That was about twenty years ago and it’s really the same issue as the ethics committee issue.)

One suggestion from the committee to the student was only to use a “well-being” measure. Again, this seems to me to be fear driven, not reality orienting. There is much good in “well-being work”, in positive psychology, and there is a very real danger that problem focus can pathologise and paralyse. However, if we only use positively cued items in questionnaires and get a scale of well-being then we have a “floor effect”: we’re not addressing what’s really, really not well for some people. We specifically designed all the CORE measures to have both problem and well-being items to get coverage of a full range of states. The GP-CORE is tuned not to dig into the self-harm realm but it still has problems, the CORE-OM, as a measure designed to be used where help is being offered to people who are asking for it, digs much more into self-harm.

Many people, many younger people, many students, are desperately miserable; many self-harm; tragically, a few do kill themselves. Yes, clinical trainings help some people provide forms of help with this. However, improving social situations and many other things that are not “clinical” can also make huge differences in Universities. (In the midst of industrial action, I of course can’t resist suggesting that not overworking, not underpaying academics, not turning Universities into minimum wage, temporary contract, degree factories, might help.)

Misery, including student misery, is an issue for all of us, not just for some select cadre thought to be able to deal with it by virtue of a training. So too, ethics is everyone’s responsibility. Perhaps we are institutionalising it into ethics committees, into “research governance” and hence putting the impossible into those systems. We create a production line for ethics alongside the production lines for everything else. Too often perhaps researchers start to think we just have to “get this through ethics” and not really own our responsibility to decide if the work is ethical. Perhaps too many research projects now are the production line through which our governments commission the research they want, probably not the research that will question them. Perhaps that runs with the production line that produces docile researchers. It’s time we thought more about ethics ourselves, and both trusted ourselves and challenged ourselves, and our peers, to engage in discussions about that, to get into collective debugging of what’s wrong. Oops, I nearly mentioned the UK elections … but it was a slip of the keyboard, it’ll get debugged out before Thursday … or perhaps it would if I wrote still needing to be on the right conveyor belts, the right production lines.

Oh, that image at the top: commemoration ‘photos, from family albums I would say, of the “disappeared” and others known to have died at the hands of the military, from Cordoba, Argentina. From our work/holiday trip there this summer. A country trying to own its past and not fantasize.

I too was a medical student in 1975. Would I have been brave? Ethical?

Data entry: two out of range item scores can really affect Cronbach’s alpha

This little saga started over a year ago when I helped at a workshop a psychological therapies department held about how they might improve their use of routine outcome measures. They were using the CORE-OM plus a sensible local measure that added details they wanted and for which they weren’t seeking comparability with other data.

In the lunch break someone told me s/he had CORE-OM data from a piece of work done in another NHS setting (with full research governance approval!) The little team that had put a lot of work into a small pragmatic study felt stymied because the Cronbach alpha for their CORE-OM data was .65 and they were worried that this meant the perhaps the CORE-OM didn’t work well for their often highly disturbed clientèle. They had stopped there but thought of asking me about it.

My reaction was that I shared the concern about self-report measures, not just the CORE-OM, perhaps not having the same psychometrics, not working as well, in severely disturbed client groups as in the less disturbed or non-clinical samples in which they’re usually developed. However, I hadn’t thought that would bring the alpha down that low and wondered if they had forgotten to reverse score the positively cued items.

As everyone’s crazily busy I didn’t hear anything for a long while but then got a message that they had checked and the coding was definitely right, would I have a look at their data in case it really was about the client group as they knew I was interested in how severity, chronicity and type of disturbance may affect clients’ use of measures.

I agreed and received the well anonymised data. About 700 participants had completed all the items and the alpha was .65 (not that I really doubted them, I just like to recheck everything!) So I checked the item score ranges though I hadn’t really thought there was likely to be much by way of data entry errors. There wasn’t: just two out of range items in over 23,000. The one was 11 and the other was 403. Changing them to missing, and hence dropping two participants resulted in an alpha of .93 with a parametric 95% confidence interval from .93 to .94, i.e. absolutely typical for CORE-OM data.

I would never have believed that just 0.008% incorrect items could affect alpha that much, even if one was 403 when the item upper score limit is 4: I was wrong! Well, perhaps it’s not quite that low a percentage. If that 11 was 1 for the one item (item22) and another 1 which should have gone into item 23 then perhaps many of the remaining items for that client were wrong; same for 403 for item 28, after all 1, 4, 0 and 3 are all possible item scores on the CORE-OM. That would take the incorrect entries up to 0.08%. However, if something like failure to hit the carriage return is the explanation then there should have been one or more missing items at the end of the entries for that client and their data would never have made it into the computation of alpha. Perhaps a really badly out of range item at a rate of just 0.008% is enough to bring alpha down this much. Only checking back to the original data will tell. I hope they still have the original data.

OK, but does this merit a blog post (well, I’ve got to start somewhere!) I think there are some points of interest.

  • it shows just how influential a few out of range scores can be
  • it shows that alpha can sometimes detect this and hooray for the people involved that they did calculate alpha and sensed that something was so wrong that they couldn’t just go ahead with the analyses they had planned
  • it does show though that simple range checks on items were a quicker and more certain way of detecting what was at root here
  • it shows that though I think you should always do all the range and coherence checks on data that you can think of making sense for the data …
  • … it’s stronger to have duplicate data entry but which of us can afford this?
  • even if you can do duplicate entry (assuming that the clients complete the measures on paper) you should use a data entry system that as far as possible detects impossible or improbable data at the point of entry
  • (and if you do have direct entry by clients please make sure it does that entry checking and in a user-friendly way)
  • but while absurd sums of money are put into healthcare data systems and into funding psychological therapy RCTs, where is the money to fund good data entry, clinician research and practice based evidence?

To finish on a gritty note about data entry, at least twenty years ago, before I discovered S+ and R I mainly used SPSS for statistics and back then, for a while, SPSS had a “data entry module”. It was slow ,which was perhaps why they dropped it but it was brilliant: you could set up range checks and all the coherence checks you wanted (pregnant male: I think not). After that died I tended to enter my data into spreadsheets and until about a year ago I was encouraging colleagues I work with around the world to use Excel (yes, I tried encouraging them to use Libre/OpenOffice but everyone had and knew Excel and often weren’t allowed to install anything else). They or I would write data checking into the spreadsheets to the extent that Excel allows and I wrote data checking code in R (https://www.r-project.org/) to double check that and to catch things we couldn’t in Excel. I still use that for one huge project but it’s a nightmare: updates of Windoze and seem to break backwards compatibility, M$’s way of handling local character sets seems to create problems, its data checking seems to break easily and I find it almost impossible to lock spreadsheets so that people can enter data but not change anything else. I’m sure that there are Excel magicians who can do better but I’m equally sure there are better alternatives. At the moment, with Dr. Clara Paz in Ecuador, we’re using the open source LimeSurvey survey software hosted on the server that hosts all my sites (thanks to Mythic Beasts for excellent open source based hosting). If you have a host who gives you raw access to the operating system LimeSurvey is pretty easy to install (and I think it runs on nasty closed source systems too!) Its user interface isn’t the easiest but so far we’ve been able to do most things we’ve wanted to with a bit of thought and the main thing is that it’s catching data entry errors at entry and proved totally reliable so far.