University of Huddersfield home page

Library Impact Data Project

Just another JISC Activity Data project blog!


Library Impact Data Toolkit

We are pleased to release the LIDP Toolkit.

One of the outcomes of the project was to provide a toolkit to assist other institutions who may want to test their own data against our hypothesis. The toolkit aims to give general guidelines about:

1. Data Requirements
2. Legal Issues
3. Analysis of the Data
4. Focus Groups
5. Suggestions for Further Analysis
6. Release of the Data

LIDP data has been made available under the Open Data Commons Attribution License

If you have used this toolkit to look at your data, we ask you to share your data too. Please let us know and we will link to it from the project blog.

The Library Impact Data Project Toolkit

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License

Release of Project Data

A little later than planned, but we’re pleased to announce that a subset of the data used by the project is now available under an Open Data Commons licence:

This data set is made available under the Open Data Commons Attribution License

The data contains final grade and library usage figures for 33,074 students studying undergraduate degrees at UK universities.


Each of the 8 project partners provided a set of data, based on the initial data requirements document. Not all partners were able to provide data for e-resource logins and library visits, but all were able to provide library loans data.

In order to ensure anonymity:

1) the 8 partners are not named in the data release, instead they have been allocated a randomly selected name (from LIB1 to LIB8)

2) the names of schools and/or departments at each institution have been replaced with a randomly generated ID

3) the year of graduation has been removed from the data

4) where a course had less than 30 students, the course name has been replaced with a randomly generated ID

5) some course names have been “generalised” in order to remove elements that may identify the institution


The awarded degree has been mapped to the following code:

A = first (1)
B = upper second (2:1)
C = lower second (2:2)
D = third (3)
E = pass without honours

Library Usage

Where supplied by the project partner, the following library usage data measures are included:

ISSUES = total number of items borrowed from the library by that student (n.b. this may include renewals)
ERES = a measure of e-resource/database usage, e.g. total number of logins to MetaLib or Athens by that student
VISITS = total number of times that student visited the library

Other Notes

1) each graduate has been allocated an randomly generated unique ID

2) where the course/school/department name was not supplied, it has been replaced with N/A

3) where the measure of library usage was not supplied by the partner, the value is blank/empty

Focus group analysis

The focus group analysis has just been released to each individual collaborating institution.  The groups were designed to pull out additional advising data on usage of library resources and facilities, asking students how much they used library facilities and resources, where they chose to use the resources, any difficulties they experienced, and whether the library satisfied their information and learning space requirements.

Students volunteered with a small reimbursement for their time and involvement, with varying success at each institute (if you’ve been following the blog, you’ll have already seen De Montfort’s focus group discussion), but resulting in a huge amount of data to analyse!

The coding process involved reading through transcripts to bring out broad themes, and refining the themes into smaller groups where applicable.  Transcripts were then re-read for the analysis itself, with the aim to not just code them, but to use thematic clues to develop and elaborate on what students discussed.  For example, a student discussing problems they had encountered using a resource may simultaneously be  indicating non-verbally that their student group could benefit from more in-depth information literacy training, or that there could be improved subscription options for that subject area.

Analysis was also based around frequency of mentions: the more often a code or theme was discussed, the more important an element it represented in student library use/non-use.  This method can be problematic in that it doesn’t always demonstrate emphasis and enthusiasm materialising in the group discussion, or indeed can be heavily influenced by current issues the students are experiencing, but it does still demonstrate what is important to the participant at that time and thus what is meaningful to them.  Additionally, when used in combination with other codes and the analysis technique above, it can result in a revealing image of student experiences and usage, and provide material to lead further research at a later date if appropriate.

MUDL workshop at 9th Northumbria Conference

Our Dave Pattern is part of the MUDL workshop at 9th Northumbria Conference on 22nd August.

“The MUDL (Managing and Understanding Data in Libraries) group is organising a panel workshop at the 9th Northumbria Conference on Performance Measurement in Libraries. Even if you can’t get to the conference, you can benefit from it by submitting your questions for our invited panel to consider and respond to. Topics can include any aspect of managing and using library data, from e-resource use data to finance, SCONUL statistics, footfall, performance indicators, benchmarking or whatever you like. Our panellists are:

Dave Pattern (Library Systems Manager at Huddersfield University and IWR Information Professional of the Year)

Claire Creaser (Director of LISU, Loughborough University)

Carol Tenopir (Professor of Information Science at the University of Tennessee, member of the COUNTER Board of Directors and a keynote speaker at the conference)

Ann Davies (Open University, a member of SCONUL Working Group on Performance and Quality and Chair of the SCONUL Statistics Sub Committee)

Angela Conyers (Research Fellow at Evidence Base, the evaluation and research unit within Library and Learning Resources at Birmingham City University)

You can review questions already submitted and suggest your own via the MUDL wiki: or by emailing Submitted questions will be reviewed by the committee and those considered best will be presented to the panel. Questions will be posted on the MUDL wiki, as will the responses from the panel. Further information about MUDL is available at”

The Final Blog Post

It has been a short but extremely productive 6 months for the Library Impact Data Project Team. Before we report on what we have done and look to the future, we have to say a huge thank you to our partners. We thought we would be taking a lot on at the start of the project in getting eight universities to partner in a six month project; however, it has all gone extremely smoothly and as always everyone has put in far more effort and work than originally agreed. So thanks go to all the partners, in particular:

Phil Adams, Leo Appleton, Iain Baird, Polly Dawes, Regina Ferguson, Pia Krogh, Marie Letzgus, Dominic Marsh, Habby Matharoo, Kate Newell, Sarah Robbins, Paul Stainthorp

Also to Dave Pattern and Bryony Ramsden at Huddersfield.

So did we do what we said we would do

Is there is a statistically significant correlation across a number of universities between library activity data and student attainment?

There answer is a YES!

There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas. In some cases this was more significant than in others, but our statistical testing shows that you can believe what you see when you look at our graphs and charts!

Where we didn’t find a statistical significance was in entries to the library, although it looks like there is a difference between students with a 1st and 3rd, there is not an overall significance. This is not surprising as many of us have group study facilities, lecture theatres, cafes and student services in the library. Therefore a student is as just likely to be entering the library for the above reasons than for studying purposes.

We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP!  Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.

So what is our output?

Firstly we have provided all the partners in the project with short library director reports and are in the process of sending out longer in-depth reports. Regrettably, due to the nature of the content of these reports, we cannot share this data; however, we are in the process of anonymising partners graphs in order to release charts of averaged results for general consumption

Furthermore we are also planning to release the raw data from each partner for others to examine. Data will be released on an Open Data licence at

Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at:

Next steps

Although this project has had a finite goal in proving or disproving the hypothesis, we would now like to go back to the original project which provided the inspiration. This was to seek to engage low/non users of library resources and to raise student achievement by increasing the use of library resources.
This has certainly been a popular theme in questions at the SCONUL and LIBER conferences, so we feel there is a lot of interest in this in the library community. Some of these ideas have also been discussed at the recent Business Librarians Association Conference

There are a number of ways of doing this, some based on business intelligence and others based on targeting staffing resources. However, we firmly believe that although there is a business intelligence string to what we would like to take forward, the real benefits will be achieved by actively engaging with the students to improve their experience. We think this could be covered in a number of ways.

  • Gender and socio-economic background? This came out in questions from library directors at SCONUL and LIBER. We need to re-visit the data to see whether there are any effects of gender, nationality (UK, other European and international could certainly be investigated) and socio-economic background in use and attainment.
  • We need to look into what types of data are needed by library directors, e.g. for the scenario ‘if budget cuts result in less resources, does attainment fall’? The Balanced Scorecard approach could be used for this?
  • We are keen to see if we add value as a library through better use of resources and we have thought of a number of possible scenarios in which we would like to investigate further:
    • Does a student who comes in with high grades leave with high grades? If so why? What do they use that makes them so successful?
    • What if a student comes in with lower grades but achieves a higher grade on graduation after using library resources? What did they do to show this improvement?
    • Quite often students who look to be heading for a 2nd drop to a 3rd in the final part of their course, why is this so?
    • What about high achievers that don’t use our resources? What are they doing in order to be successful and should we be adopting what they do in our resources/literacy skills sessions?
  • We have not investigated VLE use, and it would be interesting to see if this had an effect
  • We have set up meetings with the University of Wollongong (Australia) and Mary Ellen Davis (executive director of ACRL) to discuss the project further. In addition we have had interest from the Netherlands and Denmark for future work surrounding the improvement of student attainment through increased use of resources

In respect to targeting non/low users we would like to achieve the following:

  • Find out what students on selected ‘non-low use’ courses think to understand why students do not engage
  • To check the amount and type of contact subject teams have had with the specific courses to compare library hours to attainment (poor attainment does not reflect negatively on the library support!)
  • Use data already available to see if there is correlation across all years of the courses. We have some interesting data on course year, some courses have no correlation in year one with final grade, but others do. By delving deeper into this we could target our staffing resources more effectively to help students at the point of demand.
    • To target staffing resources
  • Begin profiling by looking at reading lists
    • To target resource allocation
    • Does use of resources + wider reading lead to better attainment – indeed, is this what high achievers actually do?
  • To flesh out themes from the focus groups to identify areas for improvement
    • To target promotion
    • Tutor awareness
    • Inductions etc.
  • Look for a connection between selected courses and internal survey results/NSS results
  • Create a baseline questionnaire or exercise for new students to establish level of info literacy skills
    • Net Generation students tend to overestimate their own skills and then demonstrate poor critical analysis once they get onto resources.
    • Use to inform use of web 2.0 technologies on different cohorts, e.g. health vs. computing
  • Set up new longitudinal focus groups or re-interview groups from last year to check progress of project
  • Use data collected to make informed decisions on stock relocation and use of space
  • Refine data collected and impact of targeted help
  • Use this information to create a toolkit which will offer best practice to a given profile
    • E.g. scenario based

Ultimately our goal will be to help increase student engagement with the library and its resources, which as we can now prove, leads to better attainment. This work would also have an impact on library resources, by helping to target our precious staff resources in the right place at the right time and to make sure that we are spending limited funds on the resources most needed to help improve student attainment.

How can others benefit?

There has been a lot of interest from other universities throughout the project. Some universities may want to take our research as proof in itself and just look at their own data; we have provided instructions on how to do this at We will also make available the recipes written with the Synthesis project in the documentation area of the blog, we will be adding specific recipes for different library management systems in the coming weeks:

For those libraries that want to do their own statistical analysis, this was a was a complex issue for the project, particularly given the nature of the data we could obtain vs. the nature of the data required to specifically find correlations. As a result, we used the Kruskal Wallis (KW) test, designed to measure whether there are differences between groups of non-normally distributed data. To confirm non-normal distribution, a Kolmogorov-Smirnov test was run. KW unfortunately does not tell us where differences are, the Mann Whitney test was used on specific couplings of degree results, selected based on visual data represented in boxplot graphs. The number of Mann Whitney tests have to be limited as the more tests conducted, the higher the significance value required, so we limited them to three (at a required significance value of 0.0167 (5% divided by 3)). Once Mann Whitney tests had been conducted, effect size of the difference was calculated. All tests other than effect size were run in PASW 18; effect size was calculated manually. It should be noted that we are aware the size of the samples we are dealing with could have indicated relationships where they do not exist, but we feel our visual data demonstrates relationships that are confirmed by the analytics, and thus that we have a stable conclusion in our discarding of the null hypothesis that there is no relationship between library use and degree result.

Full instructions of how the tests were run will first be made available to partner institutions and disseminated publicly through a toolkit in July/August

Lessons we learned during the project

The three major lessons learned were:

Forward planning for the retention of data. Make sure all your internal systems and people are communicating with each other. Do not delete data without first checking that other parts of the University require the data. Often this appears to be based on arbitrary decisions and not on institutional policy. You can only work with what you’re able to get!

Beware e-resources data. We always made it clear that the data we were collecting for e-resource use was questionable, during the project we have found that much of this data is not collected in the same way across an institution, let alone 8! Athens, Shibboleth and EZProxy data may all be handled differently – some may not be collected at all. If others find that there is no significance between e-resources data and attainment, they should dig deeper into their data before accepting the outcome.

Legal issues. For more details on this lesson, see our earlier blog on the legal stuff

Final thoughts

Although this post is labelled the final blog post, we will be back!

We are adding open data in the next few weeks and during August we will be blogging about the themes that have been brought out in the focus groups.

The intention is then to use this blog to talk about specific issues we come across with data etc. as we carry our findings forward. At our recent final project meeting, it was agreed that all 8 partners would continue to do this via the blog.

Finally a huge thank you to Andy McGregor for his support as Programme Manager and to the JISC for funding us.

Huddersfield — borrowing year on year

If you’ve seen Graham or myself presenting recently about the LIDP, you’ve probably seen this graph (described here)…

The graph shows 5 years of graduating students (2005/6 thru to 2009/10, with approx 3,000 graduates per year) and the average number of books they borrowed. So, “2005/6″ shows the average number of books borrowed by the 2005/6 graduates

Quite early on during our data analysis, I noticed that the correlation in book borrowing seemed to be there from day one for students — in other words, students who eventually get the highest grades borrow more in their first year of study than those who eventually get lower grades.

So, here’s a year by year breakdown of the above graph, where “year 3″ is the year the student graduated in…

borrowing in year one only

I’m actually quite surprised how clear the gaps are between each grade, even though we’re not talking about large numbers of loans.

borrowing in year two only

The borrowing by students who go on to get a first is fairly similar in the second year, as is the borrowing by those who’ll get a third. However, the borrowing by 2:1 students increases to a similar level to firsts (although you can see in 2009/10, second year borrowing by the firsts is breaking away).

borrowing in year three only

In the final year of studies, we see a marked increase in borrowing (no surprises there!). As with the original graph, we can see that 2:2s and thirds are showing a declining trend in borrowing.

In many of the data sets we’ve looked at in this project, we’ve seen similar(ish) borrowing levels for firsts and 2:1s. At most, in 2009/10, the gap in average borrowing is only 4 books. However, it does look like borrowing by 2:1s in their final year of study is also showing a declining trend.

Some thoughts from Lincoln

Thanks to Paul Stainthorp at the University of Lincoln for allowing us to cut and paste this blog post. You can see the original at:

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

On the road with the #LIDP project

We have just updated the Articles and Conferences pages with the slides from recent events, including CILIPS, SCONUL, LIBER, BLA, UC&R East Midlands and Welsh Higher Education Library Forum colloquium.

In addition the Ariadne article will be out soon and we have just had our paper acepted for LIBER Quarterly.

Catch us next at the 9th Northumbria International Conference on Performance Measurement in Libraries and Information Services, NAG and Internet Librarian.

Talking to Business Librarians at the BLA Conference

We have been out and about disseminating the early findings of the LIDP project over the last few weeks. We have been delighted with the feedback we have received from conference delegates and a lot of the comments about possible future directions for research from the CILIPs, SCONUL and LIIBER conferences have given us food for thought. Many of these comments will appear in the final project blog post before the end of July. However, we had the opportunity at the Business Librarians Association Conference at Sheffield ( of testing some of these thoughts. After our presentation ( we divided delegates up into a number of groups to discuss a variety of scenarios.

Scenario 1
If we assume a link between library usage and attainment, what does good practice look like? What are the students who gain a first doing differently to their colleagues who get lower grades? Do high achievers choose ‘better’ resources, or are they ‘better’ at choosing resources?
Two groups reported back on this scenario with the following recommendations:

  • Talk to high achievers to find out what they are doing, e.g.
    • Working
    • Using data effectively
    • Using the right resources
  • Establish what good practice is, e.g. finding, using interpreting
  • Consider the requirements of the subject, for example mathematics courses often require much less resource use than other subjects such as history
  • Qualitative statistics need to be considered in addition to quantitative statistics
    Consider the impact of information literacy and support services
  • Find out the student’s own personal goals, e.g. why are they attending the course – as a work requirement etc.
  • Look at which resources are being used, such as extended reading, not just how much
  • Teach the students evaluation skills to help them find appropriate resources, not just ‘better’

Scenario 2
If students are not using the library or the resources, what can we do to change their behaviour? Is non-use a resourcing issue or an academic/information skills issues? How could gender, culture and socio-economic background affect library usage and how could this be addressed? Are there scenarios where we should NOT try to increase library use?

Groups considered a number of factors that could be used to change behaviour:

  • Incentives
    • Attached to an assignment
    • Work with and win over the academics
    • Encourage student champions
    • Make sure the resources are embedded and relevant to the subject

Regarding non-use, the groups thought that both issues were relevant. The skills issues required further training and the resources needed simplifying.
Gender, culture and socio-economic background were themes brought out at both the SCONUL and LIBER conferences. One group looked at international students where it was considered that they were too dependent on Google – does this means our resources are too difficult to understand? It was also considered that there is a focus on generalisations, e.g. international students, rather than looking at individuals. Another group considered that it was a cultural issue and that students were guided to the ‘right answer’ via reading lists, rather than reading around the subject.
Finally discussion turned to work-life balance and whether students should be logging in at 2am, and whether our culture of 24×7 access was a healthy one.

Scenario 3
Can we actually demonstrate that the library adds value? E.g. if a student enters university with average UUCAS points and attains a first class degree having used library resources to a high level, does this prove the library has added value to the student achievement? Have we done anything? Do they need us?

The short answer to this scenario was yes!
We receive feedback, both internal and external and have provided learning spaces and essential resources at the very least. We can also show that we have promoted our services and embedded information literacy skills into the curriculum by working successfully with academic staff. It was thought that we add to the employability of students by teaching them research skills and giving certification, e.g. Bloomberg etc.

Scenario 4
If the hypothesis is proved to be correct, does cutting library budgets mean that attainment will fall? Is this something that can be used at director level to protect resource budgets/subject librarians? Should we be concerned about implications for publishers if the hypothesis is proven?

The group that looked at this scenario considered that further use of statistics were required to find out what students were reading. This would allow stock to be rationalised and the reduced budget could be used to better target appropriate resources.

In addition it was suggested that other services such as, inductions and information literacy training by audited and evaluated in order to provide more effective targeting.

It was also felt that there was an absolute minimum spend for resources, once this level was passed impact would be huge with insufficient resources to support courses.

The group felt that this could be used at Director level and that evidence would be required to support this.
Big deals came up in the final point from this scenario. Discussion centered on a standoff between the need for better products verses ongoing financial commitments

Many thanks to all the delegates for allowing us to blog about their comments and to the BLA for letting us loose at their conference. We’ll be adding some of these comments to our final blog post.

Reflections on Huddersfield’s data

Following on from De Montford’s blog post about the nature of their data submission, we’ve been thinking a bit more about what we could have included (and indeed what we might look at when we finish this project).

Read the rest of this entry »