Beyond the List
OpenDOAR has always been planned to provide more than a simple list of repositories. OpenDOAR can certainly be used in this way and is a source of analysis and statistics, for comparion puposes and as a catalogue.
However, there is a need for a source of information and facilitation as a more structured information service, cataloguing and describing repositories, so that service providers can know what is they can harvest from each repository - and what rights they have in the use of the harvested information. Repository administrators need to know, in turn, the requirements of service providers. What features should a repository support in order for its content to be fully harvested, and re-used by the latest innovative service?
Repositories need to be categorised with clear information on their policies regarding tagging peer-reviewed/non-peer-reviewed material, their subject coverage, its collection and preservation policies, etc.
Harvesting analyses carried by Peter Millington show that two-thirds of repositories have neither metadata or full-text re-use policies in place to be harvested. Without this information, service providers do not know what they are allowed to do with the data that they can harvest. Without assurance of the correct permissions to be allowed to carry out analysis or re-use, repository service development will be severely hampered.
Tools in development by the OpenDOAR team address these problems. As part of outreach and community building activities, OpenDOAR will contact the two-thirds of repositories without stated policies and advise them of the situation. If a repository administrator wishes the material they hold to be widely used and referenced, then such operational policies will have to be created.
OpenDOAR tools allow the easy creation of standardised policies. Repository administrators will be able to select options from easy-to-read policy choices, which the tool will then compile into a suitable format for inclusion as a harvestable policy. The OpenDOAR tool also has the option of selecting recommended policies, in line with Open Access re-use ideas, which the administrator can select to set up their policies.
In this way, repositories will gain policies which will show service providers clear rights for information re-use. The material will then be more widely used and services will have the necessary body of data on which to build and grow. In addition, the policy generators will have the by-product of helping to standardise terminology and rights amongst repositories.
OpenDOAR plans to work in this way with other ideas, as a bridge between repository administrators and service providers, facilitating communication and development between the two groups and helping to create an understanding of the needs and capacities of each.
OpenDOAR also acts as an m2m bridge. The Nottingham team have developed an OpenDOAR API, that allows OpenDOAR data to be queried and/or re-used within other people's systems and websites.
For example, using the API, a third-party website giving a medicine-based view of repositories could use the API to find and index dedicated medical repositories of interest. The information about repositories is maintained by OpenDOAR, so the third-party developer simply sets the API query they need and use the resultant list to harvest documents and references for their website.
If the site provided search facilities, using the m2m interface, it could then structure its search to cover just these repositories. Drawing from OpenDOAR information on metadata re-use policies and full-text re-use policies, it could then offer data-mining of results, according to the different rights given by different repositories.
Using OpenDOAR information, complex customised searches could be structured. For example, a search could be structured including: the dedicated medical repositories mentioned above; then including repositories with a significant proportion of medical papers; then filtered by country; then discarding those whose metadata policies preclude data-mining full-text and then searching and data-mining the resulting list.
Without OpenDOAR a service provider would have to identify the contents of each repository, create a list of repositories of interest, maintain the list for currency and accuracy; harvest the metadata and full-text re-use policies individually; analyse the policies to check for data-mining/commercial re-use etc; exclude those repositories without the correct rights policies, and then search the result. OpenDOAR has this information and the API will allow this to be done automatically.
It is this sort of value-added feature and integration with other systems which takes OpenDOAR beyond a simple and limited list of repositories.