Bibliographic Citations For Data Files

This is a brief document to assist you with preparing citations for files used from the Data Liberation Initiative materials. It is liberally excerpted by Alberta Auringer Wood from "Bibliographic citations for computer files" prepared by Laine G.M. Ruus and Anna Bombak. As with any other physical form of published or unpublished information (manuscripts, audio recordings, etc.), computer files used as primary or secondary sources in research, or mentioned in scholarly writing of any sort, should be acknowledged in bibliographies and references. To do so ensures that authors of computer-readable works, such as data files, etc., receive due acknowledgement. Often the creation of a clean data file, with comprehensive documentation and usable by other researchers is as much if not more work than producing a monograph or periodical article, as is the work involved in the creation of a unique software product. Citing computer-readable works also ensures that readers have all relevant information needed to obtain a copy of the same source(s) for further information or for independent judgement or analysis. A third function of creating citations for computer-readable works is to ensure that these citations are included in citation indices.

There is no universal standard for citing computer files as yet. The purpose of this brief guide is to outline and provide examples of the fields of information which will unambiguously identify DLI files in citations. The order of fields, and punctuation, are normally specified by individual style manuals, some of which are listed in the references which follow that were provided by Laine and Anna. Citations generally have two components: a set of fields which serve to identify the work uniquely, and additional fields which provide the information needed to locate a copy of the same work. The fields which should be used to cite a computer file are much the same as those used to cite other formats. The problem with computer files is often identifying this information. You should include all relevant information, taking it from either the file or resource itself (if given), or from the accompanying codebook, manual or other documentation where appropriate. Other possible sources of identifying information fields are listings or catalogues of data files or electronic resources by the same producer or distributor, or even labels on the outside of a cd-rom or floppy disk. Wherever possible, give the information as provided by the author, the producer, or the distributor. When citing a computer-readable work which is accompanied by documentation (e.g. a codebook, or user manual, etc.), cite the computer file, not the accompanying documentation, unless it is only the accompanying documentation that you have used.

Basic Identifying information

1. Who is the author? Give the full name of the author(s), principal investigator(s), or corporate body or issuing agency responsible for the intellectual content of the file(s), if known.

2. What is it called? Give the full title, including subtitle(s), descriptive phrases, dates, geographic information, etc. If it does not have a discernible title, create as descriptive a title as you can, and give it in brackets. If the title is an acronym or initialism, give it followed by the expanded title:

E.g. CANSIM : Canadian socio-economic information management system

If the work is part of a larger work, give the title of the specific part you are referring to, followed by "in:" and the title of the larger work. This might occur, for example, if you have used only one 'sub-database' of a larger database.

3. What is it? Provide an indicator of the type of resource. Follow the title by a descriptive label to indicate the function of the computer-readable resource. Use "[computer file]", "[computer data]", "[computer program]" or "[computer data and program]" as appropriate.

E.g. CANSIM : Canadian socio-economic information management system [computer data]

Laine and Anna note that this field is the one in which citation and style manuals differ most widely. Most recommend the indication of the physical medium (online database, cd-rom, diskette, tape, etc.). They feel that this information is too transitory to be of use in distinguishing a work uniquely. They recommend the use of a function-based nomenclature (e.g. computer file, computer program) which will further serve to identify the work, as well as provide additional information to the reader. However, since some DLI products come in a number of formats, you may wish to distinguish them.

4. Which version is it? If relevant, give the edition, version, level, release, or issue of the work that indicates that it is different from other editions, versions, etc. of the same item.

E.g. Version 8.1

5. Who is responsible for the creation of the physical file(s)? The producer is that person or institution responsible for the creation of the physical computer file(s). Give the place in which the producer is located (city followed by province or state where appropriate), followed by the name of the producer. You may optionally include a statement of function (ie. '[producer]') following the name to clarify the role played.

6. Who is responsible for distributing the work? The distributor is that person or institution that has the right to disseminate copies of the file(s) or provide access to the file(s), e.g. an online database. If the file(s) was distributed by someone other than the producer or the author, give the place (city, followed by province or state if appropriate) and the name of the distributor. You may optionally include a statement of function (i.e. '[distributor]') following the name, to clarify the role played. If the producer and distributor are one and the same, follow the name by a joint statement of function (i.e. '[producer and distributor]').

E.g. Ottawa, Ont.: Statistics Canada [producer and distributor]

7. When was the work produced or 'published'? Give the date the file was produced, or the year copies of the version or edition you have were first distributed or made generally available, or another relevant date, such as a copyright date if present.

E.g. Ottawa, Ont.: Statistics Canada [producer and distributor], May 1993.

Additional information:
Additional information may be added to aid others in locating the work, or to aid them in evaluating whether or not they will be able to use the file(s). These include:

8. Series information. If the computer file is part of a series, give in parentheses the title of the series and, if present, any relevant part numbers by which the file is permanently identified.

E.g. (Census of Canada, 1991)

9. File size. You may give an indication of the size of the computer file(s), by giving the number of physical files and, in parentheses, the number of logical records or the size in kilobytes, megabytes, etc.; use the measure which provides most information, such as logical records for a quantitative data file, megabytes for system-dependent databases, etc. If documentation accompanies the file(s), add the phrase 'and accompanying documentation' followed by an indication of the size of the documentation in brackets (measured in pages, physical records, or bytes). Alternatively, you may give a brief description of the number and type of physical carriers (e.g. diskettes, cd-roms, etc.) on which the file(s) was received, but this is not generally recommended, since this is information that changes often and isn't usually relevant.

E.g. data file (14,826 logical records) and accompanying documentation (189 pp.) 3 data files (7.2, 5.6, and 7.6 megabytes) and accompanying documentation (9 computer files, size varies) 1 cd-rom and accompanying documentation

10. Hardware requirements. If you feel it is necessary, and if the information is relevant to evaluating the appropriateness of a reference, you may give a brief listing of the hardware and software environment required to use the work, including type of computer, operating system, peripheral hardware requirements, etc.

E.g. (Requires DOS 2.0 or higher, 640k RAM, hard disk)


Estimates by economic regions, annual average, 1987-95, Prince Edward Island [chart]. In SABAL, small area business and labour database, [computer file]. Ottawa, Ont.: Statistics Canada [producer and distributor], 1996.

SABAL, small area business and labour database [computer file] = BIDET, base de dones infrapovinciales dur les entreprises et le travail. Ottawa, Ont.: Statistics Canada [producer and distributor], 1996. 1 cd-rom. (Requires Windows 3.1 or later, MS-DOS 3.1 or later, minimum 4 MB RAM, minimum 4 MB free space on hard disc)

[Saskatchewan unemployment, 1995] [map]. In SABAL, small area business and labour database, [computer file]. Ottawa, Ont.: Statistics Canada [producer and distributor], 1996.

Union File. In General Social Survey, The Family, 1995 [computer data]. Rev. March 4, 1997. [Ottawa, Ont.]: Housing, Family and Social Statistics Division, Statistics Canada [producer and distributor], 1997. (GSS - Cycle 10). Data file (10,938 logical records) and accompanying documentation (138 p.).

