Breaking Records: The History of Bibliographic Records and Their Influence in Conceptualizing Bibliographic Data

A bibliographic record is a conceptual whole that includes all bibliographic information about a resource together in one place. With the Semantic Web, individual data statements are linked across the web. This position article argues that the traditional conceptualization of bibliographic records affects the affordances and limitations of that data. A historical analysis of the development of bibliographic records contrasted with the Semantic Web model reveals how the “record” model shaped library cataloging and the implications on library catalogs today. Reification of the record model for bibliographic data hampers possibilities for innovation in cataloging, inspiring a reconceptualization of bibliographic description.


Introduction
Cataloging and bibliographic description are arguably fundamental pillars of librarianship, supporting the selection, management, and preservation of information.When collections of information become too large to organize and access directly, librarians create surrogate "bibliographic records" to represent specific items in a library's collection, physical or digital. 1 The purpose of these bibliographic records is ostensibly to describe a document in sufficient detail to identify it uniquely among other documents and specify where the record can be located within in a file of other bibliographic records. 2 Originally, mere title and author information was enough to distinguish a work among others and order its record within a collection of records.But over time, the amounts and types of data that constitute sufficient detail to uniquely describe a resource and offer access to its description have grown.As more resources arose, more data was considered necessary to distinguish one resource from another.
The conceptual model of a bibliographic record, then, is one of a collection of data elements.This collective model served American librarianship well in its first 100 years.However, changes in the form and distribution methods of resources-especially in digital environmentsraise questions about whether librarians' traditional conceptual model is still relevant and applicable.Contemporary data models like the Semantic Web's model of linked Data directly conflict with this traditional conceptualization of bibliographic records.A bibliographic record is a conceptual whole that includes all the bibliographic information about a resource together in one place, like a catalog card or a MARC record.With Linked Data, bibliographic data need not be a conceptual whole.Bibliographic data about a resource is no longer collected in a single location, but rather linked from many multiple locations across the web.What then, are the implications of the traditional conceptualization of the bibliographic record for librarianship?
This position paper argues that the way in which librarians conceptualize bibliographic data-as a "record"-affects the affordances and limitations of that data, especially in digital environments.By tracing the development of the concept of the bibliographic record from early American librarianship through the present, and contrasting that model with the current Semantic Web model, this paper will reveal how the "record" model shaped library cataloging from early physical catalogs through contemporary digital software and interfaces.

Early American bibliographic records
Early American library catalogs in the colonial period were typically made available in book form, either manuscript or printed.American library catalogs had no systematically articulated or standardized rules for recording bibliographic information. 3Rather, they looked to 17 th century English library catalogs, such as the ones from the Bodleian Library, for guidance.
Bibliographic description in Thomas Hyde's 1674 Bodleian Library catalog mandated the inclusion of the following elements 4 :  author surname  brief title  number and size of volumes On occasion, when warranted, entries might also include the date and place of publication and/or a note about the identity of the donor.What was the reasoning for including these data elements and not others?During the colonial period, little attention had been given to articulating the purposes of the library catalog.Many printed book catalogs were not designed to be finding aids, but rather served as marketing tools to acknowledge donations, publicize the collection, and solicit new patrons and further donations. 5It wasn't until 1826, when the American Philosophical Society suggested that purpose of library catalogs was to show library holdings by subject and editions, that catalogs started to function as user tools. 6Once this designation was in place, standardized sets of rules became necessary.
The most famous early cataloging rules are Panizzi's 91 rules.Many cataloging scholars believe that our current cataloging structure can be traced back to these very rules. 7Panizzi stipulates that entries include the following data elements: 8  title  author  "any striking imperfection in a book...and any remarkable peculiarity"  the number of parts, volumes, etc.
 place of publication, printer's name, date of publication  printing notes (such as whether the work was printed on fine paper, vellum, etc.)With the advent of the card catalog in American libraries in the 1850s, the shift to catalogs as aids for library users continued.By the late 1800s, Cutter convinced librarians that the purpose of the library catalog is to assist users in finding books by title, author and subject. 9This requires that subject data be added to the cadre of bibliographic data recorded about a work.
While book catalogs grouped descriptive bibliographic entries by subject, catalog cards recorded subject heading information alongside the descriptive cataloging information, thus adding subject data elements to the burgeoning bibliographic record.The individual nature of cardseach card representing or acting as a surrogate for one specific resource-began to solidify the conceptualization of the bibliographic record as its own entity, one that carries a collection of data elements.While each line item or multi-line description in a book catalog was a record in that it recorded data about a resource, it was the full catalog that was the conceptual whole.Book catalog entries were not separate entities that could be managed or manipulated individually.The transition to individual slips or cards affords the treatment of bibliographic records as items in and of themselves.
Bibliographic records in the 20 th century In the early 1900s, the Library of Congress demonstrated support for card catalogs when they began their Card Distribution Service.In this era, what become known as "added entries" begin to be added to bibliographic records in addition to the previously stipulated elements.This data was intended to represent conceptual relationships, such as serials that were preceded or followed by other serials. 10Rules for cataloging continued to evolve, and many rules documents emerged, from Catalog Rules, Title and Author Entries 11 in 1908 and its subsequent revisions, to Rules for Descriptive Cataloging in the Library of Congress 12 in 1949.These sets of rules struggled to address issues regarding selection, form, order, and arrangement of values for the data elements.However, none of these rules questioned the fundamental principle that bibliographic records were the appropriate tool for recording these values.Even the shift in the 1950s and 1960s toward cataloging based on underlying principles and objectives 13,14 assumed the concept of the bibliographic record.
As new technological possibilities arose in the 1960s, the Library of Congress became interested in the possibility of "automating" library work, such as the printing of catalog cards.Studies demonstrated how bibliographic data from catalog cards could be input "by punching on a tape punching typewriter," making the data "machine-interpretable"-meaning that each field of data (title, author, etc.) would be parsed separately by the machine. 15To assist with data entry, a "workform" was used to collect bibliographic data about an item. 16The workform, like the catalog card, was an entity of its own, with boundaries, space limitations, and prescribed data elements, thus reinforcing the conception of the bibliographic record as a thing.Additionally, the computerized collection of data elements created for Machine Readable Cataloging (MARC) became known as a "MARC record," a term still in use today.Physically, a MARC record is formless-as a compendium of electronic bytes, it has no tangible physical form.
A raw MARC record has no shape at all, it simply consists of a string of unformatted characters (see Figure 2).Structure is imposed through standardized character positions (suach as in the leader), delimiters and punctuation.Character placement is also used in the directory, which defines the structure of the variable fields. 17  While this raw data is intended for machine processing, the human interface continued to resemble the workform.As technology evolved, data entry shifted from physical paper workforms to direct electronic input.Yet even today these input interfaces still retain the shape and structure of the physical workform, reifying the idea that the bibliographic record is a collection of specific data elements within a bounded space.ed.(1980), where it is defined as "a transcription of the complete cataloging information for any item." 18According to Wynar, the purpose of bibliographic records is to 1. provide all information necessary to describe an item accurately both physically and intellectually in order to distinguish it from every other item

to provide its location in the collection
Subsequent editions of the same textbook define bibliographic records as "catalog entr[ies] in a card, microtext, machine-readable or other form carrying full cataloging information for a given item in the library." 19(491, 499).Other well-known texts such as Chan's Cataloging and Classification: An Introduction 22 , 23 and Downing's Introduction to Cataloging and Classification 24 that have multiple editions in this time frame offer similar definitions.These definitions reveal the reinforcement of bibliographic record as "thing" by stipulating a form of existence, be it physical or digital.This notion of bibliographic record as "thing" is only reinforced by giving it physical shape through the use of catalog cards or conceptual shape through tools like the workforms used in the creation of machine-readable records.Just as these conceptual boundaries shape the input of bibliographic data, so too do they shape the output in the way that records are displayed [cite SHOT talk here?].Even modern texts and instructional manuals define records as things with space and place, such as "the place where information about each item in written down, wither on a physical 3 x 5 card or in the online database." 25Definitions for all these textbook examples describe descriptive information about a resource in plural terms; that is, the multiple data elements collectively necessary to describe a resource.
Emerging definitions and statements of purpose for bibliographic records also often refer to "full," "enough" and/or "accurate" bibliographic data as a requirement for bibliographic records.
"retrieval of any item in a collection depends on a full and accurate description of the work" 26 But what is meant by "full" and "accurate"?unknown future uses.However, the decision of which data elements to include was ultimately left to the Library of Congress, since they were the major institution supporting the project in terms of both staff and funding. 28Early MARC records included only the bibliographic data that was of use to the Library of Congress.As other libraries encountered the need for additional data elements, "enhanced" bibliographic records were developed.These enhancements added data elements based on the most popular suggestions from libraries, the most tested elements, and the most feasible additions, which mainly consisted of the following: 29  increased number of subject headings  inclusion of tables of contents  inclusion of summaries and abstracts  improved use of classification  inclusion of back-of-book indexes  including subject descriptors from other vocabularies in addition to Library of Congress Subject Headings (LCSH), including both specialty controlled vocabularies and natural language terms At the same time, cooperative cataloging among institutions was also on the rise as the ease of sharing MARC records increased, likely due to new media formats for record distribution as well as network technologies.Enhanced records and other deviations meant that not all the shared records contained the same bibliographic data elements.In the 1990s, the cooperative cataloging community developed a set of standards for "core" records, "an agreed minimum set of bibliographic data for a catalogue record which every library participating in a cooperative programme commits to provide." 30  Additional data elements were considered optional.AACR2r also stipulated varying levels of data elements needed for bibliographic records, from Level 1 (which omits information like subtitles, series titles, illustrations, size, and additional authors); Level 2 (which requires the inclusion of subtitles, series titles, statements of responsibility and additional physical details); to Level 3, which are "by definition, the most thorough work possible," requiring the inclusion of every applicable detail mandated by the rules. 32Yet "every applicable detail" is still limited by the extent of the AACR2r rules, which address only the eight traditional areas of bibliographic description.A traditional bibliographic record, then, is a unit of a limited set of multiple data elements embedded in a physical or digital space meant to stand in for a physical or digital resource.These characteristics shape the current mental model of how librarians conceptualize bibliographic data.
The turn of the 21 st century Despite the long service of the bibliographic record as conceptualized in the 20 th century, as user demands and digital technologies continued to advance, librarians attempted to keep pace.
Years of analysis of relationships between resources 33 led to the abstract model Functional Requirements for Bibliographic Records (FRBR) 34 .While certainly an advance over traditional cataloging, even the name itself reveals the insidious unconscious attachment to the bibliographic record as the unit of description for a resource.Therefore, any new cataloging codes and standards based on FRBR will continue to be based on the idea of records.Resource Description and Access (RDA), for example, relies on linking of records for works, manifestations, expressions, and items. 35This means that catalogers must create separate records for each work, each manifestation of that work, each expression of each manifestation, and possibly each item. 36These records are then related to one another using FRBR relationships.Relationships therefore exist at the record level, relating a collection of data elements to another collection of data elements, rather than at the level of the data elements themselves.

New conceptual models
The traditional library approach to describing bibliographic data in the format of a bibliographic record-a formalized unit containing multiple data elements-is clearly embedded in the psyche of most librarians as well as in their practices and technologies.While such conceptualization proved a valuable tool for American libraries in the 20 th century, new conceptual models of data at large from outside the library world may also prove useful, especially in the context of data sharing via the World Wide Web.
The idea of the Semantic Web is one such model that many agree can revolutionize librarianship.The purpose of the Semantic Web is to "bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users," such as more precise and automated searching. 37 addition to the benefits of more precise search and retrieval, Semantic Web technologies can also benefit libraries by making library data more openly accessible and interoperable with other library and non-library systems, and library resources more easily findable on the Web at large. 38The Semantic Web is made possible through "Linked Data"-a method of publishing data-rather than documents-in a structured, machine-readable way, thus enabling connection ("linking") of data among various sources for improved information retrieval and experiences. 39like the traditional model of bibliographic records, which associates multiple descriptive properties with one resource, the semantic web model uses multiple statements, each one individually associating a descriptive property with a resource. 40These statements follow standards for publishing data on the web that hinge on two major conceptual foundations: 1) Uniform Resource Identifiers (URIs) and 2) semantic web standards, such as the Resource Description Framework (RDF). 41These statements, known as "triples," are formulated as assertions that include a subject, a predicate, and an object. 42For example, the author of a work may be conceptualized as follows: the model to work successfully, however, subjects and predicates must be represented by Uniform Resource Identifiers (URIs)-unique character strings used to "distinguish one resource from all other resources. 43Every resource, physical or digital, concrete or abstract, is assigned a standardized character string that uniquely identifies that resource among other resources.A URI may also specify location of a resource, especially in the case of digital resources. 44A URI is a single data element that uniquely describes (and more often than not, locates) a resource.Each triple describes one, and only one, data element.Any amount of triples may be used to describe a resource.These triple statements may be asserted by the same source (such as a single library or institution), or they may be asserted by other institutions and organizations across the Web (such as VIAF or Wikipedia, for example).The major point is that individual triple statements about a single resource need not be collocated in the same physical or intellectual space.Additionally, with the conceptual model of Linked Data, it is no longer necessary to record a complete "set" of data elements to sufficiently describe a work.Unlike a bibliographic record, which as we have seen requires a collection of multiple data elements to describe a specific resource, the model of the semantic web relies on the idea that a single data element-the URIuniquely identifies a resource.In the Linked Data world, bibliographic data need not be a conceptual whole.There is no need for a minimum (or maximum) number of data elements to describe a resource.In fact, the collective unit model espoused by the traditional bibliographic record directly contradicts the underlying concept of Linked Data and the Semantic Web.The purpose of the RDF model is to allow data assertions to be parsed independently of one another.
While a traditional bibliographic record requires all mandatory elements to be present for that record to exist, a resource description in the Semantic Web model comes into existence with the very first URI triple.Additional descriptive triple statements may be created at the same time, or they may be created and added over time.And unlike cards or workforms that pose limitations on physical space and therefore on the amount of descriptive data, triple statements are unlimited.Libraries and other institutions may choose to only display a selected set of triple data, and different institutions may choose to display the data elements that are most useful and desirable to their unique users.It is the underlying concept of separate, individually stated data elements that allows for the accessibility and interoperability so desirable to libraries.Yet based on the review of bibliographic records presented earlier in this paper, libraries and librarians still seem entrenched in conceptualizing resource descriptions as cohesive collections of data and educating future librarians to do the same.

Implications
Throughout the 20 th century, catalogers and cataloger scholars supported the notion that library cataloging is independent of technology, and that the principles of cataloging exist independently of technological medium. 45The preceding analysis of the history and evolution of bibliographic records and they ways in which they are conceptualized demonstrates that cataloging rules and principles were, and continue to be, closely tied to the technologies of the eras of their development.The catalog card in particular had a long-standing effect on which and how much bibliographic data could be recorded about a resource, concepts which continued to be enforced in cataloging even as new technologies (such as computing) arose that did not afford the same limitations.This historical analysis of the discourse around bibliographic records reveals that the need for its existence in its current form has been taken for granted and gone unquestioned for more than 100 years.Early analysis of cataloging rules in the 1950s found them to be based on "a long accumulation of essentially ad hoc decisions" rather than any stipulated purposes or objectives. 46Later rules, such as ISBD, AACR, and even RDA purport to support cataloging objectives: finding works by title, author, or subject, identifying them, selecting them from among other works, and obtaining them. 47Yet the development of machine-readable cataloging data-which had a significant impact on the conceptualization of the bibliographic record-was originally developed not with cataloging objectives in mind but rather goals of improved internal efficiency. 48Additionally, despite the claim of user service, it seems clear from years of user studies that the traditional cataloging objectives are not as well aligned with user needs as perhaps was once thought.Users do not use library catalogs in the same way librarians assume or expect they should.Users seem to have a different mental model of how bibliographic data and links between these data are structured.Why shouldn't the underlying structure of library catalogs evolve to reflect how users model and conceptualize information?Such a new model would be based on 1) how users conceptualize information relationships (rather than how librarians or existing bibliographic records conceptualize them) and 2) the influences of current technology, both its effect on users' conceptualizations as well as its new affordances.
Traditional bibliographic records were also conceptualized as surrogates to physical resources.
Yet few contemporary libraries restrict their collections to solely physical holdings.Even as librarians' resource collections have evolved, the definition of the bibliographic record remained the same, save for small wording changes like "book" to "information package."While this reflects the shift to describing the intellectual content of a resource separate from the physical carrier that contains it, the assumption remains that a bibliographic record comprised of collective data elements is the appropriate tool for description.Despite the acknowledgement of significant resource evolution, little to no consideration was even considered for underlying structural methods of bibliographic description.Which data elements were necessary for inclusion and how to record their values underwent constant renegotiation, yet the fundamental carrier for this data remained unquestionably accepted.
With increasingly self-describing digital resources, there is less need for surrogate bibliographic records to provide description and location information.Digital documents can represent themselves with URIs and linked triple statements that specify locations.Even physical, nondereferencable resources that need pointers to physical locations (like physical books, archival materials, or even people) can be located so long as a triple statement asserts the location information (such as a call number, box number, or address, respectively).In a traditional bibliographic record manifested via a catalog card or MARC record, this type of location data could not be conceptually divested and used separately from the rest of the included bibliographic data.But triple statements do not need to be carried alongside other data elements.So long as a URI is assigned to the resource, any bibliographic data or other descriptive information that references that URI is available as an assertion.It need not be included in the same document, the same database, even the same library or the same country.
The conceptualization ofibliographic data under a Linked Data model would exist as individual statements rather than the traditional mentality of collections of descriptive data elements.As sets of data, traditional bibliographic records required a minimum number of elements or mandatory amount of data to be considered full and accurate.Bibliographic records offered stable boundaries indicating which elements were included and which were excluded.With individual triple-structure assertions, this notion becomes moot.While actual implementation may vary, with a new conceptual model of cataloging based on Linked Data , catalogers would no longer think in terms of records at all-rather, they would conceive of data as individual triple statement assertions.The conceptual work of catalogers then begins to shift.Instead of seeing a bibliographic record as some sort of "checklist" as to which data can be included and which data have no fit, catalogers could be free to include any and all triple statements deemed useful to describe a resource.Catalogers may continue to be guided by traditional cataloging objectives: i.e., if one objective is to find works by author, then author is clearly a necessary triple statement (or multiple triples in the case of multiple authors).Or catalogers may choose to reexamine traditional cataloging objectives in light of the contemporary information environment and the needs of their local users.Catalogers might make much more localized and specific triple statement assertions based on local needs.Because the Linked Data model supports tracing data via URIs, patrons can follow triple statements to find additional useful data.
However, catalogers can only begin to work in this way if they are not held bound by the traditional definitions and conceptualizations of bibliographic records.Many catalogers are certainly interested in the Semantic Web, evidenced by professional interest groups 49 and even national cataloging committees. 50However, it is almost always more difficult to break out of an existing conceptualization than to start fresh.For this reason, we need to strongly examine cataloging education and how we teach future catalogers about bibliographic description and data.The use of traditional definitions and descriptions of bibliographic records-especially continuing to use the catalog card as an illustrative analogy-is problematic in shaping the perspective of future catalogers.If we continue to educate catalogers using the traditional model of the bibliographic record, we risk remaining entrenched in this model ten, twenty, fifty years down the line, keeping libraries and library catalogs on the trailing edge of technology, perhaps even contributing to libraries' obsolescence.Instead, let us educate the future members of our profession to be open-minded to new possibilities, both technological and conceptual, and by doing so, perhaps it will be libraries once again developing the next generation of technological innovation, whatever is to come next after the Semantic Web.

Conclusion
The idea of the bibliographic record-a collection of descriptive information that stood as a surrogate for a physical resource-enabled significant advances in American library cataloging.
Yet as other components of libraries changed, such as resource types and technological tools, the model of the bibliographic record remained static.Types of data elements and the source and format of their values were often highly disputed, yet the fundamental underlying structure of the bibliographic record remained unquestioned for nearly 100 years.
However, the evolution of digital resources and technologies has spurred new ways of conceptualizing data.The Semantic Web's Linked Data model, with its individual descriptive statements, triggers an opportunity for the re-conceptualization of bibliographic records away from conceptual collective units that include all the bibliographic information about a resource together in one place, like a catalog card or a MARC record.Linked Data allows for individual data elements to be collected in and linked from many multiple locations across the web, opening up new opportunities for the ways library cataloging is conceptualized and practiced.
Libraries have already shown interest in Linked Data as a means of improving catalogs for both patrons and librarians.But if libraries and librarians continue to assume the bibliographic record as the basic unit of the library catalog, and continue to teach that model to future librarians, they risk perpetuating and reifying it.Linked Data clearly holds much more potential when used to link data instead of records.However, so long as the concept of the record as a collective unit holds sway, library bibliographic data remains conceptually incompatible with these new models, and so long as cataloging continues to be based on records rather than data, library catalogs will not change.

Figure 1 :
Figure 1: Sample of one of the earliest workforms used to record bibliographic data.From Lawrence F. Buckland, The Recording of Library of Congress Bibliographical Data in Machine Form.Washington D.C.: Council on Library Resources, Inc., 1965, figure 3. .

Figure 2 .
Figure 2. Raw MARC data for the book American Scenery, or, Land, Lake, and River Illustrations of Transatlantic Nature.Data from http://www.hray.com/5264/marc.htm

Figure 3 :
Figure 3: A contemporary digital workform in OCLC's Connexion.From http://oclc.org/connexion/interface/client.en.html By the 1980s, "essential parts of a full catalogue entry for a modern book" (per Anglo-American Cataloguing Rules, 2 nd edition 1.0B and International Standard Bibliographic Description for Monographic Publications included the following elements: 27  title  statement of responsibility  edition  imprint [i.e.place, publisher, year of publication]  collation [physical description]  series statement  any necessary annotation [e.g., contents list, summary]  ISBN In addition to the data elements prescribed by accepted cataloging standards, interest arose in adding additional useful data to bibliographic records, especially MARC records.The inclusion of additional bibliographic data in records was actually discussed during the development of MARC: most library representatives favored the inclusion of as much data as possible to address IFLA stipulated that these minimal records included the following data elements: 31  title  statement of responsibility  edition  place of publication  name of publisher  date of publication  specific materials designation and extent  title proper of series  numbering within series  standard number

Figure 4 :
Figure 4: Vizualization of an RDF triple statement declaring Mark Twain to be the author of The Adventures of Huckleberry Finn.Illustration by the author.

Figure 5 .
Figure 5. Visualization of RDF triple statements resolving to a single URI.Illustration by the author.

Figure 6 :
Figure 6: Visualization illustrating diverse and disparate sources of triplets.Illustration by the author.