John A. Kunze - Curriculum Vitae


1325 Josephine St.
Berkeley, CA 94703
+1 510-684-2376
jak at ucop dot edu

RESEARCH AND DEVELOPMENT

Information Standards

Metadata Semantic Standards: chair of the National Information Standards Organization committee that moved the Dublin Core (DC) metadata specification to approval as NISO Z39.85 (2001); co-author and editor of the original DC metadata specification, RFC 2413; member of the DC Advisory Board; chair of the DC Kernel metadata, DC Agents, and DC Date working groups; inventor in 1992 of the Z39.50 Info-1 generic attribute and element set, a precursor of DC

Metadata Syntax Standards: author of the specification for encoding Dublin Core in HTML, RFC 2731; author of the ANVL (A Name Value Language) specification for simple labels and values in the style of RFC822; author of the TEMPER specification for dates; inventor of the Z39.50 Generic Record Syntax (GRS)

Uniform Resource Identifiers: in the context of the URI working group of the IETF, coined the term, URC (originally Uniform Resource Citation); contributed substantively to the URL standard itself, which includes authoring RFC 1736, functional requirements for the URL; led the creation of RFC 2056 describing Z39.50 URLs

Electronic Permanence

Web Archiving: wrote a winning grant proposal to the Library of Congress to create a service allowing curators to harvest and preserve web sites on demand, creating a geographically replicated archive that they can annotate, browse, and search; results of this 3-year grant used to advise congress on a national preservation strategy

Permanence Ratings: defined the framework for and participated centrally in the US National Library of Medicine's multi-dimensional specification of permanence levels (about object availability, content invariance, and identifier validity); this work was published and the framework adopted by the US National Agriculture Library

ARK Persistent Identification Scheme: at NLM, analyzed 1998-era persistent naming schemes (URN, DOI, PURL, PDI), and formulated a new scheme, the Archival Resource Key (ARK), founded on the principle that persistence is purely about the service commitments of current object holders, not scheme syntax or the intent of name assigners that have no extant object service responsibility

Persistent Identifier Infrastructure and Outreach: at the California Digital Library (CDL), built a generalized ARK minting system that produces random, unique, non- sequential identifiers designed for permanence; these were compact, transcribable, semantically opaque, and furnished with a check character that guarantees each in the face of single digit and transcription errors; gave talks on this subject at international meetings (Digital Library Federation, American Library Association, European Conference on Digital Libraries, Society for Scholarly Publishing, Digital Curation Centre, etc.)

Networked Information Systems

Z39.50 Protocol: at UC Berkeley, designed, wrote, and released the first complete Z39.50 client and server protocol engine, participating in the first three-way interoperability demonstration at NET '92 with UC Office of the President (UCOP) and Penn State; worked closely with the Z39.50 Implementors Group, creating and winning adoption of the Generic Record Syntax in support of non- bibliographic applications; at NLM, designed and wrote a Z39.50 server for the MEDLINE database, and designed a generalized thesaurus mechanism (Structured Vocabulary Browse) that would work with NLM's MeSH headings; at UC San Francisco (UCSF), directed a team of programmers in developing a Web interface and HTTP/Z39.50 gateway to MEDLINE; for UCOP, developed the first production instance of the Web-based MELVYL system's access methods for external Z39.50 databases (in this case, to PreMEDLINE running at NLM)

Online Information Systems: proposed, designed, wrote, and maintained in production a client-server information system, Infocal, which provided the first general online access to UC Berkeley's main administrative datasets (schedule of classes, course catalog, phone directories, job vacancy listings, press releases); in the pre-Web era, this required extensive liaison with management and staff of campus units that maintained these datasets, and who were intimidated by information technology, unsure of the benefits of online access, and anxious about losing control over their data; Infocal was also an early client (1992) of both the Z39.50 and Web protocols (HTTP, Gopher, FTP)

Computer Aided Instruction: designed, wrote, and maintained the Berkeley UNIX help system, a general-purpose online information system for exploring documentation files, directories, and executables (it had a user interface foreshadowing that of the Gopher browser); revived and significantly enhanced the UNIX learn program, a hypertext-based teaching tool from Bell Labs (late 1970s); both help and learn were disseminated globally with 4.2/4.3BSD (Berkeley Software Distribution); by 1982 the production help documentation tree was replicated on a daily basis across about twenty file servers on the Berkeley campus network

Digital Libraries

Tobacco Control Research Collection: at UCSF, architected the American Legacy Foundation-funded Legacy National Tobacco Documents Library, some 40 million page images released by court order from the world's major tobacco companies; during the implementation phase, focused on problems of poor indexing, corrupt or missing data, and incorporation of new material via web crawlers from ever-changing industry sites; wrote the main software components that support access via persistent identifiers and the browsing of individual document pages

Tobacco Document Digitization: created a detailed processing specification for digitizing shops to use in converting paper documents into standardized digital objects suitable for incorporation into digital library collections; this included scanning to create page images, OCRing to extract searchable text, and indexing (the hand- keying of metadata such as author, title, etc.)

Search Engine: wrote a new digital library searching and ranking engine designed to fill the gap between full text engines and fielded search systems, and to scale to millions of objects; the resulting system, tested on 1.5 million text objects and their metadata, performed fast boolean searching with wildcards and term list scanning

Electronic Publishing: managed the subscription-based electronic publication of the UC Press book, "The Cigarette Papers"; this experiment explored technical and marketing aspects of online publishing with access fees

Printer Cost Recovery: at UCSF, architected a network- based charging mechanism for publicly available print stations, a problem that plagued libraries across the country; the resulting system was probably the first low- cost solution enabling the library directly to manage all aspects of the authorization and routing technology involved

Authentication Infrastructure: at UCSF, architected the development of a network-queryable campus authentication infrastructure, requiring coordination with the campus units that supply payroll and student registration data on an ongoing basis; oversaw the development of various campus proxy servers

Computer Operating Systems

Performance Analysis: wrote programs to summarize and plot UNIX 4.2BSD file system activity; debugged kernel traces and a multi-strategy cache simulator based on them (presented at ACM SigOps, 1985)

Heterogeneous Distributed Systems: worked on a project (1985) to bring up ULTRIX and VMS clients on a network of fully sharable computing resources, including file servers and print servers; wrote ULTRIX kernel code to implement a transport-level network interface, consisting of routines to generate server-side programs in a simple network command language

Command-Level Operating System Primitives: conceived a method of general-purpose software tool design; wrote three UNIX tools distributed with 4.2/4.3BSD using this method (Master's Degree Project, 1983); these tools ("jot", "rs", and "lam") are distributed in current Mac OS X systems

VLSI Coprocessor Design: worked on the design for a decimal coprocessor that IBM implemented along with Micro/370, an IBM/370 architecture on a chip; wrote and tested microcode for the decimal EDIT AND MARK instruction (1984)

Terminal Capability Database: in the early 1980's, maintained the Berkeley UNIX termcap file for a period of about five years; this file remains an integral part of modern Linux and other UNIX-based systems

Communication/Instruction

Intellectual Property: consulted with the World Intellectual Property Organization in Geneva regarding design of a federated IP digital library (covering patents, trademarks, designs, and traditional knowledge), and developed a proof-of-concept demonstration system

Instructional Support: at UCSF, managed the creation of a flexible software system for faculty to author and maintain course home pages

Instruction: at UC Berkeley, accrued ten years' experience in general consulting (advising) with users, students, and programmers on UNIX and IBM CMS; teaching assistant for undergraduate computer architecture; seven years' experience designing and teaching courses on UNIX and CMS; outside UCB, gave several week-long corporate training sessions (at AT&T, NCC, IRS) on Common Lisp and UNIX

SIGWEB: created and organized a 400-person, all-day event at UCB for the SF Bay Area special interest group in state of the art network information retrieval applications; recruited most of the speakers and presented at this 1994 meeting

Common Lisp: as a private consultant, was designer and principal author of the 899-page manual, Common Lisp: the Reference (Addison-Wesley, 1988); supervised two co-authors for this work

PROFESSIONAL HISTORY

University of California Office of the President
US National Library of Medicine
University of California San Francisco
University of California Berkeley

EDUCATION

MS in Computer Science, UC Berkeley, all but thesis (1985)
BA, UC Berkeley, Double Major in Math and Computer Science (1982)

PUBLICATIONS

[ARK: One of] A Dozen Primers on Standards, J. Kunze, Computers in Libraries, Vol 24, Issue 2, February 2004, http://www.infotoday.com/cilmag/feb04/primers.shtml

Towards Electronic Persistence Using ARK Identifiers, J. Kunze, Proceedings of the 3rd ECDL Workshop on Web Archives, August 2003, http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze

Reference Models for Digital Libraries: Actors and Roles, J. Borbinha, J. Kunze, et al. DELOS/NSF Working Group Report, July 2003, http://www.dli2.nsf.gov/internationalprojects/working_group_reports/actors_final_report.pdf

A Metadata Kernel for Electronic Permanence, J. Kunze, Journal of Digital Information, Vol 2, Issue 2, January 2002, ISSN 1368-7506, http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/

NISO/ANSI Z39.85-2001, Dublin Core Metadata Element Set, July 2001

The ARK Persistent Identifier Scheme, J. Kunze, R. Rodgers, March 2001, (work in progress) http://www.ietf.org/internet- drafts/draft-kunze-ark-01.txt

RFC 2731, Encoding Dublin Core Metadata in HTML, J. Kunze, December 1999, ftp://ftp.isi.edu/in-notes/rfc2731.txt

RFC 2413, Dublin Core Metadata for Resource Discovery, S. Weibel, J. Kunze, C. Lagoze, M. Wolf, September 1998, ftp://ftp.isi.edu/in-notes/rfc2413.txt

RFC 2056, Uniform Resource Locators for Z39.50, R. Denenberg, J. Kunze, D. Lynch, November 1996, ftp://ftp.isi.edu/in- notes/rfc2056.txt

RFC 1736, Functional Recommendations for Internet Resource Locators, J. Kunze, February 1995, ftp://ftp.isi.edu/in-notes/rfc1736.txt

RFC 1625, WAIS over Z39.50-1988, M. St. Pierre, J. Fullton, K. Gamiel, J. Goldman, B. Kahle, J. Kunze, H. Morris, and F. Schiettecatte, (WAIS, Inc., CNIDR, Thinking Machines Corp., UC Berkeley, FS Consulting), June 1994 ftp://ftp.isi.edu/in-notes/rfc1625.txt

The Cigarette Papers: Issues in Publishing Materials in Multiple Formats, K. Butter, R. Chandler, J. Kunze, D-Lib Magazine, November 1996

Evolution of a Digital Library for the Health Sciences, J. Kunze, B. Warling, D-Lib Magazine, March 1996

Basic Z39.50 Server Concepts and Creation, J. Kunze, NIST Special Publication 500-229 on Z39.50 Implementation Experiences, September 1995

Resource Citations for Electronic Discovery and Retrieval, J. Kunze, January 1993, position paper distributed to the mailing list uri@bunyip.com and privately at a CNI-sponsored meeting in Denver prior to ALA

Nonbibliographic Applications of Z39.50, J. Kunze, The Public- Access Computer Systems Review 3, no. 5 (1992): 4-30 (email to listserv@uhupvm1.uh.edu: get kunze prv3n5 f=mail)

Common Lisp: the Reference, Addison-Wesley, 1988 (899 pages)

A Trace-Driven Analysis of the UNIX 4.2 BSD File System, J. Ousterhout, H. Da Costa, D. Harrison, J. Kunze, M. Kupfer, J. Thompson, Proceedings of the Tenth ACM Symposium on Operating Systems Principles, 1985

Z39.50 in a Nutshell (An Introduction to Z39.50) J. Kunze, R. P. C. Rodgers, Lister Hill National Center for Biomedical Communications, July 1995 http://www.nlm.nih.gov/pubs/staffpubs/rodgers/z39.50/z39.50.html