SURFACE File Format Support

The preferred file type for documents housed in the SUrface institutional repository is Adobe Acrobat PDF. For materials submitted as Microsoft Word (doc and docx) or Rich Text format (rtf), SUrface will automatically convert them to PDF, retaining copies of the original for preservation purposes. Other file types will not be automatically converted to PDF as part of the submission process. For documents in “known” rather than “supported” formats (see the definitions below) the SUrface administrators recommend that a second “supported” file format version of the document, PDF where possible, be submitted along with the original.

When a file is submitted to SURFACE, we assign it one of the following categories:

  • Supported: we support this format. There is a high likelihood that its content, appearance, and functions will be preserved over time.
  • Known: we recognize this format but cannot guarantee its support over time. Among file types in this category, Microsoft Word (doc and docx), and Rich Text Format (rtf) will all be automatically converted to PDF. The original submitted document will be preserved as a backup copy.
  • Not Supported: we do not recognize this format. At best, the bit stream will be preserved but not appearance or functionality. SURFACE staff will work with the author(s)/contributor(s) to determine if the document can be hosted.

File formats that exhibit all or many of the following characteristics -- open documentation; support across a range of software platforms; wide adoption; no compression (or lossless compression); no embedded files or embedded programs/scripts; and non-proprietary format -- have the greatest likelihood of preservation into the future.

For supported formats, such as PDF or TIFF, we might choose to bulk-transform files from a current format version to a future one. SUrface staff will continually monitor formats and techniques to ensure we can accommodate needs as they arise.

All computer file formats depend on the availability of the appropriate software to render the functions and appearance intended by the file’s creator. Over time, older software applications may no longer function on new computer platforms, leaving the files created with those applications inoperable. As migration paths become available, SUrface will provide support for converting files so that they may remain easily accessible. Extremely popular but proprietary formats (such as Microsoft .doc, .xls, and .ppt) are more likely to remain accessible into the future simply because their prevalence makes it likely tools will be available. However, the proprietary nature of many specific file types makes it impossible to make preservation guarantees.

File Formats

The following list is neither exhaustive nor exclusive, but meant to give a sense of the variety of formats that might be housed in SUrface.  The SURFACE team will partner with SU researchers to explore ways to support file formats not included on this list.

File Format Extensions MIME type Level
Adobe PDF pdf application/pdf supported
XML xml text/xml supported
Text txt, asc text/plain supported
HTML htm, html text/html supported
OpenDocument Text odt application/vnd.oasis.opendocument.text supported
OpenDocument Presentation odp application/vnd.oasis.opendocument.presentation supported
OpenDocument Spreadsheet ods application/vnd.oasis.opendocument.spreadsheet supported
Rich Text Format rtf, rtx text/richtext supported
MARC marc, mrc application/marc supported
JPEG jpeg, jpg image/jpeg supported
GIF gif image/gif supported
PNG png image/png supported
TIFF tiff, tif image/tiff supported
AIFF aiff, aif, aifc, iff audio/x-aiff supported
Postscript ps, eps application/postscript supported
Microsoft Word doc, docx application/msword known
Microsoft Powerpoint ppt, pptx application/vnd.ms-powerpoint known
Microsoft Excel xls application/vnd.ms-excel known
WordPerfect wpd application/wordperfect5.1 known
audio/basic au, snd audio/basic known
WAV wav audio/x-wav known
MPEG mpeg, mpg, mpe video/mpeg known
Microsoft Visio vsd application/vnd.visio known
FMP3 fm application/x-filemaker known
BMP bmp image/x-ms-bmp known
Photoshop psd, pdd application/x-photoshop known
Video Quicktime mov, qt video/quicktime known
MPEG Audio mpa, abs, mpega audio/x-mpeg known
Microsoft Project mpp, mpx, mpd application/vnd.ms-project known
Mathematica ma application/mathematica known
LateX latex application/x-latex known
TeX tex application/x-tex known
TeX dvi dvi application/x-dvi known
SGML sgm, sgml application/sgml known
RealAudio ra, ram audio/x-pn-realaudio known
AutoCAD dwg known
AutoCAD Exchange Format dfx known
AutoCAD Internet Files dwf known
DejaVu djv known
RealVideo ra, ram video/x-pn-realvideo known
Unknown application/octet-stream unknown

Note: Microsoft Word and Rich Text Format files will be automatically converted to and distributed as PDF format as part of the submission process. The original document will be preserved but will not be distributed via SUrface unless the author makes arrangements with SUrface to release it as a supplemental version. XML file submissions should, ideally, be accompanied by a validation schema and stylesheet. Consider providing a text-based format such as tab or comma delimited in addition to Excel or Open Office, especially if calculations, formulas, and other special attributes are included in the file.