SUrface File Format Support
The preferred file type for documents housed in the SUrface institutional repository is Adobe Acrobat PDF. For materials submitted as Microsoft Word (doc and docx) or Rich Text format (rtf), SUrface will automatically convert them to PDF, retaining copies of the original for preservation purposes. Other file types will not be automatically converted to PDF as part of the submission process. For documents in “known” rather than “supported” formats (see the definitions below) the SUrface administrators recommend that a second “supported” file format version of the document, PDF where possible, be submitted along with the original.
When a file is submitted to SUrface, we assign it one of the following categories:
- Supported: we support this format. There is a high likelihood that its content, appearance, and functions will be preserved over time.
- Known: we recognize this format but cannot guarantee its support over time. Among file types in this category, Microsoft Word (doc and docx), and Rich Text Format (rtf) will all be automatically converted to PDF. The original submitted document will be preserved as a backup copy.
- Not Supported: we do not recognize this format. At best, the bit stream will be preserved but not appearance or functionality. SURFACE staff will work with the author(s)/contributor(s) to determine if the document can be hosted.
File formats that exhibit all or many of the following characteristics -- open documentation; support across a range of software platforms; wide adoption; no compression (or lossless compression); no embedded files or embedded programs/scripts; and non-proprietary format -- have the greatest likelihood of preservation into the future.
For supported formats, such as PDF or TIFF, we might choose to bulk-transform files from a current format version to a future one. SUrface staff will continually monitor formats and techniques to ensure we can accommodate needs as they arise.
All computer file formats depend on the availability of the appropriate software to render the functions and appearance intended by the file’s creator. Over time, older software applications may no longer function on new computer platforms, leaving the files created with those applications inoperable. As migration paths become available, SUrface will provide support for converting files so that they may remain easily accessible. Extremely popular but proprietary formats (such as Microsoft .doc, .xls, and .ppt) are more likely to remain accessible into the future simply because their prevalence makes it likely tools will be available. However, the proprietary nature of many specific file types makes it impossible to make preservation guarantees.
File Formats
The following list is neither exhaustive nor exclusive, but meant to give a sense of the variety of formats that might be housed in SUrface. The SUrface team will partner with SU researchers to explore ways to support file formats not included on this list.
| File Format | Extensions | MIME type | Level |
|---|---|---|---|
| Adobe PDF | application/pdf | supported | |
| XML | xml | text/xml | supported |
| Text | txt, asc | text/plain | supported |
| HTML | htm, html | text/html | supported |
| OpenDocument Text | odt | application/vnd.oasis.opendocument.text | supported |
| OpenDocument Presentation | odp | application/vnd.oasis.opendocument.presentation | supported |
| OpenDocument Spreadsheet | ods | application/vnd.oasis.opendocument.spreadsheet | supported |
| Rich Text Format | rtf, rtx | text/richtext | supported |
| MARC | marc, mrc | application/marc | supported |
| JPEG | jpeg, jpg | image/jpeg | supported |
| GIF | gif | image/gif | supported |
| PNG | png | image/png | supported |
| TIFF | tiff, tif | image/tiff | supported |
| AIFF | aiff, aif, aifc, iff | audio/x-aiff | supported |
| Postscript | ps, eps | application/postscript | supported |
| Microsoft Word | doc, docx | application/msword | known |
| Microsoft Powerpoint | ppt, pptx | application/vnd.ms-powerpoint | known |
| Microsoft Excel | xls | application/vnd.ms-excel | known |
| WordPerfect | wpd | application/wordperfect5.1 | known |
| audio/basic | au, snd | audio/basic | known |
| WAV | wav | audio/x-wav | known |
| MPEG | mpeg, mpg, mpe | video/mpeg | known |
| Microsoft Visio | vsd | application/vnd.visio | known |
| FMP3 | fm | application/x-filemaker | known |
| BMP | bmp | image/x-ms-bmp | known |
| Photoshop | psd, pdd | application/x-photoshop | known |
| Video Quicktime | mov, qt | video/quicktime | known |
| MPEG Audio | mpa, abs, mpega | audio/x-mpeg | known |
| Microsoft Project | mpp, mpx, mpd | application/vnd.ms-project | known |
| Mathematica | ma | application/mathematica | known |
| LateX | latex | application/x-latex | known |
| TeX | tex | application/x-tex | known |
| TeX dvi | dvi | application/x-dvi | known |
| SGML | sgm, sgml | application/sgml | known |
| RealAudio | ra, ram | audio/x-pn-realaudio | known |
| AutoCAD | dwg | known | |
| AutoCAD Exchange Format | dfx | known | |
| AutoCAD Internet Files | dwf | known | |
| DejaVu | djv | known | |
| RealVideo | ra, ram | video/x-pn-realvideo | known |
| Unknown | application/octet-stream | unknown |
Note: Microsoft Word and Rich Text Format files will be automatically converted to and distributed as PDF format as part of the submission process. The original document will be preserved but will not be distributed via SUrface unless the author makes arrangements with SUrface to release it as a supplemental version. XML file submissions should, ideally, be accompanied by a validation schema and stylesheet. Consider providing a text-based format such as tab or comma delimited in addition to Excel or Open Office, especially if calculations, formulas, and other special attributes are included in the file.