Attachment file types for indexing

This topic documents the supported attachment file types searchable by their content and metadata. Generally, every attachment is indexed by its metadata: file name, file extension, author ID, author name, update date and file size. The content of attachment file types listed in this topic is also indexed, and therefore searchable, unless otherwise noted.

Attachments are indexed, and therefore searchable, only if the system configuration property attachments.indexingOfAttachmentContent.enabled is enabled in the system configuration. (It is enabled by default.) Polarion has a hard-coded set of attachment file types that have content indexed by default. Administrators can extend or reduce the set of attachment file types that are so indexed via the following properties in the polarion.properties file:

Polarion uses the Tika library for indexing. Only the file types supported by this library can be parsed. For a complete list of those types, please consult the Tika documentation.

Warning:

Polarion selects files to pass to TIka based on file extension (excluding the period character), but Tika indexes files based on analysis of their content, not their file extension. Therefore, files with an incorrect file extension lead to the indexing of an unwanted file, or exclusion from indexing of a desired file.

General categories indexed by default

The following attachment file categories have content indexed by default.

  • HyperText Markup Language

  • XML and derived formats

  • Microsoft Office document formats (except .csv)

  • OpenDocument Format

  • iWorks document formats

  • WordPerfect document formats

  • Portable Document Format

  • Electronic Publication Format

  • Rich Text Format

  • Text formats

  • Mail formats

  • Crypto formats

  • Source code

General categories not indexed by default

The following attachment file categories do not have content indexed by default.

  • Compression and packaging formats

  • Java class files and archives

  • Help formats

  • Feed and Syndication formats

  • Audio formats

  • Image formats

  • Video formats

  • CAD formats

  • Font formats

  • Scientific formats

  • Executable programs and libraries

  • Database formats

Default indexed attachment file types

apxl, asp, aspx, c, cc, cpp, cxx, doc, docm, docx, dotm, dotx, dwfx, epub, fb2, fbz, groovy, htm, html, ibook, java, key, mbox, mdb, mpp, msg, ncx, numbers, odc, odf, odft, odg, odi, odm, odp, ods, odt, ole, opf, otc, otg, oth, oti, otp, ots, ott, p7c, p7m, p7s, pages, pdf, pot, potm, potx, ppa, ppam, pps, ppsm, ppsx, ppt, pptm, pptx, pst, pub, rfc822, rtf, sldasm, slddrw, sldm, sldprt, sxw, tnef, tsd, txt, vsd, vsdm, vsdx, vssm, vssx, vstm, vstx, wp6, wps, xht, xhtml, xla, xlam, xlr, xls, xlsb, xlsm, xlsx, xlt, xltm, xltx, xlw, xml, xps

Administrators please note that when specifying extensions in system properties, you should not include a preceding period (.) character, and wildcards are not supported.

Common file types not indexed by default

csv, dat, svg, svgz, zip

In general, any other file extension not included in General categories indexed by default (above), such as media types.

Note:

To have csv files indexed with content, an administrator must add the csv extension, and any other extensions for this file type that users may attach, to the system property attachments.indexingOfAttachmentContent.additionalFileTypes.

Extensions must be specified explicitly, without a preceding period (.) character. Wildcards are not supported.

Caution:

Archive files, such as zip, are not included in the default list of content-indexed file types. If you choose to explicitly add such file types in the attachments.indexingOfAttachmentContent.additionalFileTypes property, all file types contained in archive attachments that are supported by the Tika library will be indexed.