Attachment file types for indexing
This topic documents the supported attachment file types searchable by their content and metadata. Generally, every attachment is indexed by its metadata: file name, file extension, author ID, author name, update date and file size. The content of attachment file types listed in this topic is also indexed, and therefore searchable, unless otherwise noted.
Attachments are indexed, and therefore searchable, only if the system configuration property attachments.indexingOfAttachmentContent.enabled is enabled in the system configuration. (It is enabled by default.) Polarion has a hard-coded set of attachment file types that have content indexed by default. Administrators can extend or reduce the set of attachment file types that are so indexed via the following properties in the polarion.properties file:
attachments.indexingOfAttachmentContent.additionalFileTypes
attachments.indexingOfAttachmentContent.fileTypesToIgnore
Polarion uses the Tika library for indexing. Only the file types supported by this library can be parsed. For a complete list of those types, please consult the Tika documentation.
Polarion selects files to pass to TIka based on file extension (excluding the period character), but Tika indexes files based on analysis of their content, not their file extension. Therefore, files with an incorrect file extension lead to the indexing of an unwanted file, or exclusion from indexing of a desired file.
General categories indexed by default
The following attachment file categories have content indexed by default.
HyperText Markup Language
XML and derived formats
Microsoft Office document formats (except .csv)
OpenDocument Format
iWorks document formats
WordPerfect document formats
Portable Document Format
Electronic Publication Format
Rich Text Format
Text formats
Mail formats
Crypto formats
Source code
General categories not indexed by default
The following attachment file categories do not have content indexed by default.
Compression and packaging formats
Java class files and archives
Help formats
Feed and Syndication formats
Audio formats
Image formats
Video formats
CAD formats
Font formats
Scientific formats
Executable programs and libraries
Database formats
Default indexed attachment file types
apxl, asp, aspx, c, cc, cpp, cxx, doc, docm, docx, dotm, dotx, dwfx, epub, fb2, fbz, groovy, htm, html, ibook, java, key, mbox, mdb, mpp, msg, ncx, numbers, odc, odf, odft, odg, odi, odm, odp, ods, odt, ole, opf, otc, otg, oth, oti, otp, ots, ott, p7c, p7m, p7s, pages, pdf, pot, potm, potx, ppa, ppam, pps, ppsm, ppsx, ppt, pptm, pptx, pst, pub, rfc822, rtf, sldasm, slddrw, sldm, sldprt, sxw, tnef, tsd, txt, vsd, vsdm, vsdx, vssm, vssx, vstm, vstx, wp6, wps, xht, xhtml, xla, xlam, xlr, xls, xlsb, xlsm, xlsx, xlt, xltm, xltx, xlw, xml, xps
Administrators please note that when specifying extensions in system properties, you should not include a preceding period (.) character, and wildcards are not supported.
Common file types not indexed by default
csv, dat, svg, svgz, zip
In general, any other file extension not included in General categories indexed by default (above), such as media types.
To have csv files indexed with content, an administrator must add the csv extension, and any other extensions for this file type that users may attach, to the system property attachments.indexingOfAttachmentContent.additionalFileTypes.
Extensions must be specified explicitly, without a preceding period (.) character. Wildcards are not supported.
Archive files, such as zip, are not included in the default list of content-indexed file types. If you choose to explicitly add such file types in the attachments.indexingOfAttachmentContent.additionalFileTypes property, all file types contained in archive attachments that are supported by the Tika library will be indexed.