Search and metadata

Often when implementing customizations in Telligent Community, we need to add metadata to content in addition to the fields the platform data supports. This is made very easy via ExtendedAttributes and also with the ContentMetadata add on I created. This works well for most scenarios and the metadata is available with the content allowing it to be used for example, when presenting the data in a widget. However, often we need to use this data for filtering/searching as well which requires the metadata to be added to the search index. For example, we may want to tag the content with the language it was created in.

An example

In the example below, I will pull the language from an ExtendedAttribute and assume that the value was set either in the UI or via some logic when the content was created. I’m also going to limit this example forum threads and this is similar to a customization we actually implemented for a customer where they wanted the thread to be tagged with the language, and not replies; the idea being that the replies would be in the same language as the original poster. This was then used to create a custom search facet filter on the search results page. This allowed users to filter search results based on language, but you could also add the new field value in any search query and filter results based on language.

Dynamic fields

To allow for easily adding custom fields to the Solr index, the schema.xml config file in the Solr instance contains some wildcard values, meaning you can add as many values as you like and not have to worry about environment configuration required to support them. If you take a look a at the schema.xml file in your Solr/Config install location, you will see a section like the following.

   <dynamicField name="*_i" type="int" indexed="true" stored="true"/>
   <dynamicField name="*_imv" type="int" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_s" type="string" indexed="true" stored="true" />
   <dynamicField name="*_smv" type="string" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_l" type="long" indexed="true" stored="true"/>
   <dynamicField name="*_lmv" type="long" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
   <dynamicField name="*_txt" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_en" type="text_en" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
   <dynamicField name="*_bmv" type="boolean" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_f" type="float" indexed="true" stored="true"/>
   <dynamicField name="*_fmv" type="float" indexed="true" stored="true" multiValued="true"/>
   <dynamicField name="*_d" type="double" indexed="true" stored="true"/>
   <dynamicField name="*_dmv" type="double" indexed="true" stored="true" multiValued="true"/>

This allows for many different dynamic field types to be created and they will be stored in the index as the correct data type. This means if you want to store and int, date, or even an array of strings, you can store them and query them using greater/less than, range and other query types. For example, when you set the field key to ‘language_s’, that stored the value as a string.

The example

All we need to do to add these fields to the Solr index when content is indexed is to create a simple IPlugin and create an event handler for ISearchIndexing.Events.BeforeBulkIndex. In the event handler we will loop through the documents1 and set the field value after pulling the value from ExtendedAttributes.

You can also see the source on GitHub:

1 The event args will return all documents to be indexed in the current run. The maximum number of documents is defined by the search indexing plugin and defaults to 500, although in practice, depending on how active your community is and what content was flagged for indexing since the last run, the number of document to index will be much lower than this in most cases. Many different types of content can be returned depending on what requires indexing since the last run which is why we are filtering it to only the content type we need.

Leave a Reply