Import data from Oracle database in Apache Solr with DataImportHandlers

Kavita Jadhav
3 min readJan 12, 2019

Apache Solr is open source search platform used for enhanced search and indexing.For more details please visit Apache Solr’s official website.

If you are using Oracle database in your application and want to enable Solr search then you can use data importers provided by Solr to read data from your database and create create documents in Solr. You will be able to find this option on Solr UI once you select the core. You need to provide Oracle database details in order to use this functionality.

Lets start with creating a new core and connecting existing database to core and creating documents using DataImporterHandler.

Create a new core from terminal using Solr create command(you can also create a core from Solr UI).

solr create -c sample_core

Now open Solr UI and select sample_core. Go to DataImport tab. You will be able to see following error as you haven’t configured your DataImportHandler yet.

The solrconfig.xml file for this index does not have an operational DataImportHandler defined!

Follow below steps to configure DataImportHandler:

Download and copy solr-dataimporthandler-7.0.0.jar and ojdbc7–12.1.0.2.jar files in configsets/lib where Solr is installed. Add path of these jars in conf/solrconfig.xml as below:

<lib path="/usr/local/Cellar/solr/7.5.0/server/solr/configsets/lib/solr-dataimporthandler-7.0.0.jar"/>
<lib path="/usr/local/Cellar/solr/7.5.0/server/solr/configsets/lib/ojdbc7-12.1.0.2.jar"/>

Create a new file db_configs.xml and add your database configs in it as below:

<dataConfig>
<dataSource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@//localhost:1521/store_db" user="readolnyuser" password="readolnypassword"/>
</dataConfig>

Create a new request handler to import this data in solrconfig.xml file. In this request handler we will be using db_configs.xml file created in above step.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db_configs.xml</str>
</lst>
</requestHandler>

Now reload your core from Solr web UI. This time you will not see any error. Instead you will be able to see database configurations provided in db_configs.xml file.

But when you click on execute button you will be able to see error as:

Data Config problem: DataImportHandler configuration file must have one <document> node.

Here we have just created database connection. You need to create document tab But we haven’t yet added a way to read db and create documents on which search will be performed. now edit your db_configs.xml file and add details mentioning how to index documents.

<dataConfig>
<dataSource driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@//localhost:1521/store_db" user="readolnyuser" password="readolnypassword"/>
<document>
<entity name="users" query="select * from users">
<field column="FIRST_NAME" name="first_name"/>
<field column="LAST_NAME" name="last_name"/>
</entity>
</document>
</dataConfig>

Here we are reading records from users table adding first_name and last_name column values in documents. For that we need to create these fields in managed-schema file like below:

<field name="first_name" type="text_general" docValues="false" indexed="true" stored="true"/>
<field name="last_name" type="text_general" docValues="false" indexed="true" stored="true"/>

Now reload sample_core and check documents count in overview section. You will not be having documents.

Import data using solar UI. This will create documents in Solr. check documents count in overview section. You will be able to see some documents added in your core.

Now you can use same documents in Solr search. Create a new request handler to access these documents.

<requestHandler name="/search_user" class="solr.SearchHandler">
<lst name="defaults">
<str name="fl">
first_name, last_name
</str>
</lst>
</requestHandler>

Now access this request handler from Solr UI. You will be able to see these users.

You can reimport data every time when you add new records in database. This will import all documents in core. You can choose to import only updated documents by selecting delta-import option in DataImport tab. For that you need to update db_configs.xml file to identify updated records like below:

<entity name="users" query="select * from users" deltaQuery="select id from users where updated_at > '${dataimporter.last_index_time}'">

Now you can enhance db_configs.xml file to read more attributes and use multiple tables.

Happy searching…

--

--