Cherry Shoe Technologies: Logstash multiple JDBC input, multiple index output with single config file

My project's current implementation of synching mySQL data with elasticsearch is using logstash where there is one "object type"'s table data that goes into one index in elasticsearch.

Environment:
Windows 7
MySQL 5.7.10
Logstash 5.0.2
Elasticsearch 5.0.2

input {
    jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type.sql"
 }
}
output {
 elasticsearch {
  index => "cherryshoe_object_type_idx"
  document_id => "%{unique_id}"
 }
 stdout { }
}

PROBLEM:
We now needed two different "object type"'s data into two separate indexes in elasticsearch.

SOLUTION:

To achieve this solution using a single logstash config file -
Use logstash input jdbc "type" attribute in each jdbc input. In the example below, the first input jdbc has a type defined with "object_type1", the second input jdbc has a type defined with "object_type2".

Parameterize the "index" attribute in output elasticsearch with the "type" attribute used in the jdbc input. We only need one output jdbc, to sync "object_type1" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index, and "object_type2" mySQL data to elasticsearch "cherryshoe_object_type2_idx" index.

# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
  type => "object_type1"
 }
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
  type => "object_type2"
 }
}

# Use the "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
 elasticsearch {
  index => "cherryshoe_%{type}_idx"
  document_id => "%{unique_id}"
 }
 
 stdout { }
}

NOTE: I don't need the input jdbc "type" field to be indexed in the elasticsearch document, so adding the mutate filter facilitates this. It will copy the input jdbc type field to event metadata, so the event metadata "type" field can be used in the parameterized output elasticsearch "index" attribute. The updated logstash conf file is below:

# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
  type => "object_type1"
 }
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
  type => "object_type2"
 }
}

# Specifying "type" in the input creates an "_type"(non-indexed elasticsearch attribute) and "type"(indexed elasticsearch attribute).
# Copy the "type" value into the event metadata, and then remove "type" since it is a non-required attribute in each document
filter {
    mutate { add_field => { "[@metadata][type]" => "%{type}" } }
    mutate { remove_field => ["type"] }
}

# Use the event metadata "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
 elasticsearch {
  index => "cherryshoe_%{[@metadata][type]}_idx"
  document_id => "%{unique_id}"
 }
 
 stdout { }
}

These articles helped a lot:
https://stackoverflow.com/questions/37613611/multiple-inputs-on-logstash-jdbc
https://discuss.elastic.co/t/delete-a-field-in-filter-but-use-it-in-output/48008
https://www.elastic.co/blog/logstash-metadata

5 comments:

AnonymousMay 7, 2018 at 11:39 AM
Hello Judy,

Nice post. even though I have no question, I was wondering para 2 of solution says "object_type2" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index. Isn't it "cherryshoe_object_type2_idx" ?

Thanks.
Renu VajjarapuJune 16, 2020 at 6:19 AM
can i use tracking_column value in place of type parameter
SampathAugust 19, 2020 at 6:22 AM
is there a way to abstract the jdbc connection details?
AnonymousMay 27, 2022 at 3:19 AM
Now what if we had two different connection_string from two different databases?

I appreciate your time in leaving a comment!

Friday, December 1, 2017

Logstash multiple JDBC input, multiple index output with single config file

5 comments: