Friday, December 1, 2017

Logstash multiple JDBC input, multiple index output with single config file

My project's current implementation of synching mySQL data with elasticsearch is using logstash where there is one "object type"'s table data that goes into one index in elasticsearch.

Environment:
Windows 7
MySQL 5.7.10
Logstash 5.0.2
Elasticsearch 5.0.2

input {
    jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type.sql"
 }
}
output {
 elasticsearch {
  index => "cherryshoe_object_type_idx"
  document_id => "%{unique_id}"
 }
 stdout { }
}


PROBLEM:
We now needed two different "object type"'s data into two separate indexes in elasticsearch.

SOLUTION:

To achieve this solution using a single logstash config file -
Use logstash input jdbc "type" attribute in each jdbc input.  In the example below, the first input jdbc has a type defined with "object_type1", the second input jdbc has a type defined with "object_type2".

Parameterize the "index" attribute in output elasticsearch with the "type" attribute used in the jdbc input.  We only need one output jdbc, to sync "object_type1" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index, and "object_type2" mySQL data to elasticsearch "cherryshoe_object_type2_idx" index.

# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
  type => "object_type1"
 }
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
  type => "object_type2"
 }
}

# Use the "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
 elasticsearch {
  index => "cherryshoe_%{type}_idx"
  document_id => "%{unique_id}"
 }
 
 stdout { }
}


NOTE: I don't need the input jdbc "type" field to be indexed in the elasticsearch document, so adding the mutate filter facilitates this.  It will copy the input jdbc type field to event metadata, so the event metadata "type" field can be used in the parameterized output elasticsearch "index" attribute.  The updated logstash conf file is below:

# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
  type => "object_type1"
 }
 jdbc {
  jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
  jdbc_driver_class => "com.mysql.jdbc.Driver"
  jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
  jdbc_user => "cherryshoeuser"
  jdbc_password => "cherryshoepassword"
  statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
  type => "object_type2"
 }
}

# Specifying "type" in the input creates an "_type"(non-indexed elasticsearch attribute) and "type"(indexed elasticsearch attribute).
# Copy the "type" value into the event metadata, and then remove "type" since it is a non-required attribute in each document
filter {
    mutate { add_field => { "[@metadata][type]" => "%{type}" } }
    mutate { remove_field => ["type"] }
}

# Use the event metadata "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
 elasticsearch {
  index => "cherryshoe_%{[@metadata][type]}_idx"
  document_id => "%{unique_id}"
 }
 
 stdout { }
}



These articles helped a lot:
https://stackoverflow.com/questions/37613611/multiple-inputs-on-logstash-jdbc
https://discuss.elastic.co/t/delete-a-field-in-filter-but-use-it-in-output/48008
https://www.elastic.co/blog/logstash-metadata

5 comments:

  1. Hello Judy,

    Nice post. even though I have no question, I was wondering para 2 of solution says "object_type2" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index. Isn't it "cherryshoe_object_type2_idx" ?

    Thanks.

    ReplyDelete
  2. can i use tracking_column value in place of type parameter

    ReplyDelete
  3. is there a way to abstract the jdbc connection details?

    ReplyDelete
  4. Now what if we had two different connection_string from two different databases?

    ReplyDelete

I appreciate your time in leaving a comment!