Environment:
Windows 7
MySQL 5.7.10
Logstash 5.0.2
Elasticsearch 5.0.2
input {
jdbc {
jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
jdbc_user => "cherryshoeuser"
jdbc_password => "cherryshoepassword"
statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type.sql"
}
}
output {
elasticsearch {
index => "cherryshoe_object_type_idx"
document_id => "%{unique_id}"
}
stdout { }
}
PROBLEM:
We now needed two different "object type"'s data into two separate indexes in elasticsearch.
SOLUTION:
To achieve this solution using a single logstash config file -
Use logstash input jdbc "type" attribute in each jdbc input. In the example below, the first input jdbc has a type defined with "object_type1", the second input jdbc has a type defined with "object_type2".
Parameterize the "index" attribute in output elasticsearch with the "type" attribute used in the jdbc input. We only need one output jdbc, to sync "object_type1" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index, and "object_type2" mySQL data to elasticsearch "cherryshoe_object_type2_idx" index.
# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
jdbc {
jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
jdbc_user => "cherryshoeuser"
jdbc_password => "cherryshoepassword"
statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
type => "object_type1"
}
jdbc {
jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
jdbc_user => "cherryshoeuser"
jdbc_password => "cherryshoepassword"
statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
type => "object_type2"
}
}
# Use the "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
elasticsearch {
index => "cherryshoe_%{type}_idx"
document_id => "%{unique_id}"
}
stdout { }
}
NOTE: I don't need the input jdbc "type" field to be indexed in the elasticsearch document, so adding the mutate filter facilitates this. It will copy the input jdbc type field to event metadata, so the event metadata "type" field can be used in the parameterized output elasticsearch "index" attribute. The updated logstash conf file is below:
# inputs - two types of input SQLs: "object_type1" and "object_type2"
input {
jdbc {
jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
jdbc_user => "cherryshoeuser"
jdbc_password => "cherryshoepassword"
statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type1.sql"
type => "object_type1"
}
jdbc {
jdbc_driver_library => "C:\Apps\elasticstack\mysql-connector-java-5.1.40\mysql-connector-java-5.1.40-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/cherryshoe?useSSL=false"
jdbc_user => "cherryshoeuser"
jdbc_password => "cherryshoepassword"
statement_filepath => "C:\workspaces\cherryshoe-team\logstash\conf\cherryshoe_object_type2.sql"
type => "object_type2"
}
}
# Specifying "type" in the input creates an "_type"(non-indexed elasticsearch attribute) and "type"(indexed elasticsearch attribute).
# Copy the "type" value into the event metadata, and then remove "type" since it is a non-required attribute in each document
filter {
mutate { add_field => { "[@metadata][type]" => "%{type}" } }
mutate { remove_field => ["type"] }
}
# Use the event metadata "type" field to specify the index that the input(s) go to
# "unique_id" is specified in each input SQL as the document_id in the elasticsearch document
output {
elasticsearch {
index => "cherryshoe_%{[@metadata][type]}_idx"
document_id => "%{unique_id}"
}
stdout { }
}
These articles helped a lot:
https://stackoverflow.com/questions/37613611/multiple-inputs-on-logstash-jdbc
https://discuss.elastic.co/t/delete-a-field-in-filter-but-use-it-in-output/48008
https://www.elastic.co/blog/logstash-metadata
Hello Judy,
ReplyDeleteNice post. even though I have no question, I was wondering para 2 of solution says "object_type2" mySQL data to elasticsearch "cherryshoe_object_type1_idx" index. Isn't it "cherryshoe_object_type2_idx" ?
Thanks.
Yes thank you, typo has been fixed.
Deletecan i use tracking_column value in place of type parameter
ReplyDeleteis there a way to abstract the jdbc connection details?
ReplyDeleteNow what if we had two different connection_string from two different databases?
ReplyDelete