Process AWS ELB logs with Logstash! update!
I previously wrote about making a grok filter to process ELB logs.
I have since worked on this further and developed an updated filter, which has been working very well for some time now.
As there were some comments on the previous post, I thought I should upload my working copy that I use right now also.
As it is right now, I’ve broken the filter into 4 parts.
So what does this mean?
Here it is broken down.
Here is what I’ve done!
Part1 – filters the main logs, and allows for the 4 weird log example scenarios.
-It also creates a new field named ‘elb_request’, which separates out the full request.
%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \"(?:%{GREEDYDATA:elb_request}|-)\" \"(?:%{GREEDYDATA:userAgent}|-)\" %{NOTSPACE:elb_sslcipher} %{NOTSPACE:elb_sslprotocol}
Part2 – matches that ‘ELB Online’ message I’d like to keep.
-I want to keep this part to use the ‘ELB Online’ events in dashboards.
%{GREEDYDATA:event_name} for ELB: %{NOTSPACE:elb_name} at %{TIMESTAMP_ISO8601:timestamp}
Part3 – investigates the ‘elb_request field’ of Part1 and breaks it further into 4 new fields.
-It also creates a new field named ‘http_path’.
(?:%{WORD:http_method}) (?:%{DATA:http_path})? (?:%{DATA:http_type} /%{NUMBER:http_version:float})? |%{GREEDYDATA:rawrequest}
Part4 – breaks down the new ‘http_path’ field to component parts.
-I use this for further analysing web trends.
(?:%{WORD:http_path_protocol}://)? (%{NOTSPACE:http_path_site}:)? (?:%{NUMBER:http_path_port:int})? (?:%{GREEDYDATA:http_path_url})?
How does the raw code look?
Note: This is useful to understand where spaces go etc.
Part1:
%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \"(?:%{GREEDYDATA:elb_request}|-)\" \"(?:%{GREEDYDATA:userAgent}|-)\" %{NOTSPACE:elb_sslcipher} %{NOTSPACE:elb_sslprotocol}
Part2:
%{GREEDYDATA:event_name} for ELB: %{NOTSPACE:elb_name} at %{TIMESTAMP_ISO8601:timestamp}
Part3:
(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?
Part4:
(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?
And how does this look as a grok filter?
NOTE: In my ‘input’ implementation, I have labelled all my incoming ELB Logs as type ‘elblogs’
SO HERE IT IS!
This is the complete filter, that you can copy/paste to test yourself if you like:
filter { if [type] == "elblogs" { grok { match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \"(?:%{GREEDYDATA:elb_request}|-)\" \"(?:%{GREEDYDATA:userAgent}|-)\" %{NOTSPACE:elb_sslcipher} %{NOTSPACE:elb_sslprotocol}"] match => ["message", "%{GREEDYDATA:event_name} for ELB: %{NOTSPACE:elb_name} at %{TIMESTAMP_ISO8601:timestamp}"] } if [elb_request] =~ /.+/ { grok { match => ["elb_request", "(?:%{WORD:http_method}) (?:%{DATA:http_path})? (?:%{DATA:http_type}/%{NUMBER:http_version:float})?|%{GREEDYDATA:rawrequest}"] } } if [http_path] =~ /.+/ { grok { match => ["http_path", "(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?"] } } geoip { source => "elb_client_ip" } } date { match => [ "timestamp", "ISO8601" ] } useragent { source => "userAgent" prefix => "browser_" } }
Again, I hope this is useful to someone.
If you found it useful, then why not leave a comment! 😉
Apr 11, 2016 at 7:55 am
Hi Kareem,
great post!
I see you had some troubles with loading the logs from S3 :
https://discuss.elastic.co/t/iam-credentials-not-recognised-used-for-s3-plugins-used-input-s3-codec-cloudtrail/32464/13
I am having the same issue 😦 how did you manage to resolve it?
Thanks a lot,
LikeLike
Apr 11, 2016 at 12:26 pm
Hi Eran, I did manage to resolve it indeed!
I put a quick summary here for anyone that might find it useful:
https://followkman.com/2016/04/11/a-temporary-fix-for-logstash-s3-input-authentication/
🙂
LikeLike
Apr 11, 2016 at 10:19 am
Thanks a lot! superb post!
LikeLike
Jun 29, 2016 at 8:02 pm
Awesome post. Huge time saver. I’d love to see what you do in Kabana for making useful information out of these data. I’d also like to see if you have a way to handle geoips for map drawing. Supposedly that’s possible, but I haven’t been able to get it to work yet.
LikeLike
Jun 29, 2016 at 8:18 pm
[…] put an update up with some more details of what I’m doing here, with updated grok […]
LikeLike
Nov 3, 2016 at 9:37 pm
Works like a charm ! This is really useful, thank you !
LikeLike
Mar 9, 2017 at 2:21 am
Great post. But it seems it fails to parse these logs (Real log from ELB) as below
“`
2017-03-01T02:20:13.897023Z cf-router 88.99.90.240:56230 – -1 -1 -1 504 0 0 0 “GET https://habib0987.apps.io:443/value.php HTTP/1.1″ “Mozilla/4.0 (compatible; cron-job.org; http://cron-job.org/abuse/)” ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
“`
and
“`
2017-03-01T02:24:06.194449Z cf-router 167.89.125.231:61926 10.10.81.5:80 0.000049 0.000447 0.000026 400 400 522 15 “POST http://embroker.cfapps.io:80/events”
“`
As you can see 1) the first log, backend_ip and port are N/A, which is represented as a hyphen(-) and 2)
SSL related is gone away.
Any good idea?
LikeLike
Mar 9, 2017 at 4:55 am
For the #1
%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) (?:%{NUMBER:request_processing_time:float}|-1) (?:%{NUMBER:backend_processing_time:float}|-1) (?:%{NUMBER:requestresponse_processing_time:float}|-1) (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \”(?:%{GREEDYDATA:elb_request}|-)\” \”(?:%{GREEDYDATA:userAgent}|-)\” (?:%{NOTSPACE:elb_sslcipher}|\S) (?:%{NOTSPACE:elb_sslprotocol}|\S)”
works great.
but for #2, sslcipher and elb_sslprotocal do not exist, how to handle this exception?
LikeLike
Mar 9, 2017 at 9:05 am
Oh,sorry to bother you, the 2nd log is malformed and generated, the entire log is like :
017-03-09T08:14:03.459507Z cf-router 167.89.125.223:43766 10.10.17.2:80 0.000029 0.000325 0.000019 400 400 521 15 “POST https://iuhad89fgyphauihdfg9p8h.cfapps.io:443/events
HTTP/1.1″ “SendGrid Event API” ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
so
%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) (?:%{NUMBER:request_processing_time:float}|-1) (?:%{NUMBER:backend_processing_time:float}|-1) (?:%{NUMBER:requestresponse_processing_time:float}|-1) (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \”(?:%{GREEDYDATA:elb_request}|-)\” \”(?:%{GREEDYDATA:userAgent}|-)\” (?:%{NOTSPACE:elb_sslcipher}|\S) (?:%{NOTSPACE:elb_sslprotocol}|\S)”
works great. Thanks you.
LikeLike