Process AWS ELB logs with Logstash! update!

Posted on

I previously wrote about making a grok filter to process ELB logs.
I have since worked on this further and developed an updated filter, which has been working very well for some time now.
As there were some comments on the previous post, I thought I should upload my working copy that I use right now also.

As it is right now, I’ve broken the filter into 4 parts.

So what does this mean?
Here it is broken down.

Here is what I’ve done!

Part1 – filters the main logs, and allows for the 4 weird log example scenarios.
-It also creates a new field named ‘elb_request’, which separates out the full request.

%{TIMESTAMP_ISO8601:timestamp}
%{NOTSPACE:elb_name}
%{IP:elb_client_ip}:%{INT:elb_client_port:int}
(?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-)
%{NUMBER:request_processing_time:float}
%{NUMBER:backend_processing_time:float}
%{NUMBER:response_processing_time:float}
(?:%{INT:elb_status_code:int}|-)
(?:%{INT:backend_status_code:int}|-)
%{INT:elb_received_bytes:int}
%{INT:elb_sent_bytes:int}
\"(?:%{GREEDYDATA:elb_request}|-)\"
\"(?:%{GREEDYDATA:userAgent}|-)\"
%{NOTSPACE:elb_sslcipher}
%{NOTSPACE:elb_sslprotocol}

Part2 – matches that ‘ELB Online’ message I’d like to keep.
-I want to keep this part to use the ‘ELB Online’ events in dashboards.

%{GREEDYDATA:event_name} for ELB:
%{NOTSPACE:elb_name} at
%{TIMESTAMP_ISO8601:timestamp}

Part3 – investigates the ‘elb_request field’ of Part1 and breaks it further into 4 new fields.
-It also creates a new field named ‘http_path’.

(?:%{WORD:http_method})
(?:%{DATA:http_path})?
(?:%{DATA:http_type}
/%{NUMBER:http_version:float})?
|%{GREEDYDATA:rawrequest}

Part4 – breaks down the new ‘http_path’ field to component parts.
-I use this for further analysing web trends.

(?:%{WORD:http_path_protocol}://)?
(%{NOTSPACE:http_path_site}:)?
(?:%{NUMBER:http_path_port:int})?
(?:%{GREEDYDATA:http_path_url})?

How does the raw code look?
Note: This is useful to understand where spaces go etc.

Part1:

%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \"(?:%{GREEDYDATA:elb_request}|-)\" \"(?:%{GREEDYDATA:userAgent}|-)\" %{NOTSPACE:elb_sslcipher} %{NOTSPACE:elb_sslprotocol}

Part2:

%{GREEDYDATA:event_name} for ELB: %{NOTSPACE:elb_name} at %{TIMESTAMP_ISO8601:timestamp}

Part3:

(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?

Part4:

(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?

And how does this look as a grok filter?
NOTE: In my ‘input’ implementation, I have labelled all my incoming ELB Logs as type ‘elblogs’

SO HERE IT IS!
This is the complete filter, that you can copy/paste to test yourself if you like:

filter {
if [type] == "elblogs" {
grok {
match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \"(?:%{GREEDYDATA:elb_request}|-)\" \"(?:%{GREEDYDATA:userAgent}|-)\" %{NOTSPACE:elb_sslcipher} %{NOTSPACE:elb_sslprotocol}"]
match => ["message", "%{GREEDYDATA:event_name} for ELB: %{NOTSPACE:elb_name} at %{TIMESTAMP_ISO8601:timestamp}"]
}
if [elb_request] =~ /.+/ {
grok {
match => ["elb_request", "(?:%{WORD:http_method}) (?:%{DATA:http_path})? (?:%{DATA:http_type}/%{NUMBER:http_version:float})?|%{GREEDYDATA:rawrequest}"]
}
}
if [http_path] =~ /.+/ {
grok {
match => ["http_path", "(?:%{WORD:http_path_protocol}://)?(%{NOTSPACE:http_path_site}:)?(?:%{NUMBER:http_path_port:int})?(?:%{GREEDYDATA:http_path_url})?"]
}
}
geoip {
source => "elb_client_ip"
}
}
date {
match => [ "timestamp", "ISO8601" ]
}
useragent {
source => "userAgent"
prefix => "browser_"
}
}

Again, I hope this is useful to someone.

If you found it useful, then why not leave a comment! 😉

9 thoughts on “Process AWS ELB logs with Logstash! update!

    Eran said:
    Apr 11, 2016 at 7:55 am

    Hi Kareem,
    great post!

    I see you had some troubles with loading the logs from S3 :
    https://discuss.elastic.co/t/iam-credentials-not-recognised-used-for-s3-plugins-used-input-s3-codec-cloudtrail/32464/13

    I am having the same issue 😦 how did you manage to resolve it?

    Thanks a lot,

    Like

    Eran said:
    Apr 11, 2016 at 10:19 am

    Thanks a lot! superb post!

    Like

    meathouse said:
    Jun 29, 2016 at 8:02 pm

    Awesome post. Huge time saver. I’d love to see what you do in Kabana for making useful information out of these data. I’d also like to see if you have a way to handle geoips for map drawing. Supposedly that’s possible, but I haven’t been able to get it to work yet.

    Like

    […] put an update up with some more details of what I’m doing here, with updated grok […]

    Like

    Laurent Jalbert-Simard said:
    Nov 3, 2016 at 9:37 pm

    Works like a charm ! This is really useful, thank you !

    Like

    hongquan said:
    Mar 9, 2017 at 2:21 am

    Great post. But it seems it fails to parse these logs (Real log from ELB) as below
    “`
    2017-03-01T02:20:13.897023Z cf-router 88.99.90.240:56230 – -1 -1 -1 504 0 0 0 “GET https://habib0987.apps.io:443/value.php HTTP/1.1″ “Mozilla/4.0 (compatible; cron-job.org; http://cron-job.org/abuse/)” ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

    “`
    and

    “`
    2017-03-01T02:24:06.194449Z cf-router 167.89.125.231:61926 10.10.81.5:80 0.000049 0.000447 0.000026 400 400 522 15 “POST http://embroker.cfapps.io:80/events

    “`
    As you can see 1) the first log, backend_ip and port are N/A, which is represented as a hyphen(-) and 2)
    SSL related is gone away.

    Any good idea?

    Like

    hongquan said:
    Mar 9, 2017 at 4:55 am

    For the #1
    %{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) (?:%{NUMBER:request_processing_time:float}|-1) (?:%{NUMBER:backend_processing_time:float}|-1) (?:%{NUMBER:requestresponse_processing_time:float}|-1) (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \”(?:%{GREEDYDATA:elb_request}|-)\” \”(?:%{GREEDYDATA:userAgent}|-)\” (?:%{NOTSPACE:elb_sslcipher}|\S) (?:%{NOTSPACE:elb_sslprotocol}|\S)”

    works great.

    but for #2, sslcipher and elb_sslprotocal do not exist, how to handle this exception?

    Like

    hongquan said:
    Mar 9, 2017 at 9:05 am

    Oh,sorry to bother you, the 2nd log is malformed and generated, the entire log is like :

    017-03-09T08:14:03.459507Z cf-router 167.89.125.223:43766 10.10.17.2:80 0.000029 0.000325 0.000019 400 400 521 15 “POST https://iuhad89fgyphauihdfg9p8h.cfapps.io:443/events
    HTTP/1.1″ “SendGrid Event API” ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

    so

    %{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb_name} %{IP:elb_client_ip}:%{INT:elb_client_port:int} (?:%{IP:elb_backend_ip}:%{NUMBER:elb_backend_port:int}|-) (?:%{NUMBER:request_processing_time:float}|-1) (?:%{NUMBER:backend_processing_time:float}|-1) (?:%{NUMBER:requestresponse_processing_time:float}|-1) (?:%{INT:elb_status_code:int}|-) (?:%{INT:backend_status_code:int}|-) %{INT:elb_received_bytes:int} %{INT:elb_sent_bytes:int} \”(?:%{GREEDYDATA:elb_request}|-)\” \”(?:%{GREEDYDATA:userAgent}|-)\” (?:%{NOTSPACE:elb_sslcipher}|\S) (?:%{NOTSPACE:elb_sslprotocol}|\S)”

    works great. Thanks you.

    Like

Leave a Reply if you find this useful