Kinesis Firehose stream to Redshift cluster (stream TCPUdump logs from EC2)

Configure EC2 instance for AWS-Kinesis Agent to Stream logs to Kinesis Firehose “fh-redshift-2525and push to S3 bucket “fh-redshift-2525″ and then to REDSHIFT cluster “firehose”

Prerequisite:

1. Launch a redshift cluster “kinesis-redshift”

database name – firehose

database port- 5439

master user name- firehose

password – ****

                Node configuration – Default (single node)

default VPC & security group to allow port 5438 for all (for testing)

 

2. Connect to redshift cluster from ec2 using psql>

  • install postgressql rpm
  • psql -h kinesis-redshift.chxyvg4uqwv9.us-east-1.redshift.amazonaws.com -U firehose -d firehose -p 5439

Password for user firehose:
psql (9.2.15, server 8.0.2)
WARNING: psql version 9.2, server version 8.0.
Some psql features might not work.
SSL connection (cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256)
Type “help” for help.
firehose=#

-U – user name, -d – database name -p TCP Port

3. Create TABLE “tcpdump”  and COLUM “col1″  and test the data insert & verify:

firehose=# create table tcpdump (col1 varchar(10000));
CREATE TABLE

firehose=# insert into tcpdump (col1 ) values (’21:27:07.172306 IP 72-21-196-65.amazon.com.26483 > ip-172-31-4-2.ec2.internal.ssh: Flags [.], ack 153, win 4094, options [nop,nop,TS val 205283057 ecr 38584], length 0′);
INSERT 0 1

 firehose=# select * from tcpdump;
col1    ————————————————————————————————————————————————————-
21:27:07.172306 IP 72-21-196-65.amazon.com.26483 > ip-172-31-4-2.ec2.internal.ssh: Flags [.], ack 153, win 4094, options [nop,nop,TS val 205283057 ecr 38584]
, length 0
(1 row)

 

Step:2 Create an IAM role with following default AWS policies.

  • AmazonKinesisFirehoseFullAccess
  • AmazonCloudwatchFullAccess

Step:3 Create AWS Kinesis-Firehose to stream to REDSHIFT cluster “kinesis-redshift”

AWS console –> Create a Kinesis-Firehose stream “firehose-to-redhift”  & select S3 bucket “tcpdump2525″ (*** created already) and attache IAM role “firehose_delivery_role” with all other default configuration.

Step:4 Launch an EC2 instance attaching the above IAM role.

Step:5 Install “aws-kinesis-agent” in ec2 instance:

  • yum install aws-kinesis-agent -y
  • service aws-kinesis-agent start
  • chkconfig aws-kinesis-agent on

Step:6 Configure “TCPDUMP” to create continuous log file

  • yum install -y tcpdump
  • nohup tcpdump >> /tmp/tcpdump.log
  • tail -f /tmp/tcpdump.out

Step:7 Configure the aws-kinesis-agent to forward log records to kinesis stream “tcpdump”  *** I have created stream “tcpdump”

cat /etc/aws-kinesis/agent.json
{
  “cloudwatch.emitMetrics”: true,
  “kinesis.endpoint”: “”,
  “firehose.endpoint”: “”,
 
  “flows”: [
    {
      “filePattern”: “/tmp/tcpdump.out“,
      “kinesisStream”: “tcpdump“,   —> kinesis stream
      “partitionKeyOption”: “RANDOM”
    },
    {
      “filePattern”: “/tmp/tcpdump.out*”,
      “deliveryStream”: “firehose-to-redshift”  —> for firehose steram
    }
  ]
}

 

Step:8 Restart the kinesis-agent and verify the agent logs:

# service aws-kinesis-agent restart
aws-kinesis-agent shutdown                                 [  OK  ]
aws-kinesis-agent startup                                  [  OK  ]

 

Step:9  Verify the kinesis agent log:

a . tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log  (it take few minutes…wait to see success to destination)
2016-08-11 16:52:53.017+0000 ip-172-31-4-2 (Agent STARTING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Startup completed in 78 ms.
2016-08-11 16:53:23.022+0000 ip-172-31-4-2 (FileTailer[fh:firehose-to-redhift:/tmp/tcpdump.out].MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.tailing.FileTailer [INFO] FileTailer[fh:firehose-to-redhift:/2016-08-11 17:03:11.646+0000 ip-172-31-4-2 (Agent.MetricsEmitter RUNNING) com.amazon.kinesis.streaming.agent.Agent [INFO] Agent: Progress: 1371 records parsed (4256040 bytes), and 1000 records sent successfully to destinations. Uptime: 90084ms

b. Verify the S3 bucket “tcpdump2525″  to ensure data streamed.

c. Verify REDSHIFT table “select * from tcpdump;” to ensure data exists.

d. Verify AWS console for Firehose and check “Moniroing / S3 logs/ Redshift Logs” for any errors.

 

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>