An Automated Approach to Convert Okta System Logs into Open Cybersecurity Schema Framework (OCSF) Schema

This solution demonstrates the setup required to convert Okta System Log events into Apache Parquet files. Converted and OCFS-formatted Okta Syslog will be stored in an Amazon S3 bucket. Amazon Security Lake consumers will use OCSF logs from the S3 bucket for downstream analytical processes. 

This post will show how you can convert Okta System Log events using Amazon EventBridge, Amazon Kinesis Data Firehose, and AWS Lambda functions. Amazon Kinesis Firehose delivery stream will enable record conversion to convert the JSON data into Parquet format before sending it to the S3 bucket backed by Amazon Security Lake.  

Open Cybersecurity Schema Framework (OCSF)

The Open Cybersecurity Schema Framework is an open-source project that delivers an extensible framework for developing schemas and a vendor-agnostic core security schema. Vendors and other data producers can adopt and extend the schema for their specific domains. Data engineers can map differing schemas to help security teams simplify data ingestion and normalization so that data scientists and analysts can work with a common language for threat detection and investigation. The goal is to provide an open standard adopted in any environment, application, or solution while complementing existing security standards and processes. You can find more information here: https://github.com/ocsf/

Okta produces critical logging information about your identities and actions. Okta System Log includes events related to your organization to provide an audit trail that is helpful in understanding platform activity and diagnosing security problems. Converting Okta’s System Logs to OCSF compatible version will help customers query security events using an open-standard schema, while complementing all existing security events.

 Architecture diagram

Here’s how to convert incoming Okta System Log JSON data using the AWS Lambda function. Use the format conversion feature of Kinesis Firehose to convert the JSON data into Parquet. 

3sFpcIRMRB WIDc79Jr833zhZA8XS4sL1wKyV P oYqAmqRqkFpnDN YHc3xRMy0Lej lzrpDbN36l7 VCGF1LsZ7D JRPX1rBYplhjekq7UF9MiJryQmj0Nwo0jLZMFMQZxTJX8hhawjAOFMVuJtMRR yIZ9l nEqJpOkr7f0xnkpSwNlTmcpVp6w1usw

 

  • Step 1: Create integration between Okta and Amazon EventBridge
  • Step 2: Define an event rule filter to capture events from Okta System Log and communicate to Amazon Kinesis Firehose
  • Step 3: Firehose stream invokes a Lambda as it receives an event from EventBridge. The Lambda function will transform Okta’s System Log data into OCSF-formatted JSON.
  • Step 4: Configure a Firehose data stream to transform Okta System Log into OCSF format by invoking a Lambda function.
  • Step 5: Configure a Firehose data stream to convert the OCSF format from step 4 into a parquet record. (Parquet is the only acceptable format for Amazon Security Lake.)
  • Step 6: Converted Parquet files with OCSF schema will be stored in an S3 bucket as part of Amazon Security Lake. 

Prerequisite: Preparing the environment

Before diving in, you need the Okta and Amazon EventBridge integration. Okta will send Okta System Log events to Amazon EventBridge. You must add an Amazon EventBridge log stream in Okta and configure it in the Amazon EventBridge console. You can find the document for setting Okta and Amazon EventBridge integration here.  

After completing the Okta and EventBridge integration, you will notice an Event bus created in the EventBridge service console. Please note the Event bus name. This Event bus name is an input to the AWS CloudFormation template. 

auYVlM7cmoRobMlRyLEG30CNN45FtipiXkr4QYGghi8CSmVQqb2kGfgC20XaDKrAbA0mbInJnzW5iqrEGOlfQ4tfbEa2i0CN2L7f5Tr56MfbOhx8y6KSLnJ GOAQGtDEHQAZT3yDZiPXusrxrYzrJpB1PXiOjPRr4AYDSuJ6aM4ndrCCykGc7xpJvjLagg
Preparation

An approach here uses AWS CloudFormation to model, provision, and manage AWS Services by treating infrastructure as code. An AWS CloudFormation script will launch the following AWS resources:

  • AWS Lambda

  • Amazon Kinesis Firehose Delivery Stream
  • Amazon S3 bucket
  • AWS Glue Database 
  • AWS Glue Table
  • Amazon Eventbus rule 

The CloudFormation template for deploying the solution below can be found here: https://github.com/okta/okta-ocsf-syslog 

Step 1 – Create Lab resources using AWS CloudFormation

Click Create a stack with new resources. 

8SqNkPYvrUdF9qVY8EPWgHm6dHks Zz8UdZjzUlUZBXDJH3m1fIHm7wRMLHU 1ZcduBf2LblMO7t2yFIJOaJ07ThjVb 7jl41Y4M0BRBb5T4OlXYnTvV 6LZYjM1qn zONQrVS4I83Up7hopRB4su2e3TRTr6 4oVDqHXfd iSx5TUHhIDFmnGrUOdsZ2w

On the Create stack page, select the CloudFormation template. The CloudFormation template can be on your local system or an S3 bucket, and click Next.

r2G5DYF8ugLE9ONApw d5XkwXu JIppg96MuMZORhyImgxTBF8YD4FXL9jMK2 aJjKx8ycAghP zDlJ2 sxRlJI92pjjgC ddhM4uCg8tquatLNcRPPMCKa4z2T8BcrpEDVDVjQLHKQKQPzCv5x9WsECz9fYohf2L7OUMBzH5P sP5JDH9Si2 wQbBFB3g

On the Specify stack details page, enter Partner EventBus, as captured in the prerequisite section, and click Next.

BoWsJ7U3a WiscWPyHnEzPD0wUoG4AXWaDL77AsOrlWYjJE4rrbHNAwCJC2cU0lBeFeI6JkC2 4xt Yz VsC7Qsz64TEvd1VYlNNrZ1 zQ0gXkDAswjpfLemW 2vad20vDvvLYQiY6Ygx3kUAJ2ysq8Ri4DMrn0VMkFX5zfRhDoI81tyaIlqym1S53 n A

Scroll down and click Next on the Stack options page without changing anything, and click Next.

Gq3 72P6PT1Ab9ApvX9APxW5iMmvj7bp78JcF7HScueprhSZJvajU5WcilLNG1DmBYdGCjvBEz3QtlYx7qJjGozFBZIPXdN1b8C234 LaRzWYsL4ZWE5hQIK0pMZCWtvLBkiaC xK4wo56bRauIOdTO AlwY9 Y2a6fWgw3wltqAHlgSHnzhHXGvdbEuNg

On the Review page, review the configuration, check the acknowledgment at the bottom, and click Create stack.

I1buDyFmGTF3HSla2x5sKlgNtjTq1gLvFrBFHOayk461um51igj6u G41QiHOEAjMejsRKeJIhK Qi8g KdnMBhTxwI3QA4l98WwikDzSO0nzV DW3jlidbv hDSP N tB18KbgOErauaj1ROVm5Hvvf SIWTVzBK3WJgnDoRbQsTlsrw4Lm0MsE0W8vEA

The CloudFormation stack creation can take up to five minutes to complete. You will notice the status changes from CREATE_IN_PROGRESS to CREATE_COMPLETE during the creation process.

Once the process is complete, review the Outputs section. 

S3BucketName will be the destination bucket for OCSF Parquet files. 

BrSty1SAIiO fIegu5mIwTLzTnTOOs1ih91dJs93Izqz6xhLAw YC2n 5rtnuhHUvrP rTTvjQ36UTktU2LrH S1rk ezyZyRgyu5Uitkv6AbmznTqMgOqU1LUgcnhEppG6 IV0rW57jNYxMe 77xmTdhDBwWa6wQChbDi8uHveoTeY7odwd3VyL4kvnPg

Step 2 - Verify CloudFormation Resources

Follow the steps below to verify the resources created by the CloudFormation stack in the previous step.

Open the Kinesis Data Firehose Console. Here, you will see “access-logs-sink-parquet” as a delivery stream. Click the access-logs-sink-parquet delivery stream.

 

OyTiVuuqUazLeIa3al  M9kME1ZRP7 0mTSHRvTlCZb3usF1n7PmmE1I CV9AWSoU2yfqEx8kX36B0Jr8vIu7ZdCFnj7gbs7G523KTDDnkse1BZxiMe aZeGlPQ KMc bB32sxAuXMiP C0h13q2oVoV4a525phdUmCQ7CNV2SkumoUVz24C8ZHC7qnAQg

Review delivery stream configuration details.

mp8PeT5laSqRkYR0L bgHxGnt GxHiHsIX4PDPgTtI8HMM0Wvb3T37hFgTGXBiEb3K0KHqJynCkFQtnx yIaD178vy4NkKTc zGu6iKI6bet52qItrDeUyPRKpZQmnUGt8IW0nPL QHfCfC45yUKnwOQZE MpEfwax2Wj H kyXrVzgdK0Ox7Ut OO 1zg

Notice that data transformation is enabled. The Lambda function will transform incoming Okta System Log events into OCSF JSON format.

htlCrR 8kQFCM71oYLvYc9KbO6sAvmJRf2c6wXMjBBdRair0QblrLhiTsDUbsc44FpARi44HK xP BVVTfrX3haNEPCgtTlPfN6NwHkP G2uh0yzxHIjoujaMGm2Q Wg9uAYyRD FxwoOXwV3RjF5hgpOB5A9CHvw42hQAHUhktD1lcf1 2OzFkhqjmnVg

Review the Convert record format section for the Kinesis Firehose. This section shows that the record format conversion is enabled, and the output format is Apache Parquet.

OiKpehrPYcUn7SXInbEleaDUFfUI4Nzpt98A pA3aQpTNH8k6jh9DuDAVaFfGrVBHFnin G xizXkYkAYJts53LWSMooYrhaUfu821RaLVmK6FdHLh9SuzWLiObWtGRv GZUUUiFe1j  OwHUwFJVAKM1N1lyy4s6mvuuoZwUGRfzKG4Qeh8VxXKhS67Gw
Click the Lambda function to review the Lambda configuration.

BwuO5KlWw DC4zDEFIOYk33oDDXcMWHjhiV9jIDWBCRyb4g o1G9lj8bCgZr7NG6 VA5gqrzNVq60KieVd1 It2Rk F75xVXlVMSxP8TUp3N4h77qwSQcpOuK8YbILSUZJRDQtIHBcojcK5KR1N64DbR7Q8fnezelsiCwKCyX teI4z 01qK6sAv1kOSRw

Click the code tab to review the Lambda function code. The Lambda function transforms the incoming Okta System Log into JSON format.

j SrHbKbzlZRzMpADcCzKNO4P0U7vDtyZjyenkpaZ6g4U7Ix3TG5qw1e6GsT2toDORLslXRFafuu9GquC5CY4rS zdgIx3AjTaSUE5AKL6wLGC2ux7tGuU2uzv A2H6HqlRvSaSnazTRIg sW3uichDNqDPMinNvHA2H E90Se5LxuBhJjMLSyQx2rjuqQ

Return to the Kinesis Firehose details window, and view Destination Settings

  • Amazon S3 Destination - Placeholder for final OCSF schema logs
  • Dynamic Partitioning - Amazon S3 bucket is partitioned by the source of events (Okta System Log), region account, and time. Values for the Amazon S3 bucket prefix are configured via the CloudFormation template. 

lAf8p mpBZxypXC1eA96NutPHVj fUW815jzwliP 984iLjx Z0 gXU7hSxqGW6KIyph79wzKALZh3qJFUaW4J0BSE3rkuz0wt4ofjm42ydoLnejX67AwQJ7nGKUTzj2nrplMnIw8nKPlLRe AWXUWTXyhtozdz9t5qKwJuaIbCWWX05xJgoBCrQ87es

Step 3 - Review the new S3 Destination Configuration

Go to the S3 Console. Verify that the S3 bucket in the format access-logs-parquet-region-ACCOUNTID (your AWS account ID) is present. At this point, your bucket content should be empty. 

rr4XnqCPMyYKxir1fAvM8 dTaI74iVNG2ktOEn1dTkV9hLAoqGrDT2IYV8xpkqBQXM4QFYUr6rkslismEn0ulZcSEf0 3JHim3U4V5j 1ZupbL1rCbRTMQywp9 CMQhH4LwF5ljxNkrhGL yGtj08lFNJjHS12aCTSL7oS6QcS1HwzdyzCmCudlG8pBi3A

Your environment is ready to send data to Firehose.

Generate Streaming Data (Test the setup)

We are considering an Okta successful authentication event. Lambda function, OCSF schema, and other artifacts are designed to satisfy this requirement. To initiate this event, a user must log in to an Okta portal.

Once you sign into Okta as a user, the process of Syslog generation is complete. 

Verify the data in the S3 bucket

Navigate to the S3 Console.

Select the S3 bucket created in the previous section prefixed with access-logs-parquet-. You will see the data created under access-logs-parquet/ folder prefixed with a <eventSource>/region=<region>/account=<accountid>/eventhour=<yyyyMMddHH>/

 directory structure. It will be a dynamic partition as per Amazon Security Lake guidelines.

oEBg joT8Hpv1gkl10MjRq3lZqgpmqA5LTvx9IrmyJWQ8S23liba DTZkWVumQgW 7uJBHf9mNZy3R9n48HMUjMWpZtPZFn6MbxYkU8dSW6GoxIsj1vgkZ6OxesVCkuYjlVEtxB3ymofcoRI81dB4RRBsMABIYkqVnNOpPf yW3cIdFXeybVj6lcN91fdQ

Verify Parquet files and OCSF schem

Select the converted parquet file, click Actions, and then click Query with S3 Select.

0k5fKvEUFgol1sfM KUr3CRyuA5b7ZUZmt29Nw3A1wNlsw9lieFngghYG8faNAKGGXLtmmRj0hyTWP7bexbtYtgxdKjMH0Sz84yO0EwgQZyIpA6SXK0CavFzr8GLedleJGM3zMQ1Pn0Yq1wsUJ9DF LgOTNJxg5LQ3yfCSLJy B1ZgPc0KEG558ZB1viZw

Select Input and Output settings

Select input as Apache Parquet and output as JSON string.

mXg7Smh8tnLRq XG4BBIPGwLDPcs4yGxQLQxBPYYro9x0U639h0obq6hBrxL6krv9xewm72dZa 1QZDz TmbSURg JHE1q97dEsuQbyTeUyxbkiMAoRQV1agw7Nht7RHMAPDuiaCADar3kQisBMH11KWYWs1yhrANe7D4RvIHbkgcZVTMIhBzSrnTWm84w

Query the result 

Go to the SQL Query section, and click the Run SQLquery button. You will notice that the query result data is in the OCSF schema. 

PVTeZsVlXLMAKB2D7ZZLHC1fvFaqbh2uls3hx XI48DPXr8yeOwBNCAq7Fsn2JhzDkP9OM8FZeNzPoC9F9iuJ9fo9I3WXeJtnOIhxXIS6KezO2 f21 I7sl8ZcI evBTXHnQLtXPJ6j6Clq49l H2oHdkOvO3HhBWZhPhqPaI1UbHxyZq7g1fELK0WLgig

Conclusion

Security teams struggle to normalize data across various sources and in multiple formats. Adopting OCSF schema for all the generated logs helps security teams efficiently analyze and query security events and incidents. The approach described here demonstrates how customers can broaden their data normalization process to be completely automated.