An Automated Approach to Convert Okta System Logs into Open Cybersecurity Schema Framework (OCSF) Schema
This solution demonstrates the setup required to convert Okta System Log events into Apache Parquet files. Converted and OCFS-formatted Okta Syslog will be stored in an Amazon S3 bucket. Amazon Security Lake consumers will use OCSF logs from the S3 bucket for downstream analytical processes.
This post will show how you can convert Okta System Log events using Amazon EventBridge, Amazon Kinesis Data Firehose, and AWS Lambda functions. Amazon Kinesis Firehose delivery stream will enable record conversion to convert the JSON data into Parquet format before sending it to the S3 bucket backed by Amazon Security Lake.
Open Cybersecurity Schema Framework (OCSF)
The Open Cybersecurity Schema Framework is an open-source project that delivers an extensible framework for developing schemas and a vendor-agnostic core security schema. Vendors and other data producers can adopt and extend the schema for their specific domains. Data engineers can map differing schemas to help security teams simplify data ingestion and normalization so that data scientists and analysts can work with a common language for threat detection and investigation. The goal is to provide an open standard adopted in any environment, application, or solution while complementing existing security standards and processes. You can find more information here: https://github.com/ocsf/
Okta produces critical logging information about your identities and actions. Okta System Log includes events related to your organization to provide an audit trail that is helpful in understanding platform activity and diagnosing security problems. Converting Okta’s System Logs to OCSF compatible version will help customers query security events using an open-standard schema, while complementing all existing security events.
Architecture diagram
Here’s how to convert incoming Okta System Log JSON data using the AWS Lambda function. Use the format conversion feature of Kinesis Firehose to convert the JSON data into Parquet.
- Step 1: Create integration between Okta and Amazon EventBridge
- Step 2: Define an event rule filter to capture events from Okta System Log and communicate to Amazon Kinesis Firehose
- Step 3: Firehose stream invokes a Lambda as it receives an event from EventBridge. The Lambda function will transform Okta’s System Log data into OCSF-formatted JSON.
- Step 4: Configure a Firehose data stream to transform Okta System Log into OCSF format by invoking a Lambda function.
- Step 5: Configure a Firehose data stream to convert the OCSF format from step 4 into a parquet record. (Parquet is the only acceptable format for Amazon Security Lake.)
- Step 6: Converted Parquet files with OCSF schema will be stored in an S3 bucket as part of Amazon Security Lake.
Prerequisite: Preparing the environment
Before diving in, you need the Okta and Amazon EventBridge integration. Okta will send Okta System Log events to Amazon EventBridge. You must add an Amazon EventBridge log stream in Okta and configure it in the Amazon EventBridge console. You can find the document for setting Okta and Amazon EventBridge integration here.
After completing the Okta and EventBridge integration, you will notice an Event bus created in the EventBridge service console. Please note the Event bus name. This Event bus name is an input to the AWS CloudFormation template.

Preparation
An approach here uses AWS CloudFormation to model, provision, and manage AWS Services by treating infrastructure as code. An AWS CloudFormation script will launch the following AWS resources:
-
AWS Lambda
- Amazon Kinesis Firehose Delivery Stream
- Amazon S3 bucket
- AWS Glue Database
- AWS Glue Table
- Amazon Eventbus rule
The CloudFormation template for deploying the solution below can be found here: https://github.com/okta/okta-ocsf-syslog
Step 1 – Create Lab resources using AWS CloudFormation
Click Create a stack with new resources.
On the Create stack page, select the CloudFormation template. The CloudFormation template can be on your local system or an S3 bucket, and click Next.
On the Specify stack details page, enter Partner EventBus, as captured in the prerequisite section, and click Next.
Scroll down and click Next on the Stack options page without changing anything, and click Next.
On the Review page, review the configuration, check the acknowledgment at the bottom, and click Create stack.
The CloudFormation stack creation can take up to five minutes to complete. You will notice the status changes from CREATE_IN_PROGRESS to CREATE_COMPLETE during the creation process.
Once the process is complete, review the Outputs section.
S3BucketName will be the destination bucket for OCSF Parquet files.
Step 2 - Verify CloudFormation Resources
Follow the steps below to verify the resources created by the CloudFormation stack in the previous step.
Open the Kinesis Data Firehose Console. Here, you will see “access-logs-sink-parquet” as a delivery stream. Click the access-logs-sink-parquet delivery stream.
Review delivery stream configuration details.
Notice that data transformation is enabled. The Lambda function will transform incoming Okta System Log events into OCSF JSON format.
Review the Convert record format section for the Kinesis Firehose. This section shows that the record format conversion is enabled, and the output format is Apache Parquet.
Click the Lambda function to review the Lambda configuration.
Click the code tab to review the Lambda function code. The Lambda function transforms the incoming Okta System Log into JSON format.
Return to the Kinesis Firehose details window, and view Destination Settings.
- Amazon S3 Destination - Placeholder for final OCSF schema logs
- Dynamic Partitioning - Amazon S3 bucket is partitioned by the source of events (Okta System Log), region account, and time. Values for the Amazon S3 bucket prefix are configured via the CloudFormation template.
Step 3 - Review the new S3 Destination Configuration
Go to the S3 Console. Verify that the S3 bucket in the format access-logs-parquet-region-ACCOUNTID (your AWS account ID) is present. At this point, your bucket content should be empty.
Your environment is ready to send data to Firehose.
Generate Streaming Data (Test the setup)
We are considering an Okta successful authentication event. Lambda function, OCSF schema, and other artifacts are designed to satisfy this requirement. To initiate this event, a user must log in to an Okta portal.
Once you sign into Okta as a user, the process of Syslog generation is complete.
Verify the data in the S3 bucket
Navigate to the S3 Console.
Select the S3 bucket created in the previous section prefixed with access-logs-parquet-. You will see the data created under access-logs-parquet/ folder prefixed with a <eventSource>/region=<region>/account=<accountid>/eventhour=<yyyyMMddHH>/.
directory structure. It will be a dynamic partition as per Amazon Security Lake guidelines.
Verify Parquet files and OCSF schema
Select the converted parquet file, click Actions, and then click Query with S3 Select.
Select Input and Output settings
Select input as Apache Parquet and output as JSON string.
Query the result
Go to the SQL Query section, and click the Run SQL’query button. You will notice that the query result data is in the OCSF schema.
Conclusion
Security teams struggle to normalize data across various sources and in multiple formats. Adopting OCSF schema for all the generated logs helps security teams efficiently analyze and query security events and incidents. The approach described here demonstrates how customers can broaden their data normalization process to be completely automated.