The SCIM search limitations
Searching for users in a large Okta org is not as easy as it sounds. If I want to find all sales contractors in Los Angeles, I have to get the SCIM syntax exactly right, or I won't get the expected results. For example, to find users by profile attributes, you’d need something like:
profile.department eq "Sales" and profile.city eq "Los Angeles" and profile.userType eq "Contractor"
This syntax is challenging for many users, because:
- You have to know the specific syntax.
- Search is based on keywords, so a query like “my direct reports” won’t find the list of users who report to the current admin. The Okta admin console supports filtering on Manager ID, but the usual search does not understand that “my direct reports” should translate to profile.managerId equal to the admin’s own ID.
- A small typo would mean zero or unexpected results.
For a recent internal hackathon, we attempted to answer a simple question: What if Okta admins could just type what they meant in plain English, and let the system figure out the SCIM details? So:
Instead of | We can use |
|---|---|
search=profile.department eq "Sales" and profile.title eq "Manager" | Find me the sales managers and directors |
profile.managerId eq "00u1ab2c3d4e" | Show me all my direct reports |
To answer this, we built a prototype: an AI-powered semantic search engine that interprets natural language, maps it to Okta resources, and returns relevant users and applications without requiring the admin to use any SCIM syntax. In the remainder of this article, I’ll walk you through how we assembled the prototype together and the vision for a future production-ready implementation.
High level workflow
When an Okta admin types a natural language query into the global search bar, the Okta Java back-end intercepts the API calls (/api/internal/admin/search and /api/v1/internal/apps). Instead of immediately running a standard SCIM query, it makes an API call to the Python AI service with the user’s raw query.
The Python service performs its multi-stage search pipeline and returns a ranked list of relevant user and app IDs. The Java back-end then takes those IDs, fetches the full resource objects from the existing Okta data layer, and constructs the final JSON response for the UI to display.
Python AI service architecture
Data layer: For the prototype, the service uses an offline indexing process. It fetches users and applications from a demo Okta org, converts these resources into local JSON snapshots, and then creates a text representation for downstream embedding.
AI intelligence layer:
- Embedding service: We use AWS Bedrock embedding models (for example, Cohere Embed, Amazon Titan) to map each text document into a vector. Those vectors are stored in a FAISS (Facebook AI Similarity Search library) index.
- Natural language processing: We use LLMs like Anthropic Claude 4.5 to interpret queries and generate SCIM filters.
- Query understanding: A small heuristic service extracts structured attributes (like department, city, and managerId) from the natural language query to refine search results.
Advanced search pipeline:
- Query processing: The incoming query is analyzed, expanded with synonyms, and converted into a vector.
- Candidate retrieval: We search the FAISS index for the closest vectors, and assign each match a hybrid score.
- Result refinement: Finally, we send the top candidates to a Cohere Rerank model to re-order them before returning the final list of resource IDs.
By using the FAISS vector index in this pipeline, we narrow the search space from thousands of users and apps down to a small candidate set. This reduces the size of the context passed to the final Rerank model, cutting operational costs and latency.
Safely building the dataset
We used a demo Okta org to avoid touching any PII data. We generated randomized profiles of users and applications to fill this org with realistic data. All the data was created by custom Python scripts and exported into local JSON files. This allowed us to run the system offline against the snapshots.
Converting Okta objects into searchable text
Raw user and application profiles are not enough for good semantic search. We also need to capture their full context and intent. Our offline indexing engine handles this data preparation before any vectors are created. For the Okta objects, key text fields are first combined into a single text document to ensure the embedding model is fed a single description to interpret.
These documents are then enriched by addressing known limitations in the data. For users, it means building documents with synonym expansions for popular attributes, and for applications, including heuristically generated keywords based on app properties (like adding “expense” if the app is Concur).
Embedding layer on Amazon Bedrock
The Cohere embedding layer includes retry logic and falls back to the Titan model for certain scenarios. The resulting vectors encode the document’s meaning in numerical form, and they are normalized to unit length before storage to be used with the FAISS index search.
Hybrid ranking
Our hybrid search method combines the semantic signal from vector search with more traditional attribute matching. For each query, the heuristic query understanding step extracts hints from the user-provided text (like department or city). In the candidate retrieval stage, the FAISS index is queried for the most similar vectors. Results are then assigned a hybrid score that combines vector similarity with the presence of extracted attributes. Finally, the top candidates are passed to the Cohere Rerank model, which does the final re-ordering.
Future enhancements with LLMs and event hooks
Our hackathon prototype ran on a static snapshot of data, but to make it a production-ready system, we’d need two enhancements: real-time data and a hybrid search method using live queries.
To keep the index fresh, we’d replace the periodic snapshots with an event-driven architecture using Okta event hooks. A serverless function would listen for Okta syslog events and update the vector database, ensuring the search index is only seconds behind the live data.
Then we would use LLMs (like GPT or Claude) to interpret the user query and generate a precise SCIM filter which is then executed directly against the live Okta API. This ensures the results are fresh and accurate.
Conclusion
Okta semantic search started as a simple question: “What if admins did not have to write another SCIM filter by hand?” By putting together an AI pipeline (vector indexing, LLMs, and hybrid scoring), we built a working prototype that lets admins search for Okta resources in plain language.
If you are a developer interested in building AI systems to solve real-world identity problems, explore career opportunities on the Okta careers page, or start your own projects.